Binary matching is a powerful feature in Elixir that is useful for extracting information from binaries as well as pattern matching. This article serves as a short overview of the available options when pattern matching and demonstrates a few common usecases.
Binary matching can be used by itself to extract information from binaries:
iex> <<"Hello, ", place::binary>> = "Hello, World"
"Hello, World"
iex> place
"World"Or as a part of function definitions to pattern match:
defmodule ImageTyper
@png_signature <<137::size(8), 80::size(8), 78::size(8), 71::size(8),
13::size(8), 10::size(8), 26::size(8), 10::size(8)>>
@jpg_signature <<255::size(8), 216::size(8)>>
def type(<<@png_signature, rest::binary>>), do: :png
def type(<<@jpg_signature, rest::binary>>), do: :jpg
def type(_), do :unknown
endThere are 9 types used in binary matching:
integer
float
bits (alias for bitstring)
bitstring
binary
bytes (alias for binary)
utf8
utf16
utf32
When no type is specified, the default is integer.
The length of the match is equal to the unit (a number of bits) times the size (the number of repeated segnments of length unit).
| Type | Default Unit |
|---|---|
integer |
1 bit |
float |
1 bit |
binary |
8 bits |
Sizes for types are a bit more nuanced. The default size for integers is 8.
For floats, it is 64. For floats, size * unit must result in 32 or 64, corresponding to binary32 and binary64, respectively.
For binaries, the default is the size of the binary. Only the last binary in a binary match can use the default size. All others must have their size specified explicitly, even if the match is unambiguous.
For example:
iex> <<name::binary, " the ", species::binary>>= <<"Frank the Walrus">>
** (CompileError): a binary field without size is only allowed at the end of a binary pattern
iex> <<name::binary-size(5), " the ", species::binary>>= <<"Frank the Walrus">>
"Frank the Walrus"
iex> {name, species}
{"Frank", "Walrus"}For floats, size * unit must result in 32 or 64, corresponding to binary32 and binary64, respectively.
Some types have associated modifiers to clear up ambiguity in byte representation. The following
| Modifier | Relevant Type(s) |
|---|---|
signed |
integer |
unsigned (default) |
integer |
little |
integer, utf16, utf32 |
big (default) |
integer, utf16, utf32 |
native |
integer, utf16, utf32 |
Integers can be signed or unsigned, defaulting to unsigned.
iex> <<int::integer>> = <<-100>>
<<156>>
iex> int
156
iex> <<int::integer-signed>> = <<-100>>
<<156>>
iex> int
-100Elixir has three options for endianness: big, little, and native. The default is big. native is determined by the VM at startup.
iex> <<number::little-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
256
iex> <<number::big-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
1
iex> <<number::native-integer-size(16)>> = <<0, 1>>
<<0, 1>>
iex> number
256```