Hello, this brief tutorial will walk you through understanding regular expressions using some examples. The phrase "regular expressions" is sometimes shortened to "regex" It is pronounced regex, with a hard G, like the word 'graphics'. Also, like golf, game, gourd or gif.
We will parse and discuss some regex, using some examples.
Here is an example of a regular expression to ensure that a string is an email address:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
There are different versions of regular expressions, known as dialects, so please check your requirements carefully. Search fo BRE vs ERE, that is outside of the scope of this tutorial.
- Anchors
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Boundaries
- Back-references
- Look-ahead and Look-behind
Anchors don't match a character - instead they match a position. For example, the start of a word, the end of a word, a character before or after, etc. The carat ^
will only match something at the begining of a word. The dollar sign $
will only match something at the end of a word.
Quantifiers make special expections to rules. For example the question mark ?
means that a character can be optional. Do not confuse quantifiers with possessive-quantifiers, as they are different.
For the most part the "OR" operator is a vertical bar. |
The resulting regular expression will match any string on either side of the vertical bar
Character Classes must be contained in brackets [], and are shorthand for that specific class of characters. Examples are: [:alpha:]
, [:digit:]
, [:punct:]
, and [:blank:]
Flags are used for special functionality, such as searching globally, or matching case. You can specify case-insensitive searching with an i
. However, m
will assist in a multi-line search.
When we enclose our search parameters in (
parentheses)
that will search for that combination. The search pattern 'hi' will match high, hi, hiiiiiiiiii, hiiiii, and so on and so forth. Enclosing the search parameters in (hi) parentheses will only match hi, hihi, hihihihihi, etc.
Similar to a character class - a bracket expression are a set of characters contained within square brackets []
. [a-z]
will search for anyt lower case letter that is between a-z, (inclusive) but [^a-z]
will search for anything that is NOT a lower case letter.
Greedy means to match the loooooooongest
possible string.
Lazy means to match the shortest
possible string.
Greedy will only stop once the condition is no longer satisfiable. Lazy will stop as soon as the condition is initially satisfied.
Like the ^
and the $
the \b
expression is an anchor. It signifies the start, or the end, of a string. It has a length of zero. So if you're looking for a specific letter, and you know it is at the beginning or end of a word, the \b
expression is your friend.
Think of back-references like variables in javascript. They refer to previously identified information that you can alias and then use again in your regex searches.
Together, these are known as lookarounds. In the case where we need to find a certain pattern, only IF it is followed or preceded by another pattern.
Thomas J. Begush is a 40-year-old mediocre white dude, that has somehow managed to fail up and forward for his entire life. Here's my github