Skip to content

Instantly share code, notes, and snippets.

@tbegush
Last active September 17, 2021 01:37
Show Gist options
  • Save tbegush/f3efa96f1ee9e02e93f214a892901f77 to your computer and use it in GitHub Desktop.
Save tbegush/f3efa96f1ee9e02e93f214a892901f77 to your computer and use it in GitHub Desktop.

Regular Expressions - Regex - the basics

Hello, this brief tutorial will walk you through understanding regular expressions using some examples. The phrase "regular expressions" is sometimes shortened to "regex" It is pronounced regex, with a hard G, like the word 'graphics'. Also, like golf, game, gourd or gif.

Summary ----------------------------------------------------------------------------------------

We will parse and discuss some regex, using some examples.

Here is an example of a regular expression to ensure that a string is an email address:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

There are different versions of regular expressions, known as dialects, so please check your requirements carefully. Search fo BRE vs ERE, that is outside of the scope of this tutorial.

Table of Contents -------------------------------------------------------------------------------

Regex Components ------------------------------------------------------------------------------

Anchors

Anchors don't match a character - instead they match a position. For example, the start of a word, the end of a word, a character before or after, etc. The carat ^ will only match something at the begining of a word. The dollar sign $ will only match something at the end of a word.

Quantifiers

Quantifiers make special expections to rules. For example the question mark ? means that a character can be optional. Do not confuse quantifiers with possessive-quantifiers, as they are different.

OR Operator

For the most part the "OR" operator is a vertical bar. | The resulting regular expression will match any string on either side of the vertical bar

Character Classes

Character Classes must be contained in brackets [], and are shorthand for that specific class of characters. Examples are: [:alpha:], [:digit:], [:punct:], and [:blank:]

Flags

Flags are used for special functionality, such as searching globally, or matching case. You can specify case-insensitive searching with an i. However, m will assist in a multi-line search.

Grouping and Capturing

When we enclose our search parameters in (parentheses) that will search for that combination. The search pattern 'hi' will match high, hi, hiiiiiiiiii, hiiiii, and so on and so forth. Enclosing the search parameters in (hi) parentheses will only match hi, hihi, hihihihihi, etc.

Bracket Expressions

Similar to a character class - a bracket expression are a set of characters contained within square brackets []. [a-z] will search for anyt lower case letter that is between a-z, (inclusive) but [^a-z] will search for anything that is NOT a lower case letter.

Greedy and Lazy Match

Greedy means to match the loooooooongest possible string. Lazy means to match the shortest possible string. Greedy will only stop once the condition is no longer satisfiable. Lazy will stop as soon as the condition is initially satisfied.

Boundaries

Like the ^ and the $ the \b expression is an anchor. It signifies the start, or the end, of a string. It has a length of zero. So if you're looking for a specific letter, and you know it is at the beginning or end of a word, the \b expression is your friend.

Back-references

Think of back-references like variables in javascript. They refer to previously identified information that you can alias and then use again in your regex searches.

Look-ahead and Look-behind

Together, these are known as lookarounds. In the case where we need to find a certain pattern, only IF it is followed or preceded by another pattern.

Author

Thomas J. Begush is a 40-year-old mediocre white dude, that has somehow managed to fail up and forward for his entire life. Here's my github

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment