Regular Expression Tutorial

This tutorial is meant to be a foundational reference guide for anyone learning Regular Espressions. By the end of this tutorial you will know what a regular expression is, when to use them, all of their different functionality, as well as some cool tips on how to improve your own regex scripting. Throughout this tutorial we'll be referencing a specific regular expression, breaking down each component and learning about the functionality of each part.

Summary

Regular expression for email: /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

The goal of each regex is to return a match for a continuous series of characters. The type of characters, number characters and order of characters can all be specified and modified in the regex. The regex we'll be referencing in this tutorial searches for an email address such as "[email protected]" and return a match if the structure of that email matches the criteria of our regular expression.

Regular Expression Tutorial

Regex Components

Anchors

Description:

Anchors are used for matching characters or a phrase at the beginning and end of a string. RegEx recognizes the end of a string as a series of characters that is terminated by a return.
Syntax: The ^ "carrot" references the begining of a line and $ "dollar-sign" references the end of a line.

Example:

I wanted to eat, so I ate a cheeseburger at McDonald's

Demo:

Let's say we wanted to find a match for any character located at the begining and end from our example string. In regex, the . "period" simply means any possible character.

Regex: ^.

Match:

Explaination:

The regex ^. is basically saying, "Find an instance of any character . located at the begining of a string ^."

Regex: .$

Match:

Explaination:

The regex .$ is basically saying, "Find an instance of any character . located at the end of a string $."

Quantifiers

Description:

Quantifiers are uesd when you want to return a match for a certain number of characters.
All RegEx Quantifiers:
- * 0 or More
- + 1 or More
- ? 0 or One
- {5} Exact Number
- {5,6} Min and Max range of numbers
Example:
```
I am very veryy veryyy veryyyy hungry
```
Demo:

Lorem

Regex: very *

Match:

Explaination:

The regex very * is basically saying, "Find all instances of 'very' that are followed by 0 or more " " space characters."

Regex: very +

Match:

Explaination:

The regex very + is basically saying, "Find all instances of 'very' that are followed by one or more " " space characters."

Regex: very?

Match:

Explaination:

The regex very? is basically saying, "Find all instances of 'ver' that are followed by 0 or 1 "y" characters."

Regex: very{3}

Match:

Explaination:

The regex very{3} is basically saying, "Find all instances of 'ver' that are followed by exactly 3 "y" characters."

Regex: very{2,4}

Match:

Explaination:

The regex very{2,4} is basically saying, "Find all instances of 'ver' that are followed by 2-4 "y" characters."

OR Operator

The OR operator is used to find a match for one \_\_ or another. The OR operator is invoked with the `|` "pipe" character.

Example String:

“I like chocolate icecream. I like vanila icecream.”

If we wanted to match the whole string in the example above, we could do so with the following expression:

/i enjoy (chocolate|vanila) icecream./g

Character Classes

Character classes are used to find matches of a specific character set ane are invoked by the `[]` brackets. You can also join multiple character sets together by simply adding the next set imediatly after the previous set.

For example, if we wanted to find all lower-case alpha characters, we could do so with the following expression:

/[a-z]/g

If we wanted to find lower-case alpha characters, upper-case alpha characters, and numeric characters, we could do so with the following expression:

/[a-zA-Z0-9]/g

Flags

Flags in Regex are placed at the end of an expression and they define different criteria for the searching behavior.

- All RegEx Flags:

/g Global
The Global flag returns all matches in the entire file instead of only returning the first instance of the match.
/i Case Insensitive
The Case Insensitive flag returns matches regardless of upper or lowercase alpha characters.
/m Multiline
The Multiline flag is used in conjunction with the ^ and $ anchors. By default, the ^ and $ anchors will only return a result if there is a match in the first line. When the /m flag is added however, the expression will search ALL lines of code for a match.
/s Single Line
The Single Line flag returns matches
/u Unicode
The Unicode flag returns matches
/y Sticky
The Sticky flag returns matches

Grouping and Capturing

Grouping is useful if we want to find a specific character or phrase within another phrase we're searching for. Groups are invoked with the `()` parentheses.

- Example String: > Peter piper picked a patch of pickled peppers

For example, if we wrote /p(i|e|a)/g as our expression, we would match:

Bracket Expressions

Bracket expressions are very similar to character classes in that they are invoked by the same `[]` brackets except they are primarily used for matching specific special characters.

For example, the regex `[.[{()\\+*\]^$|?]` would match

Greedy and Lazy Match

Description:

Lorem
Syntax: Lorem
Example:
```
Lorem
```
Demo:

Lorem

Boundaries

Description:

Boundaries or "word boundaries" are used when we want to match one or more characters of a word but only if it's located at the begining or the end of the word.
Syntax: \b references a word boundary and \B references a non-word boundary.

Example:

I wanted to eat, so I ate a cheeseburger at McDonald’s

Demo:

Let's say we wanted to find a match for "at" inside of our example string. Depending on where we place the anchor in our expression, we can match different instances of the string we're searching.

Regex: at\b

Match:

Explaination:

The regex at\b is basically saying, "Find all instances of 'at' that are followed by a word boundary."

Regex: at\B

Match:

Explaination:

The regex at\B is basically saying, "Find all instances of 'at' that are NOT followed by a word boundary."

Regex: \bat

Match:

Explaination:

The regex \bat is basically saying, "Find all instances of 'at' that are preceded by a word boundary."

Regex: \Bat

Match:

Explaination:

The regex \Bat is basically saying, "Find all instances of 'at' that are NOT preceded by a word boundary."

Back-references

Description:

Back-references are used to search for multiple instances of some criteria inside a single string. Back-refernces are invoked with \1
Syntax: (criteria-1)\1

Example:

We the People of the the United States, in Order to form a more more perfect Union, establish Justice, insure domestic Tranquility, provide for the the common defence, promote the general Welfare, and and secure the Blessings of Liberty to ourselves and our Posterity, do do ordain and establish this Constitution for the United States of of America.

Demo:

Lets say we wanted to find all instances of repreated words in our example. We can use back-referencing to do this.

Regex: \b(\w+)\s\1\b

Match:

Explaination:

Our regex is basically saying, "Find all instances of a word \w of any length + followed by a space \s that repeats \1. These repeating words must also be inside a word boundary \b."

Look-ahead and Look-behind

Description:

Look-ahead and Look-behind, collectively called “lookaround”, searches for a set of 2 criteria in sequence and returns a match for whatever is in "ahead" or "behind" it depending on the criteria specified in the regex.

Look-around example:

https://www.google.com
http://www.google.com
https://www.facebook.com
http://www.facebook.com

Positive Look-behind

Description:

Positive Look-behind searches for 2 criteria in sequence and returns a match for the second criteria, but only if the first criteria is "behind" it.
Syntax: (?<=(criteria-1))(criteria-2)
Demo:

Let's say we wanted to write a regex using Positive Look-behind that returns all matches for "google.com" so long as it has "https://www." behind it. We could do this with the following:

Regex: (?<=(https:\/\/www.))(google.com)

Match:

Explaination:

This expression is basically saying, "Return and match 'google.com' but only if 'https://www.' is behind it." Notice the "google.com" on line 2 is not matched because the string behind it, "http://www.", doesn't match criteria-1 in our regex.

Negative Look-behind

Description:

Negative Look-behind searches the exact same way as Positive Look-behind except it matches and returns the inverse.
Syntax: (?<!(criteria-1))(criteria-2)
Demo:

If we use the same example as above except we change our regex to have a Negative Look-behind syntax, (?<=(https:\/\/www.))(google.com), it will match and return all instances of "google.com" that do NOT have "https://www." behind it.

Regex: (?<!(https:\/\/www.))(google.com)

Match:

Explaination:

This expression will only return the "google.com" on line 2 becasue that's the only instance of "google.com" in our example where "https://www." is not behind it.

Positive Look-ahead

Description:

Positive Look-ahead searches for 2 criteria in sequence and returns a match for the first criteria but only if the second criteria is ahead of it.
Syntax: (criteria-1)(?=(criteria-2))
Demo:

Using our example, let's say we wnat to return all instances of "https://www." but only if "google.com" is ahead of it. We could do this with the following:

Regex: (https:\/\/www.)(?=(google.com))

Match:

Explaination:

This expression will only return the "https://www." on line 1 becuase that's the only instance in our example where "google.com" is ahead of it.

Negative Look-ahead

Description:

Just like how Negative Look-behind searches the inverse of Positive Look-behind, Negative Look-ahead searches the exact same way as Positive Look-ahead but matches and returns the inverse. Negative Look-ahead searches for 2 criteria in sequence and returns a match for the first criteria EXCEPT if the second criteria is ahead of it.
Syntax: (criteria-1)(?!(criteria-2))
Demo:

Regex: (https:\/\/www.)(?!(google.com))

Match:

Explaination:

This expression will only return the "https://www." on line 3 because that is the only instance of "https://www." in our example where "google.com" is NOT ahead of it.

About the Author

Abdulmelik Ersoy is a coder and web-developer. He started his journey in the world of coding at with the Full-Stack Web Development Bootcamp at Rutgers University. Clayton looks forward to learning more about all asects of web development, sharpening his coding skills and meeting more awesome coders who are just as excited about coding as he is!

Abdulmelik Ersoy's GitHub

Abduler21/Tutorial.md

Title (replace with your title)

Summary

Table of Contents

Regex Components

Anchors

Quantifiers

OR Operator

Character Classes

Flags

Grouping and Capturing

Bracket Expressions

Greedy and Lazy Match

Boundaries

Back-references

Look-ahead and Look-behind

Author

Regular Expression Tutorial

Summary

Table of Contents

Regex Components

Anchors

Quantifiers

OR Operator

Character Classes

Flags

Grouping and Capturing

Bracket Expressions

Greedy and Lazy Match

Boundaries

Back-references

Look-ahead and Look-behind

Positive Look-behind

Negative Look-behind

Positive Look-ahead

Negative Look-ahead

About the Author