Skip to content

Instantly share code, notes, and snippets.

@stevecalla
Last active October 10, 2022 17:11
Show Gist options
  • Save stevecalla/b1f49a30c95b7fee813052f66d35c649 to your computer and use it in GitHub Desktop.
Save stevecalla/b1f49a30c95b7fee813052f66d35c649 to your computer and use it in GitHub Desktop.

REGEX TUTORIAL

REGEX is short for regular expression. A regular expression is way to search for a sequence of characters or a character pattern in a body of text. REGEX uses a search pattern that can be difficult to understand but can find multiple instances of a character pattern throughout a text document.

For example, if I want to find a phone number in a body of text (such as a markdown document or javascript code or simply an article or word document), I can use a REGEX search pattern.

The following REGEX search pattern will find all phone numbers (e.g. 123-456-7899) that match this pattern of 3 digits followed by a "-" followed by 3 digits followed by a "-" followed by 4 digits. The REGEX to do so is

REGEX = "\d{3}-\d{3}-\d{4f}"
will search for phone numbers in the pattern 
"123-456-7899"

Here is an excellent video series and a very good article to dive deep into REGEX.

Video Series or Short Tutorial

Summary

This tutorial will breakdown the following REGEX pattern that is designed to match a URL string;

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Table of Contents

Regex Components

Overall Breakdown

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Let's break this regex pattern search down:

  1. /^(https?:\/\/)?

At the beginning of the line "^", find either "https" optionally "?" with two backslashes "/", "/" with the entire statement of "https://" optional "?". Examples would include the statement starting with "https" or "https://".

  1. /^(https?://)?([\da-z\.-]+)\.

Then we add on the ([\da-z.-]+). after the second paragraph which searches for a digit "\d" or lower case "a" through "z" followed by a "." or "-" for one or more characters ending in a ".".

  1. ([a-z\.]{2,6})([/\w .-])/?$

After the "." in the middle of the statement, the regex statement starts by matching any sequence of "a" through "z" followed by a "." with a minimum of two (2) characters and a maximum of six (6) characters {2,6}.

  1. ([a-z.]{2,6})([\/\w \.-]*)*\/?$

In the last part of the regex the pattern is matching a forward slash "/" (with an escape character "") with any word "\w" followed by a period "." or "-" with 0 or more characters "*" followed by a "/" optionally "?" at the end of the statement "$".

This pattern will match one or more URL strings in a body of text such as the following:

https://www.facebook.com/

www.facebook.com/

www.facebook.com/login-123

This screenshot shows the regex in the search box (upper portion screenshot) matching various URLs with the text but not matching non-URLs strings.

Screen Shot 2022-10-09 at 2 16 51 PM


Note: In Javascript, a string is enclosed in quotes such as var x = "Hello World". A REGEX is enclosed in two forward slashes such as /Hello World/ or in the URL match above note the "/" at the start and "/" at the end of the REGEX.


REGEX COMPONENTS

Anchors

Anchor characters identify or attempt to match based on the position of a character in a string. For example, "^" means match the positon at the start of a string or "$" means match the position at the end of a string.

Example #1: The code below searches for the character sequence "https" at the beginning of the string as indicated by the "^" character.

^(https?:\/\/)?

Example #2: The code below searches for the character sequence "/" at the end of the string as indicated by the "$" character.

/$

Character Classes

A character class is a 1 or more characters that appear in a bracket expression. The characters inside the bracket expression represent a search pattern that match on those characters under an "or" condition. For example, [abc] will search for lower case a or b or c.

Another example is [-.] which will match "-" or ".". In this instance, "-" or "." are the literal "-" or "." not the meta-character "-" or ".". The meta-character "-" inside brackets is as follows [a-z] meaning any character starting at "a" through "z". The meta character "^" means "not" such that [^0-5] means any character not 0 through 5.

Example #1: The code below searches for the character a through z or a period using the bracket expression to represent a character class.

[a-z\.]

Example #2: The code below searches for the a digit or a through z or a period using the bracket expression to represent a character class.

[\da-z\.-]+

Quantifiers

A qualifier is a meta character that modify the previous meta character. For example, the qualifier {min, max} or {n} will search for a {min, max} min or max number of characters or specific number of characters {n}. In the phone number example above "\d{3}" will search for three digits with "\d" indicating search for digit and {3} indicating 3 digits in a sequence.

Min/Max Qualifier: The code below searches for the character a through z or a period using a qualifier which modifies the search to a minimum of 2 characters and a maximum of 6 characters.

[a-z\.]{2,6}

"+" Qualifier: The code below searches for the a digit or a through z or a period or - using a qualifier "+" which modifies the search to one or more characters that match the described pattern.

[\da-z\.-]+

Grouping Constructs

A grouping contruct matches the pattern inside parenthesis such as the examples below.

Example #1: The code inside the parenthesis represents a group construct. In this case, the grouping construct matches at the start of a line indicated by "^" optionally (as indicated by the 1st "?") a https followed by a ":" and "/" and "/" with "https://" (as optional as indicated by the 2nd "?") or (after the second "?") any digit "\d" or any lower case "a" through "z" followed by a "." or "-" for 1 or more characters "+" all contained with a grouping construct indicated by the outer most parenthesis at the start and end of the statement.

^(https?:\/\/)?([\da-z\.-]+)

Bracket Expressions

A bracket expression indicate a set of characters to match as described above in the character class section.

Example #1: The code below searches for the character a through z or a period using the bracket expression to represent a character class.

[a-z\.]

The OR Operator

The OR operator is the pipe character "|" and is typically enclosed in parenthesis.

Example #1: The code under review does not have an OR operator but an example is below where the match pattern is searching for the word "cat" or "dog".

(cat|dog)

Character Escapes

Some characters such as a period "." are meta characters. In order to create a search pattern that matches these meta characters an "escape" or "" must proceed these characters.

Example #1: In the example below "." (after the z and before the "-") is an example of a character escape to search for a period ".". If a period "." is listed without the "" escape, the match will look for any character not a period ".". Thus the "" is used to indicate a literal ".".

[\da-z\.-]

Other Resources:

Author

Steve Calla is a student at University of Denver Full Stack Coding (part-time) bootcamp. GitHub Profile.

This is a test for phone number search including 917-555-1234 or 614.123-4567. Hello this is a test.
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
https://www.facebook.com/
www.facebook.com/
www.facebook.com/login-123
@stevecalla
Copy link
Author

stevecalla commented Oct 9, 2022

Hello - Here is my regex tutorial submission. I treated this like a normal git repo to edit this document including using git add -A, git commit -m and git push origin main.

The history of this work product is readily available via the revision history in the gist as well as the screen shot below. 

URL: https://gist.github.com/stevecalla/b1f49a30c95b7fee813052f66d35c649

Revision History: See image in the comment above.

Thanks, Steve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment