REGEX is short for regular expression. A regular expression is way to search for a sequence of characters or a character pattern in a body of text. REGEX uses a search pattern that can be difficult to understand but can find multiple instances of a character pattern throughout a text document.
For example, if I want to find a phone number in a body of text (such as a markdown document or javascript code or simply an article or word document), I can use a REGEX search pattern.
The following REGEX search pattern will find all phone numbers (e.g. 123-456-7899) that match this pattern of 3 digits followed by a "-" followed by 3 digits followed by a "-" followed by 4 digits. The REGEX to do so is
REGEX = "\d{3}-\d{3}-\d{4f}"
will search for phone numbers in the pattern
"123-456-7899"
Here is an excellent video series and a very good article to dive deep into REGEX.
Video Series or Short Tutorial
This tutorial will breakdown the following REGEX pattern that is designed to match a URL string;
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
- Overall Breakdown
- Anchors
- Quantifiers
- Grouping Constructs
- Bracket Expressions
- Character Classes
- The OR Operator
- Character Escapes
- Other Resources
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Let's break this regex pattern search down:
/^(https?:\/\/)?
At the beginning of the line "^", find either "https" optionally "?" with two backslashes "/", "/" with the entire statement of "https://" optional "?". Examples would include the statement starting with "https" or "https://".
- /^(https?://)?
([\da-z\.-]+)\.
Then we add on the ([\da-z.-]+). after the second paragraph which searches for a digit "\d" or lower case "a" through "z" followed by a "." or "-" for one or more characters ending in a ".".
([a-z\.]{2,6})
([/\w .-])/?$
After the "." in the middle of the statement, the regex statement starts by matching any sequence of "a" through "z" followed by a "." with a minimum of two (2) characters and a maximum of six (6) characters {2,6}.
- ([a-z.]{2,6})
([\/\w \.-]*)*\/?$
In the last part of the regex the pattern is matching a forward slash "/" (with an escape character "") with any word "\w" followed by a period "." or "-" with 0 or more characters "*" followed by a "/" optionally "?" at the end of the statement "$".
This pattern will match one or more URL strings in a body of text such as the following:
This screenshot shows the regex in the search box (upper portion screenshot) matching various URLs with the text but not matching non-URLs strings.
Note: In Javascript, a string is enclosed in quotes such as var x = "Hello World". A REGEX is enclosed in two forward slashes such as /Hello World/ or in the URL match above note the "/" at the start and "/" at the end of the REGEX.
Anchor characters identify or attempt to match based on the position of a character in a string. For example, "^" means match the positon at the start of a string or "$" means match the position at the end of a string.
Example #1: The code below searches for the character sequence "https" at the beginning of the string as indicated by the "^" character.
^(https?:\/\/)?
Example #2: The code below searches for the character sequence "/" at the end of the string as indicated by the "$" character.
/$
A character class is a 1 or more characters that appear in a bracket expression. The characters inside the bracket expression represent a search pattern that match on those characters under an "or" condition. For example, [abc] will search for lower case a or b or c.
Another example is [-.] which will match "-" or ".". In this instance, "-" or "." are the literal "-" or "." not the meta-character "-" or ".". The meta-character "-" inside brackets is as follows [a-z] meaning any character starting at "a" through "z". The meta character "^" means "not" such that [^0-5] means any character not 0 through 5.
Example #1: The code below searches for the character a through z or a period using the bracket expression to represent a character class.
[a-z\.]
Example #2: The code below searches for the a digit or a through z or a period using the bracket expression to represent a character class.
[\da-z\.-]+
A qualifier is a meta character that modify the previous meta character. For example, the qualifier {min, max} or {n} will search for a {min, max} min or max number of characters or specific number of characters {n}. In the phone number example above "\d{3}" will search for three digits with "\d" indicating search for digit and {3} indicating 3 digits in a sequence.
Min/Max Qualifier: The code below searches for the character a through z or a period using a qualifier which modifies the search to a minimum of 2 characters and a maximum of 6 characters.
[a-z\.]{2,6}
"+" Qualifier: The code below searches for the a digit or a through z or a period or - using a qualifier "+" which modifies the search to one or more characters that match the described pattern.
[\da-z\.-]+
A grouping contruct matches the pattern inside parenthesis such as the examples below.
Example #1: The code inside the parenthesis represents a group construct. In this case, the grouping construct matches at the start of a line indicated by "^" optionally (as indicated by the 1st "?") a https followed by a ":" and "/" and "/" with "https://" (as optional as indicated by the 2nd "?") or (after the second "?") any digit "\d" or any lower case "a" through "z" followed by a "." or "-" for 1 or more characters "+" all contained with a grouping construct indicated by the outer most parenthesis at the start and end of the statement.
^(https?:\/\/)?([\da-z\.-]+)
A bracket expression indicate a set of characters to match as described above in the character class section.
Example #1: The code below searches for the character a through z or a period using the bracket expression to represent a character class.
[a-z\.]
The OR operator is the pipe character "|" and is typically enclosed in parenthesis.
Example #1: The code under review does not have an OR operator but an example is below where the match pattern is searching for the word "cat" or "dog".
(cat|dog)
Some characters such as a period "." are meta characters. In order to create a search pattern that matches these meta characters an "escape" or "" must proceed these characters.
Example #1: In the example below "." (after the z and before the "-") is an example of a character escape to search for a period ".". If a period "." is listed without the "" escape, the match will look for any character not a period ".". Thus the "" is used to indicate a literal ".".
[\da-z\.-]
- video series
- #1 INTRO: https://www.youtube.com/watch?v=7DG3kCDx53c - intro
- #2 META CHARACTERS: https://www.youtube.com/watch?v=YTocEnDsMNw
- #3 CHARACTER CLASSES: https://www.youtube.com/watch?v=EfJU0Y9WAZ4
- #4 CAPTURING GROUPS: https://www.youtube.com/watch?v=c9HbsUSWilw
- #5 BACK REFERENCE: https://www.youtube.com/watch?v=Z66TeSTcP-Q
- #6 JS - test() & match(): https://www.youtube.com/watch?v=W7S_Vmq0GSs
- #7 JS - exec(): https://www.youtube.com/watch?v=t029QcVHtas
- #8 JS - split(): https://www.youtube.com/watch?v=fdyqutmcI2Q
- #9 JS - replace(): https://www.youtube.com/watch?v=7a-a6lKoyIQ
- https://coding-boot-camp.github.io/full-stack/computer-science/regex-tutorial
Steve Calla is a student at University of Denver Full Stack Coding (part-time) bootcamp. GitHub Profile.
Hello - Here is my regex tutorial submission. I treated this like a normal git repo to edit this document including using git add -A, git commit -m and git push origin main.
The history of this work product is readily available via the revision history in the gist as well as the screen shot below.
URL: https://gist.github.com/stevecalla/b1f49a30c95b7fee813052f66d35c649
Revision History: See image in the comment above.
Thanks, Steve