Skip to content

Instantly share code, notes, and snippets.

@wonjong2
Last active June 24, 2022 22:21
Show Gist options
  • Save wonjong2/422efdf90930f6f4a268c008c7cd31f9 to your computer and use it in GitHub Desktop.
Save wonjong2/422efdf90930f6f4a268c008c7cd31f9 to your computer and use it in GitHub Desktop.
Tutorial for Regex Expression

Regular Expression Tutorial (Matching a URL)

A Regular Expression (shortened as 'regex' or 'regexp') is a sequence of characters that defines a specific search patten in text. Most general-purpose programming launguages support regex capabilities either natively or via libraries, including for example JavaScript, Python, C, C++ and Java.

Summary

In this Gist, Let's break down the regex for matching a URL and take a look at each component.

  • Matching a URL : /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Table of Contents

Regex Components

Anchors

/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/

Anchors allow you to match a position before or after characters

  • ^ : The caret anchor matches the beginning of the text
  • $ : The dollar anchor matches the end of the text

Examples (JavaScript)

anchor_eg1

Quantifiers

/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/

Quantifires match a number of instances of a character, group, or character class in a string.

Quantifier Description
* Match zero or more times - same as {0, }
+ Match one or more times - same as {1, }
? Match zero or one time - same as {0,1}
{n} Match exactly n times
{n, } Match at least n times
{n,m} Match from n to m times

Examples (JavaScript)

quantifiers_eg1

Grouping Constructs

"/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/"

Groups use the ( ) symbols. They are useful for creating blocks of patterns, so you can apply repetitions or other modifiers to them as a whole.

Example (JavaScript)

group_eg1

Bracket Expressions

"/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/"

The bracket expressions match one character out of a set of characters. The square brackets can contain character range such as [a-z], [0-9], or [a-zA-Z0-9] etc.

Examples (JavaScript)

bracket_eg1

Character Classes

"/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w.-]*)*/?$/"

A Character class allows you to match any symbol from a certain character set. A character class is also called a character set.

Characters Meaning
\d Matches any digit (Arabic numeral), same as [0-9]
\D Matches any character that is not a digit (Arabic numeral), same as [^0-9]
\w Matches any alphanumeric character form the basic latin alphabet, including the underscore, same as [A-Za-z0-9_]
\W Matches any character that is not a word character form the basic Latin alphabet, such as [^a-za-z0-9_]
\s Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces, same as [\f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
\S Matches a single character other thatn white space, same as [^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
\t Matches a horizontal tag
\r Matches a carriage return
\n Matches a linefeed

Examples (JavaScript)

class+eg1

Character Escapes

"/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/"

There are special characters that have special meaning in a regular expression, such as []{}()\^$.|?*+. To use a special character as a regular one, prepend it with a backslash: \

Examples (JavaScript)

escape_eg1

Flags

A flag changes the default searching behavior of a regular expression. It makes a regex search in a different way.

Flag Name Modification
i Ignore Casing With this glag the search is case-insensitive: no difference between A and a
g Global With this flag the search looks for all matches, without it - only the first match is returned
s Dot All Enables 'dotall' mode, that allows a dot . to match newline character \n
m Multiline Makes the boundary characters ^ and $ match the beginning and ending of every single line instead of the beginning and ending of the whole string
u Unicode Enables full Unicode support

Author

Wonjong Park : https://github.com/wonjong2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment