Understanding Regular Expressions in JavaScript

Understanding Regular Expressions in JavaScript

Or How To /r[a-e]+d\s(pa|gi)[tb]{2}er[ni]sh?\?/gi
Ferenc Almasi • Last updated 2021 April 07 • Read time 11 min read
Regular expressions in JavaScript can look scary at first, but once you get the hang of it, there's no pattern you can't create.
  • twitter
  • facebook
JavaScript

In a previous article, I talked about how I managed to reduce my CSS bundle size by more than 20%. I had a lot of examples of regex patterns there, and recently I also got questions related to the topic, so I thought it’s time to collect things down in one place.

What are regular expressions?

Let’s start off by first defining what regular expressions actually mean? According to Wikipedia

A regular expression, regex or regexp is a sequence of characters that define a search pattern.

That’s a pretty good definition; regexes are nothing more than a combination of characters that are mostly used to find patterns in text or to validate user input.


Tools of the Trade

To give you a simple example, say we have an input field where we expect the user to type in some numbers in the following format: YYYY/MM/DD
Four numbers followed by a slash, followed by two numbers, a slash, and two numbers again. A date. 🗓️

Now when it comes to writing regex patterns, there are a number of great tools out there that can help you achieve your goals. There are two I’d like to mention and these are:

RegExr helps you with a handy cheat sheet and also lets you test it out right away as the expressions are evaluated in real-time. This is how I actually “learned” to write regex. Regexper is another great tool that helps you visualize the pattern with a diagram.
Back to the example, the right solution is as simple as doing:

Copied to clipboard!
/\d{4}\/\d{2}\/\d{2}/g
regex.js
Diagram for date regex
Above example represented with a diagram

Before starting, I would like to advise you to follow along by copy-pasting the examples into RegExr and play around with the “Text” field.


The Start

Now let’s break it down, starting from the basics. Every regex pattern is denoted by two /, the pattern itself goes between them. We can also have flags after the closing slash. The two most common one you are going to come across are g and i or the combination of both: gi. They mean global and case insensitive respectively.

Say you have a paragraph in which the digits appear more than once. In order to select every occurrence, you have to set the global flag. Otherwise, only the first occurrence will be matched.

Say you want to select bothjavascript and JavaScript in a piece of text. This is where you would use the i flag. In case you want to select all occurrences then you need the global flag as well, making it /javascript/gi. You see, everything that goes between the slashes will get picked up by regex. So let’s examine what we have between //g and what do they actually mean.

Looking to improve your skills? Check out our interactive course to master JavaScript from start to finish.
Master JavaScriptinfo Remove ads

Character Classes

The regex in the example starts with \d. This is called a character class. Character classes — also called â€śCharacter Sets” — lets you tell the regex engine to match either a single or a set of characters. The \d selects every digit. To select a set of characters you can use brackets. For example to do the same, you can alternatively use [0-9].

This can also be done with letters. [a-z] will select every letter from a to z. Note that this will only select lowercase letters. To include uppercase as well you need to say[a-zA-Z]. Multiple characters can be stacked by simply writing them one after another. Can you guess what [a-z0-9] will do? That’s right, it will select every letter from a to z including every digit from 0 to 9.


Quantifiers and Alternations

Moving on we have {4} after \d. This is called a quantifier and all it does, is it tells the regex engine to look for exactly four digits. Therefore /\d{4}/g will match for 2019, but not for 20 19, 20, 201, or anything else that’s not four digits long. This is what we have done to months and days with \d{2}. We want to get numbers that are exactly two digits long. We can also define a range with two number, starting from the minimum: \d{2,4}. This will get numbers that are at least 2 digits long but not longer than 4. You can also omit the max value \d{2,} and it will get every number longer than 2 digits.

There are also four other alternations I would like to cover as they are often used. The | (or) operator lets you define multiple alternatives. Say you have to write a regex for URLs and need to match both “HTTP” and “WWW”. Piping them together let us match either one of them: /http|www/g.
The other three are really similar to each other and are used to define quantity. They are in order: \d*\d+\d?.

  • Star is used to match 0 or more of the preceding character.
  • Plus is used to match 1 or more of the preceding character
  • The question mark is used to match 0 or 1 of the preceding character. It can be used if we want to express optionality. Let’s say you want to match both http and https this time. This can be done by /https?/g, which will make the (preceding) letter “s” optional.

Escaped Characters

Next, we have the following: \/. This is an escaped character. We wanted to match for a forward slash, but to do so, we first need to escape it with a backslash, and vice versa. The same goes for other special characters that otherwise would have another meaning.

For example, a dot means any character, except a new line. But if you specifically want to match “…”, you can’t just write /.../g. Instead, you need to escape them with a backlash /\.\.\./g.

We know that brackets are used to match for character sets. But what if we want to target the [] characters themselves? They also need to be escaped, so instead of [] we would do \[\], and so on.


Groups and Lookarounds

Now say you use this regex in your JavaScript code and whenever you find a match, you want to extract a portion of it. In this case, it would be nice if we could retrieve the year, month, and day separately so we could do stuff later with them. This is where capturing groups come into place. See the three examples below:

Copied to clipboard! Playground
// Original example
/\d{4}\/\d{2}\/\d{2}/g.exec('2020/01/02'); // Outputs: ["2020/01/02", index: 0, input: "2020/01/02", groups: undefined]

// With capturing groups
/(\d{4})\/(\d{2})\/(\d{2})/g.exec('2020/01/02'); // Outputs: ["2020/01/02", "2020", "01", "02", index: 0, input: "2020/01/02", groups: undefined]

// With named capturing groups (as of writing, currently in stage 4 for ES2018)
/(?<year>\d{4})\/(?<month>\d{2})\/(?<day>\d{2})/g.exec('2020/01/02'); // Outputs: ["2020/01/02", "2020", "01", "02", index: 0, input: "2020/01/02", groups: {…}]

/**
 * Groups will include the following:
 * groups:
 *   day: "02"
 *   month: "01"
 *   year: "2020"
 */
regex.js

In the original example, when we use the exec method on the regex and pass in a date, we get an array back. (meaning we have a match, otherwise exec would return null)

In this case, we would still need to call '2020/01/02'.split('/'); to get what we want. With the second example, we can get around this by grouping everything together with parentheses. By saying (\d{4}), we group the year which we can later extract with exec. Now in the output, we get back the year, the month and the day separately and we can access them, starting from the first index of the array: arr[1]. The zero index will always return the whole match itself.

I also included a third example which uses named capturing groups. This will give us a group object on the output array, which will hold our named groups with their value. However, this is not standardized yet and not supported in all browsers so I would advise to avoid it using in production code just yet.

There can also be cases where you need to group part of the pattern together, but you don’t actually want to create a group for it when calling from JavaScript. A non-capturing group will help you in this case. Adding ?: to the beginning of the group will mark it as non-capturing: (?:\d{4}).

Lookarounds

We talked about groups but we also have so-called “lookarounds”. Among them, we have positive and negative lookaheads, which basically tells the regex engine to “Look forward and see if the pattern is followed by a certain pattern!”.

Imagine you have a domain regex and you only want to match domains that are ending with “.net”. You want a positive lookahead because you want to end it with “.net”. You can turn your capturing group into that by adding ?= to the beginning: domainRegex\.(?=net).

The opposite of that is a negative lookahead. You want a negative lookahead when you don’t want to end it with “.net”. The pattern in this case is ?!, so domainRegex\.(?!net) will match every domain, except the ones that have a “.net” ending.

There’s also lookbehinds which do the exact opposite; look back and see if a pattern is preceded by the one specified in the lookbehind. They are ES2018 features, just like named capturing groups, so using them in production is not advised.

It’s important to note, lookarounds will not be part of a match, they only validate or invalidate it!

Looking to improve your skills? Check out our interactive course to master JavaScript from start to finish.
Master JavaScriptinfo Remove ads

Practice Time

Let’s say I want to create a regex that matches a URL for my portfolio and I want it to work with “HTTP”, “HTTPS”, “WWW” or no protocol at all. That means I need to cover four different cases:

  • https://allma.si
  • http://allma.si
  • www.allma.si
  • allma.si

Starting from the beginning I can just say:

/https?/g

This will match for both “HTTP” and “HTTPS”. This is followed by a double colon and two forward slashes. Light shines through you and you say: “We must escape those!” So we can expand the pattern to:

/https?:\/\//g

And now we can finish up the rest with the hostname itself, taking into consideration that we also have to escape the dot, leading us to:

/https?:\/\/allma\.si/g

Now, this will definitely work for the first two cases but we can also have “WWW” and no protocol at all. So we “or” it with a pipe:

/https?:\/\/|www\.allma\.si/g

And the only thing left to do is to make it optional so we have a match when we don’t provide any protocol. We can do this with a question mark at the end of “WWW”, but to make it effective to HTTP, we have to group them together, so that leaves us with:

/(https?:\/\/|www\.)?allma\.si/g
Diagram for domain protocol regex

Use Cases in JavaScript

There are a couple of methods that we can use with regular expressions in JavaScript. We have to differentiate between methods attached to the RegExp object and methods on the String object. We already looked at exec, but we also have another common RegExp method which returns either true or false, based on the provided input. With that, we can easily create checks in our code:

Copied to clipboard!
if (/graph/g.test('paragraph')) { ... } // Will evaluate to true
test.js

We also have a couple of handy functions on the String object. The most common one that you will use is probably match, which returns an array of matches if there’s any or null if there’s none. The above example can be rewritten in the following way:

Copied to clipboard!
'paragraph'.match(/graph/g); // Returns ["graph"]
match.js

There’s also matchAll, but instead, it returns either a RegExpStringIterator or an empty array. A RegExpStringIterator is an iterator on which you can loop through. Each item contains an array, similar to the output of exec. You can get the outputs by using spread on the return value of matchAll.

Copied to clipboard!
[...'paragraph'.matchAll(/graph/g)];
matchAll.js

Last but not least, there’s String.search, which returns the index number for the match, in case there’s any. If there’s none, it will return -1 instead. In the example below, it will find a match, starting from the 5th character of the provided string, hence it returns 4. (As we start the index from 0)

Copied to clipboard!
'paragraph'.search(/graph/g); // Returns 4
search.js

As a last word, I would like to encourage you to practice and hack the regex used in the subtitle and comment your solution down below. The right answer gets the cookie 🍪. To give you a little bit of help, here’s a diagram of that.

Diagram for regex in the subtitle

Cheatsheet

To recap everything, here’s a quick reference to things mentioned in this article. I marked ES2018 features with an exclamation mark.

Flags
g â€” Global
i â€” Case Insensitive

Character classes
\d â€” Match for every digit
\w â€” Match for every word
[a-z] â€” Match a set of characters inside the brackets (a to z)

Quantifiers, Alternations
a{4} â€” Match preceding token that is exactly 4 characters long
a{2,4} â€” Match preceding token that is between 2 and 4 characters long
a{2,} â€” Match preceding token longer than 2 characters

z* â€” Match 0 or more of the preceding character
z+ â€” Match 1 or more of the preceding character
z? â€” Match 0 or 1 of the preceding character

a|z â€” Match “a” or “z”

Escaped characters
\/ â€” Escape a forward slash (char code 47)
\\ â€” Escape a backslash (char code 92)
\. â€” Escape a dot (char code 46)

Groups, Lookarounds
(2020) â€” Capturing group
(?:2020) â€” Non-capturing group
(?<year>2020) â€” Named capturing group ⚠️
(?=2020) â€” Positive lookahead
(?!2020) â€” Negative lookahead
(?<=2020) â€” Positive lookbehind ⚠️
(?<!2020) â€” Negative lookbehind ⚠️

JavaScript functions
regex.exec('string') â€” Returns null or array containing the match
regex.test('string') â€” Returns true or false based on the provided string

str.match(/regex/g) â€” Returns null or array containing matches
str.matchAll(/regex/g) â€” Returns an empty array or RegExpStringIterator
str.search(/regex/g) â€” Returns index, returns -1 if no match is found

  • twitter
  • facebook
JavaScript
Did you find this page helpful?
đź“š More Webtips
Mentoring

Rocket Launch Your Career

Speed up your learning progress with our mentorship program. Join as a mentee to unlock the full potential of Webtips and get a personalized learning experience by experts to master the following frontend technologies:

Courses

Recommended

This site uses cookies We use cookies to understand visitors and create a better experience for you. By clicking on "Accept", you accept its use. To find out more, please see our privacy policy.