Try Regex

Type some JavaScript here:

Getting started with regex

This interactive tutorial serves as an introduction to regular expressions, specifically regular expressions in JavaScript. This will still teach you to write regular expressions that work in other languages, but you should be aware that there are differences.

The console to the left is just a JavaScript console. Tell us your name using setName('Your name') (substituting your name in) to get started with the tutorial.

There are a number of useful commands: run help() to view them.

What are regular expressions?

A regular expression (also known as a regex or a regexp) is a string for describing a search pattern—similar to asterisks for wildcard file name matching, but more powerful (and thus more complicated).

We'll start with a very basic example so that you can get the hang of the syntax and regular expressions in JavaScript.

The bio variable contains a string which may or may not contain your name. To see if it does, type bio.match(/{{ firstEscaped }}/);

It does!

There are a couple things that you can get from the previous example. The first is the syntax used to define regular expressions: you simply surround your expression in forward slashes:

/your expression/

If you type that into the console, you'll see that the regular expressions is returned.

The second is that you can use the .match() method to test an expression on a string. There are a couple other methods you can call: you can use the .exec() method directly on the regex to execute a string on a regex. Type /{{ firstEscaped }}/.exec(bio).

Simple testing

The .exec() method does the same thing as the .match() function, but is called on the expression instead of the string—this can be pretty useful.

Another method you can use—and probably the simplest of them all—is the .test() method. It is similar to .exec(), but returns a boolean value. Try it out!

Helpful hint: you can use the up arrow on your keyboard to go back to a previous expression.

String replacements

The final method we'll be using is the .replace() method of a string to replace a bit of a string with another string. Type the following to hide your name from the bio var:

bio.replace(/{{ firstEscaped }}/, '[redacted]')

Special characters

None of the expressions we have used so far have been especially interesting, and haven't contained any special characters. The following characters need escaping in regular expressions:

$()*+.?[^|]

To escape them, use a backslash, eg /what\?/.

Write an expression to see if the num variable contains the string "3.5".

The dot operator

It doesn't! num equalled 123456, and so didn't contain the string "3.5".

The dot character has a special meaning in regular expressions: it matches any single character except for new line characters (so /a.c/ would match "abc", "a c", "a$c", etc). Using /3.5/ without escaping the dot would match the string stored in the num, as the dot operator would match the 4.

Try it out.

Quantifiers

There are a number of "quantifiers" that you can use to say how many times something should be matched. The first one is the question mark, which makes the previous token in the expression (the previous character or group of characters) optional.

The expression /regexp?/ will match both "regex" and "regexp", as the question mark makes the p (but only the p) optional.

Write an expression that will match both "frontend" and "front-end", and give it as an argument to the answer() function (eg answer(/your expression/)).

The plus sign

The next quantifier we'll be looking at it the plus sign. It means "one or more of the previous token"; /Princes+/ will match "Princes", "Princess", "Princesssss", etc. It will not, however, match "Prince".

The next expression you need to write is a little trickier. Write a regular expression which extracts everything between the opening bracket and the closing bracket of the shortStory variable (note that you can view the contents of the variable just by typing shortStory). Hint: you'll need the previously mentioned dot operator.

The asterisk

Similar to the plus sign is the asterisk; but instead of meaning "one or more", the asterisk means "zero or more" of the previous token. /Princes*/, in addition to matching all the examples from /Princes+/, would also match "Prince".

Repeat the previous example, but using the asterisk instead of the plus sign. Extract everything from the shortStory variable between the opening and closing brackets, even if there is nothing there.

Limited repetition

There is one final quantifier that you can use which allows you to limit repetition. The syntax is {min,max} which min is the minimum number of repetitions and max the maximum. For example, /a{3,5}/ would match "aaa", "aaaa" and "aaaaa", but nothing else.

Write an expression to match the text between an opening and closing bracket in the bracketNumbers variable—but only if the contents are between 5 and 8 characters long.

More limited repetition

In addition to specifying a range of repetitions, you can specify an exact number of repetitions using {n} where n is the number of repetitions. The expression a{6}, for example, will match exactly six repetitions of the letter a.

You can leave out the maximum when using curly brackets, which will match at least minimum values, with no maximum limit. For example, /a{5,}/ will match five or more of the letter a.

Pass the answer() function the equivalent of /a?b+c*/, but without using any of these characters: ?*+

Flags—the case insensitive flag

Flags are used to modify the behaviour of a regular expression, and they are specified after the expression (eg /your expression/ig). Each flag is represented by a letter, and JavaScript supports four of—two of which will be covered in this tutorial. The i flag makes the expression case insensitive—while without the flag /a/ would match "a" and not "A", /a/i would match both "a" and "A".

Run /CAT/i.exec('Category') to see the i flag in action.

Flags—The global flag

The second commonly used flag is the global flag, represented by the letter g. While /a/ only matches the first a in the string given to it, /a/g would match every single letter a.

Write a regular expression to replace every instance of the letter "a" in the shortStory variable with the letter "e".

Remember that strings have a .replace(expr, replace) method that you can use for replacements.

Character classes

Character classes allow you to specify a set or range of characters to be matched. /[aeiou]/ matches any vowel, /[a-m]/ matches any letter in the first half of the alphabet, and /[aeiou0-9]/ matches any vowel or digit.

Note that inside a character class, you don't need to escape dots and they will be matched literally. If you want a literal hyphen, however, you will need to escape it.

We're given a string which should contain a username consisting of 5 to 12 letters (uppercase or lowercase) or hyphens. Write some code that will return true if the username variable contains a valid username.

Negated character classes

A negated character class will match any character that isn't in the character class. You negate a character class by putting a caret character (^) at the beginning of the class. For example, /[^a-m]/ will match "z" and "$", but it will not match "c".

It's important to note the distinction between "not [a-m]" and "something that isn't [a-m]". /c[^a]t/ will match "cut", but it won't match "cat" and it won't match "ct"—this is important.

The username can now contain any character that isn't a space (but still has to be between 5 and 12 characters long). Write a new expression to validate the username variable.

Character types

Character types can be used as shorthand for common character classes. There are six character types: \d matches decimal digits (0-9), \s matches whitespace characters, and \w matches word characters (letters—including international characters—numbers and the underscore).

The other three character types can be found by capitalising the first three character types, which will negate their effect; \S while match any non-whitespace character, for example.

Write an expression to match a word, followed by a space, followed by a string of digits. Test the charTypeTest variable with it: don't use any literal characters.

Positions

If you want to make sure that an expression starts or ends at a certain place in a word—for example, if you want to make sure that a string starts with a capital letter—then you can use an anchor. The dollar sign matches the end of a string, and the caret sign (^) matches the beginning. /^cat$/ will match "cat" and nothing else (while just /cat/ will match anything with "cat" in.

Write an expression to test whether the possibleUrl variable starts with "http://" or "https://" and then doesn't contain any spaces all the way to the end.

Hint: Use a question mark for the protocol, and then a negated character class for the rest. You'll need both anchors.

Capturing groups

You can use parentheses to create groups, which can group multiple tokens together or store a result for later reference:

/"(.+)"/

That's an example of a capturing group, meaning that the part of the matched string within the parentheses is saved to a later point in the array returned by .match() or .exec().

Take our previous example where we grabbed the data between two parentheses using an expression like /\(.{5,8}\)/.exec(shorterStory). Try running that again, and then wrapping ".{5,8}" in parentheses and trying again.

Non-capturing groups

You can see that the array is now two items long: the first item is the entire match, and the second is only the data that the capturing group matched.

There is another type of group called a non-capturing group. This type of group, which has a slightly different syntax, doesn't store the value to an array. If you don't need to refer back to the group, you should prefer a non-capturing group: it keeps the return array cleaner. Turn the group in the previous expression into a non-capturing group by inserting "?:" into the beginning of the group before the dot.

Quantifiers

It's almost as if we don't have a group.

The main use of non-capturing groups is to apply a quantifier to a number of tokens. The following would match "I ate" and "{{ firstName }} and I ate", but nothing else:

/^(?:{{ firstEscaped }} and )?I ate$/

Write an expression which matches "ha" repeated two or more times (eg, "haha" or "hahahahaha"), and pass it the answer() function.

Hint: your expression shouldn't match "hahah". Use anchors to ensure that it doesn't.

The pipe symbol

You can specify an "or" using the pipe symbol (|). The following will match "The dog ate" and "The cat ate":

/The (dog|cat) ate/

We could also use a non-capturing group, but in this case we wanted to access the result. You can use as many pipes in one group as you want. Make the previous expression match "The rabbit ate" (currently stored in the rabbit variable), in addition to what it used to match.

Backreferences

You can reference the value of a previous capturing group later within the same expression. You simply write a backslash followed by the number of the capturing group (the index of where it will be in the returned array). For example, the following will match "The cat ate with the other cat" and "The dog ate with the other dog", but not "the cat ate with the other dog" (after all, that would just be absurd):

/The (dog|cat) ate with the other \1/

Write an expression to match the same two words in a row (eg "hello hello world"): give the expression to the answer() function as in previous examples.

The RegExp object

In addition to the literal operator (the slashes), JavaScript provides a RegExp constructor which allows you to specify your desired expression as a string. This is useful for putting variable in expressions. It works like this:

// Same as /regexp?/ig
new RegExp('regexp?', 'ig');

The username contains a variable (still). The userData variable contains user data: print it to the console to see the format of the data. Use the username variable to extract the word associated with our user. Please put your entire answer on one line so that it can be validated.

Advanced replacement

We've seen two ways in which capturing groups can have their captured value used later on: the first was the returned array, and the second was in a backreference. You can also access them from the second argument of the string .replace() method:

var text = '*italic text*';
var replace = '<em>$1</em>';
text.replace(/\*([^*]+)\*/, replace);

Write some similar code, but to turn the value of the boldText variable into a <strong> element.

Lazy vs greedy matching

By default, pattern matching in JavaScript is "greedy", which means that it matches as much as it possibly can:

'"Hi", "Hello"'.match(/".+"/)

That will return "Hi", "Hello", as it matches the two outermost quotes. Lazy pattern matching is the opposite of greedy pattern matching, and will match as little as possible—so in this case, only "Hi".

Lazy pattern matching can be achieved by putting a question mark after the quantifier—try it out with the example above.

Assertions

An assertion is a pattern that should be matched, but will not be stored: so instead of "match a and then b", we have "match a that is followed by b, but don't match b". There are two types of assertion supported in JavaScript, positive lookaheads and negative lookaheads. Lookaheads just means looking forwards; JavaScript doesn't support lookbehinds.

A positive lookahead means that we want to look ahead for a match. To look for an a followed by a b, we could use /a(?=b)/.

Use an assertion to extract "6+3" from the partialSums variable. Don't use any literal digits, use \d.

Negative assertions

Assertions can also be negative, to say that you want to match something that isn't followed by something. Note that unlike a character class, this can match something that isn't that—if you say "a that isn't followed by b", the a can be at the end of the string.

The syntax for a negative assertion is similar to that of a positive assertion, but you replace the equals sign with an exclamation mark: for example, /a(?!b)/ would find a letter a that isn't followed by a letter b.

Use a positive assertion followed by a negative assertion to extract "3+3" from the partialSums variable.

You have finished!

Congratulations, {{ firstName }}, on finishing Try Regex. You've now briefly covered most areas of regular expressions in JavaScript, and you should be able to write regex for most situations.

For further reading, try out the following links:

Fork me on GitHub