Based on...
Software Engineer at Ticketleap, working on Port (currently in beta).
Monetate: 3rd party script that allows marketers to personalize/customize their e-commerce sites and measure the impact.
Allow us to match patterns (sometimes extremely complex) in strings.
Given some text and a pattern, you can "split" "replace" "match" "search" "test" and "exec." But the how is a bit quirky.
var str = 'which way';
str.search(/hi/); // returns 1
var pattern = /hi/;
pattern.test('chocolate chip'); // returns true
Regexes are VERY POWERFUL, but can be difficult to use.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. —Jamie Zawinski
Fun/silly stuff:
Real uses:
For the following examples, you can use regex101 to easily experiment.
'Hello, class'
/a/
, or instantiate with new RegExp('a')
. Either would match the first lowercase "a"/Hello/
^
and $
(which you may recognize from some CSS uses) /^class/
would not find a match, but /class$/
would
'$50.00'
/\$/
We've already seen that the dollar sign has special meaning in a regular expression, so what if we're actually looking for a dollar sign character? Use a backslash
[0123456789]
You can also match any of a group of characters by using square brackets
[0-9]
But you can also signify any of a range of characters by using a hyphen
[0-9a-z]
Or multiple ranges.
This signifies a range of unicode characters, by number.
What's the difference between [A-z] and [A-Za-z]?
[^13579]
You can also use a carat to exclude characters in a range. This example would match only even digits
x|y|z
Any of the pipe-separated options
Here, there is no difference from [xyz], but it becomes more useful when using longer expressions, for example x1|y2|z3.
There are also some special wildcards that you can use to find pre-defined character sets:
.
A period matches any character. Escape with a backslash to match an actual period.
\s
for whitespace (\S
for non-whitespace) Whitespace: spaces, tabs, new lines, CRs, etc
\d
for digits (\D
for non-digits)
\w
for "word" characters, which includes letters, digits, and the underscore (\W
for non-word)
\b
for the beginnings or ends of words (\B
for everything else)
Modifiers added after the final slash signify global, case-insensitive, and multi-line searches that will match the start and end of a line as if it were the start or end of a string (g, i, m). There are a couple other modifiers for unicode and 'stickiness,' but they're not used often.
'Hello, class.
How was your week?'
/h/
would have no matches.
/h/i
would match the first "H"
/h/ig
would match all instances of "H". This is useful for "match" and "replace" functions. "split" is global by default, and "test" and "search" are not affected by a global modifier.
/^h/ig
would match only the first "H", while /^h/igm
would match both.
Sometimes we'll want to look for a pattern, but only use part of it. Parentheses indicate a part of the pattern you would specifically like the search to "capture" and return.
"ace".match(/a(c)e/); // returns ["ace", "c"]
The whole pattern is matched, and the next element of the returned array is the specifically "captured" group of matched characters.
Now combine that with the character range syntax /a([a-z])e/
var pattern = /a([a-z])e/;
"age".match(pattern); // returns ["age", "g"]
"ale".match(pattern); // returns ["ale", "l"]
"variegated".match(pattern); // returns ["ate", "t"]
You can also specify different numbers of characters to allow in the match.
+
Allows for any (non-zero) number of the preceding character or expression to qualify for the match.
*
Looks for zero or more of the preceding character or expression
?
Matches exactly 0 or 1 of something
{3} {1,3}
Match a specific quantity, or a specific quantity range.
This looks like...
"variegated".match(/a([a-z]+)e/); // returns ["ariegate", "riegat"]
(This might not be what you expected. Watch out for "greediness!" use +?
, *?
or {}?
to create a "reluctant", rather than greedy search)
"variegated".match(/a([a-z]+?)e/); // returns ["arie", "ri"]
/a([a-z])e/ // matches ace, ate, ale and returns the middle letter
/a(pa)?ce/ // matches ace or apace and returns "pa" or ""
/a(cr|bl)e/ // matches acre or able and returns "cr" or "bl"
In cases 2 and 3, if you want parentheses for grouping, but don't need the result, use ?:
inside the parens to indicate a non-capturing group.
/a(?:[a-z])e/ // this is the same as leaving out the parens entirely
/a(?:pa)?ce/ // matches ace or apace and returns nothing
/a(?:cr|bl)e/ // matches acre or able and returns nothing
/(\d+)[\,\.](?=\d{3})/
(\d+)
Capture one or more digits
[\,\.]
A period or comma
(?= )
Followed by... (more on looking ahead/back)
\d{3}
Exactly 3 digits
var pattern = /(\d+)[\,\.](?=\d{3})/g;
function stripSeparators(numStr) {
return numStr.replace(pattern, "$1");
}
Each time the full pattern is matched, it is replaced with the first captured group (element 1 of the array returned by the pattern match).
Regular expressions can be incredibly useful and powerful. When you start small and get familiar, they can even be fun!