Regular Expressions

What are Regular Expressions?

In JavaScript, a regular expression object

  • is an instance of RegExp() constructor function, and

  • represents a sequence of characters constituting a pattern that can be used to match a sequence or multiple sequences of characters in strings.

RegExp Constructor

In JavaScript to initialize a regular expression object a literal syntax (//) or a constructor initialization syntax (new) can be used .

const foo = /oach/

'Roach'.match(foo) // => ['oach', index: 1, input: 'Roach', groups: undefined]
'Kelpie'.match(foo) // => null

const bar = new RegExp('ike')

'Mike'.match(bar) // => ['ike', index: 1, input: 'Mike', groups: undefined]
'Leo'.match(bar) // => null

foo.constructor // => ƒ RegExp() { [native code] }
bar.constructor // => ƒ RegExp() { [native code] }

Regular expressions initialized with the literal syntax are compiled at script load but regular expressions initialized with the constructor syntax are compiled at runtime.

Regular Expression Pattern Syntax

Regular expression patterns include:

  • simple alphanumeric characters - matched outright (e.g. michael),

  • special characters - allowing for matching a group of simple characters (e.g. digits \d, alphanumerics \w, spaces \s), a combination of characters (e.g. matching one or more of a a+), etc.

Special characters can be escaped with \ to be used outright.

Special Characters

Special characters are used for:

  • matching any characters using the dot character (.),

  • matching types of characters - digits (\d), non-digits (\D), alphanumerics \w, non-alphanumerics (\W), spaces \s, non-spaces (\S), etc.,

  • indicating specific match locations - e.g. the beginning of a string (^), the end of a string ($), before a specific pattern (foo(?=bar)), etc.

  • matching logical disjunctions - either foo or bar or both (foo|bar) , any of a, b or c ([abc] or [a-c])

  • matching logical negation - e.g. any character but a ([^a])

  • matching groups of characters using (),

  • quantifying the number of characters that comply with the match - zero or more (*), one or more (+), zero or one (?), n times (a{n}), at least n times (a{n, }), between n and m times (a{n, m})

RegExp() Methods

The most notable RegExp() instance methods are:

  • RegExp.prototype.exec() - returns an array of data on match or null otherwise, and

  • RegExp.prototype.test() which returns true on match or false otherwise.

const myRegex = /abc/

myRegex.exec('abc') // => ['abc', index: 0, input: 'abc', groups: undefined]
myRegex.exec('123') // => null

myRegex.test('abc') // => true
myRegex.test('123') // => false

String() Regex Methods

The most notable String() instance methods are:

  • String.prototype.match() - returns an array of data with capturing groups on match and null otherwise,

  • String.prototype.search() - returns index of a match or -1 otherwise,

  • String.prototype.replace() - replaces the matched pattern with a provided string,

  • String.prototype.split() - splits a provided string in place or places of the pattern match.

const myRegex = /abc/
const foo = 'xyz.abc.xyz.abc.xyz.abc'

foo.match(myRegex) // => ['abc', index: 4, input: 'xyz.abc.xyz.abc.xyz.abc', groups: undefined]
foo.search(myRegex) // => 4
foo.replace(myRegex, '123') // => 'xyz.123.xyz.abc.xyz.abc'
foo.split(myRegex) // => (4) ['xyz.', '.xyz.', '.xyz.', '']

TODO

Flags

Regexes can be appended with flags that can affect the behaviour of their patterns.

Flags are attached at the initialization of a regex.

const myRegexA = /foo/flags
const myRegexB = new RegExp(bar, flags)

Combining Flags

Multiple flags can be used simultaneously.

const multiRegex = /abc/gi
const foo = 'ABC.ABC.ABC'
foo.replace(multiRegex, '123') // => '123.123.123'