Regex in JavaScript

Time to unpack another regex, but this time, I’ll also walk through how we can use the regex in JavaScript to add a comma to a string. First, let’s take a look at what we’ll be matching, using Scriptular, a JavaScript regex editor.

Screen Shot 2014-01-25 at 4.05.24 PM

The regex above includes two captures from the string “New York NY 10006 United States” using the syntax ([a-zA-Z\s]+) ([A-Z]{2}). Let’s unpack the syntax.

  • In regular expressions, captures are delimited by parentheses. The parentheses tell the regex engine to store these captures in its memory, so they can be accessed via methods in JavaScript, Ruby, PHP, etc. The first capture in the expression is wrapped in parentheses, just like every other capture: ([a-zA-Z\s]+)
  • The square brackets tell regex to search for a range of characters. In this case, we’ve specified any single character in the range a-z or A-Z, or any whitespace character.
  • The plus quantifier tells regex to be greedy; in other words, don’t just match one character that meets the conditions of the bracketed range, match as many characters as possible.
  • Notice that I put a single space between the two captures enclosed in parentheses. Think of this as a boundary between captures. It tells the greedy plus quantifier to stop munching up characters.
  • On the other side of the space boundary, we put our second capture, ([A-Z]{2}). Here, we’re telling regex to only match two characters within the range of A-Z. The curly bracketed {2} tell regex that we only want two characters from the range.

Now let’s move from the regex editor to actually executing JavaScript with regex. We can set both our regex and string to variables and then call .replace on the string variable to add a comma after the first capture (technically, we are replacing the first capture with itself, plus a comma afterward).

To execute the .replace function, we pass it two parameters, the reg variable, which holds our two captures, plus the replacement value. Notice that funny $1 … weird, right? It’s shorthand for saying capture 1 from our regex variable. To replace capture 1 with its value plus a comma, we just pass $1, as the second parameter.

If you’re new to programming and wondering where you can execute this code, try JSConsole. Also, try getting comfortable working in directly in your browser console via the Inspector.

What’s that ‘g’ for?

The ‘g’ in regex stands for global search flag. If we place a g after our reg variable like this…

var reg = /([a-zA-Z\s]+) ([A-Z]{2})/g

…we are telling it to match ALL occurrences of the pattern we have specified. This brings us to an important difference between regex in Ruby and regex in Javascript. While in Ruby the default search is global, in JavaScript you have to explicitly say “g” to execute a global search. In addition, when we call .match on our string, the g flag will tell javascript to return an array of captures instead of an array of groups.

Here is an example of the capture array .match returns when using the g flag.

Without the g flag, .match will return an array of the match groups. Match groups are not actual matches, but the components of the match. The difference is fuzzy and confusing. The clearest definition I have found is from Jeffrey Friedl in Mastering Regular Expressions: “The main difference between a Group object and a Capture object is that each Group object contains a collection of Captures representing all the intermediary matches by the group during the match, as well as the final text matched by the group.”

Here is an example of calling .match on our string without the g flag.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s