Regex Review

As a developer, I want to improve my understanding of regular expressions so I can make my code and workflow more efficient. To that end, I will be unpacking regular expression more often in blog posts.

The exercise below demonstrates the large amount of logic and concepts that go into writing even a basic regular expression like this:

(\w+)\s(\w+)

Screen Shot 2014-01-25 at 11.39.08 AM

In a nutshell, this expression captures any word character before and after a white space. But much more is going on under the hood.

  • () anything in brackets is a capture. With a capture you are telling regex engine to store the match in its memory so you can call methods on strings like .search, .match, and .replace in javascript to retrieve or alter the match.
  • \w the backslash turns literal characters into special characters. We call this escaping the literal meaning. In this case, when we escape ‘w’ with ‘\’, we create a special character \w: any word character (letter, number, underscore). This is shorthand for the regular expression [0-9a-zA-Z_].
  • + The plus character has a special meaning. It is greedy. It matches the preceding character 1 or more times and is equivalent to {1,}. In this case, the preceding character is \w, or, any word character. When we combine (\w+) we’re telling regex to match one or more of any word character. Any word character will be gobbled up until we come to a non-word character; in this case, a space. You will hear a lot about greedy and lazy quantifiers in reg ex. Read more about these concepts.
  • \s Escape the literal value of s to its special meaning, i.e., any whitespace character. Because this white space exists in our string, “John Smith”, if we want to match words around that space, we have to tell regex that the space exists. The regular expression (\w+)(\w+) would match the string “JohnSmith”, but it will only return “John” from “John Smith”.
  • (\w+) Last step is to capture one or more of any word character after the white space, \s. To do this, we repeat the capture we used before the white space.

Note the example above was executed in Scriptular, a A JavaScript regular expression editor. Rubyists can also check out Rubular. Both are great environments for checking the efficacy of your regex.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s