Regex in JavaScript

Time to unpack another regex, but this time, I’ll also walk through how we can use the regex in JavaScript to add a comma to a string. First, let’s take a look at what we’ll be matching, using Scriptular, a JavaScript regex editor.

Screen Shot 2014-01-25 at 4.05.24 PM

The regex above includes two captures from the string “New York NY 10006 United States” using the syntax ([a-zA-Z\s]+) ([A-Z]{2}). Let’s unpack the syntax.

  • In regular expressions, captures are delimited by parentheses. The parentheses tell the regex engine to store these captures in its memory, so they can be accessed via methods in JavaScript, Ruby, PHP, etc. The first capture in the expression is wrapped in parentheses, just like every other capture: ([a-zA-Z\s]+)
  • The square brackets tell regex to search for a range of characters. In this case, we’ve specified any single character in the range a-z or A-Z, or any whitespace character.
  • The plus quantifier tells regex to be greedy; in other words, don’t just match one character that meets the conditions of the bracketed range, match as many characters as possible.
  • Notice that I put a single space between the two captures enclosed in parentheses. Think of this as a boundary between captures. It tells the greedy plus quantifier to stop munching up characters.
  • On the other side of the space boundary, we put our second capture, ([A-Z]{2}). Here, we’re telling regex to only match two characters within the range of A-Z. The curly bracketed {2} tell regex that we only want two characters from the range.

Now let’s move from the regex editor to actually executing JavaScript with regex. We can set both our regex and string to variables and then call .replace on the string variable to add a comma after the first capture (technically, we are replacing the first capture with itself, plus a comma afterward).

To execute the .replace function, we pass it two parameters, the reg variable, which holds our two captures, plus the replacement value. Notice that funny $1 … weird, right? It’s shorthand for saying capture 1 from our regex variable. To replace capture 1 with its value plus a comma, we just pass $1, as the second parameter.

If you’re new to programming and wondering where you can execute this code, try JSConsole. Also, try getting comfortable working in directly in your browser console via the Inspector.

What’s that ‘g’ for?

The ‘g’ in regex stands for global search flag. If we place a g after our reg variable like this…

var reg = /([a-zA-Z\s]+) ([A-Z]{2})/g

…we are telling it to match ALL occurrences of the pattern we have specified. This brings us to an important difference between regex in Ruby and regex in Javascript. While in Ruby the default search is global, in JavaScript you have to explicitly say “g” to execute a global search. In addition, when we call .match on our string, the g flag will tell javascript to return an array of captures instead of an array of groups.

Here is an example of the capture array .match returns when using the g flag.

Without the g flag, .match will return an array of the match groups. Match groups are not actual matches, but the components of the match. The difference is fuzzy and confusing. The clearest definition I have found is from Jeffrey Friedl in Mastering Regular Expressions: “The main difference between a Group object and a Capture object is that each Group object contains a collection of Captures representing all the intermediary matches by the group during the match, as well as the final text matched by the group.”

Here is an example of calling .match on our string without the g flag.

Regex Review

As a developer, I want to improve my understanding of regular expressions so I can make my code and workflow more efficient. To that end, I will be unpacking regular expression more often in blog posts.

The exercise below demonstrates the large amount of logic and concepts that go into writing even a basic regular expression like this:

(\w+)\s(\w+)

Screen Shot 2014-01-25 at 11.39.08 AM

In a nutshell, this expression captures any word character before and after a white space. But much more is going on under the hood.

  • () anything in brackets is a capture. With a capture you are telling regex engine to store the match in its memory so you can call methods on strings like .search, .match, and .replace in javascript to retrieve or alter the match.
  • \w the backslash turns literal characters into special characters. We call this escaping the literal meaning. In this case, when we escape ‘w’ with ‘\’, we create a special character \w: any word character (letter, number, underscore). This is shorthand for the regular expression [0-9a-zA-Z_].
  • + The plus character has a special meaning. It is greedy. It matches the preceding character 1 or more times and is equivalent to {1,}. In this case, the preceding character is \w, or, any word character. When we combine (\w+) we’re telling regex to match one or more of any word character. Any word character will be gobbled up until we come to a non-word character; in this case, a space. You will hear a lot about greedy and lazy quantifiers in reg ex. Read more about these concepts.
  • \s Escape the literal value of s to its special meaning, i.e., any whitespace character. Because this white space exists in our string, “John Smith”, if we want to match words around that space, we have to tell regex that the space exists. The regular expression (\w+)(\w+) would match the string “JohnSmith”, but it will only return “John” from “John Smith”.
  • (\w+) Last step is to capture one or more of any word character after the white space, \s. To do this, we repeat the capture we used before the white space.

Note the example above was executed in Scriptular, a A JavaScript regular expression editor. Rubyists can also check out Rubular. Both are great environments for checking the efficacy of your regex.

Prototype inhertitance in JS

Tonight in New York City the Yeti is on the prowl. Fourteen inches of snow are forecast to fall by morning, and at the top of the food chain is a menacing monster. We can represent this beast as a JavaScript object to demonstrate prototype inheritance in JavaScript.

Notice that the yeti object is represented by a hash of key-value pairs. The first key, find, has a value that is a function. That function returns the value of the second key in the yeti hash, appetite, by using the “this” keyword as shorthand for “self” or “this object”. It is strange for an object to reflect on itself like this, but as an academic exercise, it illustrates inheritance very well. Things will get stranger. For now, just know that the this keyword always refers to an object.

To prove this works, let’s call .find on the yeti.

yeti.find();

//gigantic

So far so good, we returned exactly what was expected, the value of the yeti’s appetite. But here are many other hungry beasts in the blizzard and we want to display the value of their appetites too. How can we do this in Javascript? Well, first, we could create individual objects for each beast with their own find methods, but that would violate one of the tenants of object-oriented programming: don’t repeat yourself.

Instead, we’ll let each of our new objects inherit the find function from the abominable snow beast. Let’s go down the food chain to demonstrate the quirks of prototype inheritance in JavaScript and create a polar bear object that inherits from the yeti.

var polar_bear = Object.create(yeti);
polar_bear.appetite = “huge”;

Not being nearly as ferocious as the yeti, we set the polar bear’s appetite to “huge”. When we call .find on polar_bear, it will go up the prototype inheritance tree searching for the .find function. When it finds it in the yeti object, it will start evaluating the function and see the keyword “this”. Because this refers to the object that called the method, i.e., the polar bear, the value returned will be the value of the polar bear’s appetite(huge) — not the yeti’s(gigantic).

polar_bear.find();  //huge

We can keep going down the food chain, creating descendants of the yeti who all inherit the find function from the dreaded monster. But prototype inheritance is tricky. For each new object, if we don’t declare an appetite variable, you might think that the object would instead inherit the yeti’s appetite, too, but that is not the case.

Prototype inheritance will refer to the appetite of the last ancestor in the inheritance chain that did have an appetite.

For example, When JavaScript sees the .find function called on the arctic hare object, it goes all the way up the chain until until if finds the function in the yeti. When JS sees the keyword “this” it goes back down the chain to the object that called .find (the snow mouse). Because the snow mouse has no appetite set, JS goes back up the inheritance chain again until it finds an appetite key-value pair. Once it finds a appetite variable (in the seal object) JS considers its work done and hands that key-value pair to the descendant object, the snow mouse.

The lesson is, when using prototype inheritance, we need to be very clear on how JavaScript will interpret the this keyword from a parent object to avoid inheriting erroneous appetites, such as big for the snow mouse.

pancakes

I have been getting the error “stack too deep” in Ruby for months now. I never new what it meant until this morning when studying JavaScript. When a function is called, it must remember its context in order to execute properly. The context is stored in that “stack.” If another function is called inside a function, its context must also be thrown on top of the stack. Contexts, like pancakes, will be added on top of one another. Only when a function returns will its context be taken off the stack. Because the stack requires memory, if too many functions are called within a parentFunction, the computer will get into a recursive K-hole and throw an error like “maximum call stack size exceeded” or “go to hell i am tired.”

Anatomy of a JavaScript function

Before we get to the anatomy of a JavaScript function, I learned an important distinction about the scope of variables in JS today. While a variable defined in a block is local to itself and whatever context it is local to, a variable defined in a function is local only to that function. This means the variable cannot be accessed outside the function, which is great, because it negates problematic side-effects, such as unintended reassignment of variables. In other words, functions constrain the namespace of the variables they hold.

And now… a simple recipe for a JavaScript function that finds the absolute value of a number (num).

1. Declare a function and assign it a variable name. Use the keyword function, followed by parentheses:

    function absolute();

2. Parentheses serve as a container for possible arguments. If you want to use arguments, now would be good time to place them inside the parentheses.

    function absolute(num);

2. Wrap the entire body of the function in curly brackets.

    function absolute(num) {
    //I’m the body
    }

3. When using conditionals, wrap each conditional in parentheses.

    function absolute(num) {
    if (num > 0) // wrap me, i’m a condtional
    }

4. Separate control flow statements with curly brackets. In this case, if a number is less than zero, we return it’s opposite value, i.e., its absolute value.

Note: A closing curly bracket implicitly tells JavaScript that it has reached the end of a line, so no semicolon is needed after it. A semicolon must be placed at the end of all other lines.