NEWS // Blog

RESOURCES

Series: Using Regular Expressions in Ruby — Part 2 of 3

This is the second installation in our three part series on regular expressions. Before continuing be sure to check out Part I and also Part III.
Nell Shamrell works as a Software Development Engineer for Blue Box. She also sits on the advisory board for the University of Washington Certificate in Ruby Programming. She specializes in Ruby, Rails, and Test Driven Development.  Prior to entering the world of software development, she studied and worked in the field of Theatre. The world of Theatre prepared her well for the dynamic world of creating software applications. In both, she strives to create a cohesive and extraordinary experience. In her free time she enjoys practicing the martial art Naginata.

LookArounds

Lookarounds let me to do more than just match a pattern directly. They let me to define a context for that match. An expression with a lookaround only returns a match when it is surrounded by a certain context. Let’s start with a new string, yet another Star Wars quote from Obiwan Kenobi.

string = "Who’s the more foolish?  The fool or the fool who follows him?"

I want to know all the places the word "fool" is used in this string. I’m going to use the regular expression /fool/. In this case, I’m going to use Ruby’s scan method on my string. The scan method will return all matches for my regular expression in my string:

string.scan(/fool/)
=> ["fool", "fool", "fool"]

Notice it matches the word "foolish" and the two uses of the word "fool."

What if I only want to match the pattern /fool/ when it is part of the word foolish? I would use a positive lookahead. This tells my regular expression engine to find every match for my pattern that is directly followed by a match for another pattern. In Ruby, we designate something as a positive lookahead by using ?= operator:

/fool(?=ish)/

Here’s my modified regular expression.  Notice I have the primary pattern, which is the literal world "fool," and directly to the right of it I have the lookahead pattern, the letters "ish":

string.scan(/fool(?=ish)/)
>=> ["fool"]

This time, the scan method only returns one match – the one time the word "fool" is followed by the characters "ish". Let’s take this a step further and use the gsub method to change our string. Anytime we match the pattern fool –followed by the letters "ish", let’s replace it with the word "self":

string.gsub(/fool(?=ish)/, "self")
=> "Who’s the more selfish?  The fool or the fool who follows him?"

And with apologies to Obiwan Kenobi, we’ve modified the line to "Who’s the more selfish?  The fool or the fool who follows him?" Technically, this is referred to as a zero width, positive lookaround assertion. That’s a mouthful, isn’t it? In The Well Grounded Rubyist, David Black breaks it down like this:

ZERO-WIDTH means the lookahead pattern ( which is "ish" in our case) does not consume any characters.  This means it makes a match, but it doesn’t return that match.  It only returns whether there was match or not.  True or false.
POSITIVE means a match for the lookahead pattern should be there, you’re expecting it.
LOOKAHEAD means something needs to be ahead of the match for our main pattern.  It needs to come after the match for the main pattern.
ASSERTION means our lookbehind isn’t meant to return a match, it’s only meant to assert whether a match exists or whether it does not.

What if I wanted to do something slightly different? What if I wanted to match every time the word fool is NOT followed by the letters "ish"?  I would use a negative lookahead. Technically, this is referred to as a zero-width negative lookahead assertion. Negative means a match for our lookahead should NOT be present, we’re not expecting it to be there. You use the ?! operator to designate a negative lookahead.

I’m going to run scan on my string again, but this time with a negative lookahead in my regular expression. I want it to match every time the fool is NOT a part of the word foolish:

string.scan(/fool(?!ish)/)
=> ["fool", "fool"]

It returns two matches, the two times the string uses the word "fool" without being part of the word foolish. Let’s take it a step further and use it with the gsub method.  Anytime the we match the pattern fool - only when it is NOT followed by the letters "ish" - let’s replace it with the word "self":

string.gsub(/fool(?!ish)/, "self")
=> "Who’s the more foolish?  The self or the self who follows him?"

Once again, I’ve changed a classic line. Now it reads "Who’s the more foolish? The self or the self who follows him?"

These examples are great when I want to find a match based on what comes after it.  Again, let’s take it a step further.  What if I want to find a match based on what comes before? I need to use a positive lookbehind assertion. This means I want to match a pattern every time it is directly preceded by another pattern.

Let’s use another Star Wars quote for our string, this one from Yoda:

string = "For my ally is the force, and a powerful ally it is"

The main pattern I want to match is the word ally using the regular expression /ally/. I only want to match the word "ally" when the word "powerful" comes directly before it, however. This is where the positive lookbehind comes in. Positive lookbehinds use the ?<= operator. Let’s add it to our regular expression:

/(?<=powerful )ally/

This regular expression matches the word "ally" every time it is directly preceded by the word powerful. Notice the lookbehind is behind the main pattern. The lookbehind needs to come before the main match.  The word "powerful" needs to come before the word "ally."

Now I’m going to use the gsub method on the string. Every time the word "ally" is directly preceded by the world powerful, I want to replace it with the word "friend":

string.gsub(/(?<=powerful )ally/, "friend")
=> For my ally is the force, and a powerful friend it is.

It changes Yoda’s words a little bit:  "For my ally is the force, and a powerful friend it is."

What if I want to do the opposite?  What if I want to match every time the word "ally" is NOT followed by the word "powerful?" I would use a negative lookbehind. This means I want to match my pattern every time it is NOT directly preceded by another pattern. Negative lookbehinds use the ?<! operator. Let’s apply it to the regular expression:

/(?<!powerful )ally/

Let’s run gsub using this regular expression, replacing the word "ally" every time it is NOT directly preceded by the word "friend":

string.gsub(/(?<!powerful )ally/, "friend")
=> "For my friend is the force, and a powerful ally it is."

Again, I changed Yoda’s words a little bit:  "For my ally is the force, and a powerful friend it is."

Lookarounds provide a tremendous boost to your regular expressions because they help you define context. Rather than being a static pattern that either matches or doesn’t, your regular expression becomes powerful, flexible, and capable of much more.

Please check back next week for the third installment of this three part series.

Nell Shamrell
Nell Shamrell works as a Software Development Engineer for Blue Box. She also sits on the advisory board for the University of Washington Certificate in Ruby Programming. She specializes in Ruby, Rails, and Test Driven Development. Prior to entering the world of software development, she studied and worked in the field of Theatre. The world of Theatre prepared her well for the dynamic world of creating software applications. In both, she strives to create a cohesive and extraordinary experience. In her free time she enjoys practicing the martial art Naginata.

SHARE

Series: Using Regular Expressions in Ruby — Part 1 of 3 Code School Joins Blue Box’s Growing List of Marquee Customers
Q

We get it. Apps that are changing the world can't afford to be offline. Ever.


99.999% uptime. 24/7/365 live support.