Skip to main content

I love regular expressions, but I hate writing them, and I especially hate inheriting them. Why? Because they can be super complex and very hard to read. I will often stumble across some regex in one of my projects and ask ‘who the hell wrote this!? I have NO idea what it is doing!’. After doing a git blame, I find out that the offender was me. Shame on me.

Even worse are situations in which I find myself tasked with fixing some regex that 1) I did not write, 2) is critical to a production system and 3) has crazy look ahead and look behind assertions.

So what was my solution to this? 3 fold:

  1. I grew up and came to terms with the fact that my memory sucks and that I probably won’t be responsible for the code that I write in a few years.
  2. I commented the crap out of my regex, explaining each block and why it is there
  3. Most importantly, I created some unit tests for my regex.

Number 3 is key, and I’d recommend it to everyone. When you write regex, always write a unit test to back it up. Even if you think it is self explanatory. Give it some strings that should match and some strings that should not match. Make sure you cover all the edge cases, and when you find an edge case that you missed (which you will), add it to the unit test.

First posted on 15 May 2017