You won’t up your algorithm game if you don’t learn this skill
23 Sep 2019 - John
In my last post, we explored different approaches to solving a DNA transcriber that gave back the corresponding RNA sequence to any given DNA string.
As a refresher, DNA is represented by four characters, G, C, T and A, while each character has a corresponding RNA match: C, G, A and U.
At the end of the excercise, we concluded that this was an acceptable solution:
const rna = {
G: 'C',
C: 'G',
T: 'A',
A: 'U'
}
const toRna = (dna) => {
return dna.split('').map((letter) => {
return rna[letter];
}).join('');
};
but we also concluded that there was still room for improvement. Enter regex.
What is regex?
A regular expression, shortened to regex, represents a sequence of characters that make up a search pattern. In other words, it’s a way to tell the computer "hey, find me anything that matches this pattern."
Cool, how do I use it?
The main ways to create a regex is by using a literal or by using a constructor:
// Using a literal:
const myRegex = /some pattern/;
// Using a constructor:
const myRegex = new Regex('Some pattern')
So let’s see how we can put it to use. For this example, we’re going to use the String.replace()
method. It takes two arguments: the pattern to match and what we’re replacing it with:
const myRegex = /aaa/;
const myString = "123aaa456"
myString.replace(myRegex, 'bbb'); // "123bbb456"
Simple enough. We’re declaring a regex comprised of the characters 'aaa'
, and then evaluating a second string for the presence of this pattern so we can substitute with the string 'bbb'
. Now we could’ve done that with simple strings, but that’s because our pattern was super simple. Let’s dig a bit deeper.
Defining ranges
We can define a range by using brackets: /[a-z]/
to match alphabetic characters. However, this will only match the first one and then exit, so we need to tell it to keep going until it finds all the matches. To do this, all we need to do is add the g switch, which stands for "global": /[a-z]/g
. We can also tell the match to be case insensitive with the i
switch: /[a-z]/gi
:
const myString = '123a45B67d89e';
const myRegex = /[a-z]/gi;
myString.replace(myRegex, '?'); // "123?45?67?89?"
So we took a string that has a bunch of numbers mixed with letters in different cases. Then we built an expression that said "find all the letters from a to z regardless of their case" and then ran the regex over the string to substitute any matches with a question mark. How neat is that?
Excluding matches
What if we want to get all the letters, but not our vowels? We can tell our expression to ignore certain matches. We use the ^
character to basically tell it "not this":
const myString = 'The quick brown fox jumps over the lazy dog';
const myRegex = /[^aeiou]/gi;
myString.replace(myRegex, '_'); //"__e__ui_____o____o___u____o_e____e__a____o_"
This time we took our string and matched it with everything that was not a vowel, then we substituted every match with an underscore.
Back to our original exercise
We can use our newly acquired powers to simplify our original algorithm even more:
export const toRna = dna => {
const map = {
C: 'G',
G: 'C',
A: 'U',
T: 'A'
};
return dna.replace(/[CGAT]/g, rna => map[rna]);
};
toRna('ACGTGGTCTTAA'); // 'UGCACCAGAAUU'
We took our original map, then went through each our string positions and compared them to the keys in our map. Then we substituted each match from its corresponding value. Awesome!
Digging deeper
We only saw ONE of the methods we can use with regex. Other methods include test
, search
, match
and many more. This is a very interesting topic that gets pretty messy pretty quick, so be sure to take it one day at a time and sooner than you know, your algorithm game will go through the roof!