Regular Expressions and String Manipulation
these days, there are many great reasons to program in perl. One of those happens to be the first among those: its natural ability to play with strings and, in particular, regular expressions.The following two operators, =~ (match) and !~ (no match), are among the most basic. =~ returns the number of times a substring matching the regular expression is found in the supplied string. Sometimes it is interpreted as a true/false expression, where 0 matches is false (not found). The "not in" opertor !~ retunrs true if no matches are found.
The general forms are as follows:
$nummatches = ($somestring =~ /regular expression/); $notin = ($somestring !~ /regular expression/);
If you group parts of a regular expression within ()-parenthesis, and the regular expression is matched, each match within ()-parenthesis will be saved into a special variable -- much as was the case with, for example, sed. These special variables are $1, $2, etc. Careful! Careful! Everyone wants to believe that these variables represent command-line arguments as they do in shell. Notice the difference! It is also worth noting that, although not preferred, Perl will accept the \1, \2, /3, etc, notation common in many other programs. Regardless, here's a quick example:
if ( $somestring ~= /([0-9]+)[a-zA-Z]*([0-9]+)/) { # $1 is the number at the begining of the line # $2 is the number at the ending of the line } else { # $1 and $2 are unchanged }perl also has a special variable, $_, which represents the default string. Several important operators act on this string by default. For example, perl can do sed-style searching and replacing. When this type of expression is defined, it is acting upon $_:
$_ = "This is an example string: Hello World"; $changes = s/World/WORLD/g; print "$_\n"; # "World" is now WORLD print "$changes\n"; # The number of substitutions made; in this case, 1
The tr function is also very powerful. It acts much like the tr command. It allows the user to define a mapping of character-for-character substitutions and applies them to $_. Each character in the first field will be replaced by the corresponding character in the second filed. As with th s function above, it returns the number of substitutions:
$changes = tr/abc/123/; # a becomes 1, b becomes 2, c becomes 3
Please note: In the examples above, there are no quotes around the tr and s expressions. This is important. If the expressions are quoted, they'll be interpreted as strings and assigned, instead of interpreted as regex operations and performed.