The result may be used as a subpattern in a match:
$re = qr/$pattern/;
- $string =~ /foo${re}bar/; # can be interpolated in other patterns
+ $string =~ /foo${re}bar/; # can be interpolated in other
+ # patterns
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
i Do case-insensitive pattern matching.
x Use extended regular expressions.
p When matching preserve a copy of the matched string so
- that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
+ that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
+ defined.
o Compile pattern only once.
- a ASCII-restrict: Use ASCII for \d, \s, \w; specifying two a's
- further restricts /i matching so that no ASCII character will
- match a non-ASCII one
+ a ASCII-restrict: Use ASCII for \d, \s, \w; specifying two
+ a's further restricts /i matching so that no ASCII
+ character will match a non-ASCII one
l Use the locale
u Use Unicode rules
d Use Unicode or native charset, as in 5.12 and earlier
process modifiers are available:
g Match globally, i.e., find all occurrences.
- c Do not reset search position on a failed match when /g is in effect.
+ c Do not reset search position on a failed match when /g is
+ in effect.
If "/" is the delimiter then the initial C<m> is optional. With the C<m>
you can use any pair of non-whitespace (ASCII) characters
Examples:
- open(TTY, "+</dev/tty")
- || die "can't access /dev/tty: $!";
+ open(TTY, "+</dev/tty")
+ || die "can't access /dev/tty: $!";
- <TTY> =~ /^y/i && foo(); # do foo if desired
+ <TTY> =~ /^y/i && foo(); # do foo if desired
- if (/Version: *([0-9.]*)/) { $version = $1; }
+ if (/Version: *([0-9.]*)/) { $version = $1; }
- next if m#^/usr/spool/uucp#;
+ next if m#^/usr/spool/uucp#;
- # poor man's grep
- $arg = shift;
- while (<>) {
- print if /$arg/o; # compile only once (no longer needed!)
- }
+ # poor man's grep
+ $arg = shift;
+ while (<>) {
+ print if /$arg/o; # compile only once (no longer needed!)
+ }
- if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
+ if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
This last example splits $foo into the first two words and the
remainder of the line, and assigns those three fields to $F1, $F2, and
Here's another way to check for sentences in a paragraph:
- my $sentence_rx = qr{
- (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or whitespace
- \p{Lu} # capital letter
- .*? # a bunch of anything
- (?<= \S ) # that ends in non-whitespace
- (?<! \b [DMS]r ) # but isn't a common abbreviation
- (?<! \b Mrs )
- (?<! \b Sra )
- (?<! \b St )
- [.?!] # followed by a sentence ender
- (?= $ | \s ) # in front of end-of-string or whitespace
- }sx;
- local $/ = "";
- while (my $paragraph = <>) {
- say "NEW PARAGRAPH";
- my $count = 0;
- while ($paragraph =~ /($sentence_rx)/g) {
- printf "\tgot sentence %d: <%s>\n", ++$count, $1;
- }
+ my $sentence_rx = qr{
+ (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
+ # whitespace
+ \p{Lu} # capital letter
+ .*? # a bunch of anything
+ (?<= \S ) # that ends in non-
+ # whitespace
+ (?<! \b [DMS]r ) # but isn't a common abbr.
+ (?<! \b Mrs )
+ (?<! \b Sra )
+ (?<! \b St )
+ [.?!] # followed by a sentence
+ # ender
+ (?= $ | \s ) # in front of end-of-string
+ # or whitespace
+ }sx;
+ local $/ = "";
+ while (my $paragraph = <>) {
+ say "NEW PARAGRAPH";
+ my $count = 0;
+ while ($paragraph =~ /($sentence_rx)/g) {
+ printf "\tgot sentence %d: <%s>\n", ++$count, $1;
}
+ }
Here's how to use C<m//gc> with C<\G>:
regexp tries to match where the previous one leaves off.
$_ = <<'EOL';
- $url = URI::URL->new( "http://example.com/" ); die if $url eq "xXx";
+ $url = URI::URL->new( "http://example.com/" );
+ die if $url eq "xXx";
EOL
LOOP: {
print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
- print(" lowercase"), redo LOOP if /\G\p{Ll}+\b[,.;]?\s*/gc;
- print(" UPPERCASE"), redo LOOP if /\G\p{Lu}+\b[,.;]?\s*/gc;
- print(" Capitalized"), redo LOOP if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
+ print(" lowercase"), redo LOOP
+ if /\G\p{Ll}+\b[,.;]?\s*/gc;
+ print(" UPPERCASE"), redo LOOP
+ if /\G\p{Lu}+\b[,.;]?\s*/gc;
+ print(" Capitalized"), redo LOOP
+ if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
- print(" alphanumeric"), redo LOOP if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
+ print(" alphanumeric"), redo LOOP
+ if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
print(" line-noise"), redo LOOP if /\G\W+/gc;
print ". That's all!\n";
}
Here is the output (split into several lines):
- line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
- line-noise lowercase line-noise lowercase line-noise lowercase
- lowercase line-noise lowercase lowercase line-noise lowercase
- lowercase line-noise MiXeD line-noise. That's all!
+ line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
+ line-noise lowercase line-noise lowercase line-noise lowercase
+ lowercase line-noise lowercase lowercase line-noise lowercase
+ lowercase line-noise MiXeD line-noise. That's all!
=item m?PATTERN?msixpodualgc
X<?> X<operator, match-once>
specific options:
e Evaluate the right side as an expression.
- ee Evaluate the right side as a string then eval the result.
- r Return substitution and leave the original string untouched.
+ ee Evaluate the right side as a string then eval the
+ result.
+ r Return substitution and leave the original string
+ untouched.
Any non-whitespace delimiter may replace the slashes. Add space after
the C<s> when using a character allowed in identifiers. If single quotes
Examples:
- s/\bgreen\b/mauve/g; # don't change wintergreen
+ s/\bgreen\b/mauve/g; # don't change wintergreen
$path =~ s|/usr/bin|/usr/local/bin|;
s/Login: $foo/Login: $bar/; # run-time pattern
- ($foo = $bar) =~ s/this/that/; # copy first, then change
- ($foo = "$bar") =~ s/this/that/; # convert to string, copy, then change
+ ($foo = $bar) =~ s/this/that/; # copy first, then
+ # change
+ ($foo = "$bar") =~ s/this/that/; # convert to string,
+ # copy, then change
$foo = $bar =~ s/this/that/r; # Same as above using /r
$foo = $bar =~ s/this/that/r
- =~ s/that/the other/r; # Chained substitutes using /r
- @foo = map { s/this/that/r } @bar # /r is very useful in maps
+ =~ s/that/the other/r; # Chained substitutes
+ # using /r
+ @foo = map { s/this/that/r } @bar # /r is very useful in
+ # maps
- $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
+ $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
$_ = 'abc123xyz';
s/\d+/$&*2/e; # yields 'abc246xyz'
\*/ # Match the closing delimiter.
} []gsx;
- s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively
+ s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
+ # expensively
- for ($variable) { # trim whitespace in $variable, cheap
+ for ($variable) { # trim whitespace in $variable,
+ # cheap
s/^\s+//;
s/\s+$//;
}
However, when backslashes are used as the delimiters (like C<qq\\> and
C<tr\\\>), nothing is skipped.
During the search for the end, backslashes that escape delimiters or
-backslashes are removed (exactly speaking, they are not copied to the
+other backslashes are removed (exactly speaking, they are not copied to the
safe location).
For constructs with three-part delimiters (C<s///>, C<y///>, and