From 8129baca5dd762540c807db6ddf8d2e9fa4121b2 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 5 Feb 2012 16:17:54 -0700 Subject: [PATCH] perrebackslash, perlrecharclass: Note locale effects This adds text to specify what happens under 'use locale'. --- pod/perlrebackslash.pod | 3 +++ pod/perlrecharclass.pod | 22 +++++++++++++++------- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index cc72a1f..98435e5 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -618,6 +618,9 @@ C<\R> can match a sequence of more than one character, it cannot be put inside a bracketed character class; C is an error; use C<\v> instead. C<\R> was introduced in perl 5.10.0. +Note that this does not respect any locale that might be in effect; it +matches according to the platform's native character set. + Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>, and more importantly because Unicode recommends such a regular expression metacharacter, and suggests C<\R> as its notation. diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index f50699b..06d206b 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -252,24 +252,30 @@ Which rules apply are determined as described in L is matched by C<\S>. C<\h> matches any character considered horizontal whitespace; -this includes the space and tab characters and several others +this includes the platform's space and tab characters and several others listed in the table below. C<\H> matches any character -not considered horizontal whitespace. +not considered horizontal whitespace. They use the platform's native +character set, and do not consider any locale that may otherwise be in +use. C<\v> matches any character considered vertical whitespace; -this includes the carriage return and line feed characters (newline) +this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below. C<\V> matches any character not considered vertical whitespace. +They use the platform's native character set, and do not consider any +locale that may otherwise be in use. C<\R> matches anything that can be considered a newline under Unicode rules. It's not a character class, as it can match a multi-character sequence. Therefore, it cannot be used inside a bracketed character -class; use C<\v> instead (vertical whitespace). +class; use C<\v> instead (vertical whitespace). It uses the platform's +native character set, and does not consider any locale that may +otherwise be in use. Details are discussed in L. Note that unlike C<\s> (and C<\d> and C<\w>), C<\h> and C<\v> always match -the same characters, without regard to other factors, such as whether the -source string is in UTF-8 format. +the same characters, without regard to other factors, such as the active +locale or whether the source string is in UTF-8 format. One might think that C<\s> is equivalent to C<[\h\v]>. This is not true. The difference is that the vertical tab (C<"\x0b">) is not matched by @@ -777,7 +783,9 @@ The POSIX class matches the same as its Full-range counterpart. =item if locale rules are in effect ... -The POSIX class matches according to the locale. +The POSIX class matches according to the locale, except that +C uses the platform's native underscore character, no matter what +the locale is. =item if Unicode rules are in effect or if on an EBCDIC platform ... -- 2.7.4