perllocale: Nits, corrections

author Karl Williamson <public@khwilliamson.com>

Wed, 22 Jan 2014 17:52:21 +0000 (10:52 -0700)

committer Karl Williamson <public@khwilliamson.com>

Wed, 22 Jan 2014 18:46:00 +0000 (11:46 -0700)
author Karl Williamson <public@khwilliamson.com>
Wed, 22 Jan 2014 17:52:21 +0000 (10:52 -0700)
committer Karl Williamson <public@khwilliamson.com>
Wed, 22 Jan 2014 18:46:00 +0000 (11:46 -0700)
diff --git a/pod/perllocale.pod b/pod/perllocale.pod

index 32b91bb..5efc754 100644 (file)
--- a/pod/perllocale.pod
+++ b/pod/perllocale.pod
@@ -193,17 +193,25 @@ The operations that are affected by locale are:
  
  =item B<Not within the scope of any C<"use locale"> variant>
  
-Only operations originating outside Perl should be affected.
+Only operations originating outside Perl should be affected, as follows:
+
+=over 4
+
+=item *
  
  The variable L<$!|perlvar/$ERRNO> (and its synonyms C<$ERRNO> and
  C<$OS_ERROR>) when used as strings always are in terms of the current
  locale.
  
+=item *
+
  The current locale is also used when going outside of Perl with
  operations like L<system()|perlfunc/system LIST> or
  L<qxE<sol>E<sol>|perlop/qxE<sol>STRINGE<sol>>, if those operations are
  locale-sensitive.
  
+=item *
+
  Also Perl gives access to various C library functions through the
  L<POSIX> module.  Some of those functions are always affected by the
  current locale.  For example, C<POSIX::strftime()> uses C<LC_TIME>;
@@ -211,9 +219,25 @@ C<POSIX::strtod()> uses C<LC_NUMERIC>; C<POSIX::strcoll()> and
  C<POSIX::strxfrm()> use C<LC_COLLATE>; and character classification
  functions like C<POSIX::isalnum()> use C<LC_CTYPE>.  All such functions
  will behave according to the current underlying locale, even if that
-isn't exposed to Perl space.
+locale isn't exposed to Perl space.
  
-And, certain Perl operations that are set-up within the scope of a
+=item *
+
+Perl also provides lite wrappers for XS modules to use some C library
+C<printf> functions.  These wrappers don't do anything with the locale,
+and the underlying C library function is affected by the locale in
+effect at the time of the wrapper call.
+The affected functions are
+L<perlapi/my_sprintf>,
+L<perlapi/my_snprintf>,
+and
+L<perlapi/my_vsnprintf>.
+
+=back
+
+=item Lingering effects of C<S<use locale>>
+
+Certain Perl operations that are set-up within the scope of a
  C<use locale> variant retain that effect even outside the scope.
  These include:
  
@@ -470,7 +494,7 @@ Setting the environment variable C<PERL_DEBUG_FULL_TEST> to 1
  will cause it to output detailed results.  For example, on Linux, you
  could say
  
- PERL_DEBUG_FULL_TEST=1 ./perl -T lib/locale.t > locale.log 2>&1
+ PERL_DEBUG_FULL_TEST=1 ./perl -T -Ilib lib/locale.t > locale.log 2>&1
  
  Besides many other tests, it will test every locale it finds on your
  system to see if they conform to the POSIX standard.  If any have
@@ -769,9 +793,10 @@ strings and C<s///> substitutions; and case-independent regular expression
  pattern matching using the C<i> modifier.
  
  Finally, C<LC_CTYPE> affects the POSIX character-class test
-functions--C<isalpha()>, C<islower()>, and so on.  For example, if you move
-from the "C" locale to a 7-bit Scandinavian one, you may find--possibly
-to your surprise--that "|" moves from the C<ispunct()> class to C<isalpha()>.
+functions--C<POSIX::isalpha()>, C<POSIX::islower()>, and so on.  For
+example, if you move from the "C" locale to a 7-bit Scandinavian one,
+you may find--possibly to your surprise--that "|" moves from the
+C<POSIX::ispunct()> class to C<POSIX::isalpha()>.
  Unfortunately, this creates big problems for regular expressions. "|" still
  means alternation even though it matches C<\w>.
  
@@ -779,7 +804,7 @@ Note that there are quite a few things that are unaffected by the
  current locale.  All the escape sequences for particular characters,
  C<\n> for example, always mean the platform's native one.  This means,
  for example, that C<\N> in regular expressions (every character
-but new-line) work on the platform character set.
+but new-line) works on the platform character set.
  
  B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
  in clearly ineligible characters being considered to be alphanumeric by
@@ -932,7 +957,7 @@ Scalar true/false (or less/equal/greater) result is never tainted.
  
  =item  *
  
-B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u> or C<\U>)
+B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u>, C<\U>, or C<\F>)
  
  Result string containing interpolated material is tainted if
  C<use locale> (but not S<C<use locale ':not_characters'>>) is in effect.
@@ -943,15 +968,19 @@ B<Matching operator> (C<m//>):
  
  Scalar true/false result never tainted.
  
-All subpatterns, either delivered as a list-context result or as $1 etc.
-are tainted if C<use locale> (but not S<C<use locale ':not_characters'>>)
-is in effect, and the subpattern regular
-expression contains C<\w> (to match an alphanumeric character), C<\W>
-(non-alphanumeric character), C<\s> (whitespace character), or C<\S>
-(non whitespace character).  The matched-pattern variable, $&, $`
-(pre-match), $' (post-match), and $+ (last match) are also tainted if
-C<use locale> is in effect and the regular expression contains C<\w>,
-C<\W>, C<\s>, or C<\S>.
+All subpatterns, either delivered as a list-context result or as C<$1>
+I<etc>., are tainted if C<use locale> (but not
+S<C<use locale ':not_characters'>>) is in effect, and the subpattern
+regular expression is matched case-insensitively (C</i>) or contains a
+locale-dependent construct.  These constructs include C<\w>
+(to match an alphanumeric character), C<\W> (non-alphanumeric
+character), C<\s> (whitespace character), C<\S> (non whitespace
+character), and the POSIX character classes, such as C<[:alpha:]> (see
+L<perlrecharclass/POSIX Character Classes>).
+The matched-pattern variables, C<$&>, C<$`> (pre-match), C<$'>
+(post-match), and C<$+> (last match) also are tainted.
+(Note that currently there are some bugs where not everything that
+should be tainted gets tainted in all circumstances.)
  
  =item  *
  
@@ -961,8 +990,8 @@ Has the same behavior as the match operator.  Also, the left
  operand of C<=~> becomes tainted when C<use locale>
  (but not S<C<use locale ':not_characters'>>) is in effect if modified as
  a result of a substitution based on a regular
-expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of
-case-mapping with C<\l>, C<\L>,C<\u> or C<\U>.
+expression match involving any of the things mentioned in the previous
+item, or of case-mapping, such as C<\l>, C<\L>,C<\u>, C<\U>, or C<\F>.
  
  =item *
  
@@ -988,9 +1017,10 @@ Results are never tainted.
  
  =item *
  
-B<POSIX character class tests> (C<isalnum()>, C<isalpha()>, C<isdigit()>,
-C<isgraph()>, C<islower()>, C<isprint()>, C<ispunct()>, C<isspace()>, C<isupper()>,
-C<isxdigit()>):
+B<POSIX character class tests> (C<POSIX::isalnum()>,
+C<POSIX::isalpha()>, C<POSIX::isdigit()>, C<POSIX::isgraph()>,
+C<POSIX::islower()>, C<POSIX::isprint()>, C<POSIX::ispunct()>,
+C<POSIX::isspace()>, C<POSIX::isupper()>, C<POSIX::isxdigit()>):
  
  True/false results are never tainted.
  
@@ -1036,7 +1066,7 @@ Compare this with a similar but locale-aware program:
          open(F, ">$localized_output_file")
              or warn "Open of $localized_output_file failed: $!\n";
  
-This third program fails to run because $& is tainted: it is the result
+This third program fails to run because C<$&> is tainted: it is the result
  of a match involving C<\w> while C<use locale> is in effect.
  
  =head1 ENVIRONMENT
@@ -1246,7 +1276,7 @@ into bankers, bikers, gamers, and so on.
  =head1 Unicode and UTF-8
  
  The support of Unicode is new starting from Perl version v5.6, and more fully
-implemented in version v5.8 and later.  See L<perluniintro>.  It is
+implemented in versions v5.8 and later.  See L<perluniintro>.  It is
  strongly recommended that when combining Unicode and locale (starting in
  v5.16), you use
  
@@ -1305,7 +1335,7 @@ in the ISO8859-1 locale, Latin1, it is a multiplication sign.  The POSIX
  regular expression character class C<[[:alpha:]]> will magically match
  0xD7 in the Greek locale but not in the Latin one.
  
-However, there are places where this breaks down.  Certain constructs are
+However, there are places where this breaks down.  Certain Perl constructs are
  for Unicode only, such as C<\p{Alpha}>.  They assume that 0xD7 always has its
  Unicode meaning (or the equivalent on EBCDIC platforms).  Since Latin1 is a
  subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
@@ -1336,6 +1366,10 @@ Perl that way under the Greek locale.  This is not a problem
  I<provided> you make certain that all locales will always and only be either
  an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
  
+Still another problem is that this approach can lead to two code
+points meaning the same character.  Thus in a Greek locale, both U+03A7
+and U+00D7 are GREEK CAPITAL LETTER CHI.
+
  Vendor locales are notoriously buggy, and it is difficult for Perl to test
  its locale-handling code because this interacts with code that Perl has no
  control over; therefore the locale-handling code in Perl may be buggy as
author	Karl Williamson <public@khwilliamson.com>
	Wed, 22 Jan 2014 17:52:21 +0000 (10:52 -0700)
committer	Karl Williamson <public@khwilliamson.com>
	Wed, 22 Jan 2014 18:46:00 +0000 (11:46 -0700)