Update pods

author Karl Williamson <khw@khw-desktop.(none)>

Fri, 25 Dec 2009 17:40:56 +0000 (10:40 -0700)

committer Abigail <abigail@abigail.be>

Fri, 25 Dec 2009 21:06:00 +0000 (22:06 +0100)
author Karl Williamson <khw@khw-desktop.(none)>
Fri, 25 Dec 2009 17:40:56 +0000 (10:40 -0700)
committer Abigail <abigail@abigail.be>
Fri, 25 Dec 2009 21:06:00 +0000 (22:06 +0100)
diff --git a/pod/perl5113delta.pod b/pod/perl5113delta.pod

index 5b85e62..55fe29d 100644 (file)
--- a/pod/perl5113delta.pod
+++ b/pod/perl5113delta.pod
@@ -56,7 +56,8 @@ now accepted.
  C<qr/\X/>, which matches a Unicode logical character, has been expanded to work
  better with various Asian languages.  It now is defined as an C<extended
  grapheme cluster>.  (See L<http://www.unicode.org/reports/tr29/>).
-Anything matched by previously will continue to be matched.  But in addition:
+Anything matched previously that made sense will continue to be matched.  But
+in addition:
  
  =over
  
@@ -73,7 +74,9 @@ C<\X> will now match a sequence including the C<ZWJ> and C<ZWNJ> characters.
  C<\X> will now always match at least one character, including an initial mark.
  Marks generally come after a base character, but it is possible in Unicode to
  have them in isolation, and C<\X> will now handle that case, for example at the
-beginning of a line or after a C<ZWSP>.
+beginning of a line or after a C<ZWSP>.  And this is the part where C<\X>
+doesn't match the things that it used to that don't make sense.  Formerly, for
+example, you could have the nonsensical case of an accented LF.
  
  =item *
  
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod

index cf33dc5..b7a6bdc 100644 (file)
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -511,10 +511,10 @@ This matches a Unicode I<extended grapheme cluster>.
  
  C<\X> matches quite well what normal (non-Unicode-programmer) usage
  would consider a single character.  As an example, consider a G with some sort
-of accent mark over it (a diacritic).  There is no such single character in
-Unicode, but something like one can be constructed by using a G followed by a
-Unicode combining accent, and would be displayed by Unicode-aware software as
-if it were a single character.
+of diacritic mark, such as an arrow.  There is no such single character in
+Unicode, but one can be composed using a G followed by a Unicode "COMBINING
+UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it
+were a single character.
  
  Mnemonic: eI<X>tended Unicode character.
  
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod

index 6807e70..09b5215 100644 (file)
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -754,15 +754,15 @@ UTS#18 grouping, intersection, union, and removal (subtraction) syntax.
  
  [c] Try the C<:crlf> layer (see L<PerlIO>).
  
-[d] Avoid C<use warning 'utf8';> (or say C<no warning 'utf8';>) to allow
-U+FFFF (C<\x{FFFF}>).
+[d] U+FFFF will currently generate a warning message if 'utf8' warnings are
+    enabled
  
  =item *
  
  Level 2 - Extended Unicode Support
  
          RL2.1   Canonical Equivalents           - MISSING       [10][11]
-        RL2.2   Default Grapheme Clusters       - MISSING       [12][13]
+        RL2.2   Default Grapheme Clusters       - MISSING       [12]
          RL2.3   Default Word Boundaries         - MISSING       [14]
          RL2.4   Default Loose Matches           - MISSING       [15]
          RL2.5   Name Properties                 - MISSING       [16]
diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod

index e0142d4..01915c2 100644 (file)
--- a/pod/perluniintro.pod
+++ b/pod/perluniintro.pod
@@ -47,15 +47,15 @@ lowercasing, and collating (sorting) are defined.
  
  A Unicode I<logical> "character" can actually consist of more than one internal
  I<actual> "character" or code point.  For Western languages, this is adequately
-represented by a I<base character> (like C<LATIN CAPITAL LETTER A>), followed
+modelled by a I<base character> (like C<LATIN CAPITAL LETTER A>) followed
  by one or more I<modifiers> (like C<COMBINING ACUTE ACCENT>).  This sequence of
  base character and modifiers is called a I<combining character
  sequence>.  Some non-western languages require more complicated
-representations, so Unicode invented a I<grapheme cluster> and then an
-I<extended grapheme cluster>.  For example, A Korean Hangul syllable is
+models, so Unicode created the I<grapheme cluster> concept, and then the
+I<extended grapheme cluster>.  For example, a Korean Hangul syllable is
  considered a single logical character, but most often consists of three actual
-characters: a leading consonant followed by an interior vowel followed by a
-trailing consonant.
+Unicode characters: a leading consonant followed by an interior vowel followed
+by a trailing consonant.
  
  Whether to call these extended grapheme clusters "characters" depends on your
  point of view. If you are a programmer, you probably would tend towards seeing
author	Karl Williamson <khw@khw-desktop.(none)>
	Fri, 25 Dec 2009 17:40:56 +0000 (10:40 -0700)
committer	Abigail <abigail@abigail.be>
	Fri, 25 Dec 2009 21:06:00 +0000 (22:06 +0100)
pod/perl5113delta.pod		patch \| blob \| history
pod/perlrebackslash.pod		patch \| blob \| history
pod/perlunicode.pod		patch \| blob \| history
pod/perluniintro.pod		patch \| blob \| history