From 13e5d9cdc0275b57d080d6599d0913b19bf5572e Mon Sep 17 00:00:00 2001 From: Brian Fraser Date: Tue, 31 Jan 2012 23:40:59 -0300 Subject: [PATCH] perlretut: #109408 --- pod/perlretut.pod | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 226b0ff..bf4ab3b 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -869,7 +869,7 @@ with one higher than the maximum reached across all the alternatives. =head2 Position information -In addition to what was matched, Perl (since 5.6.0) also provides the +In addition to what was matched, Perl also provides the positions of what was matched as contents of the C<@-> and C<@+> arrays. C<$-[0]> is the position of the start of the entire match and C<$+[0]> is the position of the end. Similarly, C<$-[n]> is the @@ -1874,8 +1874,8 @@ work if they appear in a regular expression embedded directly in a program, but not when contained in a string that is interpolated in a pattern. -With the advent of 5.6.0, Perl regexps can handle more than just the -standard ASCII character set. Perl now supports I, a standard +Perl regexps can handle more than just the +standard ASCII character set. Perl supports I, a standard for representing the alphabets from virtually all of the world's written languages, and a host of symbols. Perl's text strings are Unicode strings, so they can contain characters with a value (codepoint or character number) higher @@ -1926,13 +1926,13 @@ Consortium, L; explanatory material with links to other resources at L. -The answer to requirement 2) is, as of 5.6.0, that a regexp (mostly) -uses Unicode characters. (The "mostly" is for messy backward +The answer to requirement 2) is that a regexp (mostly) +uses Unicode characters. The "mostly" is for messy backward compatibility reasons, but starting in Perl 5.14, any regex compiled in the scope of a C (which is automatically turned on within the scope of a C or higher) will turn that "mostly" into "always". If you want to handle Unicode properly, you -should ensure that C<'unicode_strings'> is turned on.) +should ensure that C<'unicode_strings'> is turned on. Internally, this is encoded to bytes using either UTF-8 or a native 8 bit encoding, depending on the history of the string, but conceptually it is a sequence of characters, not bytes. See L for a -- 2.7.4