Note that Perl considers grapheme clusters to be separate characters, so for
example
- print length("\N{LATIN CAPITAL LETTER A}\N{COMBINING ACUTE ACCENT}"), "\n";
+ print length("\N{LATIN CAPITAL LETTER A}\N{COMBINING ACUTE ACCENT}"),
+ "\n";
will print 2, not 1. The only exception is that regular expressions
have C<\X> for matching an extended grapheme cluster. (Thus C<\X> in a
join("",
map { $_ > 255 ? # if wide character...
sprintf("\\x{%04X}", $_) : # \x{...}
- chr($_) =~ /[[:cntrl:]]/ ? # else if control character ...
+ chr($_) =~ /[[:cntrl:]]/ ? # else if control character...
sprintf("\\x%02X", $_) : # \x..
quotemeta(chr($_)) # else quoted or as themselves
} unpack("W*", $_[0])); # unpack Unicode characters
my $unicode = chr(0x100);
print length($unicode), "\n"; # will print 1
require Encode;
- print length(Encode::encode_utf8($unicode)), "\n"; # will print 2
+ print length(Encode::encode_utf8($unicode)),"\n"; # will print 2
use bytes;
print length($unicode), "\n"; # will also print 2
# (the 0xC4 0x80 of the UTF-8)
pod/perltru64.pod ? Should you be using F<...> or maybe L<...> instead of 1
pod/perltru64.pod Verbatim line length including indents exceeds 79 by 4
pod/perlunifaq.pod empty section in previous paragraph 1
-pod/perluniintro.pod Verbatim line length including indents exceeds 79 by 3
pod/perluniprops.pod =item type mismatch 6
pod/perlvar.pod Verbatim line length including indents exceeds 79 by 9
pod/perlvms.pod ? Should you be using F<...> or maybe L<...> instead of 1