=item isalnum
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:alnum:]]/> construct instead, or possibly the C</\w/> construct.
+This is identical to the C function, except that it can apply to a
+single character or to a whole string. Note that locale settings may
+affect what characters are considered C<isalnum>. Does not work on
+Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:alnum:]]/> construct instead, or possibly
+the C</\w/> construct.
=item isalpha
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:alpha:]]/> construct instead.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<isalpha>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:alpha:]]/> construct instead.
=item isatty
=item iscntrl
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:cntrl:]]/> construct instead.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<iscntrl>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:cntrl:]]/> construct instead.
=item isdigit
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:digit:]]/> construct instead, or the C</\d/> construct.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<isdigit> (unlikely, but
+still possible). Does not work on Unicode characters code point 256
+or higher. Consider using regular expressions and the C</[[:digit:]]/>
+construct instead, or the C</\d/> construct.
=item isgraph
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:graph:]]/> construct instead.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<isgraph>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:graph:]]/> construct instead.
=item islower
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:lower:]]/> construct instead. Do B<not> use C</[a-z]/>.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<islower>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:lower:]]/> construct instead. Do B<not> use
+C</[a-z]/>.
=item isprint
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:print:]]/> construct instead.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<isprint>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:print:]]/> construct instead.
=item ispunct
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:punct:]]/> construct instead.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<ispunct>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:punct:]]/> construct instead.
=item isspace
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:space:]]/> construct instead, or the C</\s/> construct.
-(Note that C</\s/> and C</[[:space:]]/> are slightly different in that
-C</[[:space:]]/> can normally match a vertical tab, while C</\s/> does
-not.)
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<isspace>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:space:]]/> construct instead, or the C</\s/>
+construct. (Note that C</\s/> and C</[[:space:]]/> are slightly
+different in that C</[[:space:]]/> can normally match a vertical tab,
+while C</\s/> does not.)
=item isupper
-This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:upper:]]/> construct instead. Do B<not> use C</[A-Z]/>.
+This is identical to the C function, except that it can apply to
+a single character or to a whole string. Note that locale settings
+may affect what characters are considered C<isupper>. Does not work
+on Unicode characters code point 256 or higher. Consider using regular
+expressions and the C</[[:upper:]]/> construct instead. Do B<not> use
+C</[A-Z]/>.
=item isxdigit
This is identical to the C function, except that it can apply to a single
-character or to a whole string. Consider using regular expressions and the
-C</[[:xdigit:]]/> construct instead, or simply C</[0-9a-f]/i>.
+character or to a whole string. Note that locale settings may affect what
+characters are considered C<isxdigit> (unlikely, but still possible).
+Does not work on Unicode characters code point 256 or higher.
+Consider using regular expressions and the C</[[:xdigit:]]/>
+construct instead, or simply C</[0-9a-f]/i>.
=item kill
year (C<year>) is given in years since 1900. I.e., the year 1995 is 95; the
year 2001 is 101. Consult your system's C<strftime()> manpage for details
about these and the other arguments.
+
If you want your code to be portable, your format (C<fmt>) argument
should use only the conversion specifiers defined by the ANSI C
-standard. These are C<aAbBcdHIjmMpSUwWxXyYZ%>.
-The given arguments are made consistent
-as though by calling C<mktime()> before calling your system's
-C<strftime()> function, except that the C<isdst> value is not affected.
+standard (C89, to play safe). These are C<aAbBcdHIjmMpSUwWxXyYZ%>.
+But even then, the B<results> of some of the conversion specifiers are
+non-portable. For example, the specifiers C<aAbBcpZ> change according
+to the locale settings of the user, and both how to set locales (the
+locale names) and what output to expect are non-standard.
+The specifier C<c> changes according to the timezone settings of the
+user and the timezone computation rules of the operating system.
+The C<Z> specifier is notoriously unportable since the names of
+timezones are non-standard. Sticking to the numeric specifiers is the
+safest route.
+
+The given arguments are made consistent as though by calling
+C<mktime()> before calling your system's C<strftime()> function,
+except that the C<isdst> value is not affected.
The string for Tuesday, December 12, 1995.
$| = 1;
-print "1..942\n";
+print "1..968\n";
BEGIN {
chdir 't' if -d 't';
++$test;
}
-# last test 942
+{
+ print "# [perl #15763]\n";
+
+ $a = "x\x{100}";
+ chop $a; # but leaves the UTF-8 flag
+ $a .= "y"; # 1 byte before "y"
+
+ ok($a =~ /^\C/, 'match one \C on 1-byte UTF-8');
+ ok($a =~ /^\C{1}/, 'match \C{1}');
+
+ ok($a =~ /^\Cy/, 'match \Cy');
+ ok($a =~ /^\C{1}y/, 'match \C{1}y');
+
+ $a = "\x{100}y"; # 2 bytes before "y"
+
+ ok($a =~ /^\C/, 'match one \C on 2-byte UTF-8');
+ ok($a =~ /^\C{1}/, 'match \C{1}');
+ ok($a =~ /^\C\C/, 'match two \C');
+ ok($a =~ /^\C{2}/, 'match \C{2}');
+
+ ok($a =~ /^\C\C\C/, 'match three \C on 2-byte UTF-8 and a byte');
+ ok($a =~ /^\C{3}/, 'match \C{3}');
+
+ ok($a =~ /^\C\Cy/, 'match two \C');
+ ok($a =~ /^\C{2}y/, 'match \C{2}');
+
+ ok($a !~ /^\C\C\Cy/, 'not match three \Cy');
+ ok($a !~ /^\C{2}\Cy/, 'not match \C{3}y');
+
+ $a = "\x{1000}y"; # 3 bytes before "y"
+
+ ok($a =~ /^\C/, 'match one \C on three-byte UTF-8');
+ ok($a =~ /^\C{1}/, 'match \C{1}');
+ ok($a =~ /^\C\C/, 'match two \C');
+ ok($a =~ /^\C{2}/, 'match \C{2}');
+ ok($a =~ /^\C\C\C/, 'match three \C');
+ ok($a =~ /^\C{3}/, 'match \C{3}');
+
+ ok($a =~ /^\C\C\C\C/, 'match four \C on three-byte UTF-8 and a byte');
+ ok($a =~ /^\C{4}/, 'match \C{4}');
+
+ ok($a =~ /^\C\C\Cy/, 'match three \Cy');
+ ok($a =~ /^\C{3}y/, 'match \C{3}y');
+
+ ok($a !~ /^\C\C\C\C\y/, 'not match four \Cy');
+ ok($a !~ /^\C{4}y/, 'not match \C{4}y');
+}
+
+# last test 968
+