=head2 Using Unicode in XS
If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful:
+the following C APIs useful (see perlapi for details):
=over 4
=item *
-utf8_length(s, len) returns the length of the UTF-8 encoded buffer in
-characters. sv_len_utf8(sv) returns the length of the UTF-8 encoded
+utf8_length(start, end) returns the length of the UTF-8 encoded buffer
+in characters. sv_len_utf8(sv) returns the length of the UTF-8 encoded
scalar.
=item *
=item *
-is_utf8_char(buf) returns true if the buffer points to valid UTF-8.
+is_utf8_char(s) returns true if the pointer points to a valid UTF-8
+character.
=item *
UTF8SKIP(buf) will return the number of bytes in the UTF-8 encoded
character in the buffer. UNISKIP(chr) will return the number of bytes
-required to UTF-8-encode the Unicode character code point.
+required to UTF-8-encode the Unicode character code point. UTF8SKIP()
+is useful for example for iterating over the characters of a UTF-8
+encoded buffer; UNISKIP() is useful for example in computing
+the size required for a UTF-8 encoded buffer.
=item *
utf8_hop(s, off) will return a pointer to an UTF-8 encoded buffer that
is C<off> (positive or negative) Unicode characters displaced from the
-UTF-8 buffer C<s>.
+UTF-8 buffer C<s>. Be careful not to overstep the buffer: utf8_hop()
+will merrily run off the end or the beginning if told to do so.
=item *
pv_uni_display(dsv, spv, len, pvlim, flags) and sv_uni_display(dsv,
ssv, pvlim, flags) are useful for debug output of Unicode strings and
-scalars (only for debug: they display B<all> characters as hexadecimal
-code points).
+scalars. By default they are useful only for debug: they display
+B<all> characters as hexadecimal code points, but with the flags
+UNI_DISPLAY_ISPRINT and UNI_DISPLAY_BACKSLASH you can make the output
+more readable.
=item *
-ibcmp_utf8(s1, u1, len1, s2, u2, len2) can be used to compare two
-strings case-insensitively in Unicode. (For case-sensitive
-comparisons you can just use memEQ() and memNE() as usual.)
+ibcmp_utf8(s1, pe1, u1, l1, u1, s2, pe2, l2, u2) can be used to
+compare two strings case-insensitively in Unicode.
+(For case-sensitive comparisons you can just use memEQ() and memNE()
+as usual.)
=back