UTF-8 C API doc tweaks.

author Jarkko Hietaniemi <jhi@iki.fi>

Tue, 19 Feb 2002 15:01:25 +0000 (15:01 +0000)

committer Jarkko Hietaniemi <jhi@iki.fi>

Tue, 19 Feb 2002 15:01:25 +0000 (15:01 +0000)
author Jarkko Hietaniemi <jhi@iki.fi>
Tue, 19 Feb 2002 15:01:25 +0000 (15:01 +0000)
committer Jarkko Hietaniemi <jhi@iki.fi>
Tue, 19 Feb 2002 15:01:25 +0000 (15:01 +0000)
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod

index 484f356..1accda4 100644 (file)
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -825,7 +825,7 @@ for more discussion of the issues.
  =head2 Using Unicode in XS
  
  If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful:
+the following C APIs useful (see perlapi for details):
  
  =over 4
  
@@ -856,8 +856,8 @@ the UTF-8 byte sequence).
  
  =item *
  
-utf8_length(s, len) returns the length of the UTF-8 encoded buffer in
-characters.  sv_len_utf8(sv) returns the length of the UTF-8 encoded
+utf8_length(start, end) returns the length of the UTF-8 encoded buffer
+in characters.  sv_len_utf8(sv) returns the length of the UTF-8 encoded
  scalar.
  
  =item *
@@ -869,7 +869,8 @@ get turned on.  sv_utf8_decode() does the opposite of sv_utf8_encode().
  
  =item *
  
-is_utf8_char(buf) returns true if the buffer points to valid UTF-8.
+is_utf8_char(s) returns true if the pointer points to a valid UTF-8
+character.
  
  =item *
  
@@ -880,7 +881,10 @@ are valid UTF-8.
  
  UTF8SKIP(buf) will return the number of bytes in the UTF-8 encoded
  character in the buffer.  UNISKIP(chr) will return the number of bytes
-required to UTF-8-encode the Unicode character code point.
+required to UTF-8-encode the Unicode character code point.  UTF8SKIP()
+is useful for example for iterating over the characters of a UTF-8
+encoded buffer; UNISKIP() is useful for example in computing
+the size required for a UTF-8 encoded buffer.
  
  =item *
  
@@ -891,20 +895,24 @@ two pointers pointing to the same UTF-8 encoded buffer.
  
  utf8_hop(s, off) will return a pointer to an UTF-8 encoded buffer that
  is C<off> (positive or negative) Unicode characters displaced from the
-UTF-8 buffer C<s>.
+UTF-8 buffer C<s>.  Be careful not to overstep the buffer: utf8_hop()
+will merrily run off the end or the beginning if told to do so.
  
  =item *
  
  pv_uni_display(dsv, spv, len, pvlim, flags) and sv_uni_display(dsv,
  ssv, pvlim, flags) are useful for debug output of Unicode strings and
-scalars (only for debug: they display B<all> characters as hexadecimal
-code points).
+scalars.  By default they are useful only for debug: they display
+B<all> characters as hexadecimal code points, but with the flags
+UNI_DISPLAY_ISPRINT and UNI_DISPLAY_BACKSLASH you can make the output
+more readable.
  
  =item *
  
-ibcmp_utf8(s1, u1, len1, s2, u2, len2) can be used to compare two
-strings case-insensitively in Unicode.  (For case-sensitive
-comparisons you can just use memEQ() and memNE() as usual.)
+ibcmp_utf8(s1, pe1, u1, l1, u1, s2, pe2, l2, u2) can be used to
+compare two strings case-insensitively in Unicode.
+(For case-sensitive comparisons you can just use memEQ() and memNE()
+as usual.)
  
  =back
author	Jarkko Hietaniemi <jhi@iki.fi>
	Tue, 19 Feb 2002 15:01:25 +0000 (15:01 +0000)
committer	Jarkko Hietaniemi <jhi@iki.fi>
	Tue, 19 Feb 2002 15:01:25 +0000 (15:01 +0000)