From 13a6c0e08df522ae430c77e8a15932642751b605 Mon Sep 17 00:00:00 2001 From: Jarkko Hietaniemi Date: Mon, 11 Mar 2002 12:57:45 +0000 Subject: [PATCH] Undocument the use of .*utf8.*{upgrade,downgrade,encode,decode} as general purpose encoding transformation interfaces since that's not what they are. p4raw-id: //depot/perl@15169 --- lib/utf8.pm | 18 ++++++++++++------ pod/perlunicode.pod | 4 ++++ sv.c | 9 +++++++++ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/lib/utf8.pm b/lib/utf8.pm index 7b1ef0d..b18a043 100644 --- a/lib/utf8.pm +++ b/lib/utf8.pm @@ -79,21 +79,27 @@ The following functions are defined in the C package by the perl core. Converts internal representation of string to the Perl's internal I form. Returns the number of octets necessary to represent -the string as I. +the string as I. Note that this should not be used to convert +a legacy byte encoding to Unicode: use Encode for that. Affected +by the encoding pragma. =item * utf8::downgrade($string[, CHECK]) Converts internal representation of string to be un-encoded bytes. +Note that this should not be used to convert Unicode back to a legacy +byte encoding: use Encode for that. B affected by the encoding +pragma. =item * utf8::encode($string) -Converts (in-place) I<$string> from logical characters to octet sequence -representing it in Perl's I encoding. - -=item * $flag = utf8::decode($string) +Converts (in-place) I<$string> from logical characters to octet +sequence representing it in Perl's I encoding. Note that this +should not be used to convert a legacy byte encoding to Unicode: use +Encode for that. =item * $flag = utf8::decode($string) Attempts to convert I<$string> in-place from Perl's I encoding -into logical characters. +into logical characters. Note that this should not be used to convert +Unicode back to a legacy byte encoding: use Encode for that. =back diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index a885555..518d239 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -873,6 +873,10 @@ sv_utf8_upgrade(sv) converts the string of the scalar to its UTF-8 encoded form. sv_utf8_downgrade(sv) does the opposite (if possible). sv_utf8_encode(sv) is like sv_utf8_upgrade but the UTF8 flag does not get turned on. sv_utf8_decode() does the opposite of sv_utf8_encode(). +Note that none of these are to be used as general purpose encoding/decoding +interfaces: use Encode for that. sv_utf8_upgrade() is affected by the +encoding pragma, but sv_utf8_downgrade() is not (since the encoding +pragma is designed to be a one-way street). =item * diff --git a/sv.c b/sv.c index 69f338c..83e2973 100644 --- a/sv.c +++ b/sv.c @@ -3313,6 +3313,9 @@ Forces the SV to string form if it is not already. Always sets the SvUTF8 flag to avoid future validity checks even if all the bytes have hibit clear. +This is not as a general purpose byte encoding to Unicode interface: +use the Encode extension for that. + =cut */ @@ -3332,6 +3335,9 @@ if all the bytes have hibit clear. If C has C bit set, will C on C if appropriate, else not. C and C are implemented in terms of this function. +This is not as a general purpose byte encoding to Unicode interface: +use the Encode extension for that. + =cut */ @@ -3397,6 +3403,9 @@ This may not be possible if the PV contains non-byte encoding characters; if this is the case, either returns false or, if C is not true, croaks. +This is not as a general purpose Unicode to byte encoding interface: +use the Encode extension for that. + =cut */ -- 2.7.4