From 61984ee1c56aaa8a989b7eed4cbc2effd74177c5 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 23 Jun 2012 13:30:36 -0600 Subject: [PATCH] perlguts: Document that PV can point to non-string --- pod/perlguts.pod | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/pod/perlguts.pod b/pod/perlguts.pod index b9f2ed3..fcc9811 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -37,6 +37,15 @@ they will both be 64 bits. An SV can be created and loaded with one command. There are five types of values that can be loaded: an integer value (IV), an unsigned integer value (UV), a double (NV), a string (PV), and another scalar (SV). +("PV" stands for "Pointer Value". You might think that it is misnamed +because it is described as pointing only to strings. However, it is +possible to have it point to other things. For example, inversion +lists, used in regular expression data structures, are scalars, each +consisting of an array of UVs which are accessed through PVs. But, +using it for non-strings requires care, as the underlying assumption of +much of the internals is that PVs are just for strings. Often, for +example, a trailing NUL is tacked on automatically. The non-string use +is documented only in this paragraph.) The seven routines are: @@ -2633,7 +2642,7 @@ is what makes Unicode input an interesting problem. In general, you either have to know what you're dealing with, or you have to guess. The API function C can help; it'll tell you if a string contains only valid UTF-8 characters. However, it can't -do the work for you. On a character-by-character basis, C +do the work for you. On a character-by-character basis, XXX C will tell you whether the current character in a string is valid UTF-8. =head2 How does UTF-8 represent Unicode characters? -- 2.7.4