From e90781262cc0345e050fc2f0f67544a04f551a79 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Wed, 30 Mar 2011 17:11:34 -0600 Subject: [PATCH] perluniintro: revise text on blocks vs scripts --- pod/perluniintro.pod | 49 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 17 deletions(-) diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index 3fbff00..37bab49 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -93,25 +93,40 @@ character. Firstly, there are unallocated code points within otherwise used blocks. Secondly, there are special Unicode control characters that do not represent true characters. -A common myth about Unicode is that it is "16-bit", that is, -Unicode is only represented as C<0x10000> (or 65536) characters from -C<0x0000> to C<0xFFFF>. B Since Unicode 2.0 (July +When Unicode was first conceived, it was thought that all the world's +characters could be represented using a 16-bit word; that is a maximum of +C<0x10000> (or 65536) characters from C<0x0000> to C<0xFFFF> would be +needed. This soon proved to be false, and since Unicode 2.0 (July 1996), Unicode has been defined all the way up to 21 bits (C<0x10FFFF>), -and since Unicode 3.1 (March 2001), characters have been defined -beyond C<0xFFFF>. The first C<0x10000> characters are called the -I, or the I (BMP). With Unicode -3.1, 17 (yes, seventeen) planes in all were defined--but they are -nowhere near full of defined characters, yet. - -Another myth is about Unicode blocks--that they have something to -do with languages--that each block would define the characters used -by a language or a set of languages. B +and Unicode 3.1 (March 2001) defined the first characters above C<0xFFFF>. +The first C<0x10000> characters are called the I, or the +I (BMP). With Unicode 3.1, 17 (yes, +seventeen) planes in all were defined--but they are nowhere near full of +defined characters, yet. + +When a new language is being encoded, Unicode generally will choose a +C of consecutive unallocated code points for its characters. So +far, the number of code points in these blocks has always been evenly +divisible by 16. Extras in a block, not currently needed, are left +unallocated, for future growth. But there have been occasions when +a later relase needed more code points than available extras, and a new +block had to allocated somewhere else, not contiguous to the initial one +to handle the overflow. Thus, it became apparent early on that "block" +wasn't an adequate organizing principal, and so the C