1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
2 * License & terms of use: http://www.unicode.org/copyright.html
3 * Copyright (C) 2004-2016, International Business Machines
4 * Corporation and others. All Rights Reserved.
6 * file name: changes.txt
8 * tab size: 8 (not used)
11 * created on: 2004may06
12 * created by: Markus W. Scherer
14 * change log for Unicode updates
16 ---------------------------------------------------------------------------- ***
18 * New ISO 15924 script codes
20 Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
21 until they are encoded in Unicode,
22 or can be assumed to be encoded in the next Unicode version.
23 Script enum constant names want to follow the Unicode script property value aliases,
24 which are assigned only when the scripts are encoded.
25 When we encode scripts early and guess wrong, then we have confusing enum constants
26 and have sometimes added aliases.
28 Variant script codes like Latf and Aran that are not subject to separate encoding
29 can be added at any time.
30 (For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
32 We add script codes used in CLDR or in the spoof checker.
33 This includes combination/alias codes like Hanb and Jamo.
34 See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
35 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
37 We add special Z* script codes like Zsye.
39 For new script codes see http://www.unicode.org/iso15924/codechanges.html
41 ---------------------------------------------------------------------------- ***
43 Unicode 9.0 update for ICU 58
45 * Command-line environment setup
47 ICU_ROOT=~/svn.icu/trunk
48 ICU_SRC_DIR=$ICU_ROOT/src
50 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
51 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
52 UNIDATA=$ICU_SRC_DIR/source/data/unidata
54 http://www.unicode.org/review/pri323/ -- beta review
55 http://www.unicode.org/reports/uax-proposed-updates.html
56 http://www.unicode.org/versions/beta-9.0.0.html
57 http://www.unicode.org/versions/Unicode9.0.0/
58 http://www.unicode.org/reports/tr44/tr44-17.html
62 - ticket:12526: integrate Unicode 9
63 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
64 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
69 - ^/branches/markus/uni90 at r11518 from trunk at r11517
71 - cldrbug 8745: Unicode 9.0 script metadata
73 *** Unicode version numbers
76 - com.ibm.icu.util.VersionInfo
77 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
79 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
80 so that the makefiles see the new version number.
82 *** data files & enums & parser code
86 - download UCD & IDNA files
87 - make sure that the Unicode data folder passed into preparseucd.py
88 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
89 - only for manual diffs: remove version suffixes from the file names
90 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
91 (see https://sites.google.com/site/unicodetools/inputdata)
92 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
93 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
94 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
96 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
98 cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
100 * preparseucd.py changes
101 - remove or add new Unicode scripts from/to the
102 only-in-ISO-15924 list according to the error messages:
103 ValueError: remove ['Tang'] from _scripts_only_in_iso15924
104 ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
105 ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
106 ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
107 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
108 and in com.ibm.icu.dev.test.lang.TestUScript.java
109 - DerivedNumericValues.txt new numeric values
110 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
111 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
112 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
113 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
114 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
115 -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
116 uchar.c, UCharacterProperty.java
117 to support a new series of values
118 - adjust preparseucd.py for Tangut algorithmic names
120 algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
122 algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
123 - avoid block-compressing most String/Miscellaneous property values,
124 triggered by genprops not coping with a multi-code point Case_Folding on
125 block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
126 keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
128 * PropertyAliases.txt changes
129 - 1 new property PCM=Prepended_Concatenation_Mark
130 Ignore: Only useful for layout engines.
131 Ok to list in ppucd.txt.
133 * PropertyValueAliases.txt new property values
135 blk; Bhaiksuki ; Bhaiksuki
136 blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
137 blk; Glagolitic_Sup ; Glagolitic_Supplement
138 blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
139 blk; Marchen ; Marchen
140 blk; Mongolian_Sup ; Mongolian_Supplement
144 blk; Tangut_Components ; Tangut_Components
146 use long property names for enum constants
147 -> add to UCharacter.UnicodeBlock IDs
148 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
149 replace public static final int \1_ID = \2; \3
150 -> add to UCharacter.UnicodeBlock objects
151 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
152 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
155 GCB; EBG ; E_Base_GAZ
157 GCB; GAZ ; Glue_After_Zwj
159 -> uchar.h & UCharacter.GraphemeClusterBreak
161 jg ; African_Feh ; African_Feh
162 jg ; African_Noon ; African_Noon
163 jg ; African_Qaf ; African_Qaf
164 -> uchar.h & UCharacter.JoiningGroup
169 -> uchar.h & UCharacter.LineBreak
172 sc ; Bhks ; Bhaiksuki
177 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
180 WB ; EBG ; E_Base_GAZ
182 WB ; GAZ ; Glue_After_Zwj
184 -> uchar.h & UCharacter.WordBreak
186 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
187 (not strictly necessary for NOT_ENCODED scripts)
188 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
190 * generate normalization data files
192 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
193 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
194 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
195 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
196 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
198 * build ICU (make install)
199 so that the tools build can pick up the new definitions from the installed header files.
201 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
203 * build Unicode tools using CMake+make
205 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
207 # Location (--prefix) of where ICU was installed.
208 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
209 # Location of the ICU source tree.
210 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
212 ~/svn.icutools/trunk/dbg/unicode/c$
213 cmake ../../../src/unicode/c
216 * generate core properties data files
217 ~/svn.icutools/trunk/dbg/unicode/c$
218 genprops/genprops $ICU_SRC_DIR
219 genuca/genuca --hanOrder implicit $ICU_SRC_DIR
220 genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
221 - rebuild ICU (make install) & tools
223 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
224 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
225 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
226 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
227 - nothing new in 9.0, no test file to update
229 * run & fix ICU4C tests
230 - Andy handles RBBI & spoof check test failures
232 * collation: CLDR collation root, UCA DUCET
234 - UCA DUCET goes into Mark's Unicode tools, see
235 https://sites.google.com/site/unicodetools/home#TOC-UCA
236 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
237 cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
239 - cd (CLDR UCA branch)/common/uca/
240 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
241 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
242 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
243 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
244 (note removing the underscore before "Rules")
245 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
246 - restore TODO diffs in UCARules.txt
247 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
248 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
249 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
250 from the CLDR root files (..._CLDR_..._SHORT.txt)
251 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
252 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
253 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
254 - if CLDR common/uca/unihan-index.txt changes, then update
255 CLDR common/collation/root.xml <collation type="private-unihan">
256 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
258 - run genuca, see command line above;
260 Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
261 FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
262 (add the character to genuca.cpp sampleCharsToScripts[])
263 + look up the USCRIPT_ code for the new sample characters
264 (should be obvious from the comment in the error output)
265 + *add* mappings to sampleCharsToScripts[], do not replace them
266 (in case the script sample characters flip-flop)
267 + insert new scripts in DUCET script order, see the top_byte table
268 at the beginning of FractionalUCA.txt
273 org.unicode.draft.GenerateUnihanCollators
275 -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
276 -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
277 -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
278 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
282 org.unicode.draft.GenerateUnihanCollatorFiles
283 with the same arguments
286 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
287 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
290 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
291 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
293 - generate ICU zh collation data: run CLDR
294 org.unicode.cldr.icu.NewLdml2IcuConverter
295 with program arguments
297 -s /home/mscherer/svn.cldr/trunk/common/collation
298 -m /home/mscherer/svn.cldr/trunk/common/supplemental
299 -d /home/mscherer/svn.icu/trunk/src/source/data/coll
300 -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
303 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
306 * run & fix ICU4C tests, now with new CLDR collation root data
307 - run all tests with the collation test data *_SHORT.txt or the full files
308 (the full ones have comments, useful for debugging)
309 - note on intltest: if collate/UCAConformanceTest fails, then
310 utility/MultithreadTest/TestCollators will fail as well;
311 fix the conformance test before looking into the multi-thread test
313 * update Java data files
314 - refresh just the UCD/UCA-related/derived files, just to be safe
315 - see (ICU4C)/source/data/icu4j-readme.txt
317 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
320 Unicode .icu files built to ./out/build/icudt58l
321 echo timestamp > uni-core-data
322 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
323 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
324 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
325 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
326 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
327 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
328 mkdir -p /tmp/icu4j/main/shared/data
329 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
330 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
331 mkdir -p /tmp/icu4j/main/shared/data
332 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
333 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
334 - copy the big-endian Unicode data files to another location,
335 separate from the other data files,
336 and then refresh ICU4J
337 cd ~/svn.icu/trunk/dbg/data/out/icu4j
338 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
339 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
340 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
341 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
342 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
343 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
344 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
345 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
346 jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
348 * When refreshing all of ICU4J data from ICU4C
349 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
350 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
352 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
354 * update CollationFCD.java
355 + copy & paste the initializers of lcccIndex[] etc. from
356 ICU4C/source/i18n/collationfcd.cpp to
357 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
359 * refresh Java test .txt files
360 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
361 cd $ICU_SRC_DIR/source/data/unidata
362 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
363 cd ../../test/testdata
364 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
365 cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
367 * run & fix ICU4J tests
369 *** LayoutEngine script information
371 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
372 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
373 in the working directory.
375 (It also generates ScriptRunData.cpp, which is no longer needed.)
377 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
379 which maps ICU versions to the numbers of script/language constants
380 that were added then.
381 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
383 The generated files have a current copyright date and "@deprecated" statement.
385 * Review changes, fix Java tool if necessary, and copy to ICU4C
386 cd ~/svn.icu4j/trunk/src
387 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
388 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
389 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
392 - send notice to icu-design about new born-@stable API (enum constants etc.)
394 *** merge the Unicode update branches back onto the trunk
395 - do not merge the icudata.jar and testdata.jar,
396 instead rebuild them from merged & tested ICU4C
397 - make sure that changes to Unicode tools & ICU tools are checked in
398 http://www.unicode.org/utility/trac/log/trunk/unicodetools
399 http://bugs.icu-project.org/trac/log/tools/trunk
401 ---------------------------------------------------------------------------- ***
403 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
406 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
407 - new combination/alias codes: Hanb, Jamo
408 - used in CLDR 29 and in spoof checker
411 Add new codes to uscript.h & UScript.java, see Unicode update logs.
412 -> com.ibm.icu.lang.UScript
413 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
414 replace public static final int \1 = \2; \3
416 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
417 add new script codes.
418 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
420 Note: If we have to run preparseucd.py again before the Unicode 9 update,
421 then we need to manually keep/restore the new script codes.
423 ICU_ROOT=~/svn.icu/trunk
424 ICU_SRC_DIR=$ICU_ROOT/src
426 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
427 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
428 UNIDATA=$ICU_SRC_DIR/source/data/unidata
430 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
431 see http://bugs.icu-project.org/trac/ticket/12141
433 make install, then icutools cmake & make, then
434 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
436 Generate Java data as usual, only update pnames.icu & uprops.icu.
438 *** LayoutEngine script information
440 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
441 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
442 in the working directory.
444 (It also generates ScriptRunData.cpp, which is no longer needed.)
446 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
448 which maps ICU versions to the numbers of script/language constants
449 that were added then.
450 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
452 The generated files have a current copyright date and "@deprecated" statement.
454 * Review changes, fix Java tool if necessary, and copy to ICU4C
455 cd ~/svn.icu4j/trunk/src
456 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
457 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
458 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
460 ---------------------------------------------------------------------------- ***
462 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
464 Edit preparseucd.py to add & parse new properties.
465 They share the UCD property namespace but are not listed in PropertyAliases.txt.
467 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
468 Initial data from emoji/2.0/
470 ICU_ROOT=~/svn.icu/trunk
471 ICU_SRC_DIR=$ICU_ROOT/src
473 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
474 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
475 UNIDATA=$ICU_SRC_DIR/source/data/unidata
477 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
479 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
480 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
482 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
484 make install, then icutools cmake & make, then
485 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
487 Generate Java data as usual, only update pnames.icu & uprops.icu.
489 ---------------------------------------------------------------------------- ***
491 Unicode 8.0 update for ICU 56
493 * Command-line environment setup
495 ICU_ROOT=~/svn.icu/trunk
496 ICU_SRC_DIR=$ICU_ROOT/src
498 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
499 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
500 UNIDATA=$ICU_SRC_DIR/source/data/unidata
502 http://www.unicode.org/review/pri297/ -- beta review
503 http://www.unicode.org/reports/uax-proposed-updates.html
504 http://unicode.org/versions/beta-8.0.0.html
505 http://www.unicode.org/versions/Unicode8.0.0/
506 http://www.unicode.org/reports/tr44/tr44-15.html
510 - ticket:11574: Unicode 8
511 - C++ branches/markus/uni80 at r37351 from trunk at r37343
512 - Java branches/markus/uni80 at r37352 from trunk at r37338
516 - cldrbug 8311: UCA 8
517 - branches/markus/uni80 at r11518 from trunk at r11517
519 - cldrbug 8109: Unicode 8.0 script metadata
520 - cldrbug 8418: Updated segmentation for Unicode 8.0
522 *** Unicode version numbers
525 - com.ibm.icu.util.VersionInfo
526 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
528 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
529 so that the makefiles see the new version number.
531 *** data files & enums & parser code
535 - download UCD & IDNA files
536 - make sure that the Unicode data folder passed into preparseucd.py
537 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
538 - only for manual diffs: remove version suffixes from the file names
539 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
540 (see https://sites.google.com/site/unicodetools/inputdata)
541 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
542 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
543 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
545 - also: from http://unicode.org/Public/security/8.0.0/ download new
546 confusables.txt & confusablesWholeScript.txt
548 ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
549 ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
551 * initial preparseucd.py changes
552 - remove new Unicode scripts from the
553 only-in-ISO-15924 list according to the error message:
554 ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
555 from _scripts_only_in_iso15924
556 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
557 and in com.ibm.icu.dev.test.lang.TestUScript.java
558 - property and file name change:
559 IndicMatraCategory -> IndicPositionalCategory
560 - UnicodeData.txt unusual numeric values (improper fractions)
561 109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
562 109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
563 109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
564 109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
565 109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
566 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
567 109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
568 109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
569 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
570 109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
571 -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
572 which are listed in DerivedNumericValues.txt;
573 keeps storage in data file simple
575 * PropertyValueAliases.txt changes
576 - 10 new Block (blk) values:
578 blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs
579 blk; Cherokee_Sup ; Cherokee_Supplement
580 blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E
581 blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform
583 blk; Multani ; Multani
584 blk; Old_Hungarian ; Old_Hungarian
585 blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs
586 blk; Sutton_SignWriting ; Sutton_SignWriting
588 use long property names for enum constants
589 -> add to UCharacter.UnicodeBlock IDs
590 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
591 replace public static final int \1_ID = \2; \3
592 -> add to UCharacter.UnicodeBlock objects
593 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
594 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
595 - 6 new Script (sc) values:
598 sc ; Hluw ; Anatolian_Hieroglyphs
599 sc ; Hung ; Old_Hungarian
601 sc ; Sgnw ; SignWriting
602 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
604 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
605 (not strictly necessary for NOT_ENCODED scripts)
606 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
608 * generate normalization data files
610 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
611 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
612 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
613 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
614 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
616 * build ICU (make install)
617 so that the tools build can pick up the new definitions from the installed header files.
619 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
621 * build Unicode tools using CMake+make
623 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
625 # Location (--prefix) of where ICU was installed.
626 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
627 # Location of the ICU source tree.
628 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
630 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
631 ~/svn.icutools/trunk/dbg/unicode/c$ make
633 * generate core properties data files
634 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
635 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
636 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
637 - rebuild ICU (make install) & tools
638 - run genuca again (see step above) so that it picks up the new nfc.nrm
639 - rebuild ICU (make install) & tools
641 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
642 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
643 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
644 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
645 - nothing new in 8.0, no test file to update
647 * run & fix ICU4C tests
648 - bad Cherokee case folding due to difference in fallbacks:
649 UCD case folding falls back to no mapping,
650 ICU runtime case folding falls back to lowercasing;
651 fixed casepropsbuilder.cpp to generate scf mappings to self
652 when there is an slc mapping but no scf
653 - Andy handles RBBI & spoof check test failures
655 * collation: CLDR collation root, UCA DUCET
657 - UCA DUCET goes into Mark's Unicode tools, see
658 https://sites.google.com/site/unicodetools/home#TOC-UCA
659 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
660 - cd (CLDR UCA branch)/common/uca/
661 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
662 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
663 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
664 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
665 (note removing the underscore before "Rules")
666 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
667 - restore TODO diffs in UCARules.txt
668 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
669 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
670 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
671 from the CLDR root files (..._CLDR_..._SHORT.txt)
672 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
673 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
674 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
675 - if CLDR common/uca/unihan-index.txt changes, then update
676 CLDR common/collation/root.xml <collation type="private-unihan">
677 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
678 - run genuca, see command line above;
680 Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
681 (add the character to genuca.cpp sampleCharsToScripts[])
682 + look up the script for the new sample characters
683 (e.g., in FractionalUCA.txt)
684 + *add* mappings to sampleCharsToScripts[], do not replace them
685 (in case the script sample characters flip-flop)
686 + insert new scripts in DUCET script order, see the top_byte table
687 at the beginning of FractionalUCA.txt
690 * run & fix ICU4C tests, now with new CLDR collation root data
691 - run all tests with the collation test data *_SHORT.txt or the full files
692 (the full ones have comments, useful for debugging)
693 - note on intltest: if collate/UCAConformanceTest fails, then
694 utility/MultithreadTest/TestCollators will fail as well;
695 fix the conformance test before looking into the multi-thread test
696 - fixed bug in CollationWeights::getWeightRanges()
697 exposed by new data and CollationTest::TestRootElements
699 * update Java data files
700 - refresh just the UCD/UCA-related/derived files, just to be safe
701 - see (ICU4C)/source/data/icu4j-readme.txt
703 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
706 Unicode .icu files built to ./out/build/icudt56l
707 echo timestamp > uni-core-data
708 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
709 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
710 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
711 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
712 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
713 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
714 mkdir -p /tmp/icu4j/main/shared/data
715 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
716 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
717 mkdir -p /tmp/icu4j/main/shared/data
718 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
719 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
720 - copy the big-endian Unicode data files to another location,
721 separate from the other data files,
722 and then refresh ICU4J
723 cd ~/svn.icu/trunk/dbg/data/out/icu4j
724 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
725 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
726 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
727 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
728 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
729 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
730 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
731 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
732 jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
734 * When refreshing all of ICU4J data from ICU4C
735 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
736 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
738 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
740 * update CollationFCD.java
741 + copy & paste the initializers of lcccIndex[] etc. from
742 ICU4C/source/i18n/collationfcd.cpp to
743 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
745 * refresh Java test .txt files
746 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
747 cd $ICU_SRC_DIR/source/data/unidata
748 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
749 cd ../../test/testdata
750 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
751 cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
753 * run & fix ICU4J tests
755 *** LayoutEngine script information
757 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
758 because the layout engine was deprecated in ICU 54.
759 Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
760 to write lines that we used to add manually.
762 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
763 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
764 in the working directory.
766 (It also generates ScriptRunData.cpp, which is no longer needed.)
768 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
770 which maps ICU versions to the numbers of script/language constants
771 that were added then.
772 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
774 The generated files have a current copyright date and "@deprecated" statement.
776 * Review changes, fix Java tool if necessary, and copy to ICU4C
777 cd ~/svn.icu4j/trunk/src
778 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
779 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
780 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
783 - send notice to icu-design about new born-@stable API (enum constants etc.)
785 *** merge the Unicode update branches back onto the trunk
786 - do not merge the icudata.jar and testdata.jar,
787 instead rebuild them from merged & tested ICU4C
788 - make sure that changes to Unicode tools & ICU tools are checked in
789 http://www.unicode.org/utility/trac/log/trunk/unicodetools
790 http://bugs.icu-project.org/trac/log/tools/trunk
792 ---------------------------------------------------------------------------- ***
794 Unicode 7.0 update for ICU 54
796 http://www.unicode.org/review/pri271/ -- beta review
797 http://www.unicode.org/reports/uax-proposed-updates.html
798 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
799 http://www.unicode.org/reports/tr44/tr44-13.html
803 - ticket 10821: Unicode 7.0, UCA 7.0
804 - C++ branches/markus/uni70 at r35584 from trunk at r35580
805 - Java branches/markus/uni70 at r35587 from trunk at r35545
809 - ticket 7195: UCA 7.0 CLDR root collation
810 - branches/markus/uni70 at r10062 from trunk at r10061
812 - ticket 6762: script metadata for Unicode 7.0 new scripts
814 *** Unicode version numbers
817 - com.ibm.icu.util.VersionInfo
818 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
820 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
821 so that the makefiles see the new version number.
823 *** data files & enums & parser code
827 - download UCD & IDNA files
828 - make sure that the Unicode data folder passed into preparseucd.py
829 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
830 - only for manual diffs: remove version suffixes from the file names
831 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
832 (see https://sites.google.com/site/unicodetools/inputdata)
833 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
834 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
835 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
836 - Restore TODO diffs in source/data/unidata/UCARules.txt
838 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
839 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
841 - also: from http://unicode.org/Public/security/7.0.0/ download new
842 confusables.txt & confusablesWholeScript.txt
843 and copy to $ICU_ROOT/src/source/data/unidata/
845 * initial preparseucd.py changes
846 - remove new Unicode scripts from the
847 only-in-ISO-15924 list according to the error message:
848 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
849 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
850 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
851 from _scripts_only_in_iso15924
852 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
853 and in com.ibm.icu.dev.test.lang.TestUScript.java
854 - NamesList.txt now has a heading with a non-ASCII character
855 + keep ppucd.txt in platform charset, rather than changing tool/test parsers
856 + escape non-ASCII characters in heading comments
857 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
858 + get the copyright from the first file whose copyright line contains the current year
860 * PropertyValueAliases.txt changes
861 - 32 new Block (blk) values:
862 blk; Bassa_Vah ; Bassa_Vah
863 blk; Caucasian_Albanian ; Caucasian_Albanian
864 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
865 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
866 blk; Duployan ; Duployan
867 blk; Elbasan ; Elbasan
868 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
869 blk; Grantha ; Grantha
871 blk; Khudawadi ; Khudawadi
872 blk; Latin_Ext_E ; Latin_Extended_E
873 blk; Linear_A ; Linear_A
874 blk; Mahajani ; Mahajani
875 blk; Manichaean ; Manichaean
876 blk; Mende_Kikakui ; Mende_Kikakui
879 blk; Myanmar_Ext_B ; Myanmar_Extended_B
880 blk; Nabataean ; Nabataean
881 blk; Old_North_Arabian ; Old_North_Arabian
882 blk; Old_Permic ; Old_Permic
883 blk; Ornamental_Dingbats ; Ornamental_Dingbats
884 blk; Pahawh_Hmong ; Pahawh_Hmong
885 blk; Palmyrene ; Palmyrene
886 blk; Pau_Cin_Hau ; Pau_Cin_Hau
887 blk; Psalter_Pahlavi ; Psalter_Pahlavi
888 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
889 blk; Siddham ; Siddham
890 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
891 blk; Sup_Arrows_C ; Supplemental_Arrows_C
892 blk; Tirhuta ; Tirhuta
893 blk; Warang_Citi ; Warang_Citi
895 use long property names for enum constants
896 -> add to UCharacter.UnicodeBlock IDs
897 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
898 replace public static final int \1_ID = \2; \3
899 -> add to UCharacter.UnicodeBlock objects
900 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
901 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
902 - 28 new Joining_Group (jg) values:
903 jg ; Manichaean_Aleph ; Manichaean_Aleph
904 jg ; Manichaean_Ayin ; Manichaean_Ayin
905 jg ; Manichaean_Beth ; Manichaean_Beth
906 jg ; Manichaean_Daleth ; Manichaean_Daleth
907 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
908 jg ; Manichaean_Five ; Manichaean_Five
909 jg ; Manichaean_Gimel ; Manichaean_Gimel
910 jg ; Manichaean_Heth ; Manichaean_Heth
911 jg ; Manichaean_Hundred ; Manichaean_Hundred
912 jg ; Manichaean_Kaph ; Manichaean_Kaph
913 jg ; Manichaean_Lamedh ; Manichaean_Lamedh
914 jg ; Manichaean_Mem ; Manichaean_Mem
915 jg ; Manichaean_Nun ; Manichaean_Nun
916 jg ; Manichaean_One ; Manichaean_One
917 jg ; Manichaean_Pe ; Manichaean_Pe
918 jg ; Manichaean_Qoph ; Manichaean_Qoph
919 jg ; Manichaean_Resh ; Manichaean_Resh
920 jg ; Manichaean_Sadhe ; Manichaean_Sadhe
921 jg ; Manichaean_Samekh ; Manichaean_Samekh
922 jg ; Manichaean_Taw ; Manichaean_Taw
923 jg ; Manichaean_Ten ; Manichaean_Ten
924 jg ; Manichaean_Teth ; Manichaean_Teth
925 jg ; Manichaean_Thamedh ; Manichaean_Thamedh
926 jg ; Manichaean_Twenty ; Manichaean_Twenty
927 jg ; Manichaean_Waw ; Manichaean_Waw
928 jg ; Manichaean_Yodh ; Manichaean_Yodh
929 jg ; Manichaean_Zayin ; Manichaean_Zayin
930 jg ; Straight_Waw ; Straight_Waw
931 -> uchar.h & UCharacter.JoiningGroup
932 - 23 new Script (sc) values:
933 sc ; Aghb ; Caucasian_Albanian
934 sc ; Bass ; Bassa_Vah
938 sc ; Hmng ; Pahawh_Hmong
942 sc ; Mani ; Manichaean
943 sc ; Mend ; Mende_Kikakui
946 sc ; Narb ; Old_North_Arabian
947 sc ; Nbat ; Nabataean
948 sc ; Palm ; Palmyrene
949 sc ; Pauc ; Pau_Cin_Hau
950 sc ; Perm ; Old_Permic
951 sc ; Phlp ; Psalter_Pahlavi
953 sc ; Sind ; Khudawadi
955 sc ; Wara ; Warang_Citi
956 -> uscript.h (many were added before)
957 comment "Mende Kikakui" for USCRIPT_MENDE
958 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
959 -> com.ibm.icu.lang.UScript
960 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
961 replace public static final int \1 = \2; \3
962 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
971 -> uscript.h (some overlap with additions from Unicode)
972 -> com.ibm.icu.lang.UScript
973 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
974 replace public static final int \1 = \2; \3
975 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
976 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
977 and in com.ibm.icu.dev.test.lang.TestUScript.java
979 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
980 (not strictly necessary for NOT_ENCODED scripts)
981 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
983 * generate normalization data files
985 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
986 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
987 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
988 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
989 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
990 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
991 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
992 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
994 * build ICU (make install)
995 so that the tools build can pick up the new definitions from the installed header files.
997 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
999 * build Unicode tools using CMake+make
1001 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1003 # Location (--prefix) of where ICU was installed.
1004 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
1005 # Location of the ICU source tree.
1006 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
1008 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1009 ~/svn.icutools/trunk/dbg/unicode/c$ make
1012 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
1013 + add second array of Joining_Group values for at most 10800..10FFF
1014 icutools: unicode/c/genprops/bidipropsbuilder.cpp
1015 icu: source/common/ubidi_props.h/.c/_data.h
1016 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
1018 * generate core properties data files
1019 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1020 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
1021 - rebuild ICU (make install) & tools
1022 - run genuca again (see step above) so that it picks up the new nfc.nrm
1023 - rebuild ICU (make install) & tools
1025 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1026 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1027 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1028 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
1029 - nothing new in 7.0, no test file to update
1031 * run & fix ICU4C tests
1033 * update Java data files
1034 - refresh just the UCD-related files, just to be safe
1035 - see (ICU4C)/source/data/icu4j-readme.txt
1037 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1040 Unicode .icu files built to ./out/build/icudt53l
1041 echo timestamp > uni-core-data
1042 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1043 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
1044 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1045 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1046 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
1047 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
1048 mkdir -p /tmp/icu4j/main/shared/data
1049 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1050 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
1051 mkdir -p /tmp/icu4j/main/shared/data
1052 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1053 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
1054 - copy the big-endian Unicode data files to another location,
1055 separate from the other data files
1057 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1058 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1059 cd ~/svn.icu/uni70/dbg/data/out/icu4j
1060 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1061 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1062 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1063 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1064 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1065 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1067 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1069 * update CollationFCD.java
1070 + copy & paste the initializers of lcccIndex[] etc. from
1071 ICU4C/source/i18n/collationfcd.cpp to
1072 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1074 * refresh Java test .txt files
1075 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1076 cd $ICU_SRC_DIR/source/data/unidata
1077 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1078 cd ../../test/testdata
1079 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1080 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1084 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
1085 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
1086 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
1087 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
1088 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
1089 - review data; compare files, use blankweights.sed or similar
1090 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
1091 - cd ~/svn.unitools/Generated/uca/7.0.0/
1092 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1093 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1094 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1095 (note removing the underscore before "Rules")
1096 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1097 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1098 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1099 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1100 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1101 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1102 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1103 - run genuca, see command line above
1105 - refresh ICU4J collation data:
1106 (subset of instructions above for properties data refresh, except copies all coll/*)
1108 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1109 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1110 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1111 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1112 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1113 - note on intltest: if collate/UCAConformanceTest fails, then
1114 utility/MultithreadTest/TestCollators will fail as well;
1115 fix the conformance test before looking into the multi-thread test
1116 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
1117 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
1118 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
1120 * When refreshing all of ICU4J data from ICU4C
1121 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1122 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1124 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1126 * run & fix ICU4J tests
1128 *** LayoutEngine script information
1130 (For details see the Unicode 5.2 change log below.)
1132 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1133 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1134 in the working directory.
1135 (It also generates ScriptRunData.cpp, which is no longer needed.)
1137 The generated files have a current copyright date and "@stable" statement.
1138 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
1139 for "born stable" Unicode API constants, and to stop parsing ICU version numbers
1140 which may not contain dots any more.
1142 - diff current <icu>/source/layout files vs. generated ones
1143 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1144 review and manually merge desired changes;
1145 fix gratuitous changes, incorrect @draft/@stable and missing aliases;
1146 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1147 - if you just copy the above files, then
1148 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1149 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1152 - send notice to icu-design about new born-@stable API (enum constants etc.)
1154 *** merge the Unicode update branches back onto the trunk
1155 - do not merge the icudata.jar and testdata.jar,
1156 instead rebuild them from merged & tested ICU4C
1158 ---------------------------------------------------------------------------- ***
1162 http://www.unicode.org/review/pri249/ -- beta review
1163 http://www.unicode.org/reports/uax-proposed-updates.html
1164 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
1165 http://www.unicode.org/reports/tr44/tr44-11.html
1169 - ticket 10128: update ICU to Unicode 6.3 beta
1170 - ticket 10168: update ICU to Unicode 6.3 final
1171 - C++ branches/markus/uni63 at r33552 from trunk at r33551
1172 - Java branches/markus/uni63 at r33550 from trunk at r33553
1174 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
1176 *** Unicode version numbers
1179 (configure.in & configure: have been modified to extract the version from uchar.h)
1180 - com.ibm.icu.util.VersionInfo
1181 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1183 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1184 so that the makefiles see the new version number.
1186 *** data files & enums & parser code
1190 - download UCD, UCA & IDNA files
1191 - make sure that the Unicode data folder passed into preparseucd.py
1192 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1193 - modify preparseucd.py:
1194 parse new file BidiBrackets.txt
1195 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
1196 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
1197 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1198 - Check test file diffs for previously commented-out, known-failing data lines;
1199 probably need to keep those commented out.
1201 * PropertyAliases.txt changes
1202 - 1 new Enumerated Property
1203 bpt ; Bidi_Paired_Bracket_Type
1204 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
1205 -> ubidi_props.h & .c & UBiDiProps.java
1206 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
1208 -> change ubidi.icu format version from 2.0 to 2.1
1209 - 1 new Miscellaneous Property
1210 bpb ; Bidi_Paired_Bracket
1211 -> uchar.h & UProperty.java
1214 * PropertyValueAliases.txt changes
1215 - 3 Bidi_Paired_Bracket_Type (bpt) values:
1219 -> uchar.h & UCharacter.BidiPairedBracketType
1220 -> ubidi_props.h & .c & UBiDiProps.java
1221 -> change ubidi.icu format version from 2.0 to 2.1
1222 - 4 new Bidi_Class (bc) values:
1223 bc ; FSI ; First_Strong_Isolate
1224 bc ; LRI ; Left_To_Right_Isolate
1225 bc ; RLI ; Right_To_Left_Isolate
1226 bc ; PDI ; Pop_Directional_Isolate
1227 -> uchar.h & UCharacterEnums.ECharacterDirection
1228 -> until the bidi code gets updated,
1229 Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
1230 - 3 new Word_Break (WB) values:
1231 WB ; HL ; Hebrew_Letter
1232 WB ; SQ ; Single_Quote
1233 WB ; DQ ; Double_Quote
1234 -> uchar.h & UCharacter.WordBreak
1235 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
1236 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1238 Aghb 239 Caucasian Albanian
1241 -> com.ibm.icu.lang.UScript
1242 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1243 replace public static final int \1 = \2;\3
1244 -> preparseucd.py _scripts_only_in_iso15924
1245 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1246 and in com.ibm.icu.dev.test.lang.TestUScript.java
1247 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1248 (not strictly necessary for NOT_ENCODED scripts)
1250 * generate normalization data files
1251 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
1252 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
1253 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
1254 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1255 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1256 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1257 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1259 * build ICU (make install)
1260 so that the tools build can pick up the new definitions from the installed header files.
1262 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1264 * build Unicode tools using CMake+make
1266 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1268 # Location (--prefix) of where ICU was installed.
1269 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
1270 # Location of the ICU source tree.
1271 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
1273 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1274 ~/svn.icutools/trunk/dbg/unicode/c$ make
1276 * generate core properties data files
1277 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
1278 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
1279 - rebuild ICU (make install) & tools
1280 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1281 - rebuild ICU (make install) & tools
1283 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1284 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1285 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1286 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
1287 - nothing new in 6.3, no test file to update
1289 * update Java data files
1290 - refresh just the UCD-related files, just to be safe
1291 - see (ICU4C)/source/data/icu4j-readme.txt
1293 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1296 Unicode .icu files built to ./out/build/icudt52l
1297 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1298 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
1299 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1300 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1301 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
1302 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
1303 mkdir -p /tmp/icu4j/main/shared/data
1304 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1305 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
1306 mkdir -p /tmp/icu4j/main/shared/data
1307 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1308 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
1309 - copy the big-endian Unicode data files to another location,
1310 separate from the other data files
1311 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1312 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1313 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1314 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
1315 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1316 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1317 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1319 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1321 * refresh Java test .txt files
1322 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1324 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
1326 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1327 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1328 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1329 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1330 (note removing the underscore before "Rules")
1331 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1332 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1333 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1334 - check test file diffs for previously commented-out, known-failing data lines;
1335 probably need to keep those commented out
1336 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1337 - run genuca, see command line above
1339 - refresh ICU4J collation data:
1340 (subset of instructions above for properties data refresh, except copies all coll/*)
1341 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1342 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1343 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1344 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1345 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1346 - note on intltest: if collate/UCAConformanceTest fails, then
1347 utility/MultithreadTest/TestCollators will fail as well;
1348 fix the conformance test before looking into the multi-thread test
1350 * test ICU, fix test code where necessary
1352 * When refreshing all of ICU4J data from ICU4C
1353 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1354 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1356 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1358 *** LayoutEngine script information
1359 - skipped for Unicode 6.3: no new scripts
1361 *** merge the Unicode update branches back onto the trunk
1362 - do not merge the icudata.jar and testdata.jar,
1363 instead rebuild them from merged & tested ICU4C
1365 ---------------------------------------------------------------------------- ***
1369 http://www.unicode.org/review/pri230/
1370 http://www.unicode.org/versions/beta-6.2.0.html
1371 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
1372 http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
1373 http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
1374 http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
1375 http://www.unicode.org/reports/tr46/tr46-8.html IDNA
1376 http://unicode.org/Public/idna/6.2.0/
1380 - ticket 9515: Unicode 6.2: final ICU update
1382 - ticket 9514: UCA 6.2: fix UCARules.txt
1384 - ticket 9437: update ICU to Unicode 6.2
1385 - C++ branches/markus/uni62 at r32050 from trunk at r32041
1386 - Java branches/markus/uni62 at r32068 from trunk at r32066
1388 *** Unicode version numbers
1391 (configure.in & configure: have been modified to extract the version from uchar.h)
1392 - com.ibm.icu.util.VersionInfo
1393 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1395 *** data files & enums & parser code
1399 - download UCD, UCA & IDNA files
1400 - make sure that the Unicode data folder passed into preparseucd.py
1401 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1402 - modify preparseucd.py: NamesList.txt is now in UTF-8
1403 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
1404 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1405 - Check test file diffs for previously commented-out, known-failing data lines;
1406 probably need to keep those commented out.
1408 * PropertyValueAliases.txt changes
1409 - 1 new Line_Break (lb) value:
1410 lb ; RI ; Regional_Indicator
1411 -> uchar.h & UCharacter.LineBreak
1412 - 1 new Word_Break (WB) value:
1413 WB ; RI ; Regional_Indicator
1414 -> uchar.h & UCharacter.WordBreak
1415 - 1 new Grapheme_Cluster_Break (GCB) value:
1416 GCB; RI ; Regional_Indicator
1417 -> uchar.h & UCharacter.GraphemeClusterBreak
1419 * 3 new numeric values
1420 The new value -1, which was really supposed to be NaN but that would have required
1421 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
1422 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
1423 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
1424 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
1425 The two new values 216000 and 432000 require an addition to the encoding of numeric values.
1426 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
1427 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
1428 -> uprops.h, uchar.c & UCharacterProperty.java
1429 -> cucdtst.c & UCharacterTest.java
1431 * generate normalization data files
1432 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
1433 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
1434 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
1435 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1436 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1437 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1438 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1440 * build ICU (make install)
1441 so that the tools build can pick up the new definitions from the installed header files.
1442 * build Unicode tools using CMake+make
1444 * generate core properties data files
1445 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
1446 - in initial bootstrapping, change the UCA version
1447 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1448 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
1449 - rebuild ICU (make install) & tools
1450 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1451 check if the UCA version in FractionalUCA.txt matches the new Unicode version
1453 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1454 - rebuild ICU (make install) & tools
1456 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1457 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1458 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1459 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
1460 - nothing new in 6.2, no test file to update
1462 * update Java data files
1463 - refresh just the UCD-related files, just to be safe
1464 - see (ICU4C)/source/data/icu4j-readme.txt
1466 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1469 Unicode .icu files built to ./out/build/icudt50l
1470 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1471 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
1472 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1473 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1474 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
1475 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
1476 mkdir -p /tmp/icu4j/main/shared/data
1477 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1478 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
1479 mkdir -p /tmp/icu4j/main/shared/data
1480 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1481 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
1482 - copy the big-endian Unicode data files to another location,
1483 separate from the other data files
1484 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1485 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1486 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1487 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
1488 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1489 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1490 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1492 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1494 * refresh Java test .txt files
1495 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1499 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1500 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1501 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1502 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1503 (note removing the underscore before "Rules")
1504 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1505 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1506 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1507 - check test file diffs for previously commented-out, known-failing data lines;
1508 probably need to keep those commented out
1509 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1510 - run genuca, see command line above
1512 - refresh ICU4J collation data:
1513 (subset of instructions above for properties data refresh, except copies all coll/*)
1514 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1515 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1516 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1517 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1518 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1519 - note on intltest: if collate/UCAConformanceTest fails, then
1520 utility/MultithreadTest/TestCollators will fail as well;
1521 fix the conformance test before looking into the multi-thread test
1523 * test ICU, fix test code where necessary
1525 * When refreshing all of ICU4J data from ICU4C
1526 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1527 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1529 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1531 *** LayoutEngine script information
1532 - skipped for Unicode 6.2: no new scripts
1534 *** merge the Unicode update branches back onto the trunk
1535 - do not merge the icudata.jar and testdata.jar,
1536 instead rebuild them from merged & tested ICU4C
1538 ---------------------------------------------------------------------------- ***
1540 Future Unicode update
1542 Tools simplified since the Unicode 6.1 update. See
1543 - http://site.icu-project.org/design/props/ppucd
1544 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
1546 * Unicode version numbers
1547 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
1550 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
1551 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
1552 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1553 - Check test file diffs for previously commented-out, known-failing data lines;
1554 probably need to keep those commented out.
1556 * PropertyValueAliases.txt changes
1557 - Script codes that are in ISO 15924 but not in Unicode are now listed in
1558 preparseucd.py, in the _scripts_only_in_iso15924 variable.
1559 If there are new ISO codes, then add them.
1560 If Unicode adds some of them, then remove them from the .py variable.
1562 * UnicodeData.txt changes
1563 - No more manual changes for CJK ranges for algorithmic names;
1564 those are now written to ppucd.txt and genprops reads them from there.
1566 * generate core properties data files (makeprops.sh was deleted)
1567 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
1569 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
1570 - it is now generated by preparseucd.py
1572 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
1573 - it is now generated by preparseucd.py
1574 - make sure that the Unicode data folder passed into preparseucd.py
1575 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1576 (can be in some subfolder)
1578 * generate normalization data files
1579 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
1580 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
1581 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
1582 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1583 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1584 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1585 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1587 * build ICU (make install)
1588 * build Unicode tools using CMake+make
1590 * new way to call genuca (makeuca.sh was deleted)
1591 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
1593 ---------------------------------------------------------------------------- ***
1599 - ticket 8995 final update to Unicode 6.1
1600 - ticket 8994 regenerate source/layout/CanonData.cpp
1602 - ticket 8961 support Unicode "Age" value *names*
1603 - ticket 8963 support multiple character name aliases & types
1605 - ticket 8827 "update ICU to Unicode 6.1"
1606 - C++ branches/markus/uni61 at r30864 from trunk at r30843
1607 - Java branches/markus/uni61 at r30865 from trunk at r30863
1609 *** Unicode version numbers
1612 (configure.in & configure: have been modified to extract the version from uchar.h)
1613 - com.ibm.icu.util.VersionInfo
1614 - icutools/unicode/makedefs.sh
1615 + also review & update other definitions in that file,
1616 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
1618 *** data files & enums & parser code
1622 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
1623 - This prepares both unidata and testdata files in respective output subfolders.
1624 - Check test file diffs for previously commented-out, known-failing data lines;
1625 probably need to keep those commented out.
1627 * PropertyValueAliases.txt changes
1628 - 11 new block names:
1630 Arabic_Mathematical_Alphabetic_Symbols
1632 Meetei_Mayek_Extensions
1634 Meroitic_Hieroglyphs
1638 Sundanese_Supplement
1641 -> add to UCharacter.UnicodeBlock IDs
1642 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1643 replace public static final int \1_ID = \2; \3
1644 -> add to UCharacter.UnicodeBlock objects
1645 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1646 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1647 - 1 new Joining_Group (jg) value:
1649 -> uchar.h & UCharacter.JoiningGroup
1650 - 2 new Line_Break (lb) values:
1651 CJ=Conditional_Japanese_Starter
1653 -> uchar.h & UCharacter.LineBreak
1656 sc ; Merc ; Meroitic_Cursive
1657 sc ; Mero ; Meroitic_Hieroglyphs
1660 sc ; Sora ; Sora_Sompeng
1662 -> remove these from SyntheticPropertyValueAliases.txt
1663 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1664 and in com.ibm.icu.dev.test.lang.TestUScript.java
1665 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1669 and another one added 2011-12-09
1670 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
1672 -> com.ibm.icu.lang.UScript
1673 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1674 replace public static final int \1 = \2;\3
1675 -> SyntheticPropertyValueAliases.txt
1676 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1677 and in com.ibm.icu.dev.test.lang.TestUScript.java
1679 * UnicodeData.txt changes
1680 - the last Unihan code point changes from U+9FCB to U+9FCC
1681 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
1682 + do change gennames.c
1683 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1685 * DerivedBidiClass.txt changes
1686 - 2 new default-AL blocks:
1687 # Arabic Extended-A: U+08A0 - U+08FF (was default-R)
1688 # Arabic Mathematical Alphabetic Symbols:
1689 # U+1EE00 - U+1EEFF (was default-R)
1690 - 2 new default-R blocks:
1691 # Meroitic Hieroglyphs:
1693 # Meroitic Cursive: U+109A0 - U+109FF
1694 -> should be picked up by the explicit data in the file
1696 * NameAliases.txt changes
1698 # Each line has two fields
1699 # First field: Code point
1700 # Second field: Alias
1702 # Each line has three fields, as described here:
1704 # First field: Code point
1705 # Second field: Alias
1707 - Also, the file previously allowed multiple aliases but only now does it
1708 actually provide multiple, even multiple of the same type. For example,
1709 FEFF;BYTE ORDER MARK;alternate
1710 FEFF;BOM;abbreviation
1711 FEFF;ZWNBSP;abbreviation
1712 - This breaks our gennames parser, unames.icu data structure, and API.
1713 Fix gennames to only pick up "correction" aliases.
1714 New ticket #8963 for further changes.
1716 * run genpname/preparse.pl (on Linux)
1717 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1718 + make sure that data.h is writable
1719 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1720 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1722 * build ICU (make install)
1723 so that the tools build can pick up the new definitions from the installed header files.
1724 * build Unicode tools (at least genpname) using CMake+make
1727 (builds both pnames.icu and propname_data.h)
1728 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1729 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1731 * build ICU (make install)
1732 * build Unicode tools using CMake+make
1734 * update source/data/unidata/norm2/nfkc_cf.txt
1735 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1737 * update source/data/unidata/norm2/uts46.txt
1738 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1739 to ~/svn.icu/tools/trunk/src/unicode/py
1740 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1741 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1742 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1744 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1745 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1746 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1747 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
1748 - nothing new in 6.1, no test file to update
1750 * generate core properties data files
1751 - in initial bootstrapping, change the UCA version
1752 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1753 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1754 - rebuild ICU & tools
1755 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1756 check if the UCA version in FractionalUCA.txt matches the new Unicode version
1758 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1759 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1760 - rebuild ICU & tools
1762 * update Java data files
1763 - refresh just the UCD-related files, just to be safe
1764 - see (ICU4C)/source/data/icu4j-readme.txt
1766 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1769 Unicode .icu files built to ./out/build/icudt49l
1770 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1771 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1772 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1773 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1774 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1775 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1776 mkdir -p /tmp/icu4j/main/shared/data
1777 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1778 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1779 mkdir -p /tmp/icu4j/main/shared/data
1780 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1781 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1782 - copy the big-endian Unicode data files to another location,
1783 separate from the other data files
1784 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1785 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1786 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1787 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1788 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1789 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1790 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1792 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1794 * refresh Java test .txt files
1795 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1797 * test ICU so far, fix test code where necessary
1798 - temporarily ignore collation issues that look like UCA/UCD mismatches,
1799 until UCA data is updated
1803 - get output from Mark's tools; look in
1804 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1805 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1806 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1807 (note removing the underscore before "Rules")
1808 - update (ICU)/source/test/testdata/CollationTest_*.txt
1809 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1810 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1811 - check test file diffs for previously commented-out, known-failing data lines;
1812 probably need to keep those commented out
1813 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1815 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1817 - refresh ICU4J collation data:
1818 (subset of instructions above for properties data refresh, except copies all coll/*)
1819 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1820 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1821 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1822 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1823 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1824 - note on intltest: if collate/UCAConformanceTest fails, then
1825 utility/MultithreadTest/TestCollators will fail as well;
1826 fix the conformance test before looking into the multi-thread test
1828 * When refreshing all of ICU4J data from ICU4C
1829 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1830 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1832 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1834 *** LayoutEngine script information
1836 (For details see the Unicode 5.2 change log below.)
1838 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1839 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1840 in the working directory.
1841 (It also generates ScriptRunData.cpp, which is no longer needed.)
1843 The generated files have a current copyright date and "@draft" statement.
1845 - diff current <icu>/source/layout files vs. generated ones
1846 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1847 review and manually merge desired changes;
1848 fix gratuitous changes, incorrect @draft and missing aliases;
1849 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1850 - if you just copy the above files, then
1851 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1852 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1854 *** merge the Unicode update branches back onto the trunk
1855 - do not merge the icudata.jar and testdata.jar,
1856 instead rebuild them from merged & tested ICU4C
1858 ---------------------------------------------------------------------------- ***
1860 ICU 4.8 (no Unicode update, just new script codes)
1862 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1868 Shrd 319 Sharada, Śāradā
1869 Sora 398 Sora Sompeng
1870 Takr 321 Takri, Ṭākrī, Ṭāṅkrī
1874 -> com.ibm.icu.lang.UScript
1875 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1876 replace public static final int \1 = \2;\3
1877 -> genpname/SyntheticPropertyValueAliases.txt
1878 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1879 and in com.ibm.icu.dev.test.lang.TestUScript.java
1881 * run genpname/preparse.pl (on Linux)
1882 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1883 + make sure that data.h is writable
1884 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1885 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1887 * rebuild Unicode tools (at least genpname) using make
1888 - You might first need to "make install" ICU so that the tools build can pick
1889 up the new definitions from the installed header files.
1892 (builds both pnames.icu and propname_data.h)
1893 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1894 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1895 - rebuild ICU & tools
1898 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1899 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1900 - rebuild ICU & tools
1902 * update Java data files
1903 - refresh just the UCD-related files, just to be safe
1904 - see (ICU4C)/source/data/icu4j-readme.txt
1906 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1907 - copy the big-endian Unicode data files to another location,
1908 separate from the other data files
1909 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1910 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1911 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1913 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1915 * should have updated the layout engine script codes but forgot
1917 ---------------------------------------------------------------------------- ***
1921 *** related ICU Trac tickets
1923 7264 Unicode 6.0 Update
1925 *** Unicode version numbers
1928 (configure.in & configure: have been modified to extract the version from uchar.h)
1929 - com.ibm.icu.util.VersionInfo
1931 *** data files & enums & parser code
1935 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1936 - This now prepares both unidata and testdata files in respective output subfolders.
1938 * PropertyAliases.txt changes
1939 - new Script_Extensions property defined in the new ScriptExtensions.txt file
1940 but not listed in PropertyAliases.txt; reported to unicode.org;
1941 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1942 scx; Script_Extensions
1943 -> uchar.h with new UProperty section
1944 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1946 * PropertyValueAliases.txt changes
1947 - 12 new block names:
1952 CJK_Unified_Ideographs_Extension_D
1957 Miscellaneous_Symbols_And_Pictographs
1959 Transport_And_Map_Symbols
1961 -> add to UCharacter.UnicodeBlock
1962 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1963 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1964 - Joining_Group (jg) values:
1965 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1966 -> uchar.h & UCharacter.JoiningGroup
1971 -> remove these from SyntheticPropertyValueAliases.txt
1972 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1973 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1974 and in com.ibm.icu.dev.test.lang.TestUScript.java
1975 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1976 (added 2009-11-11..2010-07-18)
1978 Dupl 755 Duployan shortand
1984 Merc 101 Meroitic Cursive
1985 Narb 106 Old North Arabian
1989 Wara 262 Warang Citi
1991 -> com.ibm.icu.lang.UScript
1992 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1993 replace public static final int \1 = \2;\3
1994 -> SyntheticPropertyValueAliases.txt
1995 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1996 and in com.ibm.icu.dev.test.lang.TestUScript.java
1997 - ISO 15924 name change
1998 Mero 100 Meroitic Hieroglyphs (was Meroitic)
1999 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
2000 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
2002 * UnicodeData.txt changes
2004 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2005 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
2006 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
2008 * build Unicode tools using CMake+make
2010 * run genpname/preparse.pl (on Linux)
2011 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2012 + make sure that data.h is writable
2013 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2014 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2016 * rebuild Unicode tools (at least genpname) using make
2017 - You might first need to "make install" ICU so that the tools build can pick
2018 up the new definitions from the installed header files.
2021 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2022 - rebuild ICU & tools
2024 * update source/data/unidata/norm2/nfkc_cf.txt
2025 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2027 * update source/data/unidata/norm2/uts46.txt
2028 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
2029 to ~/svn.icu/tools/trunk/src/unicode/py
2030 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
2031 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2032 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2034 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2035 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2036 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2037 - Unicode 6.0: U+2260, U+226E, U+226F
2039 * generate core properties data files
2040 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2041 - rebuild ICU & tools
2042 - run makeuca.sh so that genuca picks up the new nfc.nrm:
2043 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2044 - rebuild ICU & tools
2046 * implement new Script_Extensions property (provisional)
2047 - parser & generator: genprops & uprops.icu
2048 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
2049 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
2051 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
2053 - genbidi/gencase/genprops tools changes
2054 - re-run makeprops.sh (see above)
2055 - UCharacterProperty.java, UCharacterTypeIterator.java,
2056 UBiDiProps.java, UCaseProps.java, and several others with minor changes;
2057 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
2059 * update Java data files
2060 - refresh just the UCD-related files, just to be safe
2061 - see (ICU4C)/source/data/icu4j-readme.txt
2063 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2066 Unicode .icu files built to ./out/build/icudt45l
2067 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2068 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2069 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2070 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
2071 mkdir -p /tmp/icu4j/main/shared/data
2072 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2073 - copy the big-endian Unicode data files to another location,
2074 separate from the other data files
2075 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2076 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2077 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2078 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
2079 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2080 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2081 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2083 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2085 * refresh Java test .txt files
2086 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2088 * un-hardcode normalization skippable (NF*_Inert) test data
2089 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
2091 * copy updated break iterator test files
2092 - now handled by early ucdcopy.py and
2093 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
2095 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
2096 to ~/svn.icu/trunk/src/source/test/testdata)
2097 - they are not used in ICU4J
2101 - get output from Mark's tools; look in
2102 http://www.unicode.org/~book/incoming/mark/uca6.0.0/
2103 http://www.macchiato.com/unicode/utc/additional-uca-files
2104 http://www.unicode.org/Public/UCA/6.0.0/
2105 http://www.unicode.org/~mdavis/uca/
2106 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2107 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2108 - update Han-implicit ranges for new CJK extensions:
2109 swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2110 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
2111 do not add it into invuca so that tailoring primary-after an ignorable works
2112 - genuca: permit space between [variable top] bytes
2113 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
2115 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2117 - refresh ICU4J collation data:
2118 (subset of instructions above for properties data refresh, except copies all coll/*)
2119 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2120 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2121 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2122 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2123 - update (ICU)/source/test/testdata/CollationTest_*.txt
2124 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2125 with output from Mark's Unicode tools
2126 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2127 - note on intltest: if collate/UCAConformanceTest fails, then
2128 utility/MultithreadTest/TestCollators will fail as well;
2129 fix the conformance test before looking into the multi-thread test
2131 * When refreshing all of ICU4J data from ICU4C
2132 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2133 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2135 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2137 *** LayoutEngine script information
2139 (For details see the Unicode 5.2 change log below.)
2141 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2142 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2143 ScriptRunData.cpp, which is no longer needed.)
2145 The generated files have a current copyright date and "@draft" statement.
2147 * copy the above files into <icu>/source/layout, replacing the old files.
2148 * fix mixed line endings
2149 * review the diffs and fix incorrect @draft and missing aliases;
2150 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2151 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2153 ---------------------------------------------------------------------------- ***
2157 *** related ICU Trac tickets
2161 7167 verify collation bytes
2162 7235 Java test NAME_ALIAS
2163 7236 Java DerivedCoreProperties.txt test
2164 7237 Java BidiTest.txt
2165 7238 UTrie2 in core unidata
2166 7239 test for tailoring gaps
2167 7240 Java fix CollationMiscTest
2168 7243 update layout engine for Unicode 5.2
2170 *** Unicode version numbers
2173 - configure.in & configure
2174 - update ucdVersion in gennames.c if an algorithmic range changes
2176 *** data files & enums & parser code
2180 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
2181 - includes finding files regardless of version numbers,
2182 copying them, and performing the equivalent processing of the
2183 ucdstrip and ucdmerge tools on the desired set of files
2186 - PropertyAliases.txt
2187 moved from numeric to enumerated:
2188 ccc ; Canonical_Combining_Class
2189 new string properties:
2190 NFKC_CF ; NFKC_Casefold
2191 Name_Alias; Name_Alias
2192 new binary properties:
2195 CWCF ; Changes_When_Casefolded
2196 CWCM ; Changes_When_Casemapped
2197 CWKCF ; Changes_When_NFKC_Casefolded
2198 CWL ; Changes_When_Lowercased
2199 CWT ; Changes_When_Titlecased
2200 CWU ; Changes_When_Uppercased
2201 new CJK Unihan properties (not supported by ICU)
2202 - PropertyValueAliases.txt
2205 one script code change:
2206 sc ; Qaai ; Inherited
2208 sc ; Zinh ; Inherited ; Qaai
2209 new Line_Break (lb) value:
2210 lb ; CP ; Close_Parenthesis
2211 new Joining_Group (jg) values: Farsi_Yeh, Nya
2213 ccc; 214; ATA ; Attached_Above
2214 - DerivedBidiClass.txt
2215 new default-R range: U+1E800 - U+1EFFF
2217 all of the ISO comments are gone
2219 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
2221 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2222 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2226 + cd \svn\icuproj\icu\trunk\source\tools\genpname
2227 + make sure that data.h is writable
2228 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
2229 + preparse.pl complains with errors like the following:
2230 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
2231 This is because ICU 4.0 had scripts from ISO 15924 which are now
2232 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
2233 and PropertyValueAliases.txt.
2234 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2235 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
2236 + preparse.pl complains with errors about block names missing from uchar.h; add them
2238 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2239 - new block & script values
2241 copy new blocks from Blocks.txt
2242 MS VC++ 2008 regular expression:
2243 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
2244 replace with " UBLOCK_\3 = 172, /*[\1]*/"
2245 + several new script values already added in ICU 4.0 for ISO 15924 coverage
2246 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
2247 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
2248 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
2249 (added to SyntheticPropertyValueAliases.txt)
2250 - new Joining Group (JG) values: Farsi_Yeh, Nya
2251 - new Line_Break (lb) value:
2252 lb ; CP ; Close_Parenthesis
2254 * hardcoded Unihan range end/limit
2255 - Unihan range end moves from 9FC3 to 9FCB
2256 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
2257 + do change gennames.c
2259 * Compare definitions of new binary properties with what we used to use
2260 in algorithms, to see if the definitions changed.
2261 - Verified that definitions for Cased and Case_Ignorable are unchanged.
2262 The gencase tool now parses the newly public Case_Ignorable values
2263 in case the definition changes in the future.
2265 * uchar.c & uprops.h & uprops.c & genprops
2266 - new numeric values that didn't exist in Unicode data before:
2267 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
2268 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
2269 therefore redesign the encoding of numeric types and values for formatVersion 6;
2270 design for simple numbers up to at least 144 ("one gross"),
2271 large values up to at least 10^20,
2272 and fractions with numerators -1..17 and denominators 1..16
2273 to cover current and expected future values
2274 (e.g., more Han numeric values, Meroitic twelfths)
2276 * reimplement Hangul_Syllable_Type for new Jamo characters
2277 - the old code assumed that all Jamo characters are in the 11xx block
2278 - Unicode 5.2 fills holes there and adds new Jamo characters in
2279 A960..A97F; Hangul Jamo Extended-A
2281 D7B0..D7FF; Hangul Jamo Extended-B
2282 - Hangul_Syllable_Type can be trivially derived from a subset of
2283 Grapheme_Cluster_Break values
2285 * build Unicode data source code for hardcoding core data
2286 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
2288 ICU data make path is \svn\icuproj\icu\trunk\source\data\
2289 ICU root path is \svn\icuproj\icu\trunk
2290 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2291 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2292 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2293 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2294 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2295 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2296 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2297 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
2298 Creating data file for Unicode Property Names
2299 Creating data file for Unicode Character Properties
2300 Creating data file for Unicode Case Mapping Properties
2301 Creating data file for Unicode BiDi/Shaping Properties
2302 Creating data file for Unicode Normalization
2303 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
2304 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
2306 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
2307 and rebuild the common library
2311 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
2312 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
2313 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
2314 [ Begin obsolete instructions:
2315 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
2316 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
2318 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
2319 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
2320 End obsolete instructions]
2321 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2322 not just the *_STUB.txt files
2323 - note on intltest: if collate/UCAConformanceTest fails, then
2324 utility/MultithreadTest/TestCollators will fail as well;
2325 fix the conformance test before looking into the multi-thread test
2327 *** Implement Cased & Case_Ignorable properties
2328 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
2329 - Problem: These properties should be disjoint, but aren't
2330 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
2331 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
2333 *** Implement Changes_When_Xyz properties
2334 - without stored data
2336 *** Implement Name_Alias property
2337 - add it as another name field in unames.icu
2338 - make it available via u_charName() and UCharNameChoice and
2339 - consider it in u_charFromName()
2343 * Update break iterator rules to new UAX versions and new property values
2344 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
2346 *** new BidiTest file
2347 - review format and data
2348 - copy BidiTest.txt to source/test/testdata
2349 - write test code using this data
2350 - fix ICU code where it fails the conformance test
2353 - generally, find and update code corresponding to C/C++
2354 - UCharacter.UnicodeBlock constants:
2355 a) add an _ID integer per new block, update COUNT
2356 b) add a class instance per new block
2357 Visual Studio regex:
2358 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
2359 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2360 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
2362 - port test changes to Java
2364 *** LayoutEngine script information
2366 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
2368 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2369 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2370 ScriptRunData.cpp, which is no longer needed.)
2372 The generated files have a current copyright date and "@draft" statement.
2374 -> Eric Mader wrote in email on 20090930:
2375 "I think the tool has been modified to update @draft to @stable for
2376 older scripts and to add @draft for new scripts.
2377 (I worked with an intern on this last year.)
2378 You should check the output after you run it."
2380 * copy the above files into <icu>/source/layout, replacing the old files.
2381 * fix mixed line endings
2382 * review the diffs and fix incorrect @draft and missing aliases
2383 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2385 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2386 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2388 -> Eric Mader wrote in email on 20090930:
2389 "This is just a matter of making sure that all the per-script tables have
2390 entries for any new scripts that were added.
2391 If any new Indic characters were added, then the class tables in
2392 IndicClassTables.cpp should be updated to reflect this.
2393 John Emmons should know how to do this if it's required."
2395 * rebuild the layout and layoutex libraries.
2399 + Jamo_Short_Name, sfc->scf, binary property value aliases
2401 ---------------------------------------------------------------------------- ***
2405 *** related ICU Trac tickets
2407 5696 Update to Unicode 5.1
2409 *** Unicode version numbers
2412 - configure.in & configure
2413 - update ucdVersion in gennames.c if an algorithmic range changes
2415 *** data files & enums & parser code
2419 DerivedCoreProperties.txt
2420 DerivedNormalizationProps.txt
2421 NormalizationTest.txt
2424 GraphemeBreakProperty.txt
2425 SentenceBreakProperty.txt
2426 WordBreakProperty.txt
2427 - ucdstrip and ucdmerge:
2431 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2432 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
2433 copy 5.1.0\ucd\Blocks.txt ..\unidata\
2434 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
2435 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
2436 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2437 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2438 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2439 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2440 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
2441 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
2442 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
2443 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
2444 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
2446 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2447 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2448 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2449 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
2450 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2451 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2452 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2453 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2454 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2455 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2459 + cd \svn\icuproj\icu\uni51\source\tools\genpname
2460 + make sure that data.h is writable
2461 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
2462 + preparse.pl complains with errors like the following:
2463 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
2464 This is because ICU 3.8 had scripts from ISO 15924 which are now
2465 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
2466 and PropertyValueAliases.txt.
2467 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2468 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
2469 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
2470 N/Y, No/Yes, F/T, False/True
2471 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
2472 It will use further values from the file if present.
2474 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2475 - new block & script values
2477 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
2478 (removed from SyntheticPropertyValueAliases.txt)
2479 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
2480 (added to SyntheticPropertyValueAliases.txt)
2481 - uprops.icu (uprops.h) only provides 7 bits for script codes.
2482 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
2483 There is none above 127 yet which is the script code for an
2484 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
2485 script code values greater than 127.
2486 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
2487 in a parallel bit field, and that overflows now.
2488 Also, future values >=128 would be incompatible anyway.
2489 uprops.h is modified to move around several of the bit fields
2490 in the properties vector words, and now uses 8 bits for the script code.
2491 Two other bit fields also grow to accommodate future growth:
2492 Block (current count: 172) grows from 8 to 9 bits,
2493 and Word_Break grows from 4 to 5 bits.
2494 - renamed property Simple_Case_Folding (sfc->scf)
2495 + nothing to be done: handled as normal alias
2496 - new property JSN Jamo_Short_Name
2497 + no new API: only contributes to the Name property
2498 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
2499 - new Joining Group (JG) value: Burushashki_Yeh_Barree
2500 - new Sentence_Break (SB) values:
2505 - new Word_Break (WB) values:
2507 WB ; Extend ; Extend
2511 * Further changes in the 2008-02-29 update:
2512 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
2513 because they should not normally be invisible.
2514 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
2515 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
2516 - new Word_Break (WB) value: NL=Newline
2518 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
2519 - Unihan range end moves from 9FBB to 9FC3
2520 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
2521 + do change gennames.c
2523 * build Unicode data source code for hardcoding core data
2524 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
2526 ICU data make path is \svn\icuproj\icu\uni51\source\data\
2527 ICU root path is \svn\icuproj\icu\uni51
2528 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2529 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2530 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2531 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2532 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2533 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2534 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2535 Creating data file for Unicode Character Properties
2536 Creating data file for Unicode Case Mapping Properties
2537 Creating data file for Unicode BiDi/Shaping Properties
2538 Creating data file for Unicode Normalization
2539 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
2540 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
2542 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
2543 and rebuild the common library
2547 * Update break iterator rules to new UAX versions and new property values
2551 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2554 - Test that APIs using Unicode property value aliases (like UnicodeSet)
2555 support all of the boolean values N/Y, No/Yes, F/T, False/True
2556 -> TestBinaryValues() tests in both cintltst and intltest
2558 *** LayoutEngine script information
2559 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2560 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2561 ScriptRunData.cpp, which is no longer needed.)
2563 The generated files have a current copyright date and "@draft" statement.
2565 * copy the above files into <icu>/source/layout, replacing the old files.
2567 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2568 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2570 * rebuild the layout and layoutex libraries.
2574 + Jamo_Short_Name, sfc->scf, binary property value aliases
2576 ---------------------------------------------------------------------------- ***
2580 *** related Jitterbugs
2582 5084 RFE: Update to Unicode 5.0
2584 *** data files & enums & parser code
2588 DerivedCoreProperties.txt
2589 DerivedNormalizationProps.txt
2590 NormalizationTest.txt
2593 GraphemeBreakProperty.txt
2594 SentenceBreakProperty.txt
2595 WordBreakProperty.txt
2596 - ucdstrip and ucdmerge:
2600 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2601 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
2602 copy 5.0.0\ucd\Blocks.txt ..\unidata\
2603 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
2604 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
2605 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2606 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2607 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2608 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2609 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
2610 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
2611 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
2612 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
2613 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
2615 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2616 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2617 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2618 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
2619 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2620 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2621 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2622 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2623 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2624 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2626 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2630 + make sure that data.h is writable
2631 + perl preparse.pl \cvs\oss\icu > out.txt
2633 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2634 - new block & script values
2635 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
2637 * build Unicode data source code for hardcoding core data
2638 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
2640 ICU data make path is \cvs\oss\icu\source\data\
2641 ICU root path is \cvs\oss\icu
2642 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2644 Creating data file for Unicode Character Properties
2645 Creating data file for Unicode Case Mapping Properties
2646 Creating data file for Unicode BiDi/Shaping Properties
2647 Creating data file for Unicode Normalization
2648 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
2649 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
2651 - copy the .c source files to C:\cvs\oss\icu\source\common
2652 and rebuild the common library
2654 *** Unicode version numbers
2659 *** LayoutEngine script information
2660 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2661 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2662 ScriptRunData.cpp, which is no longer needed.)
2664 The generated files have a current copyright date and "@draft" statement.
2666 * copy the above files into <icu>/source/layout, replacing the old files.
2668 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2669 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2671 * rebuild the layout and layoutex libraries.
2673 ---------------------------------------------------------------------------- ***
2677 *** related Jitterbugs
2679 4332 RFE: Update to Unicode 4.1
2680 4157 RBBI, TR29 4.1 updates
2682 *** data files & enums & parser code
2686 DerivedCoreProperties.txt
2687 DerivedNormalizationProps.txt
2688 NormalizationTest.txt
2689 GraphemeBreakProperty.txt
2690 SentenceBreakProperty.txt
2691 WordBreakProperty.txt
2692 - ucdstrip and ucdmerge:
2696 * add new files to the repository
2697 GraphemeBreakProperty.txt
2698 SentenceBreakProperty.txt
2699 WordBreakProperty.txt
2701 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2704 - handle new enumerated properties in sub read_uchar
2707 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2708 - new binary properties
2710 + Pattern_White_Space
2711 - new enumerated properties
2712 + Grapheme_Cluster_Break
2715 - new block & script & line break values
2718 - case-ignorable changes
2719 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2720 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2722 *** Unicode version numbers
2728 - verify that u_charMirror() round-trips
2729 - test all new properties and some new values of old properties
2733 * hardcoded Unihan range end/limit
2734 - Unihan range end moves from 9FA5 to 9FBB
2735 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2736 + do not modify BOCU/BOCSU code because that would change the encoding
2737 and break binary compatibility!
2738 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2740 + ignore trietest.c: test data is arbitrary
2741 + ignore tstnorm.cpp: test optimization, not important
2742 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2743 + do change line_th.txt and word_th.txt
2744 by replacing hardcoded ranges with the new property values
2745 + do change gennames.c
2747 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2748 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2749 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
2752 - compare new special casing context conditions with previous ones
2753 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2756 - consider storing only the short name if it is the same as the long name
2759 - UAX #29 changes (grapheme/word/sentence breaks)
2760 - UAX #14 changes (line breaks)
2761 - Pattern_Syntax & Pattern_White_Space
2763 ---------------------------------------------------------------------------- ***
2765 Unicode 4.0.1 update
2767 *** related Jitterbugs
2769 3170 RFE: Update to Unicode 4.0.1
2770 3171 Add new Unicode 4.0.1 properties
2771 3520 use Unicode 4.0.1 updates for break iteration
2773 *** data files & enums & parser code
2776 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2777 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2780 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2781 according to PRI #26
2782 http://www.unicode.org/review/resolved-pri.html#pri26
2783 - undone again because no corrigendum in sight;
2784 instead modified tests to not check consistency on this for Unicode 4.0.1
2787 - update from http://www.unicode.org/copyright.html
2788 formatted for plain text
2790 * uchar.h & uprops.h & uprops.c & genprops
2791 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2792 - add U_LB_INSEPARABLE due to a spelling fix
2793 + put short name comment only on line with new constant
2794 for genpname perl script parser
2795 - new binary properties
2797 + Variation_Selector
2800 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
2801 - perl script: correctly calculate the maximum number of fields per row
2804 - new script code Hrkt=Katakana_Or_Hiragana
2806 * gennorm.c track changes in DerivedNormalizationProps.txt
2807 - "FNC" -> "FC_NFKC"
2808 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2810 * genprops/props2.c track changes in DerivedNumericValues.txt
2811 - changed from 3 columns to 2, dropping the numeric type
2812 + assume that the type is always numeric for Han characters,
2813 and that only those are added in addition to what UnicodeData.txt lists
2815 *** Unicode version numbers
2821 - update test of default bidi classes according to PRI #28
2822 /tsutil/cucdtst/TestUnicodeData
2823 http://www.unicode.org/review/resolved-pri.html#pri28
2824 - bidi tests: change exemplar character for ES depending on Unicode version
2825 - change hardcoded expected property values where they change
2833 - use new Hrkt=Katakana_Or_Hiragana
2836 - are now part of combining character sequences
2837 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ