source/data/unidata/changes.txt

   1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
   2 * License & terms of use: http://www.unicode.org/copyright.html
   3 * Copyright (C) 2004-2016, International Business Machines
   4 * Corporation and others.  All Rights Reserved.
   5 *
   6 *   file name:  changes.txt
   7 *   encoding:   US-ASCII
   8 *   tab size:   8 (not used)
   9 *   indentation:4
  10 *
  11 *   created on: 2004may06
  12 *   created by: Markus W. Scherer
  13 *
  14 * change log for Unicode updates
  15
  16 ---------------------------------------------------------------------------- ***
  17
  18 * New ISO 15924 script codes
  19
  20 Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
  21 until they are encoded in Unicode,
  22 or can be assumed to be encoded in the next Unicode version.
  23 Script enum constant names want to follow the Unicode script property value aliases,
  24 which are assigned only when the scripts are encoded.
  25 When we encode scripts early and guess wrong, then we have confusing enum constants
  26 and have sometimes added aliases.
  27
  28 Variant script codes like Latf and Aran that are not subject to separate encoding
  29 can be added at any time.
  30 (For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
  31
  32 We add script codes used in CLDR or in the spoof checker.
  33 This includes combination/alias codes like Hanb and Jamo.
  34 See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
  35 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
  36
  37 We add special Z* script codes like Zsye.
  38
  39 For new script codes see http://www.unicode.org/iso15924/codechanges.html
  40
  41 ---------------------------------------------------------------------------- ***
  42
  43 Unicode 9.0 update for ICU 58
  44
  45 * Command-line environment setup
  46
  47 ICU_ROOT=~/svn.icu/trunk
  48 ICU_SRC_DIR=$ICU_ROOT/src
  49 ICUDT=icudt58b
  50 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
  51 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
  52 UNIDATA=$ICU_SRC_DIR/source/data/unidata
  53
  54 http://www.unicode.org/review/pri323/  -- beta review
  55 http://www.unicode.org/reports/uax-proposed-updates.html
  56 http://www.unicode.org/versions/beta-9.0.0.html
  57 http://www.unicode.org/versions/Unicode9.0.0/
  58 http://www.unicode.org/reports/tr44/tr44-17.html
  59
  60 *** ICU Trac
  61
  62 - ticket:12526: integrate Unicode 9
  63 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
  64 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
  65
  66 *** CLDR Trac
  67
  68 - cldrbug 9414: UCA 9
  69 - ^/branches/markus/uni90 at r11518 from trunk at r11517
  70
  71 - cldrbug 8745: Unicode 9.0 script metadata
  72
  73 *** Unicode version numbers
  74 - makedata.mak
  75 - uchar.h
  76 - com.ibm.icu.util.VersionInfo
  77 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
  78
  79 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
  80   so that the makefiles see the new version number.
  81
  82 *** data files & enums & parser code
  83
  84 * file preparation
  85
  86 - download UCD & IDNA files
  87 - make sure that the Unicode data folder passed into preparseucd.py
  88   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
  89 - only for manual diffs: remove version suffixes from the file names
  90   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
  91   (see https://sites.google.com/site/unicodetools/inputdata)
  92 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
  93 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
  94 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
  95
  96 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
  97   and copy to $UNIDATA
  98     cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
  99
 100 * preparseucd.py changes
 101 - remove or add new Unicode scripts from/to the
 102   only-in-ISO-15924 list according to the error messages:
 103     ValueError: remove ['Tang'] from _scripts_only_in_iso15924
 104     ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
 105     ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
 106     ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
 107   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 108       and in com.ibm.icu.dev.test.lang.TestUScript.java
 109 - DerivedNumericValues.txt new numeric values
 110     0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
 111     0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
 112     0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
 113     0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
 114     0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
 115   -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
 116      uchar.c, UCharacterProperty.java
 117      to support a new series of values
 118 - adjust preparseucd.py for Tangut algorithmic names
 119   in ppucd.txt:
 120     algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
 121   ->
 122     algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
 123 - avoid block-compressing most String/Miscellaneous property values,
 124   triggered by genprops not coping with a multi-code point Case_Folding on
 125     block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
 126   keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
 127
 128 * PropertyAliases.txt changes
 129 - 1 new property PCM=Prepended_Concatenation_Mark
 130   Ignore: Only useful for layout engines.
 131   Ok to list in ppucd.txt.
 132
 133 * PropertyValueAliases.txt new property values
 134     blk; Adlam                            ; Adlam
 135     blk; Bhaiksuki                        ; Bhaiksuki
 136     blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
 137     blk; Glagolitic_Sup                   ; Glagolitic_Supplement
 138     blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
 139     blk; Marchen                          ; Marchen
 140     blk; Mongolian_Sup                    ; Mongolian_Supplement
 141     blk; Newa                             ; Newa
 142     blk; Osage                            ; Osage
 143     blk; Tangut                           ; Tangut
 144     blk; Tangut_Components                ; Tangut_Components
 145   -> add to uchar.h
 146     use long property names for enum constants
 147   -> add to UCharacter.UnicodeBlock IDs
 148     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 149             replace  public static final int \1_ID = \2; \3
 150   -> add to UCharacter.UnicodeBlock objects
 151     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 152             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 153
 154     GCB; EB                               ; E_Base
 155     GCB; EBG                              ; E_Base_GAZ
 156     GCB; EM                               ; E_Modifier
 157     GCB; GAZ                              ; Glue_After_Zwj
 158     GCB; ZWJ                              ; ZWJ
 159   -> uchar.h & UCharacter.GraphemeClusterBreak
 160
 161     jg ; African_Feh                      ; African_Feh
 162     jg ; African_Noon                     ; African_Noon
 163     jg ; African_Qaf                      ; African_Qaf
 164   -> uchar.h & UCharacter.JoiningGroup
 165
 166     lb ; EB                               ; E_Base
 167     lb ; EM                               ; E_Modifier
 168     lb ; ZWJ                              ; ZWJ
 169   -> uchar.h & UCharacter.LineBreak
 170
 171     sc ; Adlm                             ; Adlam
 172     sc ; Bhks                             ; Bhaiksuki
 173     sc ; Marc                             ; Marchen
 174     sc ; Newa                             ; Newa
 175     sc ; Osge                             ; Osage
 176     sc ; Tang                             ; Tangut
 177   -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
 178
 179     WB ; EB                               ; E_Base
 180     WB ; EBG                              ; E_Base_GAZ
 181     WB ; EM                               ; E_Modifier
 182     WB ; GAZ                              ; Glue_After_Zwj
 183     WB ; ZWJ                              ; ZWJ
 184   -> uchar.h & UCharacter.WordBreak
 185
 186 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 187     (not strictly necessary for NOT_ENCODED scripts)
 188   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
 189
 190 * generate normalization data files
 191   cd $ICU_ROOT/dbg
 192   bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
 193   bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 194   bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 195   bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 196   bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 197
 198 * build ICU (make install)
 199   so that the tools build can pick up the new definitions from the installed header files.
 200
 201   $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
 202
 203 * build Unicode tools using CMake+make
 204
 205 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
 206
 207   # Location (--prefix) of where ICU was installed.
 208   set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
 209   # Location of the ICU source tree.
 210   set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
 211
 212   ~/svn.icutools/trunk/dbg/unicode/c$
 213     cmake ../../../src/unicode/c
 214     make
 215
 216 * generate core properties data files
 217   ~/svn.icutools/trunk/dbg/unicode/c$
 218     genprops/genprops $ICU_SRC_DIR
 219     genuca/genuca --hanOrder implicit $ICU_SRC_DIR
 220     genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
 221 - rebuild ICU (make install) & tools
 222
 223 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 224   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 225 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 226 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
 227 - nothing new in 9.0, no test file to update
 228
 229 * run & fix ICU4C tests
 230 - Andy handles RBBI & spoof check test failures
 231
 232 * collation: CLDR collation root, UCA DUCET
 233
 234 - UCA DUCET goes into Mark's Unicode tools, see
 235   https://sites.google.com/site/unicodetools/home#TOC-UCA
 236 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
 237     cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
 238
 239 - cd (CLDR UCA branch)/common/uca/
 240 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 241     cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
 242 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 243     cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
 244     (note removing the underscore before "Rules")
 245     cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 246 - restore TODO diffs in UCARules.txt
 247     meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 248 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 249   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 250   from the CLDR root files (..._CLDR_..._SHORT.txt)
 251     cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
 252     cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
 253     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
 254 - if CLDR common/uca/unihan-index.txt changes, then update
 255   CLDR common/collation/root.xml <collation type="private-unihan">
 256   and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
 257
 258 - run genuca, see command line above;
 259   deal with
 260     Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
 261     FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
 262         (add the character to genuca.cpp sampleCharsToScripts[])
 263   + look up the USCRIPT_ code for the new sample characters
 264     (should be obvious from the comment in the error output)
 265   + *add* mappings to sampleCharsToScripts[], do not replace them
 266     (in case the script sample characters flip-flop)
 267   + insert new scripts in DUCET script order, see the top_byte table
 268     at the beginning of FractionalUCA.txt
 269 - rebuild ICU4C
 270
 271 * Unihan collators
 272 - run Unicode Tools
 273     org.unicode.draft.GenerateUnihanCollators
 274   with VM arguments
 275     -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
 276     -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
 277     -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
 278     -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
 279     -DUVERSION=9.0.0
 280     -ea
 281 - run Unicode Tools
 282     org.unicode.draft.GenerateUnihanCollatorFiles
 283   with the same arguments
 284 - check CLDR diffs
 285     cd ~/svn.cldr/trunk
 286     meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
 287     meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
 288 - copy to CLDR
 289     cd ~/svn.cldr/trunk
 290     cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
 291     cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
 292 - commit to CLDR
 293 - generate ICU zh collation data: run CLDR
 294     org.unicode.cldr.icu.NewLdml2IcuConverter
 295   with program arguments
 296     -t collation
 297     -s /home/mscherer/svn.cldr/trunk/common/collation
 298     -m /home/mscherer/svn.cldr/trunk/common/supplemental
 299     -d /home/mscherer/svn.icu/trunk/src/source/data/coll
 300     -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
 301     zh
 302   and VM arguments
 303     -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
 304 - rebuild ICU4C
 305
 306 * run & fix ICU4C tests, now with new CLDR collation root data
 307 - run all tests with the collation test data *_SHORT.txt or the full files
 308   (the full ones have comments, useful for debugging)
 309 - note on intltest: if collate/UCAConformanceTest fails, then
 310   utility/MultithreadTest/TestCollators will fail as well;
 311   fix the conformance test before looking into the multi-thread test
 312
 313 * update Java data files
 314 - refresh just the UCD/UCA-related/derived files, just to be safe
 315 - see (ICU4C)/source/data/icu4j-readme.txt
 316 - mkdir /tmp/icu4j
 317 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 318   output:
 319     ...
 320     Unicode .icu files built to ./out/build/icudt58l
 321     echo timestamp > uni-core-data
 322     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
 323     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
 324     echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
 325     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
 326     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
 327     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
 328     mkdir -p /tmp/icu4j/main/shared/data
 329     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 330     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
 331     mkdir -p /tmp/icu4j/main/shared/data
 332     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 333     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
 334 - copy the big-endian Unicode data files to another location,
 335   separate from the other data files,
 336   and then refresh ICU4J
 337     cd ~/svn.icu/trunk/dbg/data/out/icu4j
 338     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 339     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 340     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 341     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 342     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
 343     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 344     cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 345     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 346     jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 347
 348 * When refreshing all of ICU4J data from ICU4C
 349 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 350 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 351 or
 352 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 353
 354 * update CollationFCD.java
 355   + copy & paste the initializers of lcccIndex[] etc. from
 356     ICU4C/source/i18n/collationfcd.cpp to
 357     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
 358
 359 * refresh Java test .txt files
 360 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 361     cd $ICU_SRC_DIR/source/data/unidata
 362     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 363     cd ../../test/testdata
 364     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 365     cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 366
 367 * run & fix ICU4J tests
 368
 369 *** LayoutEngine script information
 370
 371 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
 372   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
 373   in the working directory.
 374
 375   (It also generates ScriptRunData.cpp, which is no longer needed.)
 376
 377   It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
 378   (a plain text file)
 379   which maps ICU versions to the numbers of script/language constants
 380   that were added then.
 381   (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
 382
 383   The generated files have a current copyright date and "@deprecated" statement.
 384
 385 * Review changes, fix Java tool if necessary, and copy to ICU4C
 386   cd ~/svn.icu4j/trunk/src
 387   meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
 388   cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
 389   cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
 390
 391 *** API additions
 392 - send notice to icu-design about new born-@stable API (enum constants etc.)
 393
 394 *** merge the Unicode update branches back onto the trunk
 395 - do not merge the icudata.jar and testdata.jar,
 396   instead rebuild them from merged & tested ICU4C
 397 - make sure that changes to Unicode tools & ICU tools are checked in
 398   http://www.unicode.org/utility/trac/log/trunk/unicodetools
 399   http://bugs.icu-project.org/trac/log/tools/trunk
 400
 401 ---------------------------------------------------------------------------- ***
 402
 403 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
 404
 405 Adding
 406 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
 407 - new combination/alias codes: Hanb, Jamo
 408   - used in CLDR 29 and in spoof checker
 409 - new Z* code: Zsye
 410
 411 Add new codes to uscript.h & UScript.java, see Unicode update logs.
 412   -> com.ibm.icu.lang.UScript
 413     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 414     replace  public static final int \1 = \2; \3
 415
 416 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
 417 add new script codes.
 418 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
 419
 420 Note: If we have to run preparseucd.py again before the Unicode 9 update,
 421 then we need to manually keep/restore the new script codes.
 422
 423 ICU_ROOT=~/svn.icu/trunk
 424 ICU_SRC_DIR=$ICU_ROOT/src
 425 ICUDT=icudt57b
 426 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 427 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 428 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 429
 430 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
 431 see http://bugs.icu-project.org/trac/ticket/12141
 432
 433 make install, then icutools cmake & make, then
 434 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
 435
 436 Generate Java data as usual, only update pnames.icu & uprops.icu.
 437
 438 *** LayoutEngine script information
 439
 440 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
 441   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
 442   in the working directory.
 443
 444   (It also generates ScriptRunData.cpp, which is no longer needed.)
 445
 446   It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
 447   (a plain text file)
 448   which maps ICU versions to the numbers of script/language constants
 449   that were added then.
 450   (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
 451
 452   The generated files have a current copyright date and "@deprecated" statement.
 453
 454 * Review changes, fix Java tool if necessary, and copy to ICU4C
 455   cd ~/svn.icu4j/trunk/src
 456   meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
 457   cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
 458   cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
 459
 460 ---------------------------------------------------------------------------- ***
 461
 462 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
 463
 464 Edit preparseucd.py to add & parse new properties.
 465 They share the UCD property namespace but are not listed in PropertyAliases.txt.
 466
 467 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
 468 Initial data from emoji/2.0/
 469
 470 ICU_ROOT=~/svn.icu/trunk
 471 ICU_SRC_DIR=$ICU_ROOT/src
 472 ICUDT=icudt56b
 473 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 474 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 475 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 476
 477 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
 478
 479 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
 480 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
 481
 482 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
 483
 484 make install, then icutools cmake & make, then
 485 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
 486
 487 Generate Java data as usual, only update pnames.icu & uprops.icu.
 488
 489 ---------------------------------------------------------------------------- ***
 490
 491 Unicode 8.0 update for ICU 56
 492
 493 * Command-line environment setup
 494
 495 ICU_ROOT=~/svn.icu/trunk
 496 ICU_SRC_DIR=$ICU_ROOT/src
 497 ICUDT=icudt56b
 498 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 499 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 500 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 501
 502 http://www.unicode.org/review/pri297/  -- beta review
 503 http://www.unicode.org/reports/uax-proposed-updates.html
 504 http://unicode.org/versions/beta-8.0.0.html
 505 http://www.unicode.org/versions/Unicode8.0.0/
 506 http://www.unicode.org/reports/tr44/tr44-15.html
 507
 508 *** ICU Trac
 509
 510 - ticket:11574: Unicode 8
 511 - C++ branches/markus/uni80 at r37351 from trunk at r37343
 512 - Java branches/markus/uni80 at r37352 from trunk at r37338
 513
 514 *** CLDR Trac
 515
 516 - cldrbug 8311: UCA 8
 517 - branches/markus/uni80 at r11518 from trunk at r11517
 518
 519 - cldrbug 8109: Unicode 8.0 script metadata
 520 - cldrbug 8418: Updated segmentation for Unicode 8.0
 521
 522 *** Unicode version numbers
 523 - makedata.mak
 524 - uchar.h
 525 - com.ibm.icu.util.VersionInfo
 526 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
 527
 528 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
 529   so that the makefiles see the new version number.
 530
 531 *** data files & enums & parser code
 532
 533 * file preparation
 534
 535 - download UCD & IDNA files
 536 - make sure that the Unicode data folder passed into preparseucd.py
 537   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
 538 - only for manual diffs: remove version suffixes from the file names
 539   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
 540   (see https://sites.google.com/site/unicodetools/inputdata)
 541 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
 542 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
 543 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 544
 545 - also: from http://unicode.org/Public/security/8.0.0/ download new
 546   confusables.txt & confusablesWholeScript.txt
 547   and copy to $UNIDATA
 548     ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
 549     ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
 550
 551 * initial preparseucd.py changes
 552 - remove new Unicode scripts from the
 553   only-in-ISO-15924 list according to the error message:
 554     ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
 555     from _scripts_only_in_iso15924
 556   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 557       and in com.ibm.icu.dev.test.lang.TestUScript.java
 558 - property and file name change:
 559     IndicMatraCategory -> IndicPositionalCategory
 560 - UnicodeData.txt unusual numeric values (improper fractions)
 561     109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
 562     109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
 563     109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
 564     109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
 565     109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
 566     109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
 567     109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
 568     109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
 569     109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
 570     109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
 571   -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
 572      which are listed in DerivedNumericValues.txt;
 573      keeps storage in data file simple
 574
 575 * PropertyValueAliases.txt changes
 576 - 10 new Block (blk) values:
 577     blk; Ahom                             ; Ahom
 578     blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
 579     blk; Cherokee_Sup                     ; Cherokee_Supplement
 580     blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
 581     blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
 582     blk; Hatran                           ; Hatran
 583     blk; Multani                          ; Multani
 584     blk; Old_Hungarian                    ; Old_Hungarian
 585     blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
 586     blk; Sutton_SignWriting               ; Sutton_SignWriting
 587   -> add to uchar.h
 588     use long property names for enum constants
 589   -> add to UCharacter.UnicodeBlock IDs
 590     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 591             replace  public static final int \1_ID = \2; \3
 592   -> add to UCharacter.UnicodeBlock objects
 593     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 594             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 595 - 6 new Script (sc) values:
 596     sc ; Ahom                             ; Ahom
 597     sc ; Hatr                             ; Hatran
 598     sc ; Hluw                             ; Anatolian_Hieroglyphs
 599     sc ; Hung                             ; Old_Hungarian
 600     sc ; Mult                             ; Multani
 601     sc ; Sgnw                             ; SignWriting
 602   -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
 603
 604 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 605     (not strictly necessary for NOT_ENCODED scripts)
 606   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
 607
 608 * generate normalization data files
 609   cd $ICU_ROOT/dbg
 610   bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
 611   bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 612   bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 613   bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 614   bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 615
 616 * build ICU (make install)
 617   so that the tools build can pick up the new definitions from the installed header files.
 618
 619   $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
 620
 621 * build Unicode tools using CMake+make
 622
 623 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
 624
 625   # Location (--prefix) of where ICU was installed.
 626   set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
 627   # Location of the ICU source tree.
 628   set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
 629
 630   ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
 631   ~/svn.icutools/trunk/dbg/unicode/c$ make
 632
 633 * generate core properties data files
 634 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
 635 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
 636 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
 637 - rebuild ICU (make install) & tools
 638 - run genuca again (see step above) so that it picks up the new nfc.nrm
 639 - rebuild ICU (make install) & tools
 640
 641 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 642   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 643 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 644 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
 645 - nothing new in 8.0, no test file to update
 646
 647 * run & fix ICU4C tests
 648 - bad Cherokee case folding due to difference in fallbacks:
 649   UCD case folding falls back to no mapping,
 650   ICU runtime case folding falls back to lowercasing;
 651   fixed casepropsbuilder.cpp to generate scf mappings to self
 652   when there is an slc mapping but no scf
 653 - Andy handles RBBI & spoof check test failures
 654
 655 * collation: CLDR collation root, UCA DUCET
 656
 657 - UCA DUCET goes into Mark's Unicode tools, see
 658   https://sites.google.com/site/unicodetools/home#TOC-UCA
 659 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
 660 - cd (CLDR UCA branch)/common/uca/
 661 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 662   cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
 663 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 664     cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
 665     (note removing the underscore before "Rules")
 666     cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 667 - restore TODO diffs in UCARules.txt
 668     meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 669 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 670   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 671   from the CLDR root files (..._CLDR_..._SHORT.txt)
 672     cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
 673     cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
 674     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
 675 - if CLDR common/uca/unihan-index.txt changes, then update
 676   CLDR common/collation/root.xml <collation type="private-unihan">
 677   and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
 678 - run genuca, see command line above;
 679   deal with
 680     Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
 681         (add the character to genuca.cpp sampleCharsToScripts[])
 682   + look up the script for the new sample characters
 683     (e.g., in FractionalUCA.txt)
 684   + *add* mappings to sampleCharsToScripts[], do not replace them
 685     (in case the script sample characters flip-flop)
 686   + insert new scripts in DUCET script order, see the top_byte table
 687     at the beginning of FractionalUCA.txt
 688 - rebuild ICU4C
 689
 690 * run & fix ICU4C tests, now with new CLDR collation root data
 691 - run all tests with the collation test data *_SHORT.txt or the full files
 692   (the full ones have comments, useful for debugging)
 693 - note on intltest: if collate/UCAConformanceTest fails, then
 694   utility/MultithreadTest/TestCollators will fail as well;
 695   fix the conformance test before looking into the multi-thread test
 696 - fixed bug in CollationWeights::getWeightRanges()
 697   exposed by new data and CollationTest::TestRootElements
 698
 699 * update Java data files
 700 - refresh just the UCD/UCA-related/derived files, just to be safe
 701 - see (ICU4C)/source/data/icu4j-readme.txt
 702 - mkdir /tmp/icu4j
 703 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 704   output:
 705     ...
 706     Unicode .icu files built to ./out/build/icudt56l
 707     echo timestamp > uni-core-data
 708     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
 709     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
 710     echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
 711     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
 712     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
 713     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
 714     mkdir -p /tmp/icu4j/main/shared/data
 715     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 716     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
 717     mkdir -p /tmp/icu4j/main/shared/data
 718     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 719     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
 720 - copy the big-endian Unicode data files to another location,
 721   separate from the other data files,
 722   and then refresh ICU4J
 723     cd ~/svn.icu/trunk/dbg/data/out/icu4j
 724     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 725     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 726     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 727     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 728     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
 729     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 730     cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 731     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 732     jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 733
 734 * When refreshing all of ICU4J data from ICU4C
 735 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 736 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 737 or
 738 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 739
 740 * update CollationFCD.java
 741   + copy & paste the initializers of lcccIndex[] etc. from
 742     ICU4C/source/i18n/collationfcd.cpp to
 743     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
 744
 745 * refresh Java test .txt files
 746 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 747     cd $ICU_SRC_DIR/source/data/unidata
 748     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 749     cd ../../test/testdata
 750     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 751     cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 752
 753 * run & fix ICU4J tests
 754
 755 *** LayoutEngine script information
 756
 757 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
 758   because the layout engine was deprecated in ICU 54.
 759   Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
 760   to write lines that we used to add manually.
 761
 762 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
 763   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
 764   in the working directory.
 765
 766   (It also generates ScriptRunData.cpp, which is no longer needed.)
 767
 768   It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
 769   (a plain text file)
 770   which maps ICU versions to the numbers of script/language constants
 771   that were added then.
 772   (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
 773
 774   The generated files have a current copyright date and "@deprecated" statement.
 775
 776 * Review changes, fix Java tool if necessary, and copy to ICU4C
 777   cd ~/svn.icu4j/trunk/src
 778   meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
 779   cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
 780   cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
 781
 782 *** API additions
 783 - send notice to icu-design about new born-@stable API (enum constants etc.)
 784
 785 *** merge the Unicode update branches back onto the trunk
 786 - do not merge the icudata.jar and testdata.jar,
 787   instead rebuild them from merged & tested ICU4C
 788 - make sure that changes to Unicode tools & ICU tools are checked in
 789   http://www.unicode.org/utility/trac/log/trunk/unicodetools
 790   http://bugs.icu-project.org/trac/log/tools/trunk
 791
 792 ---------------------------------------------------------------------------- ***
 793
 794 Unicode 7.0 update for ICU 54
 795
 796 http://www.unicode.org/review/pri271/  -- beta review
 797 http://www.unicode.org/reports/uax-proposed-updates.html
 798 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
 799 http://www.unicode.org/reports/tr44/tr44-13.html
 800
 801 *** ICU Trac
 802
 803 - ticket 10821: Unicode 7.0, UCA 7.0
 804 - C++ branches/markus/uni70 at r35584 from trunk at r35580
 805 - Java branches/markus/uni70 at r35587 from trunk at r35545
 806
 807 *** CLDR Trac
 808
 809 - ticket 7195: UCA 7.0 CLDR root collation
 810 - branches/markus/uni70 at r10062 from trunk at r10061
 811
 812 - ticket 6762: script metadata for Unicode 7.0 new scripts
 813
 814 *** Unicode version numbers
 815 - makedata.mak
 816 - uchar.h
 817 - com.ibm.icu.util.VersionInfo
 818 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
 819
 820 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
 821   so that the makefiles see the new version number.
 822
 823 *** data files & enums & parser code
 824
 825 * file preparation
 826
 827 - download UCD & IDNA files
 828 - make sure that the Unicode data folder passed into preparseucd.py
 829   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
 830 - only for manual diffs: remove version suffixes from the file names
 831   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
 832   (see https://sites.google.com/site/unicodetools/inputdata)
 833 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
 834 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
 835 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 836 - Restore TODO diffs in source/data/unidata/UCARules.txt
 837     cd $ICU_SRC_DIR
 838     meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
 839 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
 840
 841 - also: from http://unicode.org/Public/security/7.0.0/ download new
 842   confusables.txt & confusablesWholeScript.txt
 843   and copy to $ICU_ROOT/src/source/data/unidata/
 844
 845 * initial preparseucd.py changes
 846 - remove new Unicode scripts from the
 847   only-in-ISO-15924 list according to the error message:
 848     ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
 849                         'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
 850                         'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
 851     from _scripts_only_in_iso15924
 852   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 853       and in com.ibm.icu.dev.test.lang.TestUScript.java
 854 - NamesList.txt now has a heading with a non-ASCII character
 855   + keep ppucd.txt in platform charset, rather than changing tool/test parsers
 856   + escape non-ASCII characters in heading comments
 857 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
 858   + get the copyright from the first file whose copyright line contains the current year
 859
 860 * PropertyValueAliases.txt changes
 861 - 32 new Block (blk) values:
 862     blk; Bassa_Vah                        ; Bassa_Vah
 863     blk; Caucasian_Albanian               ; Caucasian_Albanian
 864     blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
 865     blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
 866     blk; Duployan                         ; Duployan
 867     blk; Elbasan                          ; Elbasan
 868     blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
 869     blk; Grantha                          ; Grantha
 870     blk; Khojki                           ; Khojki
 871     blk; Khudawadi                        ; Khudawadi
 872     blk; Latin_Ext_E                      ; Latin_Extended_E
 873     blk; Linear_A                         ; Linear_A
 874     blk; Mahajani                         ; Mahajani
 875     blk; Manichaean                       ; Manichaean
 876     blk; Mende_Kikakui                    ; Mende_Kikakui
 877     blk; Modi                             ; Modi
 878     blk; Mro                              ; Mro
 879     blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
 880     blk; Nabataean                        ; Nabataean
 881     blk; Old_North_Arabian                ; Old_North_Arabian
 882     blk; Old_Permic                       ; Old_Permic
 883     blk; Ornamental_Dingbats              ; Ornamental_Dingbats
 884     blk; Pahawh_Hmong                     ; Pahawh_Hmong
 885     blk; Palmyrene                        ; Palmyrene
 886     blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
 887     blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
 888     blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
 889     blk; Siddham                          ; Siddham
 890     blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
 891     blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
 892     blk; Tirhuta                          ; Tirhuta
 893     blk; Warang_Citi                      ; Warang_Citi
 894   -> add to uchar.h
 895     use long property names for enum constants
 896   -> add to UCharacter.UnicodeBlock IDs
 897     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 898             replace  public static final int \1_ID = \2; \3
 899   -> add to UCharacter.UnicodeBlock objects
 900     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 901             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 902 - 28 new Joining_Group (jg) values:
 903     jg ; Manichaean_Aleph                 ; Manichaean_Aleph
 904     jg ; Manichaean_Ayin                  ; Manichaean_Ayin
 905     jg ; Manichaean_Beth                  ; Manichaean_Beth
 906     jg ; Manichaean_Daleth                ; Manichaean_Daleth
 907     jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
 908     jg ; Manichaean_Five                  ; Manichaean_Five
 909     jg ; Manichaean_Gimel                 ; Manichaean_Gimel
 910     jg ; Manichaean_Heth                  ; Manichaean_Heth
 911     jg ; Manichaean_Hundred               ; Manichaean_Hundred
 912     jg ; Manichaean_Kaph                  ; Manichaean_Kaph
 913     jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
 914     jg ; Manichaean_Mem                   ; Manichaean_Mem
 915     jg ; Manichaean_Nun                   ; Manichaean_Nun
 916     jg ; Manichaean_One                   ; Manichaean_One
 917     jg ; Manichaean_Pe                    ; Manichaean_Pe
 918     jg ; Manichaean_Qoph                  ; Manichaean_Qoph
 919     jg ; Manichaean_Resh                  ; Manichaean_Resh
 920     jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
 921     jg ; Manichaean_Samekh                ; Manichaean_Samekh
 922     jg ; Manichaean_Taw                   ; Manichaean_Taw
 923     jg ; Manichaean_Ten                   ; Manichaean_Ten
 924     jg ; Manichaean_Teth                  ; Manichaean_Teth
 925     jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
 926     jg ; Manichaean_Twenty                ; Manichaean_Twenty
 927     jg ; Manichaean_Waw                   ; Manichaean_Waw
 928     jg ; Manichaean_Yodh                  ; Manichaean_Yodh
 929     jg ; Manichaean_Zayin                 ; Manichaean_Zayin
 930     jg ; Straight_Waw                     ; Straight_Waw
 931   -> uchar.h & UCharacter.JoiningGroup
 932 - 23 new Script (sc) values:
 933     sc ; Aghb                             ; Caucasian_Albanian
 934     sc ; Bass                             ; Bassa_Vah
 935     sc ; Dupl                             ; Duployan
 936     sc ; Elba                             ; Elbasan
 937     sc ; Gran                             ; Grantha
 938     sc ; Hmng                             ; Pahawh_Hmong
 939     sc ; Khoj                             ; Khojki
 940     sc ; Lina                             ; Linear_A
 941     sc ; Mahj                             ; Mahajani
 942     sc ; Mani                             ; Manichaean
 943     sc ; Mend                             ; Mende_Kikakui
 944     sc ; Modi                             ; Modi
 945     sc ; Mroo                             ; Mro
 946     sc ; Narb                             ; Old_North_Arabian
 947     sc ; Nbat                             ; Nabataean
 948     sc ; Palm                             ; Palmyrene
 949     sc ; Pauc                             ; Pau_Cin_Hau
 950     sc ; Perm                             ; Old_Permic
 951     sc ; Phlp                             ; Psalter_Pahlavi
 952     sc ; Sidd                             ; Siddham
 953     sc ; Sind                             ; Khudawadi
 954     sc ; Tirh                             ; Tirhuta
 955     sc ; Wara                             ; Warang_Citi
 956   -> uscript.h (many were added before)
 957     comment "Mende Kikakui" for USCRIPT_MENDE
 958     add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
 959   -> com.ibm.icu.lang.UScript
 960     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 961     replace  public static final int \1 = \2; \3
 962 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
 963   (added 2012-11-01)
 964     Ahom        338     Ahom
 965     Hatr        127     Hatran
 966     Mult        323     Multani
 967   (added 2013-10-12)
 968     Modi        324     Modi
 969     Pauc        263     Pau Cin Hau
 970     Sidd        302     Siddham
 971   -> uscript.h (some overlap with additions from Unicode)
 972   -> com.ibm.icu.lang.UScript
 973     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 974     replace  public static final int \1 = \2; \3
 975   -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
 976   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
 977       and in com.ibm.icu.dev.test.lang.TestUScript.java
 978
 979 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 980     (not strictly necessary for NOT_ENCODED scripts)
 981   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
 982
 983 * generate normalization data files
 984 - cd $ICU_ROOT/dbg
 985 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 986 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 987 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
 988 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
 989 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 990 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 991 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 992 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 993
 994 * build ICU (make install)
 995   so that the tools build can pick up the new definitions from the installed header files.
 996
 997 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
 998
 999 * build Unicode tools using CMake+make
1000
1001 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1002
1003 # Location (--prefix) of where ICU was installed.
1004 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
1005 # Location of the ICU source tree.
1006 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
1007
1008 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1009 ~/svn.icutools/trunk/dbg/unicode/c$ make
1010
1011 * genprops work
1012 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
1013   + add second array of Joining_Group values for at most 10800..10FFF
1014     icutools: unicode/c/genprops/bidipropsbuilder.cpp
1015     icu: source/common/ubidi_props.h/.c/_data.h
1016     icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
1017
1018 * generate core properties data files
1019 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1020 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
1021 - rebuild ICU (make install) & tools
1022 - run genuca again (see step above) so that it picks up the new nfc.nrm
1023 - rebuild ICU (make install) & tools
1024
1025 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1026   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1027 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1028 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
1029 - nothing new in 7.0, no test file to update
1030
1031 * run & fix ICU4C tests
1032
1033 * update Java data files
1034 - refresh just the UCD-related files, just to be safe
1035 - see (ICU4C)/source/data/icu4j-readme.txt
1036 - mkdir /tmp/icu4j
1037 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1038   output:
1039     ...
1040     Unicode .icu files built to ./out/build/icudt53l
1041     echo timestamp > uni-core-data
1042     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1043     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
1044     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1045     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1046     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
1047     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
1048     mkdir -p /tmp/icu4j/main/shared/data
1049     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1050     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
1051     mkdir -p /tmp/icu4j/main/shared/data
1052     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1053     make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
1054 - copy the big-endian Unicode data files to another location,
1055   separate from the other data files
1056     ICUDT=icudt54b
1057     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1058     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1059     cd ~/svn.icu/uni70/dbg/data/out/icu4j
1060     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1061     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1062     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1063     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1064     cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1065     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1066 - refresh ICU4J
1067     ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1068
1069 * update CollationFCD.java
1070   + copy & paste the initializers of lcccIndex[] etc. from
1071     ICU4C/source/i18n/collationfcd.cpp to
1072     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1073
1074 * refresh Java test .txt files
1075 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1076     cd $ICU_SRC_DIR/source/data/unidata
1077     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1078     cd ../../test/testdata
1079     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1080     cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1081
1082 * UCA
1083
1084 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
1085 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
1086 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
1087 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
1088 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
1089 - review data; compare files, use blankweights.sed or similar
1090   ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
1091 - cd ~/svn.unitools/Generated/uca/7.0.0/
1092 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1093   cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1094 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1095     (note removing the underscore before "Rules")
1096     cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1097 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1098   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1099   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1100     cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1101     cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1102     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1103 - run genuca, see command line above
1104 - rebuild ICU4C
1105 - refresh ICU4J collation data:
1106   (subset of instructions above for properties data refresh, except copies all coll/*)
1107     ICUDT=icudt54b
1108     ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1109     ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1110     ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1111     ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1112 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1113 - note on intltest: if collate/UCAConformanceTest fails, then
1114   utility/MultithreadTest/TestCollators will fail as well;
1115   fix the conformance test before looking into the multi-thread test
1116 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
1117 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
1118   ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
1119
1120 * When refreshing all of ICU4J data from ICU4C
1121 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1122 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1123 or
1124 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1125
1126 * run & fix ICU4J tests
1127
1128 *** LayoutEngine script information
1129
1130 (For details see the Unicode 5.2 change log below.)
1131
1132 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1133   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1134   in the working directory.
1135   (It also generates ScriptRunData.cpp, which is no longer needed.)
1136
1137   The generated files have a current copyright date and "@stable" statement.
1138   ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
1139   for "born stable" Unicode API constants, and to stop parsing ICU version numbers
1140   which may not contain dots any more.
1141
1142 - diff current <icu>/source/layout files vs. generated ones
1143     ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1144   review and manually merge desired changes;
1145   fix gratuitous changes, incorrect @draft/@stable and missing aliases;
1146   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1147 - if you just copy the above files, then
1148   fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1149   manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1150
1151 *** API additions
1152 - send notice to icu-design about new born-@stable API (enum constants etc.)
1153
1154 *** merge the Unicode update branches back onto the trunk
1155 - do not merge the icudata.jar and testdata.jar,
1156   instead rebuild them from merged & tested ICU4C
1157
1158 ---------------------------------------------------------------------------- ***
1159
1160 Unicode 6.3 update
1161
1162 http://www.unicode.org/review/pri249/  -- beta review
1163 http://www.unicode.org/reports/uax-proposed-updates.html
1164 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
1165 http://www.unicode.org/reports/tr44/tr44-11.html
1166
1167 *** ICU Trac
1168
1169 - ticket 10128: update ICU to Unicode 6.3 beta
1170 - ticket 10168: update ICU to Unicode 6.3 final
1171 - C++ branches/markus/uni63 at r33552 from trunk at r33551
1172 - Java branches/markus/uni63 at r33550 from trunk at r33553
1173
1174 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
1175
1176 *** Unicode version numbers
1177 - makedata.mak
1178 - uchar.h
1179   (configure.in & configure: have been modified to extract the version from uchar.h)
1180 - com.ibm.icu.util.VersionInfo
1181 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1182
1183 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1184   so that the makefiles see the new version number.
1185
1186 *** data files & enums & parser code
1187
1188 * file preparation
1189
1190 - download UCD, UCA & IDNA files
1191 - make sure that the Unicode data folder passed into preparseucd.py
1192   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1193 - modify preparseucd.py:
1194   parse new file BidiBrackets.txt
1195   with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
1196 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
1197 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1198 - Check test file diffs for previously commented-out, known-failing data lines;
1199   probably need to keep those commented out.
1200
1201 * PropertyAliases.txt changes
1202 - 1 new Enumerated Property
1203   bpt                      ; Bidi_Paired_Bracket_Type
1204   -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
1205   -> ubidi_props.h & .c & UBiDiProps.java
1206   -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
1207   -> uprops.cpp
1208   -> change ubidi.icu format version from 2.0 to 2.1
1209 - 1 new Miscellaneous Property
1210   bpb                      ; Bidi_Paired_Bracket
1211   -> uchar.h & UProperty.java
1212   -> ppucd.h & .cpp
1213
1214 * PropertyValueAliases.txt changes
1215 - 3 Bidi_Paired_Bracket_Type (bpt) values:
1216   bpt; c                                ; Close
1217   bpt; n                                ; None
1218   bpt; o                                ; Open
1219   -> uchar.h & UCharacter.BidiPairedBracketType
1220   -> ubidi_props.h & .c & UBiDiProps.java
1221   -> change ubidi.icu format version from 2.0 to 2.1
1222 - 4 new Bidi_Class (bc) values:
1223   bc ; FSI                              ; First_Strong_Isolate
1224   bc ; LRI                              ; Left_To_Right_Isolate
1225   bc ; RLI                              ; Right_To_Left_Isolate
1226   bc ; PDI                              ; Pop_Directional_Isolate
1227   -> uchar.h & UCharacterEnums.ECharacterDirection
1228   -> until the bidi code gets updated,
1229      Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
1230 - 3 new Word_Break (WB) values:
1231   WB ; HL                               ; Hebrew_Letter
1232   WB ; SQ                               ; Single_Quote
1233   WB ; DQ                               ; Double_Quote
1234   -> uchar.h & UCharacter.WordBreak
1235   -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
1236 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1237   (added 2012-10-16)
1238   Aghb  239     Caucasian Albanian
1239   Mahj  314     Mahajani
1240   -> uscript.h
1241   -> com.ibm.icu.lang.UScript
1242     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1243     replace  public static final int \1 = \2;\3
1244   -> preparseucd.py _scripts_only_in_iso15924
1245   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1246       and in com.ibm.icu.dev.test.lang.TestUScript.java
1247   -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1248      (not strictly necessary for NOT_ENCODED scripts)
1249
1250 * generate normalization data files
1251 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
1252 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
1253 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
1254 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1255 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1256 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1257 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1258
1259 * build ICU (make install)
1260   so that the tools build can pick up the new definitions from the installed header files.
1261
1262 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1263
1264 * build Unicode tools using CMake+make
1265
1266 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1267
1268 # Location (--prefix) of where ICU was installed.
1269 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
1270 # Location of the ICU source tree.
1271 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
1272
1273 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1274 ~/svn.icutools/trunk/dbg/unicode/c$ make
1275
1276 * generate core properties data files
1277 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
1278 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
1279 - rebuild ICU (make install) & tools
1280 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1281 - rebuild ICU (make install) & tools
1282
1283 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1284   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1285 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1286 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
1287 - nothing new in 6.3, no test file to update
1288
1289 * update Java data files
1290 - refresh just the UCD-related files, just to be safe
1291 - see (ICU4C)/source/data/icu4j-readme.txt
1292 - mkdir /tmp/icu4j
1293 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1294   output:
1295     ...
1296     Unicode .icu files built to ./out/build/icudt52l
1297     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1298     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
1299     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1300     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1301     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
1302     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
1303     mkdir -p /tmp/icu4j/main/shared/data
1304     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1305     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
1306     mkdir -p /tmp/icu4j/main/shared/data
1307     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1308     make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
1309 - copy the big-endian Unicode data files to another location,
1310   separate from the other data files
1311     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1312     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1313     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1314     ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
1315     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1316     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1317     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1318 - refresh ICU4J
1319     ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1320
1321 * refresh Java test .txt files
1322 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1323
1324 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
1325
1326 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1327 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1328 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1329 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1330   (note removing the underscore before "Rules")
1331 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1332   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1333   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1334 - check test file diffs for previously commented-out, known-failing data lines;
1335   probably need to keep those commented out
1336 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1337 - run genuca, see command line above
1338 - rebuild ICU4C
1339 - refresh ICU4J collation data:
1340   (subset of instructions above for properties data refresh, except copies all coll/*)
1341     ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1342     ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1343     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1344     ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1345 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1346 - note on intltest: if collate/UCAConformanceTest fails, then
1347   utility/MultithreadTest/TestCollators will fail as well;
1348   fix the conformance test before looking into the multi-thread test
1349
1350 * test ICU, fix test code where necessary
1351
1352 * When refreshing all of ICU4J data from ICU4C
1353 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1354 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1355 or
1356 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1357
1358 *** LayoutEngine script information
1359 - skipped for Unicode 6.3: no new scripts
1360
1361 *** merge the Unicode update branches back onto the trunk
1362 - do not merge the icudata.jar and testdata.jar,
1363   instead rebuild them from merged & tested ICU4C
1364
1365 ---------------------------------------------------------------------------- ***
1366
1367 Unicode 6.2 update
1368
1369 http://www.unicode.org/review/pri230/
1370 http://www.unicode.org/versions/beta-6.2.0.html
1371 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
1372 http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
1373 http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
1374 http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
1375 http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
1376 http://unicode.org/Public/idna/6.2.0/
1377
1378 *** ICU Trac
1379
1380 - ticket 9515: Unicode 6.2: final ICU update
1381
1382 - ticket 9514: UCA 6.2: fix UCARules.txt
1383
1384 - ticket 9437: update ICU to Unicode 6.2
1385 - C++ branches/markus/uni62 at r32050 from trunk at r32041
1386 - Java branches/markus/uni62 at r32068 from trunk at r32066
1387
1388 *** Unicode version numbers
1389 - makedata.mak
1390 - uchar.h
1391   (configure.in & configure: have been modified to extract the version from uchar.h)
1392 - com.ibm.icu.util.VersionInfo
1393 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1394
1395 *** data files & enums & parser code
1396
1397 * file preparation
1398
1399 - download UCD, UCA & IDNA files
1400 - make sure that the Unicode data folder passed into preparseucd.py
1401   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1402 - modify preparseucd.py: NamesList.txt is now in UTF-8
1403 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
1404 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1405 - Check test file diffs for previously commented-out, known-failing data lines;
1406   probably need to keep those commented out.
1407
1408 * PropertyValueAliases.txt changes
1409 - 1 new Line_Break (lb) value:
1410   lb ; RI                               ; Regional_Indicator
1411   -> uchar.h & UCharacter.LineBreak
1412 - 1 new Word_Break (WB) value:
1413   WB ; RI                               ; Regional_Indicator
1414   -> uchar.h & UCharacter.WordBreak
1415 - 1 new Grapheme_Cluster_Break (GCB) value:
1416   GCB; RI                               ; Regional_Indicator
1417   -> uchar.h & UCharacter.GraphemeClusterBreak
1418
1419 * 3 new numeric values
1420   The new value -1, which was really supposed to be NaN but that would have required
1421   new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
1422   but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
1423     cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
1424     cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
1425   The two new values 216000 and 432000 require an addition to the encoding of numeric values.
1426     cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
1427     cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
1428   -> uprops.h, uchar.c & UCharacterProperty.java
1429   -> cucdtst.c & UCharacterTest.java
1430
1431 * generate normalization data files
1432 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
1433 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
1434 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
1435 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1436 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1437 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1438 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1439
1440 * build ICU (make install)
1441   so that the tools build can pick up the new definitions from the installed header files.
1442 * build Unicode tools using CMake+make
1443
1444 * generate core properties data files
1445 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
1446 - in initial bootstrapping, change the UCA version
1447   in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1448 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
1449 - rebuild ICU (make install) & tools
1450   + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1451     check if the UCA version in FractionalUCA.txt matches the new Unicode version
1452     (see step above)
1453 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1454 - rebuild ICU (make install) & tools
1455
1456 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1457   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1458 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1459 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
1460 - nothing new in 6.2, no test file to update
1461
1462 * update Java data files
1463 - refresh just the UCD-related files, just to be safe
1464 - see (ICU4C)/source/data/icu4j-readme.txt
1465 - mkdir /tmp/icu4j
1466 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1467   output:
1468     ...
1469     Unicode .icu files built to ./out/build/icudt50l
1470     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1471     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
1472     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1473     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1474     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
1475     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
1476     mkdir -p /tmp/icu4j/main/shared/data
1477     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1478     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
1479     mkdir -p /tmp/icu4j/main/shared/data
1480     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1481     make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
1482 - copy the big-endian Unicode data files to another location,
1483   separate from the other data files
1484     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1485     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1486     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1487     ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
1488     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1489     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1490     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1491 - refresh ICU4J
1492     ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1493
1494 * refresh Java test .txt files
1495 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1496
1497 * UCA
1498
1499 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1500 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1501 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1502 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1503   (note removing the underscore before "Rules")
1504 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1505   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1506   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1507 - check test file diffs for previously commented-out, known-failing data lines;
1508   probably need to keep those commented out
1509 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1510 - run genuca, see command line above
1511 - rebuild ICU4C
1512 - refresh ICU4J collation data:
1513   (subset of instructions above for properties data refresh, except copies all coll/*)
1514     ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1515     ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1516     ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1517     ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1518 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1519 - note on intltest: if collate/UCAConformanceTest fails, then
1520   utility/MultithreadTest/TestCollators will fail as well;
1521   fix the conformance test before looking into the multi-thread test
1522
1523 * test ICU, fix test code where necessary
1524
1525 * When refreshing all of ICU4J data from ICU4C
1526 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1527 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1528 or
1529 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1530
1531 *** LayoutEngine script information
1532 - skipped for Unicode 6.2: no new scripts
1533
1534 *** merge the Unicode update branches back onto the trunk
1535 - do not merge the icudata.jar and testdata.jar,
1536   instead rebuild them from merged & tested ICU4C
1537
1538 ---------------------------------------------------------------------------- ***
1539
1540 Future Unicode update
1541
1542 Tools simplified since the Unicode 6.1 update. See
1543 - http://site.icu-project.org/design/props/ppucd
1544 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
1545
1546 * Unicode version numbers
1547 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
1548
1549 * file preparation
1550 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
1551 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
1552 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1553 - Check test file diffs for previously commented-out, known-failing data lines;
1554   probably need to keep those commented out.
1555
1556 * PropertyValueAliases.txt changes
1557 - Script codes that are in ISO 15924 but not in Unicode are now listed in
1558   preparseucd.py, in the _scripts_only_in_iso15924 variable.
1559   If there are new ISO codes, then add them.
1560   If Unicode adds some of them, then remove them from the .py variable.
1561
1562 * UnicodeData.txt changes
1563 - No more manual changes for CJK ranges for algorithmic names;
1564   those are now written to ppucd.txt and genprops reads them from there.
1565
1566 * generate core properties data files (makeprops.sh was deleted)
1567 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
1568
1569 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
1570 - it is now generated by preparseucd.py
1571
1572 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
1573 - it is now generated by preparseucd.py
1574 - make sure that the Unicode data folder passed into preparseucd.py
1575   includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1576   (can be in some subfolder)
1577
1578 * generate normalization data files
1579 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
1580 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
1581 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
1582 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1583 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1584 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1585 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1586
1587 * build ICU (make install)
1588 * build Unicode tools using CMake+make
1589
1590 * new way to call genuca (makeuca.sh was deleted)
1591 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
1592
1593 ---------------------------------------------------------------------------- ***
1594
1595 Unicode 6.1 update
1596
1597 *** ICU Trac
1598
1599 - ticket 8995 final update to Unicode 6.1
1600 - ticket 8994 regenerate source/layout/CanonData.cpp
1601
1602 - ticket 8961 support Unicode "Age" value *names*
1603 - ticket 8963 support multiple character name aliases & types
1604
1605 - ticket 8827 "update ICU to Unicode 6.1"
1606 - C++ branches/markus/uni61 at r30864 from trunk at r30843
1607 - Java branches/markus/uni61 at r30865 from trunk at r30863
1608
1609 *** Unicode version numbers
1610 - makedata.mak
1611 - uchar.h
1612   (configure.in & configure: have been modified to extract the version from uchar.h)
1613 - com.ibm.icu.util.VersionInfo
1614 - icutools/unicode/makedefs.sh
1615   + also review & update other definitions in that file,
1616     e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
1617
1618 *** data files & enums & parser code
1619
1620 * file preparation
1621
1622 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
1623 - This prepares both unidata and testdata files in respective output subfolders.
1624 - Check test file diffs for previously commented-out, known-failing data lines;
1625   probably need to keep those commented out.
1626
1627 * PropertyValueAliases.txt changes
1628 - 11 new block names:
1629   Arabic_Extended_A
1630   Arabic_Mathematical_Alphabetic_Symbols
1631   Chakma
1632   Meetei_Mayek_Extensions
1633   Meroitic_Cursive
1634   Meroitic_Hieroglyphs
1635   Miao
1636   Sharada
1637   Sora_Sompeng
1638   Sundanese_Supplement
1639   Takri
1640   -> add to uchar.h
1641   -> add to UCharacter.UnicodeBlock IDs
1642     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1643             replace  public static final int \1_ID = \2; \3
1644   -> add to UCharacter.UnicodeBlock objects
1645     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1646             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1647 - 1 new Joining_Group (jg) value:
1648   Rohingya_Yeh
1649   -> uchar.h & UCharacter.JoiningGroup
1650 - 2 new Line_Break (lb) values:
1651   CJ=Conditional_Japanese_Starter
1652   HL=Hebrew_Letter
1653   -> uchar.h & UCharacter.LineBreak
1654 - 7 new scripts:
1655   sc ; Cakm      ; Chakma
1656   sc ; Merc      ; Meroitic_Cursive
1657   sc ; Mero      ; Meroitic_Hieroglyphs
1658   sc ; Plrd      ; Miao
1659   sc ; Shrd      ; Sharada
1660   sc ; Sora      ; Sora_Sompeng
1661   sc ; Takr      ; Takri
1662   -> remove these from SyntheticPropertyValueAliases.txt
1663   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1664       and in com.ibm.icu.dev.test.lang.TestUScript.java
1665 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1666   (added 2011-06-21)
1667   Khoj        322     Khojki
1668   Tirh        326     Tirhuta
1669     and another one added 2011-12-09
1670   Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
1671   -> uscript.h
1672   -> com.ibm.icu.lang.UScript
1673     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1674     replace  public static final int \1 = \2;\3
1675   -> SyntheticPropertyValueAliases.txt
1676   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1677       and in com.ibm.icu.dev.test.lang.TestUScript.java
1678
1679 * UnicodeData.txt changes
1680 - the last Unihan code point changes from U+9FCB to U+9FCC
1681   search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
1682   + do change gennames.c
1683   + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1684
1685 * DerivedBidiClass.txt changes
1686 - 2 new default-AL blocks:
1687 #     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
1688 #     Arabic Mathematical Alphabetic Symbols:
1689 #                       U+1EE00  - U+1EEFF  (was default-R)
1690 - 2 new default-R blocks:
1691 #     Meroitic Hieroglyphs:
1692 #                        U+10980 - U+1099F
1693 #     Meroitic Cursive:  U+109A0 - U+109FF
1694   -> should be picked up by the explicit data in the file
1695
1696 * NameAliases.txt changes
1697 - from
1698     # Each line has two fields
1699     # First field: Code point
1700     # Second field: Alias
1701 - to
1702     # Each line has three fields, as described here:
1703     #
1704     # First field:  Code point
1705     # Second field: Alias
1706     # Third field:  Type
1707 - Also, the file previously allowed multiple aliases but only now does it
1708   actually provide multiple, even multiple of the same type. For example,
1709     FEFF;BYTE ORDER MARK;alternate
1710     FEFF;BOM;abbreviation
1711     FEFF;ZWNBSP;abbreviation
1712 - This breaks our gennames parser, unames.icu data structure, and API.
1713   Fix gennames to only pick up "correction" aliases.
1714   New ticket #8963 for further changes.
1715
1716 * run genpname/preparse.pl (on Linux)
1717   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1718   + make sure that data.h is writable
1719   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1720   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1721
1722 * build ICU (make install)
1723   so that the tools build can pick up the new definitions from the installed header files.
1724 * build Unicode tools (at least genpname) using CMake+make
1725
1726 * run genpname
1727   (builds both pnames.icu and propname_data.h)
1728 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1729 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1730
1731 * build ICU (make install)
1732 * build Unicode tools using CMake+make
1733
1734 * update source/data/unidata/norm2/nfkc_cf.txt
1735 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1736
1737 * update source/data/unidata/norm2/uts46.txt
1738 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1739   to ~/svn.icu/tools/trunk/src/unicode/py
1740 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1741 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1742 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1743
1744 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1745   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1746 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1747 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
1748 - nothing new in 6.1, no test file to update
1749
1750 * generate core properties data files
1751 - in initial bootstrapping, change the UCA version
1752   in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1753 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1754 - rebuild ICU & tools
1755   + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1756     check if the UCA version in FractionalUCA.txt matches the new Unicode version
1757     (see step above)
1758 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1759   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1760 - rebuild ICU & tools
1761
1762 * update Java data files
1763 - refresh just the UCD-related files, just to be safe
1764 - see (ICU4C)/source/data/icu4j-readme.txt
1765 - mkdir /tmp/icu4j
1766 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1767   output:
1768     ...
1769     Unicode .icu files built to ./out/build/icudt49l
1770     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1771     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1772     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1773     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1774     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1775     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1776     mkdir -p /tmp/icu4j/main/shared/data
1777     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1778     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1779     mkdir -p /tmp/icu4j/main/shared/data
1780     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1781     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1782 - copy the big-endian Unicode data files to another location,
1783   separate from the other data files
1784     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1785     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1786     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1787     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1788     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1789     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1790     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1791 - refresh ICU4J
1792     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1793
1794 * refresh Java test .txt files
1795 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1796
1797 * test ICU so far, fix test code where necessary
1798 - temporarily ignore collation issues that look like UCA/UCD mismatches,
1799   until UCA data is updated
1800
1801 * UCA
1802
1803 - get output from Mark's tools; look in
1804     http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1805 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1806 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1807   (note removing the underscore before "Rules")
1808 - update (ICU)/source/test/testdata/CollationTest_*.txt
1809   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1810   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1811 - check test file diffs for previously commented-out, known-failing data lines;
1812   probably need to keep those commented out
1813 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1814 - run makeuca.sh:
1815   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1816 - rebuild ICU4C
1817 - refresh ICU4J collation data:
1818   (subset of instructions above for properties data refresh, except copies all coll/*)
1819     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1820     ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1821     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1822     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1823 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1824 - note on intltest: if collate/UCAConformanceTest fails, then
1825   utility/MultithreadTest/TestCollators will fail as well;
1826   fix the conformance test before looking into the multi-thread test
1827
1828 * When refreshing all of ICU4J data from ICU4C
1829 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1830 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1831 or
1832 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1833
1834 *** LayoutEngine script information
1835
1836 (For details see the Unicode 5.2 change log below.)
1837
1838 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1839   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1840   in the working directory.
1841   (It also generates ScriptRunData.cpp, which is no longer needed.)
1842
1843   The generated files have a current copyright date and "@draft" statement.
1844
1845 - diff current <icu>/source/layout files vs. generated ones
1846     ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1847   review and manually merge desired changes;
1848   fix gratuitous changes, incorrect @draft and missing aliases;
1849   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1850 - if you just copy the above files, then
1851   fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1852   manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1853
1854 *** merge the Unicode update branches back onto the trunk
1855 - do not merge the icudata.jar and testdata.jar,
1856   instead rebuild them from merged & tested ICU4C
1857
1858 ---------------------------------------------------------------------------- ***
1859
1860 ICU 4.8 (no Unicode update, just new script codes)
1861
1862 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1863   (added 2010-12-21)
1864     Afak    439     Afaka
1865     Jurc    510     Jurchen
1866     Mroo    199     Mro, Mru
1867     Nshu    499     Nüshu
1868     Shrd    319     Sharada, Śāradā
1869     Sora    398     Sora Sompeng
1870     Takr    321     Takri, Ṭākrī, Ṭāṅkrī
1871     Tang    520     Tangut
1872     Wole    480     Woleai
1873   -> uscript.h
1874   -> com.ibm.icu.lang.UScript
1875     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1876     replace  public static final int \1 = \2;\3
1877   -> genpname/SyntheticPropertyValueAliases.txt
1878   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1879       and in com.ibm.icu.dev.test.lang.TestUScript.java
1880
1881 * run genpname/preparse.pl (on Linux)
1882   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1883   + make sure that data.h is writable
1884   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1885   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1886
1887 * rebuild Unicode tools (at least genpname) using make
1888 - You might first need to "make install" ICU so that the tools build can pick
1889   up the new definitions from the installed header files.
1890
1891 * run genpname
1892   (builds both pnames.icu and propname_data.h)
1893 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1894 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1895 - rebuild ICU & tools
1896
1897 * run genprops
1898 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1899 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1900 - rebuild ICU & tools
1901
1902 * update Java data files
1903 - refresh just the UCD-related files, just to be safe
1904 - see (ICU4C)/source/data/icu4j-readme.txt
1905 - mkdir /tmp/icu4j
1906 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1907 - copy the big-endian Unicode data files to another location,
1908   separate from the other data files
1909     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1910     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1911     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1912 - refresh ICU4J
1913     ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1914
1915 * should have updated the layout engine script codes but forgot
1916
1917 ---------------------------------------------------------------------------- ***
1918
1919 Unicode 6.0 update
1920
1921 *** related ICU Trac tickets
1922
1923 7264 Unicode 6.0 Update
1924
1925 *** Unicode version numbers
1926 - makedata.mak
1927 - uchar.h
1928   (configure.in & configure: have been modified to extract the version from uchar.h)
1929 - com.ibm.icu.util.VersionInfo
1930
1931 *** data files & enums & parser code
1932
1933 * file preparation
1934
1935 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1936 - This now prepares both unidata and testdata files in respective output subfolders.
1937
1938 * PropertyAliases.txt changes
1939 - new Script_Extensions property defined in the new ScriptExtensions.txt file
1940   but not listed in PropertyAliases.txt; reported to unicode.org;
1941   -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1942     scx; Script_Extensions
1943   -> uchar.h with new UProperty section
1944   -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1945
1946 * PropertyValueAliases.txt changes
1947 - 12 new block names:
1948   Alchemical_Symbols
1949   Bamum_Supplement
1950   Batak
1951   Brahmi
1952   CJK_Unified_Ideographs_Extension_D
1953   Emoticons
1954   Ethiopic_Extended_A
1955   Kana_Supplement
1956   Mandaic
1957   Miscellaneous_Symbols_And_Pictographs
1958   Playing_Cards
1959   Transport_And_Map_Symbols
1960   -> add to uchar.h
1961   -> add to UCharacter.UnicodeBlock
1962     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1963             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1964 - Joining_Group (jg) values:
1965   Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1966   -> uchar.h & UCharacter.JoiningGroup
1967 - 3 new scripts:
1968   sc ; Batk      ; Batak
1969   sc ; Brah      ; Brahmi
1970   sc ; Mand      ; Mandaic
1971   -> remove these from SyntheticPropertyValueAliases.txt
1972   -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1973   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1974       and in com.ibm.icu.dev.test.lang.TestUScript.java
1975 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1976   (added 2009-11-11..2010-07-18)
1977   Bass        259     Bassa Vah
1978   Dupl        755     Duployan shortand
1979   Elba        226     Elbasan
1980   Gran        343     Grantha
1981   Kpel        436     Kpelle
1982   Loma        437     Loma
1983   Mend        438     Mende
1984   Merc        101     Meroitic Cursive
1985   Narb        106     Old North Arabian
1986   Nbat        159     Nabataean
1987   Palm        126     Palmyrene
1988   Sind        318     Sindhi
1989   Wara        262     Warang Citi
1990   -> uscript.h
1991   -> com.ibm.icu.lang.UScript
1992     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1993     replace  public static final int \1 = \2;\3
1994   -> SyntheticPropertyValueAliases.txt
1995   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1996       and in com.ibm.icu.dev.test.lang.TestUScript.java
1997 - ISO 15924 name change
1998   Mero        100     Meroitic Hieroglyphs (was Meroitic)
1999   -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
2000 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
2001
2002 * UnicodeData.txt changes
2003 - new CJK block:
2004   2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2005   2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
2006   -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
2007
2008 * build Unicode tools using CMake+make
2009
2010 * run genpname/preparse.pl (on Linux)
2011   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2012   + make sure that data.h is writable
2013   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2014   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2015
2016 * rebuild Unicode tools (at least genpname) using make
2017 - You might first need to "make install" ICU so that the tools build can pick
2018   up the new definitions from the installed header files.
2019
2020 * run genpname
2021 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2022 - rebuild ICU & tools
2023
2024 * update source/data/unidata/norm2/nfkc_cf.txt
2025 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2026
2027 * update source/data/unidata/norm2/uts46.txt
2028 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
2029   to ~/svn.icu/tools/trunk/src/unicode/py
2030 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
2031 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2032 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2033
2034 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2035   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2036 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2037 - Unicode 6.0: U+2260, U+226E, U+226F
2038
2039 * generate core properties data files
2040 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2041 - rebuild ICU & tools
2042 - run makeuca.sh so that genuca picks up the new nfc.nrm:
2043   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2044 - rebuild ICU & tools
2045
2046 * implement new Script_Extensions property (provisional)
2047 - parser & generator: genprops & uprops.icu
2048 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
2049 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
2050
2051 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
2052 - (one-time change)
2053 - genbidi/gencase/genprops tools changes
2054 - re-run makeprops.sh (see above)
2055 - UCharacterProperty.java, UCharacterTypeIterator.java,
2056   UBiDiProps.java, UCaseProps.java, and several others with minor changes;
2057   UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
2058
2059 * update Java data files
2060 - refresh just the UCD-related files, just to be safe
2061 - see (ICU4C)/source/data/icu4j-readme.txt
2062 - mkdir /tmp/icu4j
2063 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2064   output:
2065     ...
2066     Unicode .icu files built to ./out/build/icudt45l
2067     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2068     echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2069     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2070     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
2071     mkdir -p /tmp/icu4j/main/shared/data
2072     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2073 - copy the big-endian Unicode data files to another location,
2074   separate from the other data files
2075     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2076     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2077     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2078     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
2079     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2080     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2081     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2082 - refresh ICU4J
2083     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2084
2085 * refresh Java test .txt files
2086 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2087
2088 * un-hardcode normalization skippable (NF*_Inert) test data
2089 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
2090
2091 * copy updated break iterator test files
2092 - now handled by early ucdcopy.py and
2093   copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
2094   (old instructions:
2095    copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
2096    to ~/svn.icu/trunk/src/source/test/testdata)
2097 - they are not used in ICU4J
2098
2099 * UCA
2100
2101 - get output from Mark's tools; look in
2102     http://www.unicode.org/~book/incoming/mark/uca6.0.0/
2103     http://www.macchiato.com/unicode/utc/additional-uca-files
2104     http://www.unicode.org/Public/UCA/6.0.0/
2105     http://www.unicode.org/~mdavis/uca/
2106 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2107 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2108 - update Han-implicit ranges for new CJK extensions:
2109   swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2110 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
2111   do not add it into invuca so that tailoring primary-after an ignorable works
2112 - genuca: permit space between [variable top] bytes
2113 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
2114 - run makeuca.sh:
2115   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2116 - rebuild ICU4C
2117 - refresh ICU4J collation data:
2118   (subset of instructions above for properties data refresh, except copies all coll/*)
2119     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2120     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2121     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2122     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2123 - update (ICU)/source/test/testdata/CollationTest_*.txt
2124   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2125   with output from Mark's Unicode tools
2126 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2127 - note on intltest: if collate/UCAConformanceTest fails, then
2128   utility/MultithreadTest/TestCollators will fail as well;
2129   fix the conformance test before looking into the multi-thread test
2130
2131 * When refreshing all of ICU4J data from ICU4C
2132 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2133 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2134 or
2135 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2136
2137 *** LayoutEngine script information
2138
2139 (For details see the Unicode 5.2 change log below.)
2140
2141 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2142 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2143 ScriptRunData.cpp, which is no longer needed.)
2144
2145 The generated files have a current copyright date and "@draft" statement.
2146
2147 * copy the above files into <icu>/source/layout, replacing the old files.
2148 * fix mixed line endings
2149 * review the diffs and fix incorrect @draft and missing aliases;
2150   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2151 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2152
2153 ---------------------------------------------------------------------------- ***
2154
2155 Unicode 5.2 update
2156
2157 *** related ICU Trac tickets
2158
2159 7084 Unicode 5.2
2160
2161 7167 verify collation bytes
2162 7235 Java test NAME_ALIAS
2163 7236 Java DerivedCoreProperties.txt test
2164 7237 Java BidiTest.txt
2165 7238 UTrie2 in core unidata
2166 7239 test for tailoring gaps
2167 7240 Java fix CollationMiscTest
2168 7243 update layout engine for Unicode 5.2
2169
2170 *** Unicode version numbers
2171 - makedata.mak
2172 - uchar.h
2173 - configure.in & configure
2174 - update ucdVersion in gennames.c if an algorithmic range changes
2175
2176 *** data files & enums & parser code
2177
2178 * file preparation
2179
2180 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
2181 - includes finding files regardless of version numbers,
2182   copying them, and performing the equivalent processing of the
2183   ucdstrip and ucdmerge tools on the desired set of files
2184
2185 * notes on changes
2186 - PropertyAliases.txt
2187   moved from numeric to enumerated:
2188     ccc       ; Canonical_Combining_Class
2189   new string properties:
2190     NFKC_CF   ; NFKC_Casefold
2191     Name_Alias; Name_Alias
2192   new binary properties:
2193     Cased     ; Cased
2194     CI        ; Case_Ignorable
2195     CWCF      ; Changes_When_Casefolded
2196     CWCM      ; Changes_When_Casemapped
2197     CWKCF     ; Changes_When_NFKC_Casefolded
2198     CWL       ; Changes_When_Lowercased
2199     CWT       ; Changes_When_Titlecased
2200     CWU       ; Changes_When_Uppercased
2201   new CJK Unihan properties (not supported by ICU)
2202 - PropertyValueAliases.txt
2203   new block names
2204   new scripts
2205   one script code change:
2206     sc ; Qaai      ; Inherited
2207     ->
2208     sc ; Zinh      ; Inherited                        ; Qaai
2209   new Line_Break (lb) value:
2210     lb ; CP        ; Close_Parenthesis
2211   new Joining_Group (jg) values: Farsi_Yeh, Nya
2212   other new values:
2213     ccc; 214; ATA  ; Attached_Above
2214 - DerivedBidiClass.txt
2215   new default-R range: U+1E800 - U+1EFFF
2216 - UnicodeData.txt
2217   all of the ISO comments are gone
2218   new CJK block end:
2219     9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
2220   new CJK block:
2221     2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2222     2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2223
2224 * genpname
2225 - run preparse.pl
2226   + cd \svn\icuproj\icu\trunk\source\tools\genpname
2227   + make sure that data.h is writable
2228   + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
2229   + preparse.pl complains with errors like the following:
2230       Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
2231     This is because ICU 4.0 had scripts from ISO 15924 which are now
2232     added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
2233     and PropertyValueAliases.txt.
2234     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2235        Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
2236   + preparse.pl complains with errors about block names missing from uchar.h; add them
2237
2238 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2239 - new block & script values
2240   + 26 new blocks
2241     copy new blocks from Blocks.txt
2242     MS VC++ 2008 regular expression:
2243       find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
2244       replace with "    UBLOCK_\3 = 172, /*[\1]*/"
2245   + several new script values already added in ICU 4.0 for ISO 15924 coverage
2246     (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
2247   + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
2248   + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
2249     (added to SyntheticPropertyValueAliases.txt)
2250 - new Joining Group (JG) values: Farsi_Yeh, Nya
2251 - new Line_Break (lb) value:
2252     lb ; CP        ; Close_Parenthesis
2253
2254 * hardcoded Unihan range end/limit
2255 - Unihan range end moves from 9FC3 to 9FCB
2256   search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
2257   + do change gennames.c
2258
2259 * Compare definitions of new binary properties with what we used to use
2260   in algorithms, to see if the definitions changed.
2261 - Verified that definitions for Cased and Case_Ignorable are unchanged.
2262   The gencase tool now parses the newly public Case_Ignorable values
2263   in case the definition changes in the future.
2264
2265 * uchar.c & uprops.h & uprops.c & genprops
2266 - new numeric values that didn't exist in Unicode data before:
2267     1/7, 1/9, 1/10, 3/10, 1/16, 3/16
2268   the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
2269   therefore redesign the encoding of numeric types and values for formatVersion 6;
2270   design for simple numbers up to at least 144 ("one gross"),
2271   large values up to at least 10^20,
2272   and fractions with numerators -1..17 and denominators 1..16
2273   to cover current and expected future values
2274   (e.g., more Han numeric values, Meroitic twelfths)
2275
2276 * reimplement Hangul_Syllable_Type for new Jamo characters
2277 - the old code assumed that all Jamo characters are in the 11xx block
2278 - Unicode 5.2 fills holes there and adds new Jamo characters in
2279     A960..A97F; Hangul Jamo Extended-A
2280   and in
2281     D7B0..D7FF; Hangul Jamo Extended-B
2282 - Hangul_Syllable_Type can be trivially derived from a subset of
2283   Grapheme_Cluster_Break values
2284
2285 * build Unicode data source code for hardcoding core data
2286 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
2287
2288 ICU data make path is \svn\icuproj\icu\trunk\source\data\
2289 ICU root path is \svn\icuproj\icu\trunk
2290 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2291 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2292 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2293 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2294 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2295 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2296 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2297 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
2298 Creating data file for Unicode Property Names
2299 Creating data file for Unicode Character Properties
2300 Creating data file for Unicode Case Mapping Properties
2301 Creating data file for Unicode BiDi/Shaping Properties
2302 Creating data file for Unicode Normalization
2303 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
2304 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
2305
2306 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
2307   and rebuild the common library
2308
2309 *** UCA
2310
2311 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
2312 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
2313 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
2314 [ Begin obsolete instructions:
2315   Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
2316     - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
2317       on Windows:
2318         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
2319         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
2320   End obsolete instructions]
2321 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2322   not just the *_STUB.txt files
2323 - note on intltest: if collate/UCAConformanceTest fails, then
2324   utility/MultithreadTest/TestCollators will fail as well;
2325   fix the conformance test before looking into the multi-thread test
2326
2327 *** Implement Cased & Case_Ignorable properties
2328 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
2329 - Problem: These properties should be disjoint, but aren't
2330 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
2331 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
2332
2333 *** Implement Changes_When_Xyz properties
2334 - without stored data
2335
2336 *** Implement Name_Alias property
2337 - add it as another name field in unames.icu
2338 - make it available via u_charName() and UCharNameChoice and
2339 - consider it in u_charFromName()
2340
2341 *** Break iterators
2342
2343 * Update break iterator rules to new UAX versions and new property values
2344 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
2345
2346 *** new BidiTest file
2347 - review format and data
2348 - copy BidiTest.txt to source/test/testdata
2349 - write test code using this data
2350 - fix ICU code where it fails the conformance test
2351
2352 *** Java
2353 - generally, find and update code corresponding to C/C++
2354 - UCharacter.UnicodeBlock constants:
2355   a) add an _ID integer per new block, update COUNT
2356   b) add a class instance per new block
2357      Visual Studio regex:
2358         find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
2359         replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2360 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
2361
2362 - port test changes to Java
2363
2364 *** LayoutEngine script information
2365
2366 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
2367
2368 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2369 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2370 ScriptRunData.cpp, which is no longer needed.)
2371
2372 The generated files have a current copyright date and "@draft" statement.
2373
2374 -> Eric Mader wrote in email on 20090930:
2375     "I think the tool has been modified to update @draft to @stable for
2376      older scripts and to add @draft for new scripts.
2377      (I worked with an intern on this last year.)
2378      You should check the output after you run it."
2379
2380 * copy the above files into <icu>/source/layout, replacing the old files.
2381 * fix mixed line endings
2382 * review the diffs and fix incorrect @draft and missing aliases
2383 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2384
2385 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2386 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2387
2388 -> Eric Mader wrote in email on 20090930:
2389     "This is just a matter of making sure that all the per-script tables have
2390      entries for any new scripts that were added.
2391      If any new Indic characters were added, then the class tables in
2392      IndicClassTables.cpp should be updated to reflect this.
2393      John Emmons should know how to do this if it's required."
2394
2395 * rebuild the layout and layoutex libraries.
2396
2397 *** Documentation
2398 - Update User Guide
2399   + Jamo_Short_Name, sfc->scf, binary property value aliases
2400
2401 ---------------------------------------------------------------------------- ***
2402
2403 Unicode 5.1 update
2404
2405 *** related ICU Trac tickets
2406
2407 5696 Update to Unicode 5.1
2408
2409 *** Unicode version numbers
2410 - makedata.mak
2411 - uchar.h
2412 - configure.in & configure
2413 - update ucdVersion in gennames.c if an algorithmic range changes
2414
2415 *** data files & enums & parser code
2416
2417 * file preparation
2418 - ucdstrip:
2419     DerivedCoreProperties.txt
2420     DerivedNormalizationProps.txt
2421     NormalizationTest.txt
2422     PropList.txt
2423     Scripts.txt
2424     GraphemeBreakProperty.txt
2425     SentenceBreakProperty.txt
2426     WordBreakProperty.txt
2427 - ucdstrip and ucdmerge:
2428     EastAsianWidth.txt
2429     LineBreak.txt
2430
2431 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2432 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
2433 copy 5.1.0\ucd\Blocks.txt ..\unidata\
2434 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
2435 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
2436 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2437 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2438 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2439 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2440 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
2441 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
2442 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
2443 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
2444 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
2445
2446 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2447 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2448 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2449 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
2450 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2451 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2452 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2453 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2454 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2455 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2456
2457 * genpname
2458 - run preparse.pl
2459   + cd \svn\icuproj\icu\uni51\source\tools\genpname
2460   + make sure that data.h is writable
2461   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
2462   + preparse.pl complains with errors like the following:
2463       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
2464     This is because ICU 3.8 had scripts from ISO 15924 which are now
2465     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
2466     and PropertyValueAliases.txt.
2467     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2468        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
2469   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
2470       N/Y, No/Yes, F/T, False/True
2471     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
2472        It will use further values from the file if present.
2473
2474 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2475 - new block & script values
2476   + 17 new blocks
2477   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
2478     (removed from SyntheticPropertyValueAliases.txt)
2479   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
2480     (added to SyntheticPropertyValueAliases.txt)
2481 - uprops.icu (uprops.h) only provides 7 bits for script codes.
2482   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
2483   There is none above 127 yet which is the script code for an
2484   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
2485   script code values greater than 127.
2486   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
2487   in a parallel bit field, and that overflows now.
2488   Also, future values >=128 would be incompatible anyway.
2489   uprops.h is modified to move around several of the bit fields
2490   in the properties vector words, and now uses 8 bits for the script code.
2491   Two other bit fields also grow to accommodate future growth:
2492   Block (current count: 172) grows from 8 to 9 bits,
2493   and Word_Break grows from 4 to 5 bits.
2494 - renamed property Simple_Case_Folding (sfc->scf)
2495   + nothing to be done: handled as normal alias
2496 - new property JSN Jamo_Short_Name
2497   + no new API: only contributes to the Name property
2498 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
2499 - new Joining Group (JG) value: Burushashki_Yeh_Barree
2500 - new Sentence_Break (SB) values:
2501     SB ; CR        ; CR
2502     SB ; EX        ; Extend
2503     SB ; LF        ; LF
2504     SB ; SC        ; SContinue
2505 - new Word_Break (WB) values:
2506     WB ; CR        ; CR
2507     WB ; Extend    ; Extend
2508     WB ; LF        ; LF
2509     WB ; MB        ; MidNumLet
2510
2511 * Further changes in the 2008-02-29 update:
2512 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
2513   because they should not normally be invisible.
2514 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
2515 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
2516 - new Word_Break (WB) value: NL=Newline
2517
2518 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
2519 - Unihan range end moves from 9FBB to 9FC3
2520   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
2521   + do change gennames.c
2522
2523 * build Unicode data source code for hardcoding core data
2524 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
2525
2526 ICU data make path is \svn\icuproj\icu\uni51\source\data\
2527 ICU root path is \svn\icuproj\icu\uni51
2528 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2529 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2530 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2531 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2532 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2533 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2534 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2535 Creating data file for Unicode Character Properties
2536 Creating data file for Unicode Case Mapping Properties
2537 Creating data file for Unicode BiDi/Shaping Properties
2538 Creating data file for Unicode Normalization
2539 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
2540 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
2541
2542 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
2543   and rebuild the common library
2544
2545 *** Break iterators
2546
2547 * Update break iterator rules to new UAX versions and new property values
2548
2549 *** UCA
2550
2551 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2552
2553 *** Test suites
2554 - Test that APIs using Unicode property value aliases (like UnicodeSet)
2555   support all of the boolean values N/Y, No/Yes, F/T, False/True
2556   -> TestBinaryValues() tests in both cintltst and intltest
2557
2558 *** LayoutEngine script information
2559 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2560 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2561 ScriptRunData.cpp, which is no longer needed.)
2562
2563 The generated files have a current copyright date and "@draft" statement.
2564
2565 * copy the above files into <icu>/source/layout, replacing the old files.
2566
2567 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2568 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2569
2570 * rebuild the layout and layoutex libraries.
2571
2572 *** Documentation
2573 - Update User Guide
2574   + Jamo_Short_Name, sfc->scf, binary property value aliases
2575
2576 ---------------------------------------------------------------------------- ***
2577
2578 Unicode 5.0 update
2579
2580 *** related Jitterbugs
2581
2582 5084 RFE: Update to Unicode 5.0
2583
2584 *** data files & enums & parser code
2585
2586 * file preparation
2587 - ucdstrip:
2588     DerivedCoreProperties.txt
2589     DerivedNormalizationProps.txt
2590     NormalizationTest.txt
2591     PropList.txt
2592     Scripts.txt
2593     GraphemeBreakProperty.txt
2594     SentenceBreakProperty.txt
2595     WordBreakProperty.txt
2596 - ucdstrip and ucdmerge:
2597     EastAsianWidth.txt
2598     LineBreak.txt
2599
2600 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2601 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
2602 copy 5.0.0\ucd\Blocks.txt ..\unidata\
2603 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
2604 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
2605 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2606 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2607 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2608 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2609 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
2610 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
2611 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
2612 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
2613 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
2614
2615 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2616 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2617 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2618 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
2619 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2620 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2621 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2622 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2623 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2624 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2625
2626 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2627
2628 * genpname
2629 - run preparse.pl
2630   + make sure that data.h is writable
2631   + perl preparse.pl \cvs\oss\icu > out.txt
2632
2633 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2634 - new block & script values
2635   + script values already added in ICU 3.6 because all of ISO 15924 is now covered
2636
2637 * build Unicode data source code for hardcoding core data
2638 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
2639
2640 ICU data make path is \cvs\oss\icu\source\data\
2641 ICU root path is \cvs\oss\icu
2642 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2643 [etc.]
2644 Creating data file for Unicode Character Properties
2645 Creating data file for Unicode Case Mapping Properties
2646 Creating data file for Unicode BiDi/Shaping Properties
2647 Creating data file for Unicode Normalization
2648 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
2649 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
2650
2651 - copy the .c source files to C:\cvs\oss\icu\source\common
2652   and rebuild the common library
2653
2654 *** Unicode version numbers
2655 - makedata.mak
2656 - uchar.h
2657 - configure.in
2658
2659 *** LayoutEngine script information
2660 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2661 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2662 ScriptRunData.cpp, which is no longer needed.)
2663
2664 The generated files have a current copyright date and "@draft" statement.
2665
2666 * copy the above files into <icu>/source/layout, replacing the old files.
2667
2668 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2669 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2670
2671 * rebuild the layout and layoutex libraries.
2672
2673 ---------------------------------------------------------------------------- ***
2674
2675 Unicode 4.1 update
2676
2677 *** related Jitterbugs
2678
2679 4332 RFE: Update to Unicode 4.1
2680 4157 RBBI, TR29 4.1 updates
2681
2682 *** data files & enums & parser code
2683
2684 * file preparation
2685 - ucdstrip:
2686     DerivedCoreProperties.txt
2687     DerivedNormalizationProps.txt
2688     NormalizationTest.txt
2689     GraphemeBreakProperty.txt
2690     SentenceBreakProperty.txt
2691     WordBreakProperty.txt
2692 - ucdstrip and ucdmerge:
2693     EastAsianWidth.txt
2694     LineBreak.txt
2695
2696 * add new files to the repository
2697     GraphemeBreakProperty.txt
2698     SentenceBreakProperty.txt
2699     WordBreakProperty.txt
2700
2701 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2702
2703 * genpname
2704 - handle new enumerated properties in sub read_uchar
2705 - run preparse.pl
2706
2707 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2708 - new binary properties
2709   + Pattern_Syntax
2710   + Pattern_White_Space
2711 - new enumerated properties
2712   + Grapheme_Cluster_Break
2713   + Sentence_Break
2714   + Word_Break
2715 - new block & script & line break values
2716
2717 * gencase
2718 - case-ignorable changes
2719   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2720   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2721
2722 *** Unicode version numbers
2723 - makedata.mak
2724 - uchar.h
2725 - configure.in
2726
2727 *** tests
2728 - verify that u_charMirror() round-trips
2729 - test all new properties and some new values of old properties
2730
2731 *** other code
2732
2733 * hardcoded Unihan range end/limit
2734 - Unihan range end moves from 9FA5 to 9FBB
2735   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2736   + do not modify BOCU/BOCSU code because that would change the encoding
2737     and break binary compatibility!
2738   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2739     NamePrepProfile.txt
2740   + ignore trietest.c: test data is arbitrary
2741   + ignore tstnorm.cpp: test optimization, not important
2742   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2743   + do change line_th.txt and word_th.txt
2744     by replacing hardcoded ranges with the new property values
2745   + do change gennames.c
2746
2747 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2748 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2749 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
2750
2751 * case mappings
2752 - compare new special casing context conditions with previous ones
2753   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2754
2755 * genpname
2756 - consider storing only the short name if it is the same as the long name
2757
2758 *** other reviews
2759 - UAX #29 changes (grapheme/word/sentence breaks)
2760 - UAX #14 changes (line breaks)
2761 - Pattern_Syntax & Pattern_White_Space
2762
2763 ---------------------------------------------------------------------------- ***
2764
2765 Unicode 4.0.1 update
2766
2767 *** related Jitterbugs
2768
2769 3170 RFE: Update to Unicode 4.0.1
2770 3171 Add new Unicode 4.0.1 properties
2771 3520 use Unicode 4.0.1 updates for break iteration
2772
2773 *** data files & enums & parser code
2774
2775 * file preparation
2776 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2777 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2778
2779 * file fixes
2780 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2781   according to PRI #26
2782   http://www.unicode.org/review/resolved-pri.html#pri26
2783 - undone again because no corrigendum in sight;
2784   instead modified tests to not check consistency on this for Unicode 4.0.1
2785
2786 * ucdterms.txt
2787 - update from http://www.unicode.org/copyright.html
2788   formatted for plain text
2789
2790 * uchar.h & uprops.h & uprops.c & genprops
2791 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2792 - add U_LB_INSEPARABLE due to a spelling fix
2793   + put short name comment only on line with new constant
2794     for genpname perl script parser
2795 - new binary properties
2796   + STerm
2797   + Variation_Selector
2798
2799 * genpname
2800 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
2801 - perl script: correctly calculate the maximum number of fields per row
2802
2803 * uscript.h
2804 - new script code Hrkt=Katakana_Or_Hiragana
2805
2806 * gennorm.c track changes in DerivedNormalizationProps.txt
2807 - "FNC" -> "FC_NFKC"
2808 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2809
2810 * genprops/props2.c track changes in DerivedNumericValues.txt
2811 - changed from 3 columns to 2, dropping the numeric type
2812   + assume that the type is always numeric for Han characters,
2813     and that only those are added in addition to what UnicodeData.txt lists
2814
2815 *** Unicode version numbers
2816 - makedata.mak
2817 - uchar.h
2818 - configure.in
2819
2820 *** tests
2821 - update test of default bidi classes according to PRI #28
2822   /tsutil/cucdtst/TestUnicodeData
2823   http://www.unicode.org/review/resolved-pri.html#pri28
2824 - bidi tests: change exemplar character for ES depending on Unicode version
2825 - change hardcoded expected property values where they change
2826
2827 *** other code
2828
2829 * name matching
2830 - read UCD.html
2831
2832 * scripts
2833 - use new Hrkt=Katakana_Or_Hiragana
2834
2835 * ZWJ & ZWNJ
2836 - are now part of combining character sequences
2837 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ