2 URL: http://site.icu-project.org/
8 This directory contains the source code of ICU 4.6 for C/C++
10 1. It was obtained with the following:
12 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46
14 2. Platform header files for Linux, FreeBSD, OpenBSD, Android and Mac OS X:
16 - Apply platform.patch in patches directory. : It applies the upstream
17 patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
18 and change source/common/unicode/ptypes.h to refer to plinux.h and
19 pmac.h generated below.
21 - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
22 'runConfigureICU MacOSX' are run to generate
23 source/common/unicode/platform.h.
25 - On OpenBSD, source/common/unicode/platform.h is being generated
26 by the icu4c port in the ports directory and not by runConfigureICU.
27 In case the file has to be updated you can do:
28 cd /home/ports/textproc/icu4c && make configure
30 - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'
32 - Apply patches/pmach.h.patch on Mac to pmac.h
34 - On Android, the pandroid.h was generated by copying plinux.h to
35 pandroid.h and applying the patches/pandroid.h.patch.
37 - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h
39 3. The following directories were removed because they're not used by Chromium
49 4. The word breaking for Chinese and Japanese were modified to use a word
50 frequency list with the following patch and cjdict.txt.
52 - patches/segmentation.patch :
53 Adds a dictionary (word-frequency)-based word breaking for CJK
54 (Korean is supported in the code, but it does not do anything
55 because we don't have a Korean word-list.)
57 - source/data/brkitr/cjdict.txt :
58 Chinese and Japanese word frequency list.
59 See the file for license/copyright notice
61 - source/data/brkitr/cc_edict.txt :
62 the list of words derived from CC-Edict.)
64 - patches/brkitr.patch
65 * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
66 handling of U+0022, and splitting of FQDN into labels at '.'.
67 For Hebrew, see http://unicode.org/cldr/track/ticket/3120
68 * line.txt : Incorporated line_he and minor changes in CL, OP and ID
70 For Hebrew, see http://unicode.org/cldr/track/ticket/4004
71 For others, see http://unicode.org/cldr/track/ticket/3974
72 http://unicode.org/cldr/track/ticket/4200
73 http://unicode.org/cldr/track/ticket/
74 * brklocal.mk : build file changes to drop unnecessary brkitr rule
75 files (e.g. word_ja.txt, line_he.txt)
77 - android/brkitr.patch (to be applied for Android build only) :
78 Reverts some changes about Chinese/Japanese segmentation rules in
79 patches/brkitr.patch to reduce binary size for Android.
81 If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
82 to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
84 5. Converter changes : converters.patch
85 - Include what we really need. See source/data/mappings/ucmlocal.txt
86 - Alias and mapping changes : source/data/mappings/convrtrs.txt
87 - Changes several tables and add six new tables, three of which
88 are 'fake' tables for ISO-2022-CN(-Ext).
89 - ucnv2022.c is modified to use 3 'fake' tables added above for
93 - patches/locale1.patch :
94 Filipino, Amharic, and Swahili locales
95 exemplar character set changes for CJK + 9 Indian locales
96 Minor fixes for Danish, , Turkish, and Korean.
98 - patches/locale2.patch :
99 The minimum locale data Chrome needs for 47 languages Chrome is
100 not localized to. Each locale data file has ExemplarCharacters,
101 LocaleScript, layout, and the name of the language for a locale
102 in its native language.
104 - patches/locale3.patch : Locale build configuration files. They
105 add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
106 source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
108 - In source/data/region, run the following command to get rid of numeric region
109 display names we don't use (everything other than 419).
110 $ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt
112 - android/patch_locale.sh (to be run for Android build only):
113 Makes changes to source/data/{curr,region,lang} to exclude these data
114 except the language and script names of zh_Hans and zh_Hant.
116 7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
118 - patches/unihan.patch:
119 unihan collation tables are never used in Chrome/Webkit, but it takes
120 about 1MB in the uncompressed ICU data file in ICU 4.2.1.
122 8. Timezone data update
123 - Grab the latest version of the following timezone data files and
124 put them in source/data/misc.
131 As of Nov, 2011, the latest version is 2011n and the above files
133 http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2011n/44/
135 9. Build-related changes
138 - patches/vscomp.patch
139 (see http://bugs.icu-project.org/trac/ticket/8355 and
140 http://bugs.icu-project.org/trac/ticket/8356 )
141 - patches/rtti.patch : Make RTTI work without exception handling on Windows
142 (see http://bugs.icu-project.org/trac/ticket/8343)
143 - patches/data.build.patch :
144 To remove some data files we don't use and cut down the data size.
145 - patches/data.build.win.patch :
146 Windows-only data build patch. Add a new target DATALIB to makedata.mak
147 - patches/clang.patch: To build with Clang.
148 (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in
149 the patch have already been fixed in the ICU trunk.)
150 - add an empty file (stubdatabuilt.txt) to source/stubdata
152 10. Pre-built data libraries are checked in.
154 Before building data file on Linux, re-run runConfigureICU Linux again
155 if it's run without data.build.patch in #8 above.
157 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
158 to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
159 in {BUILD_DIR_ROOT}/data.
161 - source/data/in/icudt46l.dat : Built on Linux with all the patches
164 - windows/icudt.dll : With icudt46l.dat in place, all the patches applied
165 and header files moved (#11 below), generated by building icudt_build
166 project of build/icudt_build.sln on Windows. icudt46.dll is
167 generated in bin/{Release,Debug} and copied to windows/icudt.dll
168 and checked in. Note that we drop the version number ('46') from the
169 dll name to avoind having to update our build scripts/configuration
170 files everytime ICU is upgraded to a new version.
172 - {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the
173 patches above (except android/brkitr.patch) applied and checked in.
175 - android/icudt46l_dat.S : Built on Linux with all the patches above and
176 android/brkitr.patch applied and android/patch_locale.sh executed, and
179 11. The header files were moved as shown below:
181 source/common/unicode ==> public/common/unicode
182 source/i18n/unicode ==> public/i18n/unicode
184 12. Apply the fix found with static analysis tools such as PSV and coverity
186 - patches/static.analysis.patch
187 - upstream trunk/4.8 do not have this code any more.
189 13. Fix for msvs2010 applied:
190 --- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
192 +++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
195 * Visual Studios 9.0.
196 * Cygwin with MSVC 9.0 also complains here about redefinition.
198 -#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
199 +#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
200 const int32_t StringPiece::npos;
203 14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
204 - upstream bug: http://bugs.icu-project.org/trac/ticket/8561
205 - Handle other chars besides the dot. This is required because decNumber's
206 parser expects the dot as a decimal separator.
207 - Locales that don't use dot were producing "NaN" values.
209 15. Fix a bug in the regex engine.
210 - patches/regex.patch
211 - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream)
213 16. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
214 - patches/search_collation.patch
215 - upstream bug: http://bugs.icu-project.org/trac/ticket/8290
217 17. Fix a use of uninitialized memory bug in regular expression matching
218 - patches/rematch.patch
219 - upstream bug: http://bugs.icu-project.org/trac/ticket/8824
221 18. Make it compile with -Werror on gcc 4.6
222 - patches/gcc46.patch (ToT upstream does not have this code any more).
224 19. Fix four out of bounds memory access error in common/uloc.c
225 and common/uresbund.c
228 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize)
229 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords)
230 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund)
231 http://bugs.icu-project.org/trac/ticket/8813 (uresbund)
232 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords)
234 20. Fix a null pointer error in ubrk_setText in ubrk.cpp.
236 - upstream bug : http://bugs.icu-project.org/trac/ticket/9115
238 21. Fix a clang warning in rbbi.cpp by merging in an upstream change.
239 - patches/changeset_30255.patch
240 - upstream change : http://bugs.icu-project.org/trac/changeset/30255
242 22. Fix time zone handling and compilation on iOS.
243 - patches/ios_timezone.patch
244 - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051
245 - http://bugs.icu-project.org/trac/ticket/8661
247 23. Fix a buffer overflow in utext
248 - patches/utext.patch
249 - upstream change : http://bugs.icu-project.org/trac/changeset/29356
251 24. Fix compilation errors on VS2012 and above.
252 - patches/vs2012.patch
254 25. Fix a buffer overflow in UTF-16/32 detection.
255 - patches/csetdet.patch
256 - upstream bug: http://bugs.icu-project.org/trac/ticket/10318
258 26. Add BreakIterator::getRuleStatus
259 - patches/breakiterator.patch
260 - Copy and paste BreakIterator::getRuleStatus API from ICU 52