Name: icu
URL: http://site.icu-project.org/
-Version: 4.6
+Version: 52.1
License: MIT
Security Critical: yes
Description:
-This directory contains the source code of ICU 4.6 for C/C++
+This directory contains the source code of ICU 52.1 for C/C++
1. It was obtained with the following:
- $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46
+ $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-52-1 icu52
-2. Platform header files for Linux, FreeBSD, OpenBSD, Android, Mac OS X, and QNX:
+ The following directories we don't use are removed:
- - Apply platform.patch in patches directory. : It applies the upstream
- patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
- and change source/common/unicode/ptypes.h to refer to plinux.h and
- pmac.h generated below.
+ - as_is
+ - packaging
+ - source/layout
+ - source/layoutex
+ - source/data/xml
- - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
- 'runConfigureICU MacOSX' are run to generate
- source/common/unicode/platform.h.
+ patches/configure.patch is applied to get runConfigureICU work in the
+ icudata generation step without layout and layoutex directory by removing the
+ corresponding Makefile's from ac_config variable.
- - On OpenBSD, source/common/unicode/platform.h is being generated
- by the icu4c port in the ports directory and not by runConfigureICU.
- In case the file has to be updated you can do:
- cd /home/ports/textproc/icu4c && make configure
+2. Apply the following patch for platform related headers (putilimpl.h and
+ others).
- - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'
+ - patches/putil.patch for Android, QNX and newlib(NaCl-newlib).
+ Upstream bug for Android : http://bugs.icu-project.org/trac/ticket/10478
+ Upstream bug for QNX : http://bugs.icu-project.org/trac/ticket/10811
+ Upstream bug for newlib : http://bugs.icu-project.org/trac/ticket/10873
- - Apply patches/pmach.h.patch on Mac to pmac.h
+ - patches/platform_nacl.patch to add U_PF_NATIVE_CLIENT
+ Upstream bug : http://bugs.icu-project.org/trac/ticket/11033
- - On Android, the pandroid.h was generated by copying plinux.h to
- pandroid.h and applying the patches/pandroid.h.patch.
- - For QNX, the pqnx.h was generated by copying plinux.h to
- pqnx.h and applying the patches/platform.qnx.patch.
+3. Breakiterator patches
- - For NaCl (icu_nacl.gypi), the pnacl.h was generated by copying plinux.h to
- pnacl.h and applying the patches/pnacl.h.patch.
+ - Apply patches/brkitr.patch
+ * word.txt
+ a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
+ FQDN labels can be split at '.'
+ b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
+ See http://unicode.org/cldr/trac/ticket/6555
+ * line.txt
+ a. Use Japanese rules for all locales because Japanese tailoring only
+ affects Japanese specific characters.
+ See http://unicode.org/cldr/trac/ticket/3974
+ b. Minor changes in CL, OP and IS definitions to handle 'comma-variants'
+ more consistenly.
+ See http://unicode.org/cldr/trac/ticket/6557
+ c. Fix line breaking for Chinese characters and quotation marks
+ See http://unicode.org/cldr/trac/ticket/4200 and
+ http://crbug.com/39779
+
- - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h
+ - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt
+ and word_POSIX.txt dropped from the build list.
-3. The following directories were removed because they're not used by Chromium
- at the moment:
- as_is
- packaging
- source/extra
- source/sample
- source/layout
- source/layoutex
+ - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
+ (source/data/brkitr/khmerdict.txt) obtained from
+ http://bugs.icu-project.org/trac/ticket/9451
+ - Add several common Chinese words that were dropped previously to
+ source/data/cjdict/brkitr/cjdict.txt
+ patch: patches/cjdict.patch
+ upstream bug: http://bugs.icu-project.org/trac/ticket/10888
-4. The word breaking for Chinese and Japanese were modified to use a word
- frequency list with the following patch and cjdict.txt.
-
- - patches/segmentation.patch :
- Adds a dictionary (word-frequency)-based word breaking for CJK
- (Korean is supported in the code, but it does not do anything
- because we don't have a Korean word-list.)
-
- - source/data/brkitr/cjdict.txt :
- Chinese and Japanese word frequency list.
- See the file for license/copyright notice
-
- - source/data/brkitr/cc_edict.txt :
- the list of words derived from CC-Edict.)
-
- - patches/brkitr.patch
- * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
- handling of U+0022, and splitting of FQDN into labels at '.'.
- For Hebrew, see http://unicode.org/cldr/track/ticket/3120
- * line.txt : Incorporated line_he and minor changes in CL, OP and ID
- definitions.
- For Hebrew, see http://unicode.org/cldr/track/ticket/4004
- For others, see http://unicode.org/cldr/track/ticket/3974
- http://unicode.org/cldr/track/ticket/4200
- http://unicode.org/cldr/track/ticket/
- * brklocal.mk : build file changes to drop unnecessary brkitr rule
- files (e.g. word_ja.txt, line_he.txt)
- android/brkitr.patch (to be applied for Android build only) :
Reverts some changes about Chinese/Japanese segmentation rules in
patches/brkitr.patch to reduce binary size for Android.
- If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
- to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
+4. Converter changes :
-5. Converter changes : converters.patch
- - Include what we really need. See source/data/mappings/ucmlocal.txt
- - Alias and mapping changes : source/data/mappings/convrtrs.txt
- - Changes several tables and add six new tables, three of which
- are 'fake' tables for ISO-2022-CN(-Ext).
- - ucnv2022.c is modified to use 3 'fake' tables added above for
- ISO-2022-CN(-Ext).
+ - converters.patch :
+ a. revises existing mapping tables
+ b. Remove a lot of unused aliases in the converter alias table
+ (source/data/mappings/convrtrs.txt ) leading to 40kB size reduction.
-6. Locale changes
- - patches/locale1.patch :
- Filipino, Amharic, and Swahili locales
- exemplar character set changes for CJK + 9 Indian locales
- Minor fixes for Danish, , Turkish, and Korean.
+ - Add source/data/mappings/ucmlocal.txt : to list only converters we need.
+ - Add three new tables per WHATWG encoding standards for EUC-JP,
+ Shift_JIS and CP866.
+ They're generated with scripts/{eucjp, sjis, ibm866}_gen.sh.
+ - Add three 'fake' tables for ISO-2022-CN(-Ext) : noop-*.ucm.
+
+ - uconv.patch
+ a. ucnv2022 uses 3 fake tables for ISO-2022-CN(-Ext) instead of two
+ huge tables.
+ b. ISO-2022-JP-[1-4] is dropped.
+ c. SCSU, BOCU, ISCII, UTF-7 conversion is diabled leading to
+ the 47kB reduction in the code size.
- - patches/locale2.patch :
- The minimum locale data Chrome needs for 47 languages Chrome is
- not localized to. Each locale data file has ExemplarCharacters,
- LocaleScript, layout, and the name of the language for a locale
- in its native language.
+5. Locale changes
+ - patches/locale1.patch :
+ a. Exemplar character set changes for zh*, ja + 9 Indian locales
+ b. Minor fixes for Korean, a few Indic (AmPmMarkers) and
+ others (datetime format)
- - patches/locale3.patch : Locale build configuration files. They
+ - Locale build configuration files: To include the full locale data
+ for Chrome's UI languages and the minimum locale data for other locales,
add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
- - In source/data/region, run the following command to get rid of numeric region
- display names we don't use (everything other than 419).
- $ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt
+ This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter)
+ cuts down the data size by ~ 11MB.
+
+ - Run scripts/trim_data.sh : About 2.1MB data size reduction.
+ a. Trim the locale data for Chrome's UI langauges :
+ locales, lang, region, currency
+ b. Trim the locale data for non-UI languages to the bare minimum :
+ ExemplarCharacters, LocaleScript, layout, and the name of the
+ language for a locale in its native language.
+ c. Remove the legacy Chinese character set-based collation
+ (big5han/gb2312han) that don't make any sense and nobdoy uses.
- android/patch_locale.sh (to be run for Android build only):
- Makes changes to source/data/{curr,region,lang} to exclude these data
- except the language and script names of zh_Hans and zh_Hant.
+ a. Makes changes to source/data/{curr,region,lang} to exclude these data
+ except the language and script names of zh_Hans and zh_Hant.
+ b. Remove exemplar cities in timezone data (data/zone)
+ c. Keep only the minimal calendar data in data/locales
- Add tg.txt to source/data/locale source/data/lang to add the minimal locale
data necessary for the spellchecker. In both directories, add tg.txt to
reslocal.mk
-7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
-
- - patches/unihan.patch:
- unihan collation tables are never used in Chrome/Webkit, but it takes
- about 1MB in the uncompressed ICU data file in ICU 4.2.1.
-
-8. Timezone data update
+6. Timezone data update
- Grab the latest version of the following timezone data files and
put them in source/data/misc.
windowsZones.txt
zoneinfo64.txt
- As of Mar 2014, the latest version is 2014a and the above files
+ As of August 2014, the latest version is 2014f and the above files
are available at
- http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014a/44/
-
-9. Transliterator customization
+ http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014f/44/
- - Add el_Upper.txt taken from ICU 52 to source/data/trnslit
+7. Transliterator customization
- - Also add css3transform.txt to the same directory
+ - Also add css3transform.txt to source/data/trnslit.
- Put the following line in trnslocal.mk
TRANSLIT_SOURCE=css3transform.txt
-10. Build-related changes
+8. Build-related changes
- patches/wpo.patch
- - patches/vscomp.patch
- (see http://bugs.icu-project.org/trac/ticket/8355 and
- http://bugs.icu-project.org/trac/ticket/8356 )
- - patches/rtti.patch : Make RTTI work without exception handling on Windows
- (see http://bugs.icu-project.org/trac/ticket/8343)
+ Upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
+ http://bugs.icu-project.org/trac/ticket/5701
+ - patches/vscomp.patch for building with Visual Studio on Windows.
+ a. do not use WINDOWS_LOCALE_API in locmap.c
+ b. do not redefine stringpiece::npos
+ c. fix a Windows build failure with U_USING_ICU_NAMESPACE=0
+ upstream bug: http://bugs.icu-project.org/trac/ticket/10486
+ fixed in ICU 53)
+ d. Explicitly use Windows 'A' API when argument is an LPSTR in wintz.c
+ upstream bug : http://bugs.icu-project.org/trac/ticket/10870
+
- patches/data.build.patch :
- To remove some data files we don't use and cut down the data size.
+ Remove unnecessary resources : invuca, unames, collator source, stringprep
- patches/data.build.win.patch :
- Windows-only data build patch. Add a new target DATALIB to makedata.mak
- - patches/clang.patch: To build with Clang.
- (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in
- the patch have already been fixed in the ICU trunk.)
- - add an empty file (stubdatabuilt.txt) to source/stubdata
+ Windows-only data build patch.
-11. Pre-built data libraries are checked in.
+ - patches/clang_win.patch :
+ Take care of 3 warnings from clang and MSVC 2013.
+ upstream bug : http://bugs.icu-project.org/trac/ticket/11102
- Before building data file on Linux, re-run 'runConfigureICU Linux' again
- if it's run without data.build.patch in #10 above.
+9. Pre-built data files are checked in with the following steps on Linux:
- Because we removed layout and layoutex directories in step 3,
- 'runConfigureICU Linux' will fail even with '--disable-layout'. A
- work-around is to have a copy of our icu tree in a separate build directory
- and add back directories we removed in step 3 before
- running 'runConfigure'.
+ a. Make a icu data build directory outside the Chromium source tree
+ and cd to that directory.
+ b. Run
- 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
- to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
- in {BUILD_DIR_ROOT}/data.
+ ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
- 'make' will fail again when pkgdata looks for css3transform.res. Edit
+ c. Run 'make'
+ d. 'make' will fail in the 1st pass. Copy
+ ${CHROME_ICU_TREE_TOP}/source/data/in/coll/invuca.icu
+ to {BUILD_DIR_ROOT}/data/out/build/icudt52l/coll and re-run 'make'
+ in {BUILD_DIR_ROOT}/data.
+
+ e. 'make' will fail again when pkgdata looks for css3transform.res. Edit
data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
(see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again.
- source/data/in/icudtl.dat : Built on Linux with all the patches
- above applied. icudt46l.dat is generated in
+ above applied. icudt52l.dat is generated in
{BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
- version number (46) dropped.
+ version number (52) dropped.
- - windows/icudt.dll : With icudt46l.dat in place, all the patches applied
- and header files moved (#11 below), generated by building icudt_build
- project of build/icudt_build.sln on Windows. icudt46.dll is
- generated in bin/{Release,Debug} and copied to windows/icudt.dll
- and checked in. Note that we drop the version number ('46') from the
- dll name to avoind having to update our build scripts/configuration
- files everytime ICU is upgraded to a new version.
- - {mac,linux}/icudt46l_dat.S : Built on Linux with all the
+ - {mac,linux}/icudtl_dat.S : Built on Linux with all the
patches above (except android/brkitr.patch) applied and checked in.
- This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp.
-
- mac/icudt46l_dat.S is identical to linux/icudt46l_dat.S. It's made
- by changing the header portion of the Linux version to read as following
- (no leading whitespace) :
-
- .globl _icudt46_dat
- #ifdef U_HIDE_DATA_SYMBOL
- .private_extern _icudt46_dat
- #endif
- .data
- .const
- .align 4
- _icudt46_dat:
-
-
- - android/icudt46l_dat.S : Built on Linux with all the patches above and
- android/brkitr.patch applied and android/patch_locale.sh executed, and
- checked in.
- - android/icudtl.dat : Generated as icudt46l.dat in
- {BUILD_DIR_ROOT}/data/out/tmp along with icudt46l_dat.S and
- copied to the above location with '46' dropped in its name.
-
-
-12. Apply the fix found with static analysis tools such as PSV and coverity
-
- - patches/static.analysis.patch
- - upstream trunk/4.8 do not have this code any more.
-
-13. Fix for msvs2010 applied:
---- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
- (revision 78292)
-+++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
- (working copy)
-@@ -75,7 +75,7 @@
- * Visual Studios 9.0.
- * Cygwin with MSVC 9.0 also complains here about redefinition.
- */
--#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
-+#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
- const int32_t StringPiece::npos;
- #endif
-
-14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
- - upstream bug: http://bugs.icu-project.org/trac/ticket/8561
- - Handle other chars besides the dot. This is required because decNumber's
- parser expects the dot as a decimal separator.
- - Locales that don't use dot were producing "NaN" values.
-
-15. Fix a bug in the regex engine.
- - patches/regex.patch
- - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream)
-
-16. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
- - patches/search_collation.patch
- - upstream bug: http://bugs.icu-project.org/trac/ticket/8290
-
-17. Fix a use of uninitialized memory bug in regular expression matching
- - patches/rematch.patch
- - upstream bug: http://bugs.icu-project.org/trac/ticket/8824
-
-18. Make it compile with -Werror on gcc 4.6
- - patches/gcc46.patch (ToT upstream does not have this code any more).
-
-19. Fix four out of bounds memory access error in common/uloc.c
- and common/uresbund.c
- - patches/uloc.patch
- - upstream bug:
- 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize)
- 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords)
- 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund)
- http://bugs.icu-project.org/trac/ticket/8813 (uresbund)
- 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords)
-
-20. Fix a null pointer error in ubrk_setText in ubrk.cpp.
- - patches/ubrk.patch
- - upstream bug : http://bugs.icu-project.org/trac/ticket/9115
-
-21. Fix a clang warning in rbbi.cpp by merging in an upstream change.
- - patches/changeset_30255.patch
- - upstream change : http://bugs.icu-project.org/trac/changeset/30255
-
-22. Fix time zone handling and compilation on iOS.
- - patches/ios_timezone.patch
- - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051
- http://bugs.icu-project.org/trac/ticket/8661
-
-23. Fix a buffer overflow in utext
- - patches/utext.patch
- - upstream change : http://bugs.icu-project.org/trac/changeset/29356
-
-24. Fix compilation errors on VS2012 and above.
- - patches/vs2012.patch
-
-25. Fix a buffer overflow in UTF-16/32 detection.
- - patches/csetdet.patch
- - upstream bug: http://bugs.icu-project.org/trac/ticket/10318
-
-26. Add BreakIterator::getRuleStatus
- - patches/breakiterator.patch
- - Copy and paste BreakIterator::getRuleStatus API from ICU 52
-
-27. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT.
- - patches/declspec.patch
+ This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
+ icudt52l_dat.S, but '52' is dropped while copying.
+
+ mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
+ the header portion. With "linux/icudtl_dat.S" in its place,
+ run scripts/make_mac_assembly.sh to generate it.
+
+ - android/icudtl_dat.S : Built on Linux with all the patches above and
+ android/brkitr.patch applied and android/patch_locale.sh executed.
+ '52' is dropped from the name generated in the build tree.
+
+ - android/icudtl.dat : Generated as icudt52l.dat in
+ {BUILD_DIR_ROOT}/data/out/tmp along with icudt52l_dat.S and
+ copied to the above location with '52' dropped in its name.
+
+ - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
+ and don't use this file.)
-28. Add support for QNX Neutrino.
- - patches/platform.qnx.patch:
- See #2 about the platform header generation.
- - patches/si_value.undef.patch:
- Work around an all-lowercase macro defined in <signal.h>.
- Upstream took a different approach:
- http://bugs.icu-project.org/trac/ticket/9935
- - patches/xopen_source.patch:
- Set _XOPEN_SOURCE to 600 as in the upstream changeset:
- http://bugs.icu-project.org/trac/changeset/30418
+ a. check out a clean copy of icu52 from the upstream on Windows
+ outside the Chrome tree.
+
+ $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-52-1 ${SEPARATE_ICU_ROOT}/icu52
+
+ b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
+ ${SEPARATE_ICU_ROOT}/source/data/in/icudt52l.dat
+ c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
+ ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
+ c. In Visual Studio, open source/allinone/allinone.sln solution
+ in ${SEPARATE_ICU_ROOT}
+ d. Build 'makedata' target
+ e. icudt52.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
+ f. Copy that icudt52.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
+ and check that in.
+
+
+10. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT.
+ - patches/declspec.patch