Greek sigma is handled wrong in case independent regexp.
https://bugs.webkit.org/show_bug.cgi?id=82063
Reviewed by Oliver Hunt.
Source/JavaScriptCore:
The bug here is that we assume that any given codepoint has at most one additional value it
should match under a case insensitive match, and that the pair of codepoints that match (if
a codepoint does not only match itself) can be determined by calling toUpper/toLower on the
given codepoint). Life is not that simple.
Instead, pre-calculate a set of tables mapping from a UCS2 codepoint to the set of characters
it may match, under the ES5.1 case-insensitive matching rules. Since unicode is fairly regular
we can pack this table quite nicely, and get it down to 364 entries. This means we can use a
simple binary search to find an entry in typically eight compares.
* CMakeLists.txt:
* GNUmakefile.list.am:
* JavaScriptCore.gypi:
* JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj:
* JavaScriptCore.xcodeproj/project.pbxproj:
* yarr/yarr.pri:
- Added new files to build systems.
* yarr/YarrCanonicalizeUCS2.cpp: Added.
- New - autogenerated, UCS2 canonicalized comparison tables.
* yarr/YarrCanonicalizeUCS2.h: Added.
(JSC::Yarr::rangeInfoFor):
- Look up the canonicalization info for a UCS2 character.
(JSC::Yarr::getCanonicalPair):
- For a UCS2 character with a single equivalent value, look it up.
(JSC::Yarr::isCanonicallyUnique):
- Returns true if no other UCS2 code points are canonically equal.
(JSC::Yarr::areCanonicallyEquivalent):
- Compare two values, under canonicalization rules.
* yarr/YarrCanonicalizeUCS2.js: Added.
- script used to generate YarrCanonicalizeUCS2.cpp.
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::tryConsumeBackReference):
- Use isCanonicallyUnique, rather than Unicode toUpper/toLower.
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
(JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
(JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
- Use isCanonicallyUnique, rather than Unicode toUpper/toLower.
* yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::putChar):
- Updated to determine canonical equivalents correctly.
(JSC::Yarr::CharacterClassConstructor::putUnicodeIgnoreCase):
- Added, used to put a non-ascii, non-unique character in a case-insensitive match.
(JSC::Yarr::CharacterClassConstructor::putRange):
- Updated to determine canonical equivalents correctly.
(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
- Changed to call putUnicodeIgnoreCase, instead of putChar, avoid a double lookup of rangeInfo.
LayoutTests:
* fast/regex/script-tests/unicodeCaseInsensitive.js: Added.
(shouldBeTrue.ucs2CodePoint):
* fast/regex/unicodeCaseInsensitive-expected.txt: Added.
* fast/regex/unicodeCaseInsensitive.html: Added.
- Added test cases for case-insensitive matches of non-ascii characters.
git-svn-id: http://svn.webkit.org/repository/webkit/trunk@112143
268f45cc-cd09-0410-ab3c-
d52691b4dbfc