Greek sigma is handled wrong in case independent regexp.
authorbarraclough@apple.com <barraclough@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 26 Mar 2012 20:13:39 +0000 (20:13 +0000)
committerbarraclough@apple.com <barraclough@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 26 Mar 2012 20:13:39 +0000 (20:13 +0000)
commit3f23b2bd35dd34cdd8be1a9c91953b477ecaa17c
tree8de462436f24095fb32dc8b3207f99c17a430453
parentdbefeeaafe3d106799159e1dfb2295a62d660d59
Greek sigma is handled wrong in case independent regexp.
https://bugs.webkit.org/show_bug.cgi?id=82063

Reviewed by Oliver Hunt.

Source/JavaScriptCore:

The bug here is that we assume that any given codepoint has at most one additional value it
should match under a case insensitive match, and that the pair of codepoints that match (if
a codepoint does not only match itself) can be determined by calling toUpper/toLower on the
given codepoint). Life is not that simple.

Instead, pre-calculate a set of tables mapping from a UCS2 codepoint to the set of characters
it may match, under the ES5.1 case-insensitive matching rules. Since unicode is fairly regular
we can pack this table quite nicely, and get it down to 364 entries. This means we can use a
simple binary search to find an entry in typically eight compares.

* CMakeLists.txt:
* GNUmakefile.list.am:
* JavaScriptCore.gypi:
* JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj:
* JavaScriptCore.xcodeproj/project.pbxproj:
* yarr/yarr.pri:
    - Added new files to build systems.
* yarr/YarrCanonicalizeUCS2.cpp: Added.
    - New - autogenerated, UCS2 canonicalized comparison tables.
* yarr/YarrCanonicalizeUCS2.h: Added.
(JSC::Yarr::rangeInfoFor):
    - Look up the canonicalization info for a UCS2 character.
(JSC::Yarr::getCanonicalPair):
    - For a UCS2 character with a single equivalent value, look it up.
(JSC::Yarr::isCanonicallyUnique):
    - Returns true if no other UCS2 code points are canonically equal.
(JSC::Yarr::areCanonicallyEquivalent):
    - Compare two values, under canonicalization rules.
* yarr/YarrCanonicalizeUCS2.js: Added.
    - script used to generate YarrCanonicalizeUCS2.cpp.
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::tryConsumeBackReference):
    - Use isCanonicallyUnique, rather than Unicode toUpper/toLower.
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
(JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
(JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
    - Use isCanonicallyUnique, rather than Unicode toUpper/toLower.
* yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::putChar):
    - Updated to determine canonical equivalents correctly.
(JSC::Yarr::CharacterClassConstructor::putUnicodeIgnoreCase):
    - Added, used to put a non-ascii, non-unique character in a case-insensitive match.
(JSC::Yarr::CharacterClassConstructor::putRange):
    - Updated to determine canonical equivalents correctly.
(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
    - Changed to call putUnicodeIgnoreCase, instead of putChar, avoid a double lookup of rangeInfo.

LayoutTests:

* fast/regex/script-tests/unicodeCaseInsensitive.js: Added.
(shouldBeTrue.ucs2CodePoint):
* fast/regex/unicodeCaseInsensitive-expected.txt: Added.
* fast/regex/unicodeCaseInsensitive.html: Added.
    - Added test cases for case-insensitive matches of non-ascii characters.

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@112143 268f45cc-cd09-0410-ab3c-d52691b4dbfc
17 files changed:
LayoutTests/ChangeLog
LayoutTests/fast/regex/script-tests/unicodeCaseInsensitive.js [new file with mode: 0644]
LayoutTests/fast/regex/unicodeCaseInsensitive-expected.txt [new file with mode: 0644]
LayoutTests/fast/regex/unicodeCaseInsensitive.html [new file with mode: 0644]
Source/JavaScriptCore/CMakeLists.txt
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/GNUmakefile.list.am
Source/JavaScriptCore/JavaScriptCore.gypi
Source/JavaScriptCore/JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj
Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp [new file with mode: 0644]
Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.h [new file with mode: 0644]
Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.js [new file with mode: 0644]
Source/JavaScriptCore/yarr/YarrInterpreter.cpp
Source/JavaScriptCore/yarr/YarrJIT.cpp
Source/JavaScriptCore/yarr/YarrPattern.cpp
Source/JavaScriptCore/yarr/yarr.pri