Add optimized UTF-8 validation and transcoding apis, hook them up to UTF8Encoding...
authorLevi Broderick <GrabYourPitchforks@users.noreply.github.com>
Fri, 12 Apr 2019 03:50:16 +0000 (20:50 -0700)
committerGitHub <noreply@github.com>
Fri, 12 Apr 2019 03:50:16 +0000 (20:50 -0700)
commit4c8ab124ff122a23457141541020a54b30ed8624
treeb7a623a71d467d1d4d55a71df10f03c5bdfb7814
parent52bfad5c8046a2eee52f3a87bc15cd8b92ee0f8f
parent0c6934f1fa37fe17b8829d8d3c7a1ad035c0dbe3
Add optimized UTF-8 validation and transcoding apis, hook them up to UTF8Encoding (dotnet/coreclr#21948)

* Add optimized UTF-8 validation and transcoding logic
- Hook it up through the existing Utf8 public static APIs
- Move some shared methods out of ASCIIUtility
- Hook it up through the Utf8String ctor

* Hook up new UTF-8 logic through UTF8Encoding
- Add vectorized UTF-16 validation and transcoded byte counts
- Move Utf16Utility into Unicode namespace alongside Utf8Utility
- Fix some bugs in DecoderNLS's draining logic

* Improve perf of "is ASCII?" inner loop in UTF-8 validation.

* Remove SSE41.X64 optimization from AsciiUtility
RyuJIT now handles this optimally

* Clarify that vector read is unaligned

* Simplify vectorized logic; remove unnecessary adjustment

* PR feedback: GetElement(0) -> Sse2.StoreLow

* PR feedback
- Simplify CountNumberOfLeadingAsciiBytesFrom24BitInteger
- Extract some consts out to top of file w/ comments

* PR feedback: Enable SSE2 in Utf16Utility code

* Expand masks in Utf8Utility, fix const in fallback path

* Temporarily disable failing CoreFX tests

* Fix incorrect Debug.Assert statements

* Add comments tracking JIT workarounds.

* Rename DWORD -> UInt32 throughout API surface

* Re-flow Utf8Utility.Helpers

* PR feedback: Fix typos

* PR feedback: CountNumberOfLeadingAsciiBytesFrom24BitInteger

* PR feedback: Remove redundant endianess checks

* PR feedback: Validate nint definitions

* PR feedback: Clarify charIsNonAscii vector usage

* PR feedback: document tempUtf8CodeUnitCountAdjustment usage

* Fix compilation failure in Utf16Utility

* PR feedback: Clarify 3-byte sequence processing

* Add missing check to 3-byte processing logic

* Clarify comment in 3-byte processing

Commit migrated from https://github.com/dotnet/coreclr/commit/77a09eb013b7e2d66b84fc0a0008f31caee58c76
src/coreclr/tests/CoreFX/CoreFX.issues.json