Improve Regex performance (mainly interpreted) (#449)
authorStephen Toub <stoub@microsoft.com>
Tue, 3 Dec 2019 00:09:19 +0000 (19:09 -0500)
committerGitHub <noreply@github.com>
Tue, 3 Dec 2019 00:09:19 +0000 (19:09 -0500)
commit2cc926e6fde9dca2909e43aee5de72eb0d2de396
tree78f504db31b7c4846d4dae91f13c30816948ddf7
parent457e69143f4feef2dd0ecc7a1ab02fc26b3175ea
Improve Regex performance (mainly interpreted) (#449)

* Remove branches from tight inner interpreter loop in FindFirstChar

* Tweak RegexBoyerMoore.IsMatch

Reduce the checks needed and elimiate unnecessary layers of function calls.

* Remove IsSingleton optimization

This doesn't show up in real regexes and is just adding unnecessary complication to the code.  No one writes `[a-b]`... they just write `a`.  SingletonInverse is more useful, as you can search for any character except for a specific one, e.g. find the first character that's not a dash.

* Cache CharInClass results for ASCII lookups

* Improve codegen in a few places (and a little cleanup)

* Mark RegexInterpreter.SetOperator aggressive inlining

It's small but isn't getting inlined; it's only called in 4 places, but on hot paths, and inlininig it nets around an ~8% throughput win.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Match.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexBoyerMoore.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCode.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexRunner.cs