Factor common prefix text out of Regex alternations (#2171)
authorStephen Toub <stoub@microsoft.com>
Mon, 27 Jan 2020 19:56:04 +0000 (14:56 -0500)
committerGitHub <noreply@github.com>
Mon, 27 Jan 2020 19:56:04 +0000 (14:56 -0500)
commit9023aa2ecca472c66b6c3c4af5aafeec5c7a97ea
tree81197bb3dd63d73db39ac4854e87901296472c1a
parent526dfa81917e9a5f8052f82b874c051e8879cfa6
Factor common prefix text out of Regex alternations (#2171)

* Factor common prefix text out of Regex alternations

Given a regex like "this|that|there", we will now factor out the common prefix into a new node concatenated with the alternation, e.g. "th(?:is|at|ere)".  This has a few benefits, including exposing more text to FindFirstChar if this is at the beginning of the sequence, reducing backtracking, and enabling further reduction/optimization opportunities in the alternation.

* Address PR feedback

* Update RegexBoyerMoore.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexBoyerMoore.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCode.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFCD.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexRunner.cs
src/libraries/System.Text.RegularExpressions/tests/Regex.UnicodeChar.Tests.cs
src/libraries/System.Text.RegularExpressions/tests/RegexReductionTests.cs