Refactor CharSetSolver and friends to avoid unbounded caching (#67673)
authorStephen Toub <stoub@microsoft.com>
Thu, 7 Apr 2022 20:59:05 +0000 (16:59 -0400)
committerGitHub <noreply@github.com>
Thu, 7 Apr 2022 20:59:05 +0000 (16:59 -0400)
commit2e1935739c321b413a3f0c10720aea56f00ea8fb
tree2ae63f16a795848fdcb46075f24a2fbcbb5cfddd
parentdc030a0362e3177389f4d4e73e1332c345eab891
Refactor CharSetSolver and friends to avoid unbounded caching (#67673)

* Refactor CharSetSolver and friends to avoid unbounded caching

CharSetSolver today is a singleton and holds on to a thread-safe cache that is augmented with every BDD produced by every regex processed.  Over time, then, the cache grows unbounded and represents a significant leak.  This change makes CharSetSolver no longer a singleton, with every regex creating its own CharSetSolver, with caches no longer shared across instances.  In doing so, we also remove the need for CharSetSolver's cache to be thread-safe, making it cheaper with a `Dictionary<>` instead of a `ConcurrentDictionary<>`.

As part of this refactoring, other code has been refactored and cleaned up:
- CharSetSolver was the only type deriving from the abstract BDDAlgebra.  The base type has been removed and its functionality merged up into CharSetSolver.
- ICharAlgebra was the only type inheriting IBooleanAlgebra.  The latter has been removed and its APIs moved up into ICharAlgebra, which has also been renamed to ISolver.
- Similarly, UInt64Algebra and BitVectorAlgebra have been renamed to UInt64Solver and BitVectorSolver, respectively.
- Several files were renamed to match the types they contain.
- Several functions were parameterized with an integer value but the same value was passed in from every call site; that value has been moved to be a const in the method and the parameter removed.
- CharSetSolver had a complicated PrettyPrint method.  Most of it has been deleted in favor of using the set description functionality already present in RegexCharClass.
- Generic type parameter names for the implementation of sets (BDD, ulong, BitVector) were all over the map.  I've standardized on TSet.
- Several members of ISolver (formerly ICharAlgebra/IBooleanAlgebra) have been renamed: True is now Full, False is now Empty, and CharConstraint is now CreateFromChar.  IsSatisfiable and AreEquivalent were removed in favor of adding IsEmpty and IsFull members (AreEquivalent was only ever used with Full, and IsSatisfiable calls were replaced with !IsEmpty calls).
- Use of "predicate" has largely been removed.
- Replaced a few recursive algorithms with loops.
- Added some more comments.

* Address PR feedback
27 files changed:
src/libraries/System.Text.RegularExpressions/src/System.Text.RegularExpressions.csproj
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BDD.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BDDAlgebra.cs [deleted file]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BDDRangeConverter.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BitVectorSolver.cs [moved from src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BitVectorAlgebra.cs with 52% similarity]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BooleanClassifier.cs [deleted file]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/CharSetSolver.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/DfaMatchingState.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/DgmlWriter.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/IBooleanAlgebra.cs [deleted file]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/ICharAlgebra.cs [deleted file]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/ISolver.cs [new file with mode: 0644]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MintermClassifier.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MintermGenerator.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/RegexNodeConverter.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicNFA.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexBuilder.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexRunnerFactory.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexSampler.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexSet.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/TransitionRegex.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/UInt64Solver.cs [moved from src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/BitVector64Algebra.cs with 57% similarity]
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/UnicodeCategoryConditions.cs
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/UnicodeCategoryRangesGenerator.cs