Remove partitioning from CancellationTokenSource (#48251)
When CancellationTokenSource was original created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation).
As long as I was refactoring a bunch of code, I fixed up a few other things along the way:
- Avoided allocating while holding the instance's spin lock
- Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method
- Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case
- Used existing throw helpers in a few more cases
- Renamed a few methods, and made a few others to be local functions