Improve thread pool worker thread's spinning for work (#13921)
Improve thread pool worker thread's spinning for work
Closes https://github.com/dotnet/coreclr/issues/5928
Replaced UnfairSemaphore with a new implementation in CLRLifoSemaphore
- UnfairSemaphore had a some benefits:
- It tracked the number of spinners and avoids waking up waiters as long as the signal count can be satisfied by spinners
- Since spinners get priority over waiters, that's the main "unfair" part of it that allows hot threads to remain hot and cold threads to remain cold. However, waiters are still released in FIFO order.
- Spinning helps with throughput when incoming work is bursty
- All of the above benefits were retained in CLRLifoSemaphore and some were improved:
- Similarly to UnfairSemaphore, the number of spinners are tracked and preferenced to avoid waking up waiters
- For waiting, on Windows, a I/O completion port is used since it releases waiters in LIFO order. For Unix, added a prioritized wait function to the PAL to register waiters in reverse order for LIFO release behavior. This allows cold waiters to time out more easily since they will be used less frequently.
- Similarly to SemaphoreSlim, the number of waiters that were signaled to wake but have not yet woken is tracked to help avoid waking up an excessive number of waiters
- Added some YieldProcessorNormalized() calls to the spin loop. This avoids thrashing on Sleep(0) by adding a delay to the spin loop to allow it to be more effective when there are no threads to switch to, or the only other threads to switch to are other similar spinners.
- Removed the processor count multiplier on the max spin count and retuned the default max spin count. The processor count multiplier was causing excessive CPU usage on machines with many processors.
12 files changed: