[threading] Switch to a native implementation of LowLevelLifoSemaphore (#2098)
This change is motivated on two main fronts. The first is maintainability - the current managed implementation is difficult to understand and I worry diagnosing any potential issues would be a massive time sink. The second is performance - the current implementation appears to suffer from significant lock contention when running the TechEmpower plaintext benchmark. My hope is that the simpler, cleaner native implementation here will avoid both problems, but I don't want to merge it until we have benchmarking data. However, even if the numbers are similar, I still think it's worth merging just from a maintainability perspective.
The native LifoSemaphore implementation has only ever been tested on posix-like platforms, so Windows behavior is unknown. Currently the Windows implementation of LowLevelLifoSemaphore is very different, so unless we have need for the LifoSemaphore elsewhere in the runtime this isn't a concern.
Many thanks to @filipnavara for the initial implementation.