Improve performance of cgroup access (#21229)
Currently, we create a CGroup instance on each request for getting
used or total physical memory. This has an extra cost of finding
filesystem paths of the current cgroup files. I have found that
if we do the initialization just once, the performance of getting
the used or total memory in a tight loop improves 22 times.
Accidentally, I was also looking into a perf regression in the
ByteMark.BenchBitOps test that was observed in the past, seemingly
related to the recent change to the way we get the used memory.
And I've found that the benchmark results improve two fold with
the change in this commit.
This change was made both in PAL and in the standalone GC.