From: Ludovic Henry <luhenry@microsoft.com>
Date: Fri, 5 Apr 2019 16:42:47 +0000 (-0700)
Subject: Partially improve support for `--cpus` from Docker CLI (#23747)
X-Git-Tag: accepted/tizen/unified/20190813.215958~53^2
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=aea3b1a80d6c114e3e67bc9521bf39a8a17371d1;p=platform%2Fupstream%2Fcoreclr.git

Partially improve support for `--cpus` from Docker CLI (#23747)

* Round up the value of the CPU limit

In the case where `--cpus` is set to a value very close to the smaller
integer (ex: 1.499999999), it would previously be rounded down. This
would mean that the runtime would only try to take advantage of 1 CPU in
this example, leading to underutilization.

By rounding it up, we augment the pressure on the OS threads scheduler,
but even in the worst case scenario (`--cpus=1.000000001` previously
being rounded to 1, now rounded to 2), we do not observe any
overutilization of the CPU leading to performance degradation.

* Teach the ThreadPool of CPU limits

By making sure we do take the CPU limits into account when computing the
CPU busy time, we ensure we do not have the various heuristic of the
threadpool competing with each other: one trying to allocate more
threads to increase the CPU busy time, and the other one trying to
allocate less threads because there adding more doesn't improve the
throughput.

Let's take the example of a system with 20 cores, and a docker container
with `--cpus=2`. It would mean the total CPU usage of the machine is
2000%, while the CPU limit is 200%. Because the OS scheduler would never
allocate more than 200% of its total CPU budget to the docker container,
the CPU busy time would never get over 200%. From `PAL_GetCpuBusyTime`,
this would indicate that we threadpool threads are mostly doing non-CPU
bound work, meaning we could launch more threads.
---

diff --git a/src/gc/unix/cgroup.cpp b/src/gc/unix/cgroup.cpp
index 88eb415..10b9b86 100644
--- a/src/gc/unix/cgroup.cpp
+++ b/src/gc/unix/cgroup.cpp
@@ -102,7 +102,7 @@ public:
     {
         long long quota;
         long long period;
-        long long cpu_count;
+        double cpu_count;
 
         quota = ReadCpuCGroupValue(CFS_QUOTA_FILENAME);
         if (quota <= 0)
@@ -119,10 +119,11 @@ public:
             return true;
         }
         
-        cpu_count = quota / period;
-        if (cpu_count < UINT32_MAX)
+        cpu_count = (double) quota / period;
+        if (cpu_count < UINT32_MAX - 1)
         {
-            *val = cpu_count;
+            // round up
+            *val = (uint32_t)(cpu_count + 0.999999999);
         }
         else
         {
diff --git a/src/pal/src/misc/cgroup.cpp b/src/pal/src/misc/cgroup.cpp
index 144ac66..97b2cb2 100644
--- a/src/pal/src/misc/cgroup.cpp
+++ b/src/pal/src/misc/cgroup.cpp
@@ -90,7 +90,7 @@ public:
     {
         long long quota;
         long long period;
-        long long cpu_count;
+        double cpu_count;
 
         quota = ReadCpuCGroupValue(CFS_QUOTA_FILENAME);
         if (quota <= 0)
@@ -106,11 +106,12 @@ public:
             *val = 1;
             return true;
         }
-        
-        cpu_count = quota / period;
-        if (cpu_count < UINT_MAX)
+
+        cpu_count = (double) quota / period;
+        if (cpu_count < UINT_MAX - 1)
         {
-            *val = cpu_count;
+            // round up
+            *val = (UINT)(cpu_count + 0.999999999);
         }
         else
         {
diff --git a/src/pal/src/thread/process.cpp b/src/pal/src/thread/process.cpp
index 35ac2bf..74ec02f 100644
--- a/src/pal/src/thread/process.cpp
+++ b/src/pal/src/thread/process.cpp
@@ -2524,6 +2524,12 @@ PAL_GetCPUBusyTime(
         {
             return 0;
         }
+
+        UINT cpuLimit;
+        if (PAL_GetCpuLimit(&cpuLimit) && cpuLimit < dwNumberOfProcessors)
+        {
+            dwNumberOfProcessors = cpuLimit;
+        }
     }
 
     if (getrusage(RUSAGE_SELF, &resUsage) == -1)