Documentation/botr/garbage-collection.md

   1 Garbage Collection Design
   2 =========================
   3 Author: Maoni Stephens ([@maoni0](https://github.com/maoni0)) - 2015
   4
   5 Note: See _The Garbage Collection Handbook_ (referenced in the resources section at the end of this document) to learn more about garbage collection topics.
   6
   7 Component Architecture
   8 ======================
   9
  10 The 2 components that belong to GC are the allocator and the
  11 collector. The allocator is responsible for getting more memory and triggering the collector when appropriate. The collector reclaims garbage, or the memory of objects that are no longer in use by the program.
  12
  13 There are other ways that the collector can get called, such as manually calling GC.Collect or the finalizer thread receiving an asynchronous notification of the low memory (which triggers the collector).
  14
  15 Design of Allocator
  16 ===================
  17
  18 The allocator gets called by the allocation helpers in the Execution Engine (EE), with the following information:
  19
  20 - Size requested
  21 - Thread allocation context
  22 - Flags that indicate things like whether this is a finalizable object or not
  23
  24 The GC does not have special treatment for different kinds of object types. It consults the EE to get the size of an object.
  25
  26 Based on the size, the GC divides objects into 2 categories: small
  27 objects (< 85,000 bytes) and large objects (>= 85,000 bytes). In
  28 principle, small and large objects can be treated the same way but
  29 since compacting large objects is more expensive GC makes this
  30 distinction.
  31
  32 When the GC gives out memory to the allocator, it does so in terms of allocation contexts. The size of an allocation context is defined by the allocation quantum.
  33
  34 - **Allocation contexts** are smaller regions of a given heap segment that are each dedicated for use by a given thread. On a single-processor (meaning 1 logical processor) machine, a single context is used, which is the generation 0 allocation context.
  35 - The **Allocation quantum** is the size of memory that the allocator allocates each time it needs more memory, in order to perform object allocations within an allocation context. The allocation is typically 8k and the average size of managed objects are around 35 bytes, enabling a single allocation quantum to be used for many object allocations.
  36
  37 Large objects do not use allocation contexts and quantums. A single large object can itself be larger than these smaller regions of memory. Also, the benefits (discussed below) of these regions are specific to smaller objects. Large objects are allocated directly to a heap segment.
  38
  39 The allocator is designed to achieve the following:
  40
  41 - **Triggering a GC when appropriate:** The allocator triggers a GC when the allocation budget (a threshold set by the collector) is exceeded or when the allocator can no longer allocate on a given segment. The allocation budget and managed segments are discussed in more detail later.
  42 - **Preserving object locality:** Objects allocated together on the same heap segment will be stored at virtual addresses close to each other.
  43 - **Efficient cache usage:** The allocator allocates memory in _allocation quantum_ units, not on an object-by-object basis. It zeroes out that much memory to warm up the CPU cache because there will be objects immediately allocated in that memory. The allocation quantum is usually 8k.
  44 - **Efficient locking:** The thread affinity of allocation contexts and quantums guarantee that there is only ever a single thread writing to a given allocation quantum. As a result, there is no need to lock for object allocations, as long as the current allocation context is not exhausted.
  45 - **Memory integrity:** The GC always zeroes out the memory for newly allocated objects to prevent object references pointing at random memory.
  46 - **Keeping the heap crawlable:** The allocator makes sure to make a free object out of left over memory in each allocation quantum. For example, if there is 30 bytes left in an allocation quantum and the next object is 40 bytes, the allocator will make the 30 bytes a free object and get a new allocation quantum.
  47
  48 Allocation APIs
  49 ---------------
  50
  51      Object* GCHeap::Alloc(size_t size, DWORD flags);
  52      Object* GCHeap::Alloc(alloc_context* acontext, size_t size, DWORD flags);
  53
  54 The above functions can be used to allocate both small objects and
  55 large objects. There is also a function to allocate directly on LOH:
  56
  57      Object* GCHeap::AllocLHeap(size_t size, DWORD flags);
  58
  59 Design of the Collector
  60 =======================
  61
  62 Goals of the GC
  63 ---------------
  64
  65 The GC strives to manage memory extremely efficiently and
  66 require very little effort from people who write "managed code". Efficient means:
  67
  68 - GCs should occur often enough to avoid the managed heap containing a significant amount (by ratio or absolute count) of unused but allocated objects (garbage), and therefore use memory unnecessarily.
  69 - GCs should happen as infrequently as possible to avoid using otherwise useful CPU time, even though frequent GCs would result in lower memory usage.
  70 - A GC should be productive. If GC reclaims a small amount of memory, then the GC (including the associated CPU cycles) was wasted.
  71 - Each GC should be fast. Many workloads have low latency requirements.
  72 - Managed code developers shouldn’t need to know much about the GC to achieve good memory utilization (relative to their workload).
  73 – The GC should tune itself to satisfy different memory usage patterns.
  74
  75 Logical representation of the managed heap
  76 ------------------------------------------
  77
  78 The CLR GC is a generational collector which means objects are
  79 logically divided into generations. When a generation _N_ is collected,
  80 the survived objects are now marked as belonging to generation _N+1_. This
  81 process is called promotion. There are exceptions to this when we
  82 decide to demote or not promote.
  83
  84 For small objects the heap is divided into 3 generations: gen0, gen1
  85 and gen2. For large objects there’s one generation – gen3. Gen0 and gen1 are referred to as ephemeral (objects lasting for a short time) generations.
  86
  87 For the small object heap, the generation number represents the age – gen0
  88 being the youngest generation. This doesn’t mean all objects in gen0
  89 are younger than any objects in gen1 or gen2. There are exceptions
  90 which will be explained below. Collecting a generation means collecting
  91 objects in that generation and all its younger generations.
  92
  93 In principle large objects can be handled the same way as small
  94 objects but since compacting large objects is very expensive, they are treated differently. There is only one generation for large objects and
  95 they are always collected with gen2 collections due to performance
  96 reasons. Both gen2 and gen3 can be big, and collecting ephemeral generations (gen0 and gen1) needs to have a bounded cost.
  97
  98 Allocations are made in the youngest generation – for small objects this means always gen0 and for large objects this means gen3 since there’s only one generation.
  99
 100 Physical representation of the managed heap
 101 -------------------------------------------
 102
 103 The managed heap is a set of managed heap segments. A heap segment is a contiguous block of memory that is acquired by the GC from the OS. The heap segments are
 104 partitioned into small and large object segments, given the distinction of small and large objects. On each heap the heap segments are chained together. There is at least one small object segment and one large segment - they are reserved when CLR is loaded.
 105
 106 There’s always only one ephemeral segment in each small object heap, which is where gen0 and gen1 live. This segment may or may not include gen2
 107 objects. In addition to the ephemeral segment, there can be zero, one or more additional segments, which will be gen2 segments since they only contain gen2 objects.
 108
 109 There are 1 or more segments on the large object heap.
 110
 111 A heap segment is consumed from the lower address to the higher
 112 address, which means objects of lower addresses on the segment are
 113 older than those of higher addresses. Again there are exceptions that
 114 will be described below.
 115
 116 Heap segments can be acquired as needed. They are  deleted when they
 117 don’t contain any live objects, however the initial segment on the heap
 118 will always exist. For each heap, one segment at a time is acquired,
 119 which is done during a GC for small objects and during allocation time
 120 for large objects. This design provides better performance because large objects are only collected with gen2 collections (which are relatively expensive).
 121
 122 Heap segments are chained together in order of when they were acquired. The last segment in the chain is always the ephemeral segment. Collected segments (no live objects) can be reused instead of deleted and instead become the new ephemeral segment. Segment reuse is only implemented for small object heap. Each time a large object is allocated, the whole large object heap is considered. Small object allocations only consider the ephemeral segment.
 123
 124 The allocation budget
 125 ---------------------
 126
 127 The allocation budget is a logical concept associated with each
 128 generation. It is a size limit that triggers a GC for that
 129 generation when exceeded.
 130
 131 The budget is a property set on the generation mostly based on the
 132 survival rate of that generation. If the survival rate is high, the budget is made larger with the expectation that there will be a better ratio of dead to live objects next time there is a GC for that generation.
 133
 134 Determining which generation to collect
 135 ---------------------------------------
 136
 137 When a GC is triggered, the GC must first determine which generation to collect. Besides the allocation budget there are other factors that must be considered:
 138
 139 - Fragmentation of a generation – if a generation has high fragmentation, collecting that generation is likely to be productive.
 140 - If the memory load on the machine is too high, the GC may collect
 141   more aggressively if that’s likely to yield free space. This is important to
 142   prevent unnecessary paging (across the machine).
 143 - If the ephemeral segment is running out of space, the GC may do more aggressive ephemeral collections (meaning doing more gen1’s) to avoid acquiring a new heap segment.
 144
 145 The flow of a GC
 146 ----------------
 147
 148 Mark phase
 149 ----------
 150
 151 The goal of the mark phase is to find all live objects.
 152
 153 The benefit of a generational collector is the ability to collect just part of
 154 the heap instead of having to look at all of the objects all the
 155 time. When  collecting the ephemeral generations, the GC needs to find out which objects are live in these generations, which is information reported by the EE. Besides the objects kept live by the EE, objects in older generations
 156 can also keep objects in younger generations live by making references
 157 to them.
 158
 159 The GC uses cards for the older generation marking. Cards are set by JIT
 160 helpers during assignment operations. If the JIT helper sees an
 161 object in the ephemeral range it will set the byte that contains the
 162 card representing the source location. During ephemeral collections, the GC can look at the set cards for the rest of the heap and only look at the objects that these cards correspond to.
 163
 164 Plan phase
 165 ---------
 166
 167 The plan phase simulates a compaction to determine the effective result. If compaction is productive the GC starts an actual compaction; otherwise it sweeps.
 168
 169 Relocate phase
 170 --------------
 171
 172 If the GC decides to compact, which will result in moving objects, then  references to these objects must be updated. The relocate phase needs to find all references that point to objects that are in the
 173 generations being collected. In contrast, the mark phase only consults live objects so it doesn’t need to consider weak references.
 174
 175 Compact phase
 176 -------------
 177
 178 This phase is very straight forward since the plan phase already
 179 calculated the new addresses the objects should move to. The compact
 180 phase will copy the objects there.
 181
 182 Sweep phase
 183 -----------
 184
 185 The sweep phase looks for the dead space in between live objects. It creates free objects in place of these dead spaces. Adjacent dead objects are made into one free object. It places all of these free objects onto the _freelist_.
 186
 187 Code Flow
 188 =========
 189
 190 Terms:
 191
 192 - **WKS GC:** Workstation GC
 193 - **SVR GC:** Server GC
 194
 195 Functional Behavior
 196 -------------------
 197
 198 ### WKS GC with concurrent GC off
 199
 200 1. User thread runs out of allocation budget and triggers a GC.
 201 2. GC calls SuspendEE to suspend managed threads.
 202 3. GC decides which generation to condemn.
 203 4. Mark phase runs.
 204 5. Plan phase runs and decides if a compacting GC should be done.
 205 6. If so relocate and compact phase runs. Otherwise, sweep phase runs.
 206 7. GC calls RestartEE to resume managed threads.
 207 8. User thread resumes running.
 208
 209 ### WKS GC with concurrent GC on
 210
 211 This illustrates how a background GC is done.
 212
 213 1. User thread runs out of allocation budget and triggers a GC.
 214 2. GC calls SuspendEE to suspend managed threads.
 215 3. GC decides if background GC should be run.
 216 4. If so background GC thread is woken up to do a background
 217    GC. Background GC thread calls RestartEE to resume managed threads.
 218 5. Managed threads continue allocating while the background GC does its work.
 219 6. User thread may run out of allocation budget and trigger an
 220    ephemeral GC (what we call a foreground GC). This is done in the same
 221    fashion as the "WKS GC with concurrent GC off" flavor.
 222 7. Background GC calls SuspendEE again to finish with marking and then
 223    calls RestartEE to start the concurrent sweep phase while user threads
 224    are running.
 225 8. Background GC is finished.
 226
 227 ### SVR GC with concurrent GC off
 228
 229 1. User thread runs out of allocation budget and triggers a GC.
 230 2. Server GC threads are woken up and call SuspendEE to suspend
 231    managed threads.
 232 3. Server GC threads do the GC work (same phases as in workstation GC
 233    without concurrent GC).
 234 4. Server GC threads call RestartEE to resume managed threads.
 235 5. User thread resumes running.
 236
 237 ### SVR GC with concurrent GC on
 238
 239 This scenario is the same as WKS GC with concurrent GC on, except the non background GCs are done on SVR GC threads.
 240
 241 Physical Architecture
 242 =====================
 243
 244 This section is meant to help you follow the code flow.
 245
 246 User thread runs out of quantum and gets a new quantum via try_allocate_more_space.
 247
 248 try_allocate_more_space calls GarbageCollectGeneration when it needs to trigger a GC.
 249
 250 Given WKS GC with concurrent GC off, GarbageCollectGeneration is done all
 251 on the user thread that triggerred the GC. The code flow is:
 252
 253      GarbageCollectGeneration()
 254      {
 255          SuspendEE();
 256          garbage_collect();
 257          RestartEE();
 258      }
 259
 260      garbage_collect()
 261      {
 262          generation_to_condemn();
 263          gc1();
 264      }
 265
 266      gc1()
 267      {
 268          mark_phase();
 269          plan_phase();
 270      }
 271
 272      plan_phase()
 273      {
 274          // actual plan phase work to decide to
 275          // compact or not
 276          if (compact)
 277          {
 278              relocate_phase();
 279              compact_phase();
 280          }
 281          else
 282              make_free_lists();
 283      }
 284
 285 Given WKS GC with concurrent GC on (default case), the code flow for a background GC is
 286
 287      GarbageCollectGeneration()
 288      {
 289          SuspendEE();
 290          garbage_collect();
 291          RestartEE();
 292      }
 293
 294      garbage_collect()
 295      {
 296          generation_to_condemn();
 297          // decide to do a background GC
 298          // wake up the background GC thread to do the work
 299          do_background_gc();
 300      }
 301
 302      do_background_gc()
 303      {
 304          init_background_gc();
 305          start_c_gc ();
 306
 307          //wait until restarted by the BGC.
 308          wait_to_proceed();
 309      }
 310
 311      bgc_thread_function()
 312      {
 313          while (1)
 314          {
 315              // wait on an event
 316              // wake up
 317              gc1();
 318          }
 319      }
 320
 321      gc1()
 322      {
 323          background_mark_phase();
 324          background_sweep();
 325      }
 326
 327 Resources
 328 =========
 329
 330 - [.NET CLR GC Implementation](https://raw.githubusercontent.com/dotnet/coreclr/master/src/gc/gc.cpp)
 331 - [The Garbage Collection Handbook: The Art of Automatic Memory Management](http://www.amazon.com/Garbage-Collection-Handbook-Management-Algorithms/dp/1420082795)
 332 - [Garbage collection (Wikipedia)](http://en.wikipedia.org/wiki/Garbage_collection_(computer_science))