Documentation/project-docs/jit-testing.md

   1 # JIT Testing
   2
   3 We would like to ensure that the CoreCLR contains sufficient test collateral
   4 and tooling to enable high-quality contributions to RyuJit or LLILC's JIT.
   5
   6 JIT testing is somewhat specialized and can't rely solely on the general
   7 framework tests or end to end application tests.
   8
   9 This document describes some of the work needed to bring JIT existing tests and
  10 technology into the CoreCLR, and touches on some areas as that open for
  11 innovation.
  12
  13 We expect to evolve this document into a road map for the overall JIT testing
  14 effort, and to spawn a set of issues in the CoreCLR and LLILC repos for
  15 implementing the needed capabilities.
  16
  17 ## Requirements and Assumptions
  18
  19 1. It must be easy to add new tests.
  20 2. Tests must execute with high throughput. We anticipate needing to run
  21 thousands of tests to provide baseline level testing for JIT changes.
  22 3. Tests should generally run on all supported/future chip architectures and
  23 all OS platforms.
  24 4. Tests must be partitionable so CI latency is tolerable (test latency goal
  25 TBD).
  26 5. Tests in CI can be run on private changes (currently tied to PRs; this may
  27 be sufficient).
  28 6. Test strategy harmonious with other .Net repo test strategies.
  29 7. Test harness behaves reasonably on test failure. Easy to get at repro steps
  30 for subsequent debugging.
  31 8. Tests must allow fine-grained inspection of JIT outputs, for instance
  32 comparing the generated code versus a baseline JIT.
  33 9. Tests must support collection of various quantitative measurements, eg time
  34 spent in the JIT, memory used by the JIT, etc.
  35 10. For now, JIT test assets belong in the CoreCLR repo.
  36 11. JIT tests use the same basic test xunit harness as existing CoreCLR tests.
  37 12. JIT special-needs testing will rely on extensions/hooks. Examples below.
  38
  39 ## Tasks
  40
  41 Below are some broad task areas that we should consider as part of this plan.
  42 It seems sensible for Microsoft to focus on opening up the JIT self-host
  43 (aka JITSH) tests first. A few other tasks are also Microsoft specific and are
  44 marked with (MS) below.
  45
  46 Other than that the priority, task list, and possibly assignments are open to
  47 discussion.
  48
  49 ### (MS) Bring up equivalent of the JITSH tests
  50
  51 JITSH is a set of roughly 8000 tests that have been traditionally used by
  52 Microsoft JIT developers as the frontline JIT test suite.
  53
  54 We'll need to subset these tests for various reasons:
  55
  56 1. Some have shallow desktop CLR dependence (e.g. missing cases in string
  57 formatting).
  58 2. Some have deep desktop CLR dependence (testing a desktop CLR feature that
  59 is not present in CoreCLR).
  60 3. Some require tools not yet available in CoreCLR (ilasm in particular).
  61 4. Some test windows features and won\92t be relevant to other OS platforms.
  62 5. Some tests may not be able to be freely redistributed.
  63
  64 We have done an internal inventory and identified roughly 1000 tests that
  65 should be straightforward to port into CoreCLR, and have already started in on
  66 moving these.
  67
  68 ### Test script capabilities
  69
  70 We need to ensure that the CoreCLR repo contains a suitably
  71 hookable test script. Core testing is driven by xunit but there\92s typically a
  72 wrapper around this (runtest.cmd today) to facilitate test execution.
  73
  74 The proposal is to implement platform-neutral variant of runtest.cmd that
  75 contains all the existing functionality plus some additional capabilities for
  76 JIT testing. Initially this will mean:
  77
  78 1. Ability to execute tests with a JIT specified by the user (either as alt
  79 JIT or as the only JIT)
  80 2. Ability to pass options through to the JIT (eg for dumping assembly or IR)
  81 or to the CoreCLR (eg to disable use of ngen images).
  82
  83 ### Cache prebuilt test assets
  84
  85 In general we want JIT tests to be built from sources. But given the volume
  86 of tests it can take a significant amount of time to compile those sources into
  87 assemblies. This in turn slows down the ability to test the JIT.
  88
  89 Given the volume of tests, we might reach a point where the default CoreCLR
  90 build does not build all the tests.
  91
  92 So it would be good if there was a regularly scheduled build of CoreCLR that
  93 would prebuild a matching set of tests and make them available.
  94
  95 ### Round out JITSH suite, filling in missing pieces
  96
  97 We need some way to run ILASM. Some suggestions here are to port the existing
  98 ILASM or find some equivalent we could run instead. We could also prebuild
  99 IL based tests and deploy as a package. Around 2400 JITSH tests are blocked by
 100 this.
 101
 102 There are also some VB tests which presumably can be brought over now that VB
 103 projects can build.
 104
 105 Native/interop tests may or may not require platform-specific adaption.
 106
 107 ### (MS) Port the devBVT tests.
 108
 109 devBVT is a broader part of CLR SelfHost that is useful for second-tier testing.
 110 Not yet clear what porting this entails.
 111
 112 ### Leverage peer repo test suites.
 113
 114 We should be able to directly leverage tests provided in peer repo suites, once
 115 they can run on top of CoreCLR. In particular CoreFx and Roslyn test cases
 116 could be good initial targets.
 117
 118 Note LLILC is currently working through the remaining issues that prevent it
 119 from being able to compile all of Roslyn. See the "needed for Roslyn" tags
 120 on the open LLILC issues.
 121
 122 ### Look for other CoreCLR hosted projects.
 123
 124 Similar to the above, as other projects are able to host on CoreCLR we can
 125 potentially use their tests for JIT testing.
 126
 127 ### Porting of existing suites/tools over to our repos.
 128
 129 Tools developed to test JVM Jits might be interesting to port over to .Net.
 130 Suggestions for best practices or effective techniques are welcome.
 131
 132 ### Bring up quantitative measurements.
 133
 134 For Jit testing we'll need various quantitatve assessments of Jit behavior:
 135
 136 1. Time spent jitting
 137 2. Speed of jitted code
 138 3. Size of jitted code
 139 4. Memory utilization by the jit (+ leak detection)
 140 5. Debug info fidelity
 141 6. Coverage ?
 142
 143 There will likely be work going on elsewhere to address some of these same
 144 measurement capabilities, so we should make sure to keep it all in sync.
 145
 146 ### Bring up alternate codegen capabilities.
 147
 148 For LLILC, implementing support for crossgen would provide the ability to drive
 149 lots of IL through the JIT. There is enough similarity between the JIT and
 150 crossgen paths that this would likely surface issues in both.
 151
 152 Alternatively one can imagine simple test drivers that load up assemblies and
 153 use reflection to enumerate methods and asks for method bodies to force the JIT
 154 to generate code for all the methods.
 155
 156 ### Bring up stress testing
 157
 158 The value of existing test assets can be leveraged through various stress
 159 testing modes. These modes use non-standard code generation or runtime
 160 mechanisms to try an flush out bugs.
 161
 162 1. GC stress. Here the runtime will GC with much higher frequency in an attempt
 163 to maximize the dependence on the GC info reported by the JIT.
 164 2. Internal modes in the JIT to try and flush out bugs, eg randomized inlining,
 165 register allocation stress, volatile stress, randomized block layout, etc.
 166
 167 ### Bring up custom testing frameworks and tools.
 168
 169 We should invest in things like random program or IL generation tools.