kernel version is equal or newer than 3.11. Currently Only IVB is supported right now.
Actually, the code was run on IVB GT2/GT1, and both system are well supported now.
+Math Function precision
+-----------------------
+
+Currently Gen does not provide native support of high precision math functions
+required by OpenCL. We provide a software version to achieve high precision,
+which you can turn on through `export OCL_STRICT_CONFORMANCE=1`.
+But be careful, this would make your CL kernel run a little longer.
+
TODO
----
Environment variables are used all over the code. Most important ones are:
-- `OCL_SIMD_WIDTH` `(8 or 16)`. Change the number of lanes per hardware thread
+- `OCL_STRICT_CONFORMANCE` `(0 or 1)`. Gen does not provide native high
+ precision math instructions compliant with OpenCL Spec. So we provide a
+ software version to meet the high precision requirement. Obviously the
+ software version's performance is not as good as native version supported by
+ GEN hardware. What's more, most graphics application don't need this high
+ precision, so we choose 0 as the default value. So OpenCL apps do not suffer
+ the performance penalty for using high precision math functions.
+
+- `OCL_SIMD_WIDTH` `(8 or 16)`. Select the number of lanes per hardware thread,
+ Normally, you don't need to set it, we will select suitable simd width for
+ a given kernel. Default value is 16.
- `OCL_OUTPUT_GEN_IR` `(0 or 1)`. Output Gen IR (scalar intermediate
representation) code
- `OCL_OUTPUT_ASM` `(0 or 1)`. Output Gen ISA
-- `OCL_OUTPUT_REG_ALLOC` `(0 or 1)`. Output Gen register allocations
+- `OCL_OUTPUT_REG_ALLOC` `(0 or 1)`. Output Gen register allocations, including
+ virtual register to physical register mapping, live ranges.
+
+- `OCL_OUTPUT_BUILD_LOG` `(0 or 1)`. Output error messages if there is any
+ during CL kernel compiling and linking.
+
+- `OCL_OUTPUT_CFG` `(0 or 1)`. Output control flow graph in .dot file.
+
+- `OCL_OUTPUT_CFG_ONLY` `(0 or 1)`. Output control flow graph in .dot file,
+ but without instructions in each BasicBlock.
+
+- `OCL_PRE_ALLOC_INSN_SCHEDULE` `(0 or 1)`. The instruction scheduler in
+ beignet are currently splitted into two passes: before and after register
+ allocation. The pre-alloc scheduler tend to decrease register pressure.
+ This variable is used to disable/enable pre-alloc scheduler. This pass is
+ disabled now for some bugs.
+
+- `OCL_POST_ALLOC_INSN_SCHEDULE` `(0 or 1)`. Disable/enable post-alloc
+ instruction scheduler. The post-alloc scheduler tend to reduce instruction
+ latency. By default, this is enabled now.
+
+- `OCL_SIMD16_SPILL_THRESHOLD` `(0 to 256)`. Tune how much registers can be
+ spilled under SIMD16. Default value is 16. We find spill too much register
+ under SIMD16 is not as good as fall back to SIMD8 mode. So we set the
+ variable to control spilled register number under SIMD16.
+
+- `OCL_USE_PCH` `(0 or 1)`. The default value is 1. If it is enabled, we use
+ a pre compiled header file which include all basic ocl headers. This would
+ reduce the compile time.
Implementation details
----------------------