From: Ruiling Song Date: Thu, 19 Jun 2014 07:20:54 +0000 (+0800) Subject: update docs on environment variables. X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=6bd3f96e41983c7189c967bde007c4e26c7c44a7;p=contrib%2Fbeignet.git update docs on environment variables. Signed-off-by: Ruiling Song Reviewed-by: Zhigang Gong --- diff --git a/docs/Beignet.mdwn b/docs/Beignet.mdwn index 7870c12..1a56a6f 100644 --- a/docs/Beignet.mdwn +++ b/docs/Beignet.mdwn @@ -142,6 +142,14 @@ The code was tested on IVB GT2 with ubuntu and fedora core distribution. The rec kernel version is equal or newer than 3.11. Currently Only IVB is supported right now. Actually, the code was run on IVB GT2/GT1, and both system are well supported now. +Math Function precision +----------------------- + +Currently Gen does not provide native support of high precision math functions +required by OpenCL. We provide a software version to achieve high precision, +which you can turn on through `export OCL_STRICT_CONFORMANCE=1`. +But be careful, this would make your CL kernel run a little longer. + TODO ---- diff --git a/docs/Beignet/Backend.mdwn b/docs/Beignet/Backend.mdwn index 99d678e..be6081b 100644 --- a/docs/Beignet/Backend.mdwn +++ b/docs/Beignet/Backend.mdwn @@ -30,7 +30,17 @@ Various environment variables Environment variables are used all over the code. Most important ones are: -- `OCL_SIMD_WIDTH` `(8 or 16)`. Change the number of lanes per hardware thread +- `OCL_STRICT_CONFORMANCE` `(0 or 1)`. Gen does not provide native high + precision math instructions compliant with OpenCL Spec. So we provide a + software version to meet the high precision requirement. Obviously the + software version's performance is not as good as native version supported by + GEN hardware. What's more, most graphics application don't need this high + precision, so we choose 0 as the default value. So OpenCL apps do not suffer + the performance penalty for using high precision math functions. + +- `OCL_SIMD_WIDTH` `(8 or 16)`. Select the number of lanes per hardware thread, + Normally, you don't need to set it, we will select suitable simd width for + a given kernel. Default value is 16. - `OCL_OUTPUT_GEN_IR` `(0 or 1)`. Output Gen IR (scalar intermediate representation) code @@ -42,7 +52,35 @@ Environment variables are used all over the code. Most important ones are: - `OCL_OUTPUT_ASM` `(0 or 1)`. Output Gen ISA -- `OCL_OUTPUT_REG_ALLOC` `(0 or 1)`. Output Gen register allocations +- `OCL_OUTPUT_REG_ALLOC` `(0 or 1)`. Output Gen register allocations, including + virtual register to physical register mapping, live ranges. + +- `OCL_OUTPUT_BUILD_LOG` `(0 or 1)`. Output error messages if there is any + during CL kernel compiling and linking. + +- `OCL_OUTPUT_CFG` `(0 or 1)`. Output control flow graph in .dot file. + +- `OCL_OUTPUT_CFG_ONLY` `(0 or 1)`. Output control flow graph in .dot file, + but without instructions in each BasicBlock. + +- `OCL_PRE_ALLOC_INSN_SCHEDULE` `(0 or 1)`. The instruction scheduler in + beignet are currently splitted into two passes: before and after register + allocation. The pre-alloc scheduler tend to decrease register pressure. + This variable is used to disable/enable pre-alloc scheduler. This pass is + disabled now for some bugs. + +- `OCL_POST_ALLOC_INSN_SCHEDULE` `(0 or 1)`. Disable/enable post-alloc + instruction scheduler. The post-alloc scheduler tend to reduce instruction + latency. By default, this is enabled now. + +- `OCL_SIMD16_SPILL_THRESHOLD` `(0 to 256)`. Tune how much registers can be + spilled under SIMD16. Default value is 16. We find spill too much register + under SIMD16 is not as good as fall back to SIMD8 mode. So we set the + variable to control spilled register number under SIMD16. + +- `OCL_USE_PCH` `(0 or 1)`. The default value is 1. If it is enabled, we use + a pre compiled header file which include all basic ocl headers. This would + reduce the compile time. Implementation details ----------------------