contrib/beignet.git
10 years agoBDW: Add BDW Device id to gen binary generater and binary serialize in backend.
Yang Rong [Mon, 29 Sep 2014 05:38:35 +0000 (13:38 +0800)]
BDW: Add BDW Device id to gen binary generater and binary serialize in backend.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: BDW don't need add slm offset, remove it.
Yang Rong [Mon, 29 Sep 2014 05:38:34 +0000 (13:38 +0800)]
BDW: BDW don't need add slm offset, remove it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Refine BDW's int 32*32 multiply.
Yang Rong [Mon, 29 Sep 2014 05:38:33 +0000 (13:38 +0800)]
BDW: Refine BDW's int 32*32 multiply.

BDW support int32 * int32 directly. So add a flag to selection for it.
BDW use int32*int16 when use acc. Because int32*int16 also work in IVB,
change to int32*int16 when use acc.
Need refine int32*int32 to long later.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Fix unsample bug.
Yang Rong [Mon, 29 Sep 2014 05:38:32 +0000 (13:38 +0800)]
BDW: Fix unsample bug.

When set the hstride to 2, also need set vstride to 16.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: enable SLM in BDW.
Yang Rong [Mon, 29 Sep 2014 05:38:31 +0000 (13:38 +0800)]
BDW: enable SLM in BDW.

BDW's SLM control register change to L3CNTLREG, offset is 0x7034.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Fix Pointer argument curbe alloce size.
Yang Rong [Mon, 29 Sep 2014 05:38:30 +0000 (13:38 +0800)]
BDW: Fix Pointer argument curbe alloce size.

Because kernel will write 64bits address when reloc, so when reloc argument
in the curbe bo, the pointer need 8 byte curbe.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: add some BDW function.
Yang Rong [Mon, 29 Sep 2014 05:38:12 +0000 (13:38 +0800)]
BDW: add some BDW function.

Add intel_gpgpu_load_vfe_state_gen8, intel_gpgpu_walker_gen8, intel_gpgpu_build_idrt_gen8.
Reloc Dynamic State Base Address in gen7's intel_gpgpu_set_base_address, to unify intel_gpgpu_load_curbe_buffer
and intel_gpgpu_load_idrt.
Now can pass part of utest builtin_global_id.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Refine intel_gpgpu_setup_bti and add intel_gpgpu_set_base_address for BDW.
Yang Rong [Mon, 29 Sep 2014 05:38:11 +0000 (13:38 +0800)]
BDW: Refine intel_gpgpu_setup_bti and add intel_gpgpu_set_base_address for BDW.

Because the sizeof struct surface state change in BDW, remove gen6_surface_state, and
use gen_surface_state as the unoin of gen7_surface_state and gen8_surface_state.
Use gen_surface_state in surface_heap_t.
Reloc the Dynamic State Base and Instruction Address in intel_gpgpu_set_base_address_gen8.
BDW use 48 bits GPU address, so when reloc address, remember that kernel will reloc 64 bits in
command batch, so make sure there are 64 bits address, the high 64bits follow by low 32bits in command batch.

v2:
remove binary .swp file.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Pass Jip and Uip when patchJMPI.
Yang Rong [Mon, 29 Sep 2014 05:38:10 +0000 (13:38 +0800)]
BDW: Pass Jip and Uip when patchJMPI.

Do not like GEN7, BDW's Jip is in bits4 and Uip is in bits3, so should set Jip
and Uip independently.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Add function intel_gpgpu_bind_buf for gen8.
Junyan He [Mon, 29 Sep 2014 05:37:49 +0000 (13:37 +0800)]
BDW: Add function intel_gpgpu_bind_buf for gen8.

Must call cl_bind_buf instead of intel_gpgpu_bind_buf directly in intel_gpgpu.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Correct surface base address set in setup bti.
Junyan He [Mon, 29 Sep 2014 05:37:48 +0000 (13:37 +0800)]
BDW: Correct surface base address set in setup bti.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Add function intel_gpgpu_setup_bti for gen8.
Junyan He [Mon, 29 Sep 2014 05:37:47 +0000 (13:37 +0800)]
BDW: Add function intel_gpgpu_setup_bti for gen8.

Also set the correct surface cache control.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: refine the gen8_surface_state_t.
Junyan He [Mon, 29 Sep 2014 05:37:46 +0000 (13:37 +0800)]
BDW: refine the gen8_surface_state_t.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoBDW: Add gen8 surface state struct.
Junyan He [Mon, 29 Sep 2014 05:37:45 +0000 (13:37 +0800)]
BDW: Add gen8 surface state struct.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoBDW: Add class Gen8Context.
Yang Rong [Mon, 29 Sep 2014 05:37:19 +0000 (13:37 +0800)]
BDW: Add class Gen8Context.

Now Gen8Context is almost same as Gen75Context, but still derive Gen8Context from GenContext for clearly.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Add Gen8Encoder and Gen7Encoder.
Yang Rong [Mon, 29 Sep 2014 05:37:18 +0000 (13:37 +0800)]
BDW: Add Gen8Encoder and Gen7Encoder.

Class Gen8Encoder and Gen7Encoder derive from GenEncoder, and Gen75Encoder derive from Gen7Encode.
GenNativeInstruction is handled in class GenEncoder, Gen7NativeInstruction is handled in class
Gen7Encoder and Gen75Encoder, and Gen8NativeInstruction is handled in classe Gen8Encoder.
Disable Gen8's instruction compact temporary, should add compact and disassemble later.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Add BDW instruction define.
Yang Rong [Mon, 29 Sep 2014 05:37:17 +0000 (13:37 +0800)]
BDW: Add BDW instruction define.

Seperate GEN7 instruction and GEN8 instrunction. GenNativeInstruction will become a union of
Gen7NativeInstruction and Gen8NativeInstruction.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoBDW: Add BDW pci ids and BDW device struct.
Yang Rong [Mon, 29 Sep 2014 05:37:16 +0000 (13:37 +0800)]
BDW: Add BDW pci ids and BDW device struct.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoAvoid use GenNativeInstruction directly out of GenEncode and gen_insn_compact.
Yang Rong [Mon, 29 Sep 2014 05:37:15 +0000 (13:37 +0800)]
Avoid use GenNativeInstruction directly out of GenEncode and gen_insn_compact.

Use the void* instead of when do instruction compact/decompact.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoGBE: structurized loop exit need an extra branching instruction when do reordering.
Zhigang Gong [Tue, 23 Sep 2014 06:15:46 +0000 (14:15 +0800)]
GBE: structurized loop exit need an extra branching instruction when do reordering.

When we want to reorder the BBs and move the unstructured BB out-of the
structured block, we need to add a BRA to the block. If the exit of the
structured block is a loop, we need to append a unconditional BRA right
after the predicated BRA. Otherwise, we may lost the correct successor
if an unstructured BB is moved next to this BB.

After this patch, with loop optimization enabled, there is no regression
on both utests and piglit. But there are still a few regressions in opencv
test suite:
[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (40041 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] OCL_Photo/FastNlMeansDenoising.Mat/2, where GetParam() = (Channels(2), false)
[  FAILED  ] OCL_Photo/FastNlMeansDenoising.Mat/3, where GetParam() = (Channels(2), true)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/0, where GetParam() = (Channels(3), false)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/1, where GetParam() = (Channels(3), true)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/2, where GetParam() = (Channels(4), false)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/3, where GetParam() = (Channels(4), true)

So let's keep this optimizaion disabled. Will enable it when I fixed all
the known issues.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Luo <xionghu.luo@intel.com>
10 years agoGBE: fix a loop header file including bug.
Zhigang Gong [Fri, 19 Sep 2014 01:00:11 +0000 (09:00 +0800)]
GBE: fix a loop header file including bug.

function.hpp doesn't need to include the structural_analysis.hpp.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Luo <xionghu.luo@intel.com>
10 years agoUse instruction WHILE to manipulate structure.
Luo Xionghu [Mon, 15 Sep 2014 00:23:39 +0000 (08:23 +0800)]
Use instruction WHILE to manipulate structure.

1. WHILE instruction should be non-schedulable.
2. if this WHILE instruction jumps to an ELSE instruction, the distance
need add 2.

v2:
We also need to take care of HSW for while instruction.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoadd handleSelfLoopNode to insert while instruction on Gen IR level.
Luo Xionghu [Mon, 15 Sep 2014 00:23:38 +0000 (08:23 +0800)]
add handleSelfLoopNode to insert while instruction on Gen IR level.

v2:
disable loop optimization by default due to still buggy.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd Gen IR WHILE.
Luo Xionghu [Mon, 15 Sep 2014 00:23:37 +0000 (08:23 +0800)]
Add Gen IR WHILE.

Add Gen IR WHILE to mark the strucutred region.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp.
Ruiling Song [Thu, 18 Sep 2014 06:42:01 +0000 (14:42 +0800)]
GBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp.

Gen provide tm0 register for intra-kernel profiling.
Here we provide an API __gen_ocl_get_timestamp() to return
the timestamp in TM.

The return type is defined as:
struct time_stamp {
  ulong tick;
  uint event;
};

'tick' is a 64bit time tick. 'event' stores a value which means
whether a tmEvent has occured (non-zero) or not (0). tmEvent includes
time-impacting event such as context switch or frequency change
since last time tm0 was read.

I add a sample in the kernels/compiler_time_stamp.cl. Hope it
would help you understand how to use it.

V2:
Introduce ir::ARFRegister to avoid directly use of nr/subnr in Gen IR.
Rename __gen_ocl_extract_reg to __gen_ocl_region.
Rename beignet_get_time_stamp to __gen_ocl_get_timestamp.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE/libocl: fix build dependency issue.
Zhigang Gong [Thu, 18 Sep 2014 00:33:46 +0000 (08:33 +0800)]
GBE/libocl: fix build dependency issue.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoAdd long support for printf
Junyan He [Thu, 18 Sep 2014 04:39:15 +0000 (12:39 +0800)]
Add long support for printf

V2:
    Replace all the long and ulong to int64_t

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Output linkModules's error message.
Ruiling Song [Wed, 17 Sep 2014 03:33:49 +0000 (11:33 +0800)]
GBE: Output linkModules's error message.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agofix utest memory leak.
Luo Xionghu [Tue, 16 Sep 2014 21:58:17 +0000 (05:58 +0800)]
fix utest memory leak.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix one bug at cl_get_kernel_workgroup_info.
Luo Xionghu [Tue, 16 Sep 2014 21:58:17 +0000 (05:58 +0800)]
fix one bug at cl_get_kernel_workgroup_info.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRevert "improve the build performance of vector type built-in function."
Zhigang Gong [Wed, 17 Sep 2014 03:49:30 +0000 (11:49 +0800)]
Revert "improve the build performance of vector type built-in function."

This patch still has to be pending to fix the wide integer issue completely.
Although we have a fallback mechanism which will try to build the module again
by ignoring some passes to avoid the wide integer issue, it's broken now on
master branch. As now all the builtin functions have been built statically,
and those bitcode may already have i128/i512 etc.

This reverts commit 565d1eb00d9a5219c2848b3674e40ac07cb48b89.

10 years agoimprove the build performance of vector type built-in function.
Luo Xionghu [Tue, 16 Sep 2014 03:24:48 +0000 (11:24 +0800)]
improve the build performance of vector type built-in function.

this patch was lost during the libocl merge. resubmit it to improve the
vector function performance.

please refer to e2db890596eea0a6eb741e11e576a38952f1ed1e for detail.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoremove the LinkOnceAnyLinkage since the libocl is introduced.
Luo Xionghu [Tue, 16 Sep 2014 01:40:09 +0000 (09:40 +0800)]
remove the LinkOnceAnyLinkage since the libocl is introduced.

no need to set the LinkOnceAnyLinkage for global variables and functions
to avoid redefinition.

v2:
also enable the VerifierPass.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoFix the bug of LLVM_LFLAGS fail to set
Junyan He [Tue, 16 Sep 2014 03:12:10 +0000 (11:12 +0800)]
Fix the bug of LLVM_LFLAGS fail to set

The LLVM_LFLAGS is used before finding the LLVM package,
which causes the CMake fails to set correct -L flags and
cause linkage error.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE/libocl: fix a regression after libocl change.
Zhigang Gong [Fri, 12 Sep 2014 09:38:06 +0000 (17:38 +0800)]
GBE/libocl: fix a regression after libocl change.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoGBE/libocl: add missing vector builtin definition for fma.
Zhigang Gong [Fri, 12 Sep 2014 09:18:16 +0000 (17:18 +0800)]
GBE/libocl: add missing vector builtin definition for fma.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoModify the CMakeList to use the internal PCH first.
Junyan He [Mon, 15 Sep 2014 08:04:10 +0000 (16:04 +0800)]
Modify the CMakeList to use the internal PCH first.

Because we delete the validation of the PCH file, sometimes
the PCH in the system dir is not compatible with the clang
and cause crash.
Anytime, we need to use internal PCH when compiling.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUpdate NEWS.
Zhigang Gong [Mon, 15 Sep 2014 08:13:37 +0000 (16:13 +0800)]
Update NEWS.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRemove out-of-date document.
Zhigang Gong [Mon, 15 Sep 2014 06:45:24 +0000 (14:45 +0800)]
Remove out-of-date document.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE/libocl: Fix sub_sat corner case.
Ruiling Song [Mon, 15 Sep 2014 03:14:05 +0000 (11:14 +0800)]
GBE/libocl: Fix sub_sat corner case.

It seems that hw return wrong result when y is equal to 0x80000000
in sub_sat(int x, int y). So we re-write it as:
add_sat(add_sat(0x7fffffff, x), 1)

Also enable corresponding utest.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix bin/cl-program-tester tests/cl/program/execute/attributes.cl regression.
Luo Xionghu [Sun, 14 Sep 2014 22:33:12 +0000 (06:33 +0800)]
fix bin/cl-program-tester tests/cl/program/execute/attributes.cl regression.

work_group_size_hint should define another variable.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUpdate readme.
Zhigang Gong [Mon, 15 Sep 2014 02:21:18 +0000 (10:21 +0800)]
Update readme.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoEnable ICC and CLANG compiler for beignet
Lv Meng [Fri, 22 Aug 2014 08:26:37 +0000 (16:26 +0800)]
Enable ICC and CLANG compiler for beignet

the 'COMPILER' is to choose the detail compiler,the default is GCC.

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix multiple files compilation bugs.
Zhigang Gong [Fri, 12 Sep 2014 05:45:40 +0000 (13:45 +0800)]
GBE: fix multiple files compilation bugs.

If we want to link multiple files together, and one kernel
function need refer other kernel functions in other files,
we must not set those functions as linked once attribute.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agofix piglit get kernel info FUNCTION ATTRIBUTE fail.
Luo [Fri, 12 Sep 2014 03:53:41 +0000 (11:53 +0800)]
fix piglit get kernel info FUNCTION ATTRIBUTE fail.

the backend need return the kernel FUNCTION ATTRIBUTE message to the
clGetKernelInfo.
there are 3 kind of function attribute so far, vec_type_hint parameter
is not available to return due to llvm lack of such info.

Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: fix build status handling.
Zhigang Gong [Fri, 12 Sep 2014 06:29:23 +0000 (14:29 +0800)]
runtime: fix build status handling.

According to the spec:
The build status is to
  Returns the build, compile or link status,
  whichever was performed last on program for
  device.

The previous implementation only consider the clProgramBuild and
doesn't consider the compile. Now fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoruntime: fix program binary type bug.
Zhigang Gong [Fri, 12 Sep 2014 05:47:25 +0000 (13:47 +0800)]
runtime: fix program binary type bug.

If the binary is a executable type, the first byte is zero and
we need to set the binary type correctly to CL_PROGRAM_BINARY_TYPE_EXECUTABLE.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoUpdate license disclaimer.
Yang Rong [Fri, 5 Sep 2014 02:25:44 +0000 (10:25 +0800)]
Update license disclaimer.

LunarGLASS have update his copyright, so update the copyright in llvm_scalarize.cpp.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: don't enable double by default.
Zhigang Gong [Thu, 11 Sep 2014 07:04:44 +0000 (15:04 +0800)]
GBE: don't enable double by default.

Actually, we don't support double completely currently.
Let's disable it now. This bring a little incompatible point with the 1.2 spec
which doesn't require the kernel to use the following pragma to enable fp64.
 #pragma OPENCL EXTENSION cl_khr_fp64 : enable

If the application wants to try the partially supported double with beignet
under opencl 1.2, the application will still need to add the above pragma.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix a potential memory leak bug.
Zhigang Gong [Thu, 11 Sep 2014 05:45:33 +0000 (13:45 +0800)]
GBE: fix a potential memory leak bug.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Fix a potential segfault.
Zhigang Gong [Thu, 11 Sep 2014 05:44:16 +0000 (13:44 +0800)]
GBE: Fix a potential segfault.

And when we fail to compile a module, the fileName may be NULL, we can't
access it unconditionally.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: don't return error if we get an empty module.
Zhigang Gong [Thu, 11 Sep 2014 05:43:29 +0000 (13:43 +0800)]
GBE: don't return error if we get an empty module.

When compile a empty string, we may get an empty module which is not
an error.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix piglit cl-api-set-kernel-arg fail.
Luo Xionghu [Thu, 11 Sep 2014 00:43:54 +0000 (08:43 +0800)]
fix piglit cl-api-set-kernel-arg fail.

the memory object should be checked whether valid in context buffers before being set as kernel arguments.

v2: rename the function from mem_in_buffers to is_valid_mem, move the
magic header check into it.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix clGetKernelWorkGroupInfo built-in kernel fail.
Luo Xionghu [Wed, 10 Sep 2014 03:31:32 +0000 (11:31 +0800)]
fix clGetKernelWorkGroupInfo built-in kernel fail.

add CL_KERNEL_GLOBAL_WORK_SIZE option for clGetKernelWorkGroupInfo.

v2: should return the max global work size instead of current work size.
This funtion need return CL_INVALID_VALUE if the device is not a custom
device or kernel is not a built-in kernel.
we have 3 kind of built-in kernels for 1d/2d/3d memories, the max global
work size are decided by the dimension and memory type.
the piglit fail is caused by calling NON built-in kernels, so need send
patch to piglit later.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE/libocl: Added one missing prototype fma().
Zhigang Gong [Thu, 11 Sep 2014 02:33:07 +0000 (10:33 +0800)]
GBE/libocl: Added one missing prototype fma().

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix bugs when handling -cl-std option.
Zhigang Gong [Thu, 11 Sep 2014 01:51:19 +0000 (09:51 +0800)]
GBE: fix bugs when handling -cl-std option.

Actually, CLANG does take this option and we should not
filter it out. We also change the default option to create
PCH file to -cl-std=CL1.2. And if the user pass in a CL1.1
we will have to disable PCH.

Another change is that if we are CL1.2, then we should enable
the cl_khr_fp64 by default. As from CL1.2, this extension should
be enabled by default.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the switch logic for math conformance fast path
Junyan He [Wed, 10 Sep 2014 08:10:28 +0000 (16:10 +0800)]
Add the switch logic for math conformance fast path

Modify the __ocl_math_fastpath_flag init value in the
backend link stage to switch between fast path and
conformance path.

V2:
    Rename the function prototype parameter name.

V3:
    Modify the parameter to boolean and correct some comment words.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE/libocl: fix the wrong prototype of scalar native_powr.
Zhigang Gong [Wed, 10 Sep 2014 08:21:12 +0000 (16:21 +0800)]
GBE/libocl: fix the wrong prototype of scalar native_powr.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoFix the issue of -cl-std=CLX.X option.
Junyan He [Wed, 10 Sep 2014 07:39:41 +0000 (15:39 +0800)]
Fix the issue of -cl-std=CLX.X option.

The -cl-std= will specify the least version to compile
the source code providing to our API. So we need to
check it early, and return failure if our platform's
version can not meet the request. In the backend, we
just ignore this cmd line option.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRuntime: Implement clGetExtensionFunctionAddressForPlatform.
Zhigang Gong [Sat, 30 Aug 2014 08:34:44 +0000 (16:34 +0800)]
Runtime: Implement clGetExtensionFunctionAddressForPlatform.

It seems that this function is required by latest PyOpenCL.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd copyright header for all libocl files.
Junyan He [Thu, 4 Sep 2014 14:53:00 +0000 (22:53 +0800)]
Add copyright header for all libocl files.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse ${PYTHON_EXECUTABLE} to run python scripts.
Yichao Yu [Wed, 10 Sep 2014 00:31:21 +0000 (20:31 -0400)]
Use ${PYTHON_EXECUTABLE} to run python scripts.

Signed-off-by: Yichao Yu <yyc1992@gmail.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoUpdate README for the command parser in drm kernel.
Yang Rong [Fri, 5 Sep 2014 03:25:05 +0000 (11:25 +0800)]
Update README for the command parser in drm kernel.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix piglit cl-api-get-program-info fail.
Luo Xionghu [Mon, 8 Sep 2014 22:42:07 +0000 (06:42 +0800)]
fix piglit cl-api-get-program-info fail.

add pointer check.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd uncompatible PCH Options to avoid compiling failure.
Junyan He [Fri, 5 Sep 2014 08:54:27 +0000 (16:54 +0800)]
Add uncompatible PCH Options to avoid compiling failure.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fallback if we get a wider than i64 constant.
Zhigang Gong [Fri, 5 Sep 2014 08:33:44 +0000 (16:33 +0800)]
GBE: fallback if we get a wider than i64 constant.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
10 years agoGBE: fix a bug with LLVM 3.3.
Zhigang Gong [Fri, 5 Sep 2014 08:19:30 +0000 (16:19 +0800)]
GBE: fix a bug with LLVM 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
10 years agoAdd the missing function prototypes of any() and atom_add()
Junyan He [Fri, 5 Sep 2014 08:27:30 +0000 (16:27 +0800)]
Add the missing function prototypes of any() and atom_add()

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: avoid one optimization pass to generate wide integer.
Zhigang Gong [Fri, 5 Sep 2014 06:04:38 +0000 (14:04 +0800)]
GBE: avoid one optimization pass to generate wide integer.

Integer type wider than 64 bit is hard to handle on Gen.
Let's try to prevent ScalarReplAggregates pass to generate
such type of integer.

v2:
fix compilation error with LLVM 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: remove the user defined macro cl_khr_fp64.
Zhigang Gong [Thu, 4 Sep 2014 04:30:44 +0000 (12:30 +0800)]
GBE: remove the user defined macro cl_khr_fp64.

This is not a predefined macro according to the spec. Let's not
define it by default. This patch also disable the fp64 when enter
user kernels.

v2:
Some internal .cl files require cl_khr_fp64 enabled. Fixed that issue
by move the enable macro to ocl_types.h.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoDelete all the unused files of old huge header.
Junyan He [Mon, 1 Sep 2014 02:22:54 +0000 (10:22 +0800)]
Delete all the unused files of old huge header.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse the PCH to accelerate the parsing speed of the ocl.h
Junyan He [Mon, 1 Sep 2014 07:28:18 +0000 (15:28 +0800)]
Use the PCH to accelerate the parsing speed of the ocl.h

We disable the valid check for the PCH to avoid path
and modified time check, which brings us some trouble.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoEnable libocl and disable the usage of the old huge header.
Junyan He [Mon, 1 Sep 2014 07:28:02 +0000 (15:28 +0800)]
Enable libocl and disable the usage of the old huge header.

The llvm ir print out is modified.
From the OCL_OUTPUT_LLVM_BEFORE_EXTRA_PASS and
OCL_OUTPUT_LLVM, we change to
OCL_OUTPUT_LLVM_BEFORE_LINK
OCL_OUTPUT_LLVM_AFTER_LINK
OCL_OUTPUT_LLVM_AFTER_GEN
The first one print out the IR before link the bitcode lib.
The second one print out the IR result after linking.
Then last one print out the IR after gen translating.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoAdd the bit code linker into the module pass.
Junyan He [Mon, 1 Sep 2014 09:16:22 +0000 (17:16 +0800)]
Add the bit code linker into the module pass.

The bit code linker will load the beignet.bc as a
lib module and link the module of the kernel together.
Then we will filter out all the dead bit code by create
an InternalizePass for the module.
After this stage, the ir will include the bitcode just
used by the cl kernel.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoAdd memcpy, memset and barrier bitcode files into libocl
Junyan He [Mon, 1 Sep 2014 02:19:42 +0000 (10:19 +0800)]
Add memcpy, memset and barrier bitcode files into libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the ocl_defines header file into libocl
Junyan He [Mon, 1 Sep 2014 02:16:17 +0000 (10:16 +0800)]
Add the ocl_defines header file into libocl

This file will be used to define some common defines
for both CL and the backend source code.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the relational module into libocl as template
Junyan He [Mon, 1 Sep 2014 02:15:35 +0000 (10:15 +0800)]
Add the relational module into libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the math function into libocl as template
Junyan He [Mon, 1 Sep 2014 02:14:45 +0000 (10:14 +0800)]
Add the math function into libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the integer module into libocl as template
Junyan He [Mon, 1 Sep 2014 02:13:04 +0000 (10:13 +0800)]
Add the integer module into libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the common module into the libocl as template
Junyan He [Mon, 1 Sep 2014 02:12:42 +0000 (10:12 +0800)]
Add the common module into the libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the gen_vector script into the libocl
Junyan He [Mon, 1 Sep 2014 02:12:26 +0000 (10:12 +0800)]
Add the gen_vector script into the libocl

This script will genenrate function defines and function
prototypes for all the vector functions.
Some modules need very verbose vector functions after their
scalar version. We will write a template for all the scalar
version and use this script the generate the vector version
and append them to the template to generate the header or
source file.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the convert and as modules into the libocl
Junyan He [Mon, 1 Sep 2014 02:11:52 +0000 (10:11 +0800)]
Add the convert and as modules into the libocl

The convert and as function suites have very similar
format for all tye types and vectors, and they are
really verbose. So the two scripts will generate the
code for Convert and AS separatedlly.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd thw workitem module into the libocl
Junyan He [Mon, 1 Sep 2014 02:11:39 +0000 (10:11 +0800)]
Add thw workitem module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd vload module into the libocl
Junyan He [Mon, 1 Sep 2014 02:11:24 +0000 (10:11 +0800)]
Add vload module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd printf module into libocl
Junyan He [Mon, 1 Sep 2014 02:10:05 +0000 (10:10 +0800)]
Add printf module into libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the sync module into the libocl
Junyan He [Mon, 1 Sep 2014 02:09:28 +0000 (10:09 +0800)]
Add the sync module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the misc module into the libocl
Junyan He [Mon, 1 Sep 2014 02:08:34 +0000 (10:08 +0800)]
Add the misc module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the image module into the libocl
Junyan He [Tue, 2 Sep 2014 13:28:13 +0000 (21:28 +0800)]
Add the image module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the geometric module into the libocl
Junyan He [Mon, 1 Sep 2014 02:07:04 +0000 (10:07 +0800)]
Add the geometric module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the atomic module into the libocl
Junyan He [Mon, 1 Sep 2014 02:06:35 +0000 (10:06 +0800)]
Add the atomic module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the async module into the libocl
Junyan He [Mon, 1 Sep 2014 02:06:09 +0000 (10:06 +0800)]
Add the async module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd common define header files to initialize the libocl
Junyan He [Mon, 1 Sep 2014 02:05:54 +0000 (10:05 +0800)]
Add common define header files to initialize the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: fixup/refine a bug for image1D array's extra binding index handling.
Zhigang Gong [Thu, 4 Sep 2014 00:01:25 +0000 (08:01 +0800)]
GBE: fixup/refine a bug for image1D array's extra binding index handling.

Due to hardware limitation on Gen7/Gen75 when sampling a
surface with clamp address mode and nearest filter mode
on a integer image1Darray type surface, we have to bind
one buffer to to bti. The previous implementation hard
coded it to 128 + original index and when check whether
it is such type bti in driver layer, assume the bti reserved
is 3 which is wrong now.

This patch fixed those hard coded functions and use the
macros defined in the program.h.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: refine the unaligned data gathering.
Zhigang Gong [Thu, 28 Aug 2014 01:26:06 +0000 (09:26 +0800)]
GBE: refine the unaligned data gathering.

Save some unecessary duplicate instructions.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: relax the batch byte/short load vector size restrication.
Zhigang Gong [Thu, 28 Aug 2014 00:44:58 +0000 (08:44 +0800)]
GBE: relax the batch byte/short load vector size restrication.

Previous restrication is that the vector size must be multiple
of DWORD. This restrication prevent the vload2/3 of char or
vload3 of ushort to be optimized. This patch relax this restrication
on the vload path.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: optimize unaligned char and short data vector's load.
Zhigang Gong [Wed, 27 Aug 2014 03:13:15 +0000 (11:13 +0800)]
GBE: optimize unaligned char and short data vector's load.

The gather the contiguous short/char loads into a single load instruction
could give us a good pportunity to use untyped load to optimize them.

This patch enable the short/char load gathering at the load store optimize
pass. Then at the backend, it will load corresponding DWORDs then covert to
short/char accordingly by applying shift and bitwise operations.

The benchmark shows, for vload4/8/16 char or vload/2/4/8/16 short, this patch brings
about 80%-100% improvement.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoAdd new vload benchmark/test case.
Zhigang Gong [Wed, 27 Aug 2014 02:33:42 +0000 (10:33 +0800)]
Add new vload benchmark/test case.

v2:
refine the benchmark case and don't mix it with normal
unit test cases.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fix error in the rootn fastpath function for some special input.
Zhigang Gong [Fri, 29 Aug 2014 02:04:38 +0000 (10:04 +0800)]
GBE: fix error in the rootn fastpath function for some special input.

The fastpath is to lose some accuracy but get fast speed. It is not
to generate error result. The rootn has many special input and need
to be taken care before we call the native pow directly.
This patch fix all the pow related failures at the OpenCV 3.0 test
suite.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoutests: fix two utest bugs.
Zhigang Gong [Tue, 2 Sep 2014 02:34:33 +0000 (10:34 +0800)]
utests: fix two utest bugs.

Similar as the bug found by junyan, some events are
accessed before assigned.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoFix a bug for runtime_barrier_list.cpp, event array out of bound
Junyan He [Tue, 2 Sep 2014 02:37:02 +0000 (10:37 +0800)]
Fix a bug for runtime_barrier_list.cpp, event array out of bound

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>