contrib/beignet.git
9 years agoFix the compare_image_2d_and_1d_array test case bug
Junyan He [Mon, 27 Oct 2014 07:34:11 +0000 (15:34 +0800)]
Fix the compare_image_2d_and_1d_array test case bug

The test case use OCL_MAP_BUFFER_GTT to map the image
buffers and then do the result comparison, which may
cause problems.
On IVB and HSW, the slice pitch is equal but on BDW,
because we change the slice pitch of image array, it
cause this bug.
Modify it by using the standard clEnqueueReadImage API.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoadd utest compiler_overflow for llvm intrinsic function.
Luo Xionghu [Mon, 27 Oct 2014 03:14:50 +0000 (11:14 +0800)]
add utest compiler_overflow for llvm intrinsic function.

this case only runs for uadd_with_over_flow function so far.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoadd llvm Intrinsic call support.
Luo Xionghu [Mon, 27 Oct 2014 03:14:49 +0000 (11:14 +0800)]
add llvm Intrinsic call support.

so far this patch only support uadd_with_overflow.
sadd_with_overflow, usub_with_overflow, smul_with_overflow, umul_with_overflow, bswap are not supported yet.
this funtion should be implemented by carrier flag later.

v2: update comments.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: fix a wrong type of cl_device_info.
Zhigang Gong [Fri, 24 Oct 2014 09:55:26 +0000 (17:55 +0800)]
GBE: fix a wrong type of cl_device_info.

Per OpenCL spec 1.2:
CL_DEVICE_IMAGE_MAX_BUFFER_SIZE should be size_t type rather
than cl_ulong.

This bug will cause problems on i386 platform.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
9 years agoGBE: set default address space to -1 to avoid incorrect unroll hint.
Zhigang Gong [Mon, 27 Oct 2014 00:32:44 +0000 (08:32 +0800)]
GBE: set default address space to -1 to avoid incorrect unroll hint.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
9 years agoFix AUX buffer for page alignment
Zhenyu Wang [Thu, 23 Oct 2014 07:19:26 +0000 (15:19 +0800)]
Fix AUX buffer for page alignment

Apply ALIGN() for aux buffer size from beginning has no effect.
Move to the end of all state offsets set for page alignment.

v2: Update comments

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
9 years agoUse pread/pwrite for buffer enqueue read/write
Zhenyu Wang [Thu, 23 Oct 2014 07:19:25 +0000 (15:19 +0800)]
Use pread/pwrite for buffer enqueue read/write

Instead of mmap, use pread/write interface for bo read/write with
optimized operations.

Result on one mem bandwidth benchmark for buffer enqueue read/write on HSW.

**** Host to device copy - workgroup_size=512

                     old code with mmap       new code with pread/pwrite

WG=512  SZ=  1 KiB     2325 MB/s               3336 MB/s
WG=512  SZ=  2 KiB     4479 MB/s        6270 MB/s
WG=512  SZ=  4 KiB     8215 MB/s       11808 MB/s
WG=512  SZ=  8 KiB    14271 MB/s       19493 MB/s
WG=512  SZ= 16 KiB    16456 MB/s       20079 MB/s
WG=512  SZ= 32 KiB    21136 MB/s       22334 MB/s
WG=512  SZ= 64 KiB    24785 MB/s       24792 MB/s
WG=512  SZ=128 KiB    24590 MB/s       24908 MB/s
WG=512  SZ=256 KiB    17928 MB/s       21435 MB/s
WG=512  SZ=512 KiB    18346 MB/s       20583 MB/s
WG=512  SZ=  1 MiB    18558 MB/s       20808 MB/s
WG=512  SZ=  2 MiB    18582 MB/s       20939 MB/s
WG=512  SZ=  4 MiB    15382 MB/s       18230 MB/s
WG=512  SZ=  8 MiB     7737 MB/s       11558 MB/s
WG=512  SZ= 16 MiB     7073 MB/s        8962 MB/s
WG=512  SZ= 32 MiB     6984 MB/s        8302 MB/s
WG=512  SZ= 64 MiB     6938 MB/s        8308 MB/s
WG=512  SZ=128 MiB     6950 MB/s        8271 MB/s
WG=512  SZ=256 MiB     6941 MB/s        8034 MB/s

**** Device to host copy - workgroup_size=512

                     old code with mmap       new code with pread/pwrite

WG=512  SZ=  1 KiB     2165 MB/s               2596 MB/s
WG=512  SZ=  2 KiB     4453 MB/s        6154 MB/s
WG=512  SZ=  4 KiB     8211 MB/s       11198 MB/s
WG=512  SZ=  8 KiB    14001 MB/s       11319 MB/s
WG=512  SZ= 16 KiB    16218 MB/s       14394 MB/s
WG=512  SZ= 32 KiB    21310 MB/s       22438 MB/s
WG=512  SZ= 64 KiB    24738 MB/s       25237 MB/s
WG=512  SZ=128 KiB    22784 MB/s       25113 MB/s
WG=512  SZ=256 KiB    18018 MB/s       22265 MB/s
WG=512  SZ=512 KiB    18429 MB/s       21687 MB/s
WG=512  SZ=  1 MiB    18654 MB/s       21856 MB/s
WG=512  SZ=  2 MiB    18655 MB/s       21998 MB/s
WG=512  SZ=  4 MiB    15098 MB/s       17563 MB/s
WG=512  SZ=  8 MiB     7737 MB/s       11170 MB/s
WG=512  SZ= 16 MiB     6989 MB/s        8812 MB/s
WG=512  SZ= 32 MiB     6891 MB/s        7597 MB/s
WG=512  SZ= 64 MiB     6868 MB/s        7798 MB/s
WG=512  SZ=128 MiB     6838 MB/s        7630 MB/s
WG=512  SZ=256 MiB     6842 MB/s        7466 MB/s

The benchmark can be got from http://cgit.freedesktop.org/~zhen/MPBenchmarks/.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoClear batch buffer pointer after unmap
Zhenyu Wang [Thu, 23 Oct 2014 07:19:23 +0000 (15:19 +0800)]
Clear batch buffer pointer after unmap

Fix libdrm warning about unmap the unmapped buffer, as when
batch terminate we will try to do unmap again, but batch has
already been unmapped.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoMake use of write enable flag for mem bo map
Zhenyu Wang [Thu, 23 Oct 2014 07:19:22 +0000 (15:19 +0800)]
Make use of write enable flag for mem bo map

Use drm/intel optimization for mem bo mapping in case of read or write.
So we could be possibly waiting less.

This also adds 'map_flags' check in clEnqueueMapBuffer/clEnqueueMapImage
for actual read or write mapping.

But currently leave clMapBufferIntel untouched which might break ABI/API.

v2: Fix write_map flag in clEnqueueMapBuffer/clEnqueueMapImage.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: handle dead loop BBs in liveness analysis.
Zhigang Gong [Fri, 24 Oct 2014 01:55:34 +0000 (09:55 +0800)]
GBE: handle dead loop BBs in liveness analysis.

Considering the following CFG, our previous liveness analysis will only
back traverse from the exit point BB, thus the BB 6 and 7 will not be handled.

    4---
    |   |
    |   |
    5 --|---> 10 ---> ret
    |   |
    |   |
    6<--
    |
    |<--
    7   |
    |   |
     ---

Although the CFG looks not a normal application as once it goes to
block 6, it will enter a dead loop and will never return, we still need
to compile it successfully.

This patch is to fix the bug at:
https://bugs.freedesktop.org/show_bug.cgi?id=85362

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
9 years agocreate GIT_SHA1 without any dependency
Meng Mengmeng [Fri, 24 Oct 2014 19:10:02 +0000 (03:10 +0800)]
create GIT_SHA1 without any dependency

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoAdd the disasm support for Gen8
Junyan He [Thu, 23 Oct 2014 05:09:00 +0000 (13:09 +0800)]
Add the disasm support for Gen8

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
9 years agoGBE: fix regression caused by simple block optimization.
Zhigang Gong [Thu, 23 Oct 2014 06:22:38 +0000 (14:22 +0800)]
GBE: fix regression caused by simple block optimization.

Almost all 64bit related instructions and unaligned load
instruction should be complex instruction. We need to exclude
them from simple block.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Xionghu Luo <xionghu.luo@intel.com>
9 years agoadd beignet GIT_HAL1 if there is .git directory
Meng Mengmeng [Thu, 23 Oct 2014 01:09:21 +0000 (09:09 +0800)]
add beignet GIT_HAL1 if there is .git directory

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: add basic PHINode support in legalize pass.
Zhigang Gong [Tue, 21 Oct 2014 13:03:46 +0000 (21:03 +0800)]
GBE: add basic PHINode support in legalize pass.

Lack of the incomplete PHINode support currently.
After this patch, we could fix all the crash cases in piglit.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
9 years agoGBE: increase maximum src/dst operands to 32.
Zhigang Gong [Tue, 21 Oct 2014 13:00:23 +0000 (21:00 +0800)]
GBE: increase maximum src/dst operands to 32.

As we may bitcast a <16 * i64> to/from <32 * i32> due to
the legalize pass, we have to increase the maximum operands
number to 32 and fix some assertions accordingly.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
9 years agoGBE: add Selection instruction handler at legalize pass.
Zhigang Gong [Tue, 21 Oct 2014 08:04:28 +0000 (16:04 +0800)]
GBE: add Selection instruction handler at legalize pass.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
9 years agoAdd the test case for image 2d array fill
Junyan He [Wed, 22 Oct 2014 07:51:06 +0000 (15:51 +0800)]
Add the test case for image 2d array fill

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoAdd the test case for image 1d array fill
Junyan He [Wed, 22 Oct 2014 07:50:59 +0000 (15:50 +0800)]
Add the test case for image 1d array fill

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix the bug of 1D array slice pitch
Junyan He [Tue, 21 Oct 2014 13:02:27 +0000 (21:02 +0800)]
Fix the bug of 1D array slice pitch

For BDW, the vertical align is 4 at least.
This cause the slice pitch twice as big as
the Gen7 for 1D buffer array.
Because the buffer tiling alignment may change
for different GENs, we move it from run time to
intel driver.

V2:
  Fix all the bugs about 1d and 2d image array.
  And delete the tile align size which is useless.
  Also integrate two image array test cases into
  this patch set.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agouse global flag 0.0 to control unstructured simple block.
Luo Xionghu [Fri, 17 Oct 2014 03:33:23 +0000 (11:33 +0800)]
use global flag 0.0 to control unstructured simple block.

filter the simple block out and replace the if/endif with global flag
to control.

v2: fix the luxmark sala performance degression due to extern flag in a
BRA instruction.

v3: fix compiler_switch regression, LOAD/STORE instruction could
call replaceSrc/replaceDst to generate 2 extra MOV instruction; exclude
the scalar instructions since they don't have prediction.

this patch is somewhat dangerous to change the instruction structure of block,
will add sanity check after emitInstructionStream to assert if illegally
modified.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoGBE: disable custom loop unroll for LLVM 3.3/3.4.
Zhigang Gong [Mon, 20 Oct 2014 01:32:13 +0000 (09:32 +0800)]
GBE: disable custom loop unroll for LLVM 3.3/3.4.

To fix a build error with LLVM 3.3/3.4.

v2:
should include llvm-config before check llvm version.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
9 years agoGBE: Add a customized loop unrolling handling mechanism.
Zhigang Gong [Wed, 8 Oct 2014 04:58:59 +0000 (12:58 +0800)]
GBE: Add a customized loop unrolling handling mechanism.

By default, the unrolling threshold is relatively small.
Thus some relative large loops which access private array
will not be unrolled, thus those private array can't
be scalarized latter. And the private array is allocated
in stack which is extreme slow for Gen backend currently.

To increase the unrolling threshold for all loops is not
a good idea, as most of the loops don't need to do unrolling
for this purpose and a large unrolling threshold will cause
a big code size and unecessary big register pressure which
may lead to register spilling.

So this patch introduce a trade-off pass to identify those
loops which still have private load/store in the outer most
of the loop. Then add a metadata to it to indicate aggresive
unrolling on those loops. Then do another round loop unrolling.

This patch with the previous small patch, can bring significant
performance improvement for some cases. I just tested with some
opencv test cases, and observed it can bring 2x to 10x improvement.

v2:
refine the parent loop unroll analysis method.

v3:
disable this pass for LLVM 3.3/3.4.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
9 years agoGBE: fix a legalize pass bug when bitcast wide integer to incompaitble vector.
Zhigang Gong [Fri, 17 Oct 2014 04:10:15 +0000 (12:10 +0800)]
GBE: fix a legalize pass bug when bitcast wide integer to incompaitble vector.

Our wide integer legalize pass will assume the source type of a wide
integer must be the same as the final use type. But this is not always
true. The following case is a real example:

      %conv.i.i.14 = sext i8 %usrc0.i.sroa.0.14.extract.trunc to i32
      %call.i.i2.14 = tail call i32 @__gen_ocl_abs(i32 %conv.i.i.14) #5
      %conv1.i.i.14.mask = and i32 %call.i.i2.14, 255
      %uret.i.sroa.0.14.insert.ext = zext i32 %conv1.i.i.14.mask to i128
      %uret.i.sroa.0.14.insert.shift = shl nuw nsw i128 %uret.i.sroa.0.14.insert.ext, 112
      ......
      %uret.i.sroa.0.15.insert.mask = or i128 %uret.i.sroa.0.14.insert.mask.masked, %uret.i.sroa.0.14.insert.shift
      %uret.i.sroa.0.15.insert.insert = or i128 %uret.i.sroa.0.15.insert.mask, %uret.i.sroa.0.15.insert.shift
      %2 = bitcast i128 %uret.i.sroa.0.15.insert.insert to <16 x i8>

The wide integer i128 %uret.i.sroa.0.16.insert.insert is from an
i32 integer %conv1.i.i.14.mask. But the use of it is i8 vector
which breaks our assumption.

According to ruiling's good suggestion, we always bitcast the wide integer to
a compatible vector, take the above example:
  %3 = bitcast i128 %uret.i.sroa.0.15.insert.insert to <4 x i32>
Then insert a bit cast instruction to convert it to the original destination
  %2 = bitcast <4 x i32> %3 to <16 x i8>

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
9 years agoRefine the the error handling in function cl_command_queue_ND_range_gen7.
Yang Rong [Fri, 17 Oct 2014 03:04:16 +0000 (11:04 +0800)]
Refine the the error handling in function cl_command_queue_ND_range_gen7.

Return error code when work group error, and remove printf and exit(-1) when internal error..

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix a HSW regression.
Yang Rong [Thu, 16 Oct 2014 03:07:37 +0000 (11:07 +0800)]
Fix a HSW regression.

slmoffset use as DWORD register, but declare WORD, fix it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: do intrinsics lowering pass earlier.
Zhigang Gong [Tue, 30 Sep 2014 04:16:34 +0000 (12:16 +0800)]
GBE: do intrinsics lowering pass earlier.

After the intrinsics lowering pass, it may introduce some new loops
which could be optimized nicely with the previous loop unrolling and
SROA PASS which especially may elminiate private data access.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
9 years agoFix a upsample regression.
Yang Rong [Thu, 16 Oct 2014 08:51:12 +0000 (16:51 +0800)]
Fix a upsample regression.

In GenEncoder, unpack is not good, so move the upsample int/short from GenEncoder to instruction selection.
Still handle upsample long in GenContext.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoBDW: Also need set Shader Channel Select for constant buffer in BDW.
Yang Rong [Thu, 16 Oct 2014 07:11:02 +0000 (15:11 +0800)]
BDW: Also need set Shader Channel Select for constant buffer in BDW.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoBDW: Fix load/store half error.
Yang Rong [Thu, 16 Oct 2014 07:10:35 +0000 (15:10 +0800)]
BDW: Fix load/store half error.

BDW support HF data type, so use mov directly to convert between Half Float/Float.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoBDW: Change BDW's max work group size to 512.
Yang Rong [Wed, 15 Oct 2014 08:26:14 +0000 (16:26 +0800)]
BDW: Change BDW's max work group size to 512.

Opencv only query and use device max work group size, when SLM/Barrier enable, BDW
can't fill 1024 work group in one subslice, even in SIMD16. Change device's max work
group size temp.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix a HSW constant buffer regression.
Yang Rong [Wed, 15 Oct 2014 06:39:44 +0000 (14:39 +0800)]
Fix a HSW constant buffer regression.

HSW's constant buffer read use ld message, also need set Shader Channel Select in surface state.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
9 years agoGBE: fix a bug in legalize pass.
Zhigang Gong [Wed, 15 Oct 2014 05:18:12 +0000 (13:18 +0800)]
GBE: fix a bug in legalize pass.

The Shr/Shl llvm instructions require to have identical type for
the two operands, so we can't set the shift count to i32 always.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Zhu, BingbingX <bingbingx.zhu@intel.com>
9 years agoFit the printf bug in loop
Junyan He [Tue, 14 Oct 2014 07:52:04 +0000 (15:52 +0800)]
Fit the printf bug in loop

The static analyse for printf can not totally work
when the printf inst is within the loop and the loop
can not be unrolled. This causes the printf just to
print one info for a loop and to lose all the others.
We now increment the exec number every time the printf
inst is triggered. The number is stored for output all
the message later.
The problem is that we can not caculate the exact loops
number for each printf inst. The wrong loop number will
cause the data overwritten. We now assume all the printf
inst are in loop and store the data like this:
| PRINTF1_DATA  PRINTF2_DATA ... | PRINTF1_DATA  PRINTF2_DATA ... | ...
|       DATA_LOOP_ONE            |          DATA_LOOP_TWO         | ...
Although this may cause some space wasted.

Another problem is that we need to decide the size of printf buffer
because the loop upbound can not be caculated. We just set
it yo 1M for small info slot request and 4M for big one.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix HSW thread_n <= 64 assert.
Yang Rong [Tue, 14 Oct 2014 07:18:39 +0000 (15:18 +0800)]
Fix HSW thread_n <= 64 assert.

In function cl_get_kernel_max_wg_sz, hsw's thread count may large than 64,
add a max limit.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoadd utest popcount for all types.
Luo Xionghu [Tue, 14 Oct 2014 00:08:46 +0000 (08:08 +0800)]
add utest popcount for all types.

v2: add all types to test.
v3: fix signed type count bits error.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoadd opencl-1.2 builtin function popcount.
Luo [Fri, 10 Oct 2014 03:05:04 +0000 (11:05 +0800)]
add opencl-1.2 builtin function popcount.

the popcount function returns the number of non-zero bits in input.
use GEN instruction cbit(Count Bits Set) to implement it.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoLet the failure of intel_drm lib's check as a FATAL_ERROR
Junyan He [Tue, 14 Oct 2014 05:58:02 +0000 (13:58 +0800)]
Let the failure of intel_drm lib's check as a FATAL_ERROR

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix memcpy and memset bug.
Yang Rong [Tue, 14 Oct 2014 03:48:20 +0000 (11:48 +0800)]
Fix memcpy and memset bug.

In ocl_memcpy.ll and ocl_memset.ll, index+4 should be less than size when use int in
memcpy and memset, and need consider alignment.

V3: For performance, provide two versions of memcpy and memset, decide call which one when lowering intrinsic.
V4: add these new functions in the bitcode link filter list.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoAdd the libdrm version check.
Junyan He [Mon, 13 Oct 2014 14:23:42 +0000 (22:23 +0800)]
Add the libdrm version check.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoRefine the shared function ID define.
Junyan He [Mon, 13 Oct 2014 13:54:22 +0000 (21:54 +0800)]
Refine the shared function ID define.

The old sfid define is obsolete and confusing.
Correct the names and delete the invalid ones.
The math shared function is no longer included in sfid,
and some modification for disasm to print the math
related infomation correctly.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix a HSW regression.
Yang Rong [Mon, 13 Oct 2014 06:48:06 +0000 (14:48 +0800)]
Fix a HSW regression.

HSW's JMPI instruction has 32bits JIP, can't merge JIP and UIP as other jump instruction.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoRe-apply "improve the build performance of vector type built-in function."
Ruiling Song [Fri, 10 Oct 2014 07:01:27 +0000 (15:01 +0800)]
Re-apply "improve the build performance of vector type built-in function."

This reverts commit c65c0087166a2194ece457d8739d06e86a857dbe.

As we have handled wide integers, we can enable it now.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: add legalize pass to handle wide integers
Ruiling Song [Sat, 11 Oct 2014 06:09:37 +0000 (14:09 +0800)]
GBE: add legalize pass to handle wide integers

This legalize pass will break wider integers like i128/i256/... into shorter ones.
The problem is how to choose the shorter type? From my observation,
wide integer type always comes from shorter ones through 'zext' on small type
or 'bitcast' on vectors, so we simply choose the type where it comes from.
Then we can split wide integer operations into operations on shorter interger.

v2:
  add an assert on the wide integer bit-width, should be power of 2.
  use rpo_iterator to make sure traverse Value def before its use.

v3:
  drop all references before erase processed instruction.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: Fix a bug when setting flag register
Ruiling Song [Fri, 10 Oct 2014 07:01:25 +0000 (15:01 +0800)]
GBE: Fix a bug when setting flag register

we should use simd1, instead of simd8/simd16.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoFix a segment fault.
Yang Rong [Thu, 9 Oct 2014 05:56:52 +0000 (13:56 +0800)]
Fix a segment fault.

llvm::CallInst::CallInst::getCalledFunction may return NULL, can't call getIntrinsicID
directly.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
9 years agoBDW: Add gen8 into intel_driver_init
Junyan He [Mon, 29 Sep 2014 05:37:44 +0000 (13:37 +0800)]
BDW: Add gen8 into intel_driver_init

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
9 years agoModify the bind sampler logic for gen8
Junyan He [Thu, 9 Oct 2014 08:27:44 +0000 (16:27 +0800)]
Modify the bind sampler logic for gen8

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
9 years agoAdd sampler state and tile define for gen8.
Junyan He [Thu, 9 Oct 2014 08:27:35 +0000 (16:27 +0800)]
Add sampler state and tile define for gen8.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
9 years agoBDW: Correct stack setting in BDW.
Yang Rong [Thu, 9 Oct 2014 06:07:56 +0000 (14:07 +0800)]
BDW: Correct stack setting in BDW.

Remove special fftid handle for HSW in Gen8Context, and change stack buffer address
to QWORD, for curbe. Because it only waste 4 bytes register in other platform, change
to QWORD for all platform.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Need not restore SLM setting in BDW.
Yang Rong [Thu, 9 Oct 2014 06:07:55 +0000 (14:07 +0800)]
BDW: Need not restore SLM setting in BDW.

Restore SLM setting may cause some test random fail, remove it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Fix a scaler int 32*32 bug.
Yang Rong [Thu, 9 Oct 2014 06:07:54 +0000 (14:07 +0800)]
BDW: Fix a scaler int 32*32 bug.

Seems BDW's scalar mul need QWROD dst, otherwise will touch the dst's follow register

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Correct BDW device name.
Yang Rong [Thu, 9 Oct 2014 06:07:53 +0000 (14:07 +0800)]
BDW: Correct BDW device name.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Forgot to set UIP of else in BDW.
Yang Rong [Thu, 9 Oct 2014 06:07:52 +0000 (14:07 +0800)]
BDW: Forgot to set UIP of else in BDW.

GEN_OPCODE_ELSE also need set UIP, add it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Correct scratch buffer of BDW.
Yang Rong [Mon, 29 Sep 2014 05:38:37 +0000 (13:38 +0800)]
BDW: Correct scratch buffer of BDW.

BDW's scratch buffer change to power 2 alignment from 1024.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add device's sub slice field, for cl_get_kernel_max_wg_sz.
Yang Rong [Mon, 29 Sep 2014 05:38:36 +0000 (13:38 +0800)]
BDW: Add device's sub slice field, for cl_get_kernel_max_wg_sz.

When SLM enable, get kernal max workgroup size should return the a sub slice's max thread * simdwidth.
So need the sub slice information.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add BDW Device id to gen binary generater and binary serialize in backend.
Yang Rong [Mon, 29 Sep 2014 05:38:35 +0000 (13:38 +0800)]
BDW: Add BDW Device id to gen binary generater and binary serialize in backend.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: BDW don't need add slm offset, remove it.
Yang Rong [Mon, 29 Sep 2014 05:38:34 +0000 (13:38 +0800)]
BDW: BDW don't need add slm offset, remove it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Refine BDW's int 32*32 multiply.
Yang Rong [Mon, 29 Sep 2014 05:38:33 +0000 (13:38 +0800)]
BDW: Refine BDW's int 32*32 multiply.

BDW support int32 * int32 directly. So add a flag to selection for it.
BDW use int32*int16 when use acc. Because int32*int16 also work in IVB,
change to int32*int16 when use acc.
Need refine int32*int32 to long later.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Fix unsample bug.
Yang Rong [Mon, 29 Sep 2014 05:38:32 +0000 (13:38 +0800)]
BDW: Fix unsample bug.

When set the hstride to 2, also need set vstride to 16.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: enable SLM in BDW.
Yang Rong [Mon, 29 Sep 2014 05:38:31 +0000 (13:38 +0800)]
BDW: enable SLM in BDW.

BDW's SLM control register change to L3CNTLREG, offset is 0x7034.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Fix Pointer argument curbe alloce size.
Yang Rong [Mon, 29 Sep 2014 05:38:30 +0000 (13:38 +0800)]
BDW: Fix Pointer argument curbe alloce size.

Because kernel will write 64bits address when reloc, so when reloc argument
in the curbe bo, the pointer need 8 byte curbe.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: add some BDW function.
Yang Rong [Mon, 29 Sep 2014 05:38:12 +0000 (13:38 +0800)]
BDW: add some BDW function.

Add intel_gpgpu_load_vfe_state_gen8, intel_gpgpu_walker_gen8, intel_gpgpu_build_idrt_gen8.
Reloc Dynamic State Base Address in gen7's intel_gpgpu_set_base_address, to unify intel_gpgpu_load_curbe_buffer
and intel_gpgpu_load_idrt.
Now can pass part of utest builtin_global_id.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Refine intel_gpgpu_setup_bti and add intel_gpgpu_set_base_address for BDW.
Yang Rong [Mon, 29 Sep 2014 05:38:11 +0000 (13:38 +0800)]
BDW: Refine intel_gpgpu_setup_bti and add intel_gpgpu_set_base_address for BDW.

Because the sizeof struct surface state change in BDW, remove gen6_surface_state, and
use gen_surface_state as the unoin of gen7_surface_state and gen8_surface_state.
Use gen_surface_state in surface_heap_t.
Reloc the Dynamic State Base and Instruction Address in intel_gpgpu_set_base_address_gen8.
BDW use 48 bits GPU address, so when reloc address, remember that kernel will reloc 64 bits in
command batch, so make sure there are 64 bits address, the high 64bits follow by low 32bits in command batch.

v2:
remove binary .swp file.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Pass Jip and Uip when patchJMPI.
Yang Rong [Mon, 29 Sep 2014 05:38:10 +0000 (13:38 +0800)]
BDW: Pass Jip and Uip when patchJMPI.

Do not like GEN7, BDW's Jip is in bits4 and Uip is in bits3, so should set Jip
and Uip independently.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add function intel_gpgpu_bind_buf for gen8.
Junyan He [Mon, 29 Sep 2014 05:37:49 +0000 (13:37 +0800)]
BDW: Add function intel_gpgpu_bind_buf for gen8.

Must call cl_bind_buf instead of intel_gpgpu_bind_buf directly in intel_gpgpu.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Correct surface base address set in setup bti.
Junyan He [Mon, 29 Sep 2014 05:37:48 +0000 (13:37 +0800)]
BDW: Correct surface base address set in setup bti.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add function intel_gpgpu_setup_bti for gen8.
Junyan He [Mon, 29 Sep 2014 05:37:47 +0000 (13:37 +0800)]
BDW: Add function intel_gpgpu_setup_bti for gen8.

Also set the correct surface cache control.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: refine the gen8_surface_state_t.
Junyan He [Mon, 29 Sep 2014 05:37:46 +0000 (13:37 +0800)]
BDW: refine the gen8_surface_state_t.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
9 years agoBDW: Add gen8 surface state struct.
Junyan He [Mon, 29 Sep 2014 05:37:45 +0000 (13:37 +0800)]
BDW: Add gen8 surface state struct.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
9 years agoBDW: Add class Gen8Context.
Yang Rong [Mon, 29 Sep 2014 05:37:19 +0000 (13:37 +0800)]
BDW: Add class Gen8Context.

Now Gen8Context is almost same as Gen75Context, but still derive Gen8Context from GenContext for clearly.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add Gen8Encoder and Gen7Encoder.
Yang Rong [Mon, 29 Sep 2014 05:37:18 +0000 (13:37 +0800)]
BDW: Add Gen8Encoder and Gen7Encoder.

Class Gen8Encoder and Gen7Encoder derive from GenEncoder, and Gen75Encoder derive from Gen7Encode.
GenNativeInstruction is handled in class GenEncoder, Gen7NativeInstruction is handled in class
Gen7Encoder and Gen75Encoder, and Gen8NativeInstruction is handled in classe Gen8Encoder.
Disable Gen8's instruction compact temporary, should add compact and disassemble later.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add BDW instruction define.
Yang Rong [Mon, 29 Sep 2014 05:37:17 +0000 (13:37 +0800)]
BDW: Add BDW instruction define.

Seperate GEN7 instruction and GEN8 instrunction. GenNativeInstruction will become a union of
Gen7NativeInstruction and Gen8NativeInstruction.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoBDW: Add BDW pci ids and BDW device struct.
Yang Rong [Mon, 29 Sep 2014 05:37:16 +0000 (13:37 +0800)]
BDW: Add BDW pci ids and BDW device struct.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoAvoid use GenNativeInstruction directly out of GenEncode and gen_insn_compact.
Yang Rong [Mon, 29 Sep 2014 05:37:15 +0000 (13:37 +0800)]
Avoid use GenNativeInstruction directly out of GenEncode and gen_insn_compact.

Use the void* instead of when do instruction compact/decompact.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agoGBE: structurized loop exit need an extra branching instruction when do reordering.
Zhigang Gong [Tue, 23 Sep 2014 06:15:46 +0000 (14:15 +0800)]
GBE: structurized loop exit need an extra branching instruction when do reordering.

When we want to reorder the BBs and move the unstructured BB out-of the
structured block, we need to add a BRA to the block. If the exit of the
structured block is a loop, we need to append a unconditional BRA right
after the predicated BRA. Otherwise, we may lost the correct successor
if an unstructured BB is moved next to this BB.

After this patch, with loop optimization enabled, there is no regression
on both utests and piglit. But there are still a few regressions in opencv
test suite:
[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (40041 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] OCL_Photo/FastNlMeansDenoising.Mat/2, where GetParam() = (Channels(2), false)
[  FAILED  ] OCL_Photo/FastNlMeansDenoising.Mat/3, where GetParam() = (Channels(2), true)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/0, where GetParam() = (Channels(3), false)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/1, where GetParam() = (Channels(3), true)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/2, where GetParam() = (Channels(4), false)
[  FAILED  ] OCL_Photo/FastNlMeansDenoisingColored.Mat/3, where GetParam() = (Channels(4), true)

So let's keep this optimizaion disabled. Will enable it when I fixed all
the known issues.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Luo <xionghu.luo@intel.com>
9 years agoGBE: fix a loop header file including bug.
Zhigang Gong [Fri, 19 Sep 2014 01:00:11 +0000 (09:00 +0800)]
GBE: fix a loop header file including bug.

function.hpp doesn't need to include the structural_analysis.hpp.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Luo <xionghu.luo@intel.com>
9 years agoUse instruction WHILE to manipulate structure.
Luo Xionghu [Mon, 15 Sep 2014 00:23:39 +0000 (08:23 +0800)]
Use instruction WHILE to manipulate structure.

1. WHILE instruction should be non-schedulable.
2. if this WHILE instruction jumps to an ELSE instruction, the distance
need add 2.

v2:
We also need to take care of HSW for while instruction.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoadd handleSelfLoopNode to insert while instruction on Gen IR level.
Luo Xionghu [Mon, 15 Sep 2014 00:23:38 +0000 (08:23 +0800)]
add handleSelfLoopNode to insert while instruction on Gen IR level.

v2:
disable loop optimization by default due to still buggy.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoAdd Gen IR WHILE.
Luo Xionghu [Mon, 15 Sep 2014 00:23:37 +0000 (08:23 +0800)]
Add Gen IR WHILE.

Add Gen IR WHILE to mark the strucutred region.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp.
Ruiling Song [Thu, 18 Sep 2014 06:42:01 +0000 (14:42 +0800)]
GBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp.

Gen provide tm0 register for intra-kernel profiling.
Here we provide an API __gen_ocl_get_timestamp() to return
the timestamp in TM.

The return type is defined as:
struct time_stamp {
  ulong tick;
  uint event;
};

'tick' is a 64bit time tick. 'event' stores a value which means
whether a tmEvent has occured (non-zero) or not (0). tmEvent includes
time-impacting event such as context switch or frequency change
since last time tm0 was read.

I add a sample in the kernels/compiler_time_stamp.cl. Hope it
would help you understand how to use it.

V2:
Introduce ir::ARFRegister to avoid directly use of nr/subnr in Gen IR.
Rename __gen_ocl_extract_reg to __gen_ocl_region.
Rename beignet_get_time_stamp to __gen_ocl_get_timestamp.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE/libocl: fix build dependency issue.
Zhigang Gong [Thu, 18 Sep 2014 00:33:46 +0000 (08:33 +0800)]
GBE/libocl: fix build dependency issue.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
9 years agoAdd long support for printf
Junyan He [Thu, 18 Sep 2014 04:39:15 +0000 (12:39 +0800)]
Add long support for printf

V2:
    Replace all the long and ulong to int64_t

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: Output linkModules's error message.
Ruiling Song [Wed, 17 Sep 2014 03:33:49 +0000 (11:33 +0800)]
GBE: Output linkModules's error message.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
9 years agofix utest memory leak.
Luo Xionghu [Tue, 16 Sep 2014 21:58:17 +0000 (05:58 +0800)]
fix utest memory leak.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agofix one bug at cl_get_kernel_workgroup_info.
Luo Xionghu [Tue, 16 Sep 2014 21:58:17 +0000 (05:58 +0800)]
fix one bug at cl_get_kernel_workgroup_info.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoRevert "improve the build performance of vector type built-in function."
Zhigang Gong [Wed, 17 Sep 2014 03:49:30 +0000 (11:49 +0800)]
Revert "improve the build performance of vector type built-in function."

This patch still has to be pending to fix the wide integer issue completely.
Although we have a fallback mechanism which will try to build the module again
by ignoring some passes to avoid the wide integer issue, it's broken now on
master branch. As now all the builtin functions have been built statically,
and those bitcode may already have i128/i512 etc.

This reverts commit 565d1eb00d9a5219c2848b3674e40ac07cb48b89.

9 years agoimprove the build performance of vector type built-in function.
Luo Xionghu [Tue, 16 Sep 2014 03:24:48 +0000 (11:24 +0800)]
improve the build performance of vector type built-in function.

this patch was lost during the libocl merge. resubmit it to improve the
vector function performance.

please refer to e2db890596eea0a6eb741e11e576a38952f1ed1e for detail.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoremove the LinkOnceAnyLinkage since the libocl is introduced.
Luo Xionghu [Tue, 16 Sep 2014 01:40:09 +0000 (09:40 +0800)]
remove the LinkOnceAnyLinkage since the libocl is introduced.

no need to set the LinkOnceAnyLinkage for global variables and functions
to avoid redefinition.

v2:
also enable the VerifierPass.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoFix the bug of LLVM_LFLAGS fail to set
Junyan He [Tue, 16 Sep 2014 03:12:10 +0000 (11:12 +0800)]
Fix the bug of LLVM_LFLAGS fail to set

The LLVM_LFLAGS is used before finding the LLVM package,
which causes the CMake fails to set correct -L flags and
cause linkage error.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE/libocl: fix a regression after libocl change.
Zhigang Gong [Fri, 12 Sep 2014 09:38:06 +0000 (17:38 +0800)]
GBE/libocl: fix a regression after libocl change.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
9 years agoGBE/libocl: add missing vector builtin definition for fma.
Zhigang Gong [Fri, 12 Sep 2014 09:18:16 +0000 (17:18 +0800)]
GBE/libocl: add missing vector builtin definition for fma.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
9 years agoModify the CMakeList to use the internal PCH first.
Junyan He [Mon, 15 Sep 2014 08:04:10 +0000 (16:04 +0800)]
Modify the CMakeList to use the internal PCH first.

Because we delete the validation of the PCH file, sometimes
the PCH in the system dir is not compatible with the clang
and cause crash.
Anytime, we need to use internal PCH when compiling.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoUpdate NEWS.
Zhigang Gong [Mon, 15 Sep 2014 08:13:37 +0000 (16:13 +0800)]
Update NEWS.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoRemove out-of-date document.
Zhigang Gong [Mon, 15 Sep 2014 06:45:24 +0000 (14:45 +0800)]
Remove out-of-date document.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoGBE/libocl: Fix sub_sat corner case.
Ruiling Song [Mon, 15 Sep 2014 03:14:05 +0000 (11:14 +0800)]
GBE/libocl: Fix sub_sat corner case.

It seems that hw return wrong result when y is equal to 0x80000000
in sub_sat(int x, int y). So we re-write it as:
add_sat(add_sat(0x7fffffff, x), 1)

Also enable corresponding utest.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agofix bin/cl-program-tester tests/cl/program/execute/attributes.cl regression.
Luo Xionghu [Sun, 14 Sep 2014 22:33:12 +0000 (06:33 +0800)]
fix bin/cl-program-tester tests/cl/program/execute/attributes.cl regression.

work_group_size_hint should define another variable.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoUpdate readme.
Zhigang Gong [Mon, 15 Sep 2014 02:21:18 +0000 (10:21 +0800)]
Update readme.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
9 years agoEnable ICC and CLANG compiler for beignet
Lv Meng [Fri, 22 Aug 2014 08:26:37 +0000 (16:26 +0800)]
Enable ICC and CLANG compiler for beignet

the 'COMPILER' is to choose the detail compiler,the default is GCC.

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
9 years agoGBE: fix multiple files compilation bugs.
Zhigang Gong [Fri, 12 Sep 2014 05:45:40 +0000 (13:45 +0800)]
GBE: fix multiple files compilation bugs.

If we want to link multiple files together, and one kernel
function need refer other kernel functions in other files,
we must not set those functions as linked once attribute.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>