Kevron Rees [Mon, 9 Feb 2015 23:36:27 +0000 (15:36 -0800)]
Added requires on ocl-icd
Kevron Rees [Fri, 14 Nov 2014 18:09:54 +0000 (10:09 -0800)]
updated version to 1.0
Kevron Rees [Fri, 14 Nov 2014 17:40:25 +0000 (09:40 -0800)]
add smack manifest
Kevron Rees [Tue, 4 Nov 2014 17:34:39 +0000 (09:34 -0800)]
initial packaging
Zhigang Gong [Fri, 14 Nov 2014 01:58:08 +0000 (09:58 +0800)]
Bump to 1.0.0.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Fri, 14 Nov 2014 00:02:57 +0000 (08:02 +0800)]
GBE: work around error reporting for unresolved symbols
Beignet currently doesn't have a good error reporting mechanism
for internal passes. Currently, almost all error will cause an
user unfriendly assert. This is ok for most of the cases, but
it is really not good for unresolve symbols error, as it is not
a real compiler internal bug and we should not assert. We should
report it to help user to identify the error and fix that in the
cl kernel.
This patch is just a work around. We will implement a better
error handling in the future to consolidate all this type of
error into the normal error log buffer.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Junyan He [Thu, 13 Nov 2014 09:40:46 +0000 (17:40 +0800)]
Fix the bug of multi-thread crash
The cl_thread has a potential problem.
If the threads are created and destroyed very fast,
while the queue remain avaible, the resource of
destroyed thread will not be free correctly and will
be wrongly reused by later created thread.
V2:
Use a easy way to handle this case. We do not clear
the resource and just keep it. The later thread will
not wrongly reuse it. The thread number will not be
very huge, so it is reasonable to clear all the
resource when the command queue is destroyed.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 13 Nov 2014 06:17:23 +0000 (14:17 +0800)]
runtime: disable userptr due to random fail.
We get a random fail when enable userptr.
Disable it before we root cause and fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Zhigang Gong [Thu, 13 Nov 2014 05:39:20 +0000 (13:39 +0800)]
runtime: fix bug in cl_enqueue_read_buffer.
If the buffer is a userptr buffer, we should copy it directly.
Otherwise, it fails in libdrm. As drm_intel_gem_bo_subdata() refuses
to read a userptr buffer object.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Zhigang Gong [Thu, 13 Nov 2014 05:37:15 +0000 (13:37 +0800)]
runtime: refine version handling.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Wed, 12 Nov 2014 23:55:43 +0000 (07:55 +0800)]
Update documents.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Wed, 12 Nov 2014 06:09:35 +0000 (14:09 +0800)]
runtime: fix one bug in BDW image.
As we still have the image 1d array workaround, we need to
fix it for BDW as well.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
Zhigang Gong [Wed, 12 Nov 2014 05:01:02 +0000 (13:01 +0800)]
update some documents.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Tue, 11 Nov 2014 10:09:58 +0000 (18:09 +0800)]
GBE: fix one double related bugs for post register scheduling.
We need to set the temporary register to U64 type, otherwise
latter post register scheduling will do some bad things.
Although we don't support double currently, this bug still could
be triggerred easily if you use printf("%f", foo).
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Wed, 12 Nov 2014 03:44:18 +0000 (11:44 +0800)]
Revert "BDW: Change the default tiling mode to TILING_Y on BDW."
This reverts commit
f2c57a46de4f51fa5d4c8e02cc751fce7ff417c8.
Zhigang Gong [Tue, 11 Nov 2014 05:55:37 +0000 (13:55 +0800)]
GBE: fix relocatable issue for pch file.
When we use pch file, we need to provide the orignal header files.
Otherwise, clang may complain errors. Clang provide a compile option
--relocatable-pch to enable pch/header files relocation. We have to
use this option when we want to install the header files/pch file
into system diretory.
v2:
exchange beignet.pch and beignet.local.pch.
v3:
fix comment.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Zhigang Gong [Wed, 5 Nov 2014 09:22:21 +0000 (17:22 +0800)]
License: adjust all license version to LGPL v2.1+.
To make the license statement consistent to each other, adjust
all license versions to v2.1+. Thus beignet should have a pure
LGPL v2.1+ license.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Wed, 5 Nov 2014 09:22:20 +0000 (17:22 +0800)]
utests: remove all shader toy test cases.
As we can't find the original license of these test cases, we
have to remove them from beignet's unit test cases.
Reported by "Rebecca N. Palmer" <rebecca_palmer@zoho.com>.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Ruiling Song [Tue, 11 Nov 2014 01:30:15 +0000 (09:30 +0800)]
docs: update mixed_buffer_pointer document.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 11 Nov 2014 02:12:14 +0000 (10:12 +0800)]
Revert "fix issue to create cl image from libva with non-zero offset"
We found this patch cause some serious regressions. Considering it is not
part of the OCL standard API, we choose to revert it for 1.0 release.
This reverts commit
b6660fa343e4e80231123695834cc24e3fc5487b.
Zhigang Gong [Tue, 11 Nov 2014 02:13:38 +0000 (10:13 +0800)]
Revert "add test for clCreateImageFromLibvaIntel"
This reverts commit
9e236b18542f2564e399bf13d4d1fbcc48a5ec9f.
Guo Yejun [Mon, 10 Nov 2014 08:02:37 +0000 (16:02 +0800)]
use posix_memalign instead of aligned_alloc to be more compatible
At some systems, function aligned_alloc is not supported.
From Linux Programmer's Manual:
The function aligned_alloc() was added to glibc in version 2.16.
The function posix_memalign() is available since glibc 2.1.91.
V2: add check for return value of posix_memalign
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 10 Nov 2014 08:00:25 +0000 (16:00 +0800)]
BDW: Change the default tiling mode to TILING_Y on BDW.
TILING_Y's performance is better than TILING_X'S on BDW, but almost same
on IVB/HSW. Using the TILING_Y as default tiling mode temporary, still need
to find out the root cause why different behavior between BDW and IVB/HSW.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Mon, 10 Nov 2014 05:49:19 +0000 (13:49 +0800)]
add test for clCreateImageFromLibvaIntel
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Wed, 24 Sep 2014 23:45:35 +0000 (07:45 +0800)]
fix issue to create cl image from libva with non-zero offset
Beignet accepts buffer object name to share data between libva,
it is supposed to support to create cl image from the bo name
with a non-zero offset, but it does not work at some platforms.
The driver calls intel_bo_gem_create_from_name to retrieve the
dri_bo, and the offset of dri_bo is changed by the non-zero offset.
At some platforms, the change of the offset has side effect when
the kernel is executed again and so intel_bo_gem_create_from_name
is called for the second time.
So, do not change the offset of dri_bo, but maintain the non-zero
offset in cl_image, and use the non-zero offset until we fill the
surface state.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Mon, 10 Nov 2014 06:58:40 +0000 (14:58 +0800)]
fix a bug in clCompileProgram().
passing a binary program to clCompileProgram() should return
CL_INVALID_OPERATION.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Mon, 10 Nov 2014 06:58:40 +0000 (14:58 +0800)]
fix piglit clCreateProgramWithBinary fail.
the program should be deserialized and loaded when created from a
EXECUTABLE binary.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 10 Nov 2014 03:31:34 +0000 (11:31 +0800)]
GBE: Do topological sorting of basicblocks.
Toplogical sorting have two big advantages:
1. Sorted basicblocks will reduce unneccesary register pressure.
2. Sorted basicblocks will make liveness analysis easier.
This patch fix opencv failures:
./opencv_test_video --gtest_filter=OCL_OCL_Video/Mog2_Update.Accuracy/1
./opencv_test_imgproc --gtest_filter=OCL_ImageProc/Filter2D.Mat/482
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 10 Nov 2014 03:31:21 +0000 (11:31 +0800)]
GBE: Fix a bitcast from float vector to wide interger issue in legalize pass.
When bitcast from <4 x float> to i128, we should not use extractelement directly.
Instead, we cast <4 x float> to <4 x i32>, then use extractelement to get
individual element.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Fri, 7 Nov 2014 08:21:05 +0000 (16:21 +0800)]
add test for cl buffer created with CL_MEM_USE_HOST_PTR
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Guo Yejun [Fri, 7 Nov 2014 08:19:46 +0000 (16:19 +0800)]
enable CL_DEVICE_HOST_UNIFIED_MEMORY when userptr is supported
userptr is firstly checked at compile time with libdrm version, but
it does not ensure the system has such capability (for exmaple, with
old linux kernel), so also take a check at run time for the device info.
V2: add runtime check to see if userptr is really supported
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Guo Yejun [Fri, 7 Nov 2014 08:18:54 +0000 (16:18 +0800)]
support CL_MEM_USE_HOST_PTR with userptr for cl buffer
userptr is used to wrap a memory pointer (page aligned) supplied
by user space into a buffer object accessed by GPU, and so no extra
copy is needed. It is supported starting from linux kernel 3.16
and libdrm 2.4.58.
This patch is originally finished by Zhenyu Wang <zhenyuw@linux.intel.com>,
I did a little change and some code clean.
No regression issue found on IVB+Ubuntu14.10 with libdrm upgraded with tests:
beignet/utests, piglit, OpenCV/test&perf, conformance/basic&mem_host_flags&buffers
V2: add page align limit for data size, add comments for kernel without MMU_NOTIFIER
V3: add runtime check with host_unified_memory, return CL_MEM_OBJECT_ALLOCATION_FAILURE if failed
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Yang Rong [Fri, 7 Nov 2014 07:02:38 +0000 (15:02 +0800)]
BDW: Set the URB/REST size to 384K/384K when SLM disable.
If application switch between SLM enable and disable, will cause random fail.
The fail occure only when URB/REST partition changed when enable and disable SLM.
Set the same REST size when disable SLM to workaround.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
Ruiling Song [Thu, 6 Nov 2014 07:44:49 +0000 (15:44 +0800)]
utests: add a test to trigger cl_float3 bug in clSetKernelArg.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Thu, 6 Nov 2014 07:44:48 +0000 (15:44 +0800)]
GBE: Fix kernel argument size for vector3
per OpenCL Spec 1.2, see 6.1.5 Alignment of Types.
For 3-component vector data types, the size of the data type
is 4 * sizeof(component).
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Mon, 3 Nov 2014 08:26:17 +0000 (16:26 +0800)]
GBE: remove useless debug info.
This debug information is not useful now. Remove them to
avoid confusing.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Rebecca Palmer [Wed, 5 Nov 2014 04:27:16 +0000 (12:27 +0800)]
utests: fix bugs in builtin_tgamma().
This patch is based on Rebecca's patch at:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090.
And fixed another bug which we should not use an absolute error checking.
We should use ULP and considering the strict conformance or non strict
conformance state.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Rebecca Palmer [Wed, 5 Nov 2014 04:24:20 +0000 (12:24 +0800)]
utests: fix bugs in builtin_pow().
This patch is based on Rebecca's patch at:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090.
And fixed another bug which we should not use an absolute error checking.
We should use ULP and considering the strict conformance or non strict
conformance state.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Rebecca Palmer [Wed, 5 Nov 2014 05:11:22 +0000 (13:11 +0800)]
GBE: fix bug in tgamma().
tgamma is actually lgamma, a related but very different function.
This patch is from:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Rebecca Palmer [Wed, 5 Nov 2014 05:08:31 +0000 (13:08 +0800)]
GBE: fix bug in erf()/erfc().
erf/erfc diverge (instead of converging to 1 or 0) for arguments above
about 2.
This patch is from:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Rebecca Palmer [Wed, 5 Nov 2014 05:30:26 +0000 (13:30 +0800)]
GBE: fix bug in pow()/pown().
pow/pown ignore the sign of their first argument (e.g. pow(-2,3) gives
8 instead of -8)
This patch is from:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=Fix-pow-erf-tgamma.patch;att=3;bug=768090
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Wed, 5 Nov 2014 04:14:40 +0000 (12:14 +0800)]
GBE: Support more instructions for constant expression handling.
Add support for the following OPs:
FCmp/ICmp/FPToSI/FPToUI/SIToFP/UIToFP.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Wed, 5 Nov 2014 07:36:23 +0000 (15:36 +0800)]
GBE: fix a bool handling bug when SEL on a uniform bool variable.
If a SEL uses a bool variable which is a uniform bool, even
we can get a dag node within the same BB, we still need to
set the externFlag bit. The reason is that we don't know how
to generate a scalar physical flag.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Chuanbo Weng [Mon, 3 Nov 2014 08:15:39 +0000 (16:15 +0800)]
utest: add new test that trigger an assignment operation bug in if.
This test case shows that assignment operation in if block seems
does not affect lvalue.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Meng Mengmeng [Wed, 5 Nov 2014 23:47:18 +0000 (07:47 +0800)]
add building dependency GIT_SHA1
Signed-off-by: Meng Mengmeng <mengmeng.meng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Wed, 5 Nov 2014 03:10:40 +0000 (11:10 +0800)]
fix bswap kernel function type issue.
use MACRO to define the corresponding function.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 31 Oct 2014 10:41:11 +0000 (18:41 +0800)]
GBE: fix disassembly bug.
Those math attributes are not valid on Gen7/7.5.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 31 Oct 2014 02:22:17 +0000 (10:22 +0800)]
utests: replace the nodistriutable picture.
According to https://bugs.debian.org/758442, we should not use
Len(n)a standard test image in our package. I just select a picture
took by myself. Thanks Rebecca for pointing this out.
v2:
forgot to add sample.bmp.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 31 Oct 2014 02:04:50 +0000 (10:04 +0800)]
utest: change the box_blur_image to be identical to box_blur.
Change box_blur_image to read integer type surface thus
it could be totally identical to the box_blur thus they
can share the same reference image.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Luo Xionghu [Mon, 3 Nov 2014 22:42:36 +0000 (06:42 +0800)]
add utest function bswap.
this llvm instrincs bswap function is generated by calling
__builtin_bswap.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Mon, 3 Nov 2014 22:42:35 +0000 (06:42 +0800)]
enable llvm intrinsic call bswap function.
this intrinsic call is implemented at the GEN IR level currently,
should be optimazed later.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 4 Nov 2014 07:22:40 +0000 (15:22 +0800)]
GBE: Fix live range for temporary register in replaceReg
previously it is simply assigned as [insnID, insnID], But it is used in 2 instruction:
[1] MOV tmp, replacedReg
[2] send null, addr, tmp,...
As minID maxID is equal, it will be treated as temporary register during spill,
and no scratch memory allocated. But scratch register is allocated per instruction,
if tmp in [1] is assigned g100 from spill register pool, instruction [2] would have no idea of that,
it will assign another register like g102 from spill register pool. As no scratch memory allocated,
we cannnot do any spill/unspill operation between them.
To fix this issue, I change the liveness range according to isSrc, so spill/unspill
could work as normally.
This patch fix the cos16 test case failure in piglit under strict conformance.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 3 Nov 2014 07:52:25 +0000 (15:52 +0800)]
GBE: workaround register allocation fail caused by custom loop unroll.
As this issue only occurs under strict math, we disable custom
loop unroll if strict math is enabled.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Sun, 2 Nov 2014 21:22:04 +0000 (05:22 +0800)]
add utest for llvm intrinsic call usub_with_overflow funtion.
as llvm couldn't recognize the pattern of usub overflow, this usub
with is generated by calling the intrinsic function __builtin_usub_overflow;
also this type of uadd intrinsic funtion couldn't support short/byte type
overflow, we choose another way for the uadd kernel to generate short/byte
overflow.
will send patch to llvm later to fix the 2 issues.
v2: split the patch.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Sun, 2 Nov 2014 21:22:03 +0000 (05:22 +0800)]
enable llvm intrinsic call usub_with_overflow funtion.
implement usub_with_overflow without any optimization.
v2:
fix the wrong implementation:
change ctx.LT(dst0Type, overflow, dst0, src1)
to ctx.GT(unsignedType, overflow, dst0, src0)
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Andreas Beckmann [Fri, 31 Oct 2014 20:45:58 +0000 (21:45 +0100)]
use env to set environment variables for GBE_BIN_GENERATER
cmake interprets OCL_PCM_PATH=... as a command and will enclose it in
quotes in case it contains characters requiring protection, e.g. ~
a quoted "FOO=bar" is interpreted by /bin/sh as a command (that does not
exist), not a variable setting for a following command
use env to set the variables unambiguously
Signed-off-by: Andreas Beckmann <anbe@debian.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Andreas Beckmann [Fri, 31 Oct 2014 15:00:20 +0000 (16:00 +0100)]
fix some typos
Signed-off-by: Andreas Beckmann <anbe@debian.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhenyu Wang [Fri, 31 Oct 2014 07:11:59 +0000 (15:11 +0800)]
Remove intel_gpgpu_check_binded_buf_address()
On recent kernel with full PPGTT support, we can possibly bind buffer
offset with 0, but intel_gpgpu_check_binded_buf_address() always thinks
it's invalid, which is not true. So simply remove the check.
v2: Add comment on why this was added and leads to failure. Use FIXME tag
on issue that needs to be fixed otherwise.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 29 Oct 2014 08:31:48 +0000 (16:31 +0800)]
IVB/HSW/BYT: Revert the Dynamic state Base Addr and relative buffers address setting.
I have changed the setting of the curbe/sampler state/idrt buffer address, whith set
the Dynamic State Base Addr in intel_gpgpu_set_base_address and then set the relative
offset of these address when set these buffer address.
But it may cause some some image tests random fail in IVB/HSW/BYT.
Revert to the previous manner to set these address which always set absolute address.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 29 Oct 2014 07:37:38 +0000 (15:37 +0800)]
BDW: Fix bwd 32*32 scalar multiplication bug.
When scalar multiplication, must disable predicate and don't need specail handle.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Tested-by: Zhu, BingbingX <bingbingx.zhu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 20 Oct 2014 07:46:17 +0000 (15:46 +0800)]
Fix a size assert when setup bti.
Global constant buffer size is not align to 4 byte, will cause assert in BDW when set bti.
Per spec, the low two bits of surface state's width must be 11 if SURFACE_BUFFER's format is RAW.
Align the global constant buffer size to 4.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 27 Oct 2014 08:16:51 +0000 (16:16 +0800)]
Refine the intel gpgpu delete.
The intel gpgpu struct is destroyed when a new gpgpu struct needed. But in that time,
the command batch relative with the destroyed gpgpu may have not finish, and the resource
in gpgpu still used by gpgpu, can't be destroyed.
So, when delete a gpgpu, check the batch status, if have not complete, insert to list in intel driver,
and delete all finished gpgpu in that list.
V2: V1 assume all the gpgpu list would complete in order, but it is not true. The insert order in is the
gpgpu delete order, not the flush command batch order. So visit and delete all complete gpgpu in the list.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Mon, 27 Oct 2014 07:34:11 +0000 (15:34 +0800)]
Fix the compare_image_2d_and_1d_array test case bug
The test case use OCL_MAP_BUFFER_GTT to map the image
buffers and then do the result comparison, which may
cause problems.
On IVB and HSW, the slice pitch is equal but on BDW,
because we change the slice pitch of image array, it
cause this bug.
Modify it by using the standard clEnqueueReadImage API.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Mon, 27 Oct 2014 03:14:50 +0000 (11:14 +0800)]
add utest compiler_overflow for llvm intrinsic function.
this case only runs for uadd_with_over_flow function so far.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Mon, 27 Oct 2014 03:14:49 +0000 (11:14 +0800)]
add llvm Intrinsic call support.
so far this patch only support uadd_with_overflow.
sadd_with_overflow, usub_with_overflow, smul_with_overflow, umul_with_overflow, bswap are not supported yet.
this funtion should be implemented by carrier flag later.
v2: update comments.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 24 Oct 2014 09:55:26 +0000 (17:55 +0800)]
GBE: fix a wrong type of cl_device_info.
Per OpenCL spec 1.2:
CL_DEVICE_IMAGE_MAX_BUFFER_SIZE should be size_t type rather
than cl_ulong.
This bug will cause problems on i386 platform.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Zhigang Gong [Mon, 27 Oct 2014 00:32:44 +0000 (08:32 +0800)]
GBE: set default address space to -1 to avoid incorrect unroll hint.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng Mengmeng <mengmeng.meng@intel.com>
Zhenyu Wang [Thu, 23 Oct 2014 07:19:26 +0000 (15:19 +0800)]
Fix AUX buffer for page alignment
Apply ALIGN() for aux buffer size from beginning has no effect.
Move to the end of all state offsets set for page alignment.
v2: Update comments
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Zhenyu Wang [Thu, 23 Oct 2014 07:19:25 +0000 (15:19 +0800)]
Use pread/pwrite for buffer enqueue read/write
Instead of mmap, use pread/write interface for bo read/write with
optimized operations.
Result on one mem bandwidth benchmark for buffer enqueue read/write on HSW.
**** Host to device copy - workgroup_size=512
old code with mmap new code with pread/pwrite
WG=512 SZ= 1 KiB 2325 MB/s 3336 MB/s
WG=512 SZ= 2 KiB 4479 MB/s 6270 MB/s
WG=512 SZ= 4 KiB 8215 MB/s 11808 MB/s
WG=512 SZ= 8 KiB 14271 MB/s 19493 MB/s
WG=512 SZ= 16 KiB 16456 MB/s 20079 MB/s
WG=512 SZ= 32 KiB 21136 MB/s 22334 MB/s
WG=512 SZ= 64 KiB 24785 MB/s 24792 MB/s
WG=512 SZ=128 KiB 24590 MB/s 24908 MB/s
WG=512 SZ=256 KiB 17928 MB/s 21435 MB/s
WG=512 SZ=512 KiB 18346 MB/s 20583 MB/s
WG=512 SZ= 1 MiB 18558 MB/s 20808 MB/s
WG=512 SZ= 2 MiB 18582 MB/s 20939 MB/s
WG=512 SZ= 4 MiB 15382 MB/s 18230 MB/s
WG=512 SZ= 8 MiB 7737 MB/s 11558 MB/s
WG=512 SZ= 16 MiB 7073 MB/s 8962 MB/s
WG=512 SZ= 32 MiB 6984 MB/s 8302 MB/s
WG=512 SZ= 64 MiB 6938 MB/s 8308 MB/s
WG=512 SZ=128 MiB 6950 MB/s 8271 MB/s
WG=512 SZ=256 MiB 6941 MB/s 8034 MB/s
**** Device to host copy - workgroup_size=512
old code with mmap new code with pread/pwrite
WG=512 SZ= 1 KiB 2165 MB/s 2596 MB/s
WG=512 SZ= 2 KiB 4453 MB/s 6154 MB/s
WG=512 SZ= 4 KiB 8211 MB/s 11198 MB/s
WG=512 SZ= 8 KiB 14001 MB/s 11319 MB/s
WG=512 SZ= 16 KiB 16218 MB/s 14394 MB/s
WG=512 SZ= 32 KiB 21310 MB/s 22438 MB/s
WG=512 SZ= 64 KiB 24738 MB/s 25237 MB/s
WG=512 SZ=128 KiB 22784 MB/s 25113 MB/s
WG=512 SZ=256 KiB 18018 MB/s 22265 MB/s
WG=512 SZ=512 KiB 18429 MB/s 21687 MB/s
WG=512 SZ= 1 MiB 18654 MB/s 21856 MB/s
WG=512 SZ= 2 MiB 18655 MB/s 21998 MB/s
WG=512 SZ= 4 MiB 15098 MB/s 17563 MB/s
WG=512 SZ= 8 MiB 7737 MB/s 11170 MB/s
WG=512 SZ= 16 MiB 6989 MB/s 8812 MB/s
WG=512 SZ= 32 MiB 6891 MB/s 7597 MB/s
WG=512 SZ= 64 MiB 6868 MB/s 7798 MB/s
WG=512 SZ=128 MiB 6838 MB/s 7630 MB/s
WG=512 SZ=256 MiB 6842 MB/s 7466 MB/s
The benchmark can be got from http://cgit.freedesktop.org/~zhen/MPBenchmarks/.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhenyu Wang [Thu, 23 Oct 2014 07:19:23 +0000 (15:19 +0800)]
Clear batch buffer pointer after unmap
Fix libdrm warning about unmap the unmapped buffer, as when
batch terminate we will try to do unmap again, but batch has
already been unmapped.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhenyu Wang [Thu, 23 Oct 2014 07:19:22 +0000 (15:19 +0800)]
Make use of write enable flag for mem bo map
Use drm/intel optimization for mem bo mapping in case of read or write.
So we could be possibly waiting less.
This also adds 'map_flags' check in clEnqueueMapBuffer/clEnqueueMapImage
for actual read or write mapping.
But currently leave clMapBufferIntel untouched which might break ABI/API.
v2: Fix write_map flag in clEnqueueMapBuffer/clEnqueueMapImage.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: "Guo, Yejun" <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 24 Oct 2014 01:55:34 +0000 (09:55 +0800)]
GBE: handle dead loop BBs in liveness analysis.
Considering the following CFG, our previous liveness analysis will only
back traverse from the exit point BB, thus the BB 6 and 7 will not be handled.
4---
| |
| |
5 --|---> 10 ---> ret
| |
| |
6<--
|
|<--
7 |
| |
---
Although the CFG looks not a normal application as once it goes to
block 6, it will enter a dead loop and will never return, we still need
to compile it successfully.
This patch is to fix the bug at:
https://bugs.freedesktop.org/show_bug.cgi?id=85362
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Meng Mengmeng [Fri, 24 Oct 2014 19:10:02 +0000 (03:10 +0800)]
create GIT_SHA1 without any dependency
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Thu, 23 Oct 2014 05:09:00 +0000 (13:09 +0800)]
Add the disasm support for Gen8
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 23 Oct 2014 06:22:38 +0000 (14:22 +0800)]
GBE: fix regression caused by simple block optimization.
Almost all 64bit related instructions and unaligned load
instruction should be complex instruction. We need to exclude
them from simple block.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Xionghu Luo <xionghu.luo@intel.com>
Meng Mengmeng [Thu, 23 Oct 2014 01:09:21 +0000 (09:09 +0800)]
add beignet GIT_HAL1 if there is .git directory
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 21 Oct 2014 13:03:46 +0000 (21:03 +0800)]
GBE: add basic PHINode support in legalize pass.
Lack of the incomplete PHINode support currently.
After this patch, we could fix all the crash cases in piglit.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Zhigang Gong [Tue, 21 Oct 2014 13:00:23 +0000 (21:00 +0800)]
GBE: increase maximum src/dst operands to 32.
As we may bitcast a <16 * i64> to/from <32 * i32> due to
the legalize pass, we have to increase the maximum operands
number to 32 and fix some assertions accordingly.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Zhigang Gong [Tue, 21 Oct 2014 08:04:28 +0000 (16:04 +0800)]
GBE: add Selection instruction handler at legalize pass.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Junyan He [Wed, 22 Oct 2014 07:51:06 +0000 (15:51 +0800)]
Add the test case for image 2d array fill
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Wed, 22 Oct 2014 07:50:59 +0000 (15:50 +0800)]
Add the test case for image 1d array fill
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 21 Oct 2014 13:02:27 +0000 (21:02 +0800)]
Fix the bug of 1D array slice pitch
For BDW, the vertical align is 4 at least.
This cause the slice pitch twice as big as
the Gen7 for 1D buffer array.
Because the buffer tiling alignment may change
for different GENs, we move it from run time to
intel driver.
V2:
Fix all the bugs about 1d and 2d image array.
And delete the tile align size which is useless.
Also integrate two image array test cases into
this patch set.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Fri, 17 Oct 2014 03:33:23 +0000 (11:33 +0800)]
use global flag 0.0 to control unstructured simple block.
filter the simple block out and replace the if/endif with global flag
to control.
v2: fix the luxmark sala performance degression due to extern flag in a
BRA instruction.
v3: fix compiler_switch regression, LOAD/STORE instruction could
call replaceSrc/replaceDst to generate 2 extra MOV instruction; exclude
the scalar instructions since they don't have prediction.
this patch is somewhat dangerous to change the instruction structure of block,
will add sanity check after emitInstructionStream to assert if illegally
modified.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Mon, 20 Oct 2014 01:32:13 +0000 (09:32 +0800)]
GBE: disable custom loop unroll for LLVM 3.3/3.4.
To fix a build error with LLVM 3.3/3.4.
v2:
should include llvm-config before check llvm version.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Wed, 8 Oct 2014 04:58:59 +0000 (12:58 +0800)]
GBE: Add a customized loop unrolling handling mechanism.
By default, the unrolling threshold is relatively small.
Thus some relative large loops which access private array
will not be unrolled, thus those private array can't
be scalarized latter. And the private array is allocated
in stack which is extreme slow for Gen backend currently.
To increase the unrolling threshold for all loops is not
a good idea, as most of the loops don't need to do unrolling
for this purpose and a large unrolling threshold will cause
a big code size and unecessary big register pressure which
may lead to register spilling.
So this patch introduce a trade-off pass to identify those
loops which still have private load/store in the outer most
of the loop. Then add a metadata to it to indicate aggresive
unrolling on those loops. Then do another round loop unrolling.
This patch with the previous small patch, can bring significant
performance improvement for some cases. I just tested with some
opencv test cases, and observed it can bring 2x to 10x improvement.
v2:
refine the parent loop unroll analysis method.
v3:
disable this pass for LLVM 3.3/3.4.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 17 Oct 2014 04:10:15 +0000 (12:10 +0800)]
GBE: fix a legalize pass bug when bitcast wide integer to incompaitble vector.
Our wide integer legalize pass will assume the source type of a wide
integer must be the same as the final use type. But this is not always
true. The following case is a real example:
%conv.i.i.14 = sext i8 %usrc0.i.sroa.0.14.extract.trunc to i32
%call.i.i2.14 = tail call i32 @__gen_ocl_abs(i32 %conv.i.i.14) #5
%conv1.i.i.14.mask = and i32 %call.i.i2.14, 255
%uret.i.sroa.0.14.insert.ext = zext i32 %conv1.i.i.14.mask to i128
%uret.i.sroa.0.14.insert.shift = shl nuw nsw i128 %uret.i.sroa.0.14.insert.ext, 112
......
%uret.i.sroa.0.15.insert.mask = or i128 %uret.i.sroa.0.14.insert.mask.masked, %uret.i.sroa.0.14.insert.shift
%uret.i.sroa.0.15.insert.insert = or i128 %uret.i.sroa.0.15.insert.mask, %uret.i.sroa.0.15.insert.shift
%2 = bitcast i128 %uret.i.sroa.0.15.insert.insert to <16 x i8>
The wide integer i128 %uret.i.sroa.0.16.insert.insert is from an
i32 integer %conv1.i.i.14.mask. But the use of it is i8 vector
which breaks our assumption.
According to ruiling's good suggestion, we always bitcast the wide integer to
a compatible vector, take the above example:
%3 = bitcast i128 %uret.i.sroa.0.15.insert.insert to <4 x i32>
Then insert a bit cast instruction to convert it to the original destination
%2 = bitcast <4 x i32> %3 to <16 x i8>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Fri, 17 Oct 2014 03:04:16 +0000 (11:04 +0800)]
Refine the the error handling in function cl_command_queue_ND_range_gen7.
Return error code when work group error, and remove printf and exit(-1) when internal error..
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 16 Oct 2014 03:07:37 +0000 (11:07 +0800)]
Fix a HSW regression.
slmoffset use as DWORD register, but declare WORD, fix it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 30 Sep 2014 04:16:34 +0000 (12:16 +0800)]
GBE: do intrinsics lowering pass earlier.
After the intrinsics lowering pass, it may introduce some new loops
which could be optimized nicely with the previous loop unrolling and
SROA PASS which especially may elminiate private data access.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Thu, 16 Oct 2014 08:51:12 +0000 (16:51 +0800)]
Fix a upsample regression.
In GenEncoder, unpack is not good, so move the upsample int/short from GenEncoder to instruction selection.
Still handle upsample long in GenContext.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 16 Oct 2014 07:11:02 +0000 (15:11 +0800)]
BDW: Also need set Shader Channel Select for constant buffer in BDW.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 16 Oct 2014 07:10:35 +0000 (15:10 +0800)]
BDW: Fix load/store half error.
BDW support HF data type, so use mov directly to convert between Half Float/Float.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 15 Oct 2014 08:26:14 +0000 (16:26 +0800)]
BDW: Change BDW's max work group size to 512.
Opencv only query and use device max work group size, when SLM/Barrier enable, BDW
can't fill 1024 work group in one subslice, even in SIMD16. Change device's max work
group size temp.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 15 Oct 2014 06:39:44 +0000 (14:39 +0800)]
Fix a HSW constant buffer regression.
HSW's constant buffer read use ld message, also need set Shader Channel Select in surface state.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Zhigang Gong [Wed, 15 Oct 2014 05:18:12 +0000 (13:18 +0800)]
GBE: fix a bug in legalize pass.
The Shr/Shl llvm instructions require to have identical type for
the two operands, so we can't set the shift count to i32 always.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Zhu, BingbingX <bingbingx.zhu@intel.com>
Junyan He [Tue, 14 Oct 2014 07:52:04 +0000 (15:52 +0800)]
Fit the printf bug in loop
The static analyse for printf can not totally work
when the printf inst is within the loop and the loop
can not be unrolled. This causes the printf just to
print one info for a loop and to lose all the others.
We now increment the exec number every time the printf
inst is triggered. The number is stored for output all
the message later.
The problem is that we can not caculate the exact loops
number for each printf inst. The wrong loop number will
cause the data overwritten. We now assume all the printf
inst are in loop and store the data like this:
| PRINTF1_DATA PRINTF2_DATA ... | PRINTF1_DATA PRINTF2_DATA ... | ...
| DATA_LOOP_ONE | DATA_LOOP_TWO | ...
Although this may cause some space wasted.
Another problem is that we need to decide the size of printf buffer
because the loop upbound can not be caculated. We just set
it yo 1M for small info slot request and 4M for big one.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Tue, 14 Oct 2014 07:18:39 +0000 (15:18 +0800)]
Fix HSW thread_n <= 64 assert.
In function cl_get_kernel_max_wg_sz, hsw's thread count may large than 64,
add a max limit.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo Xionghu [Tue, 14 Oct 2014 00:08:46 +0000 (08:08 +0800)]
add utest popcount for all types.
v2: add all types to test.
v3: fix signed type count bits error.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Fri, 10 Oct 2014 03:05:04 +0000 (11:05 +0800)]
add opencl-1.2 builtin function popcount.
the popcount function returns the number of non-zero bits in input.
use GEN instruction cbit(Count Bits Set) to implement it.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 14 Oct 2014 05:58:02 +0000 (13:58 +0800)]
Let the failure of intel_drm lib's check as a FATAL_ERROR
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>