Yang Rong [Mon, 28 Oct 2013 06:02:17 +0000 (14:02 +0800)]
Re-build the program when build option changed.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Mon, 28 Oct 2013 06:02:16 +0000 (14:02 +0800)]
Remove CL_FP_DENORM in clGetDeviceInfo.
IVB don't support single float denorm, so compiler option -cl-denorms-are-zero should ingore.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Mon, 28 Oct 2013 06:02:15 +0000 (14:02 +0800)]
Add preprocessor #define that match the extension name string.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Yi Sun [Thu, 24 Oct 2013 07:56:58 +0000 (15:56 +0800)]
utest: add test case for builtin function exp/exp2/exp10/expm1.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Yangwei Shui <yangweix.shui@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yi Sun [Mon, 9 Sep 2013 08:55:47 +0000 (16:55 +0800)]
utest: Add test case for built-in function pow.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 21 Oct 2013 08:22:09 +0000 (16:22 +0800)]
support saturated-rounding converting
passed piglit test case:
piglit/bin/cl-program-tester tests/cl/program/execute/vector-conversion.cl
version 2:
contains updating of ocl_convert.h
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Fri, 18 Oct 2013 05:47:54 +0000 (13:47 +0800)]
support converting with rounding mode
support built-in functions of converting with rounding mode,
such as "convert_DSTTYPE_rtz, rte, rtp, rtn".
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Mon, 28 Oct 2013 02:45:07 +0000 (10:45 +0800)]
initialize GenRegister::subphysical
GenRegister::subphysical should have same init value as GenRegister::physical
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 24 Oct 2013 02:44:41 +0000 (10:44 +0800)]
add scalar type builtin function "dot"
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
Homer Hsing [Mon, 21 Oct 2013 01:12:42 +0000 (09:12 +0800)]
implement __builtin_* functions
backend does not support __builtin_* functions,
so they are implemented in ocl_stdlib.tmpl.h
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Tue, 22 Oct 2013 07:06:32 +0000 (15:06 +0800)]
Bump to version 0.3.
Also update some documents accordingly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 21 Oct 2013 08:30:47 +0000 (16:30 +0800)]
add a semaphore for clang lib
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Mon, 21 Oct 2013 11:12:56 +0000 (19:12 +0800)]
Disable instrucion schedule temp.
If enable schedule, will cause fails. Will enable it after fix these fails.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Chuanbo Weng [Tue, 22 Oct 2013 09:11:56 +0000 (17:11 +0800)]
Fix two memory leak.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Ruiling Song [Tue, 22 Oct 2013 04:02:56 +0000 (12:02 +0800)]
GBE: Fix a bo->offset assert
scratchSize was missed in the binary, which will cause a random value
when kernel is loaded from binary. add it in the binary format.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Yang Rong [Mon, 21 Oct 2013 11:12:55 +0000 (19:12 +0800)]
Fix zeroinitializer load/store vector assert.
After move the newValueProxy of vector load/store to genWriter pass, genWriter will
get ConstantAggregateZero of load/store vector with zeroinitializer. In function processConstant,
don't handle correct type of ConstantAggregateZero, cause assert. Add the types handle.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Yang Rong [Mon, 21 Oct 2013 07:20:41 +0000 (15:20 +0800)]
Add a test for vector argument deallocate assert.
V2: Add result compare.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Yang Rong [Fri, 11 Oct 2013 05:50:08 +0000 (13:50 +0800)]
Change -O3 to -O2 again because my previous change's typo.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Yang Rong [Fri, 11 Oct 2013 05:50:07 +0000 (13:50 +0800)]
Refine vector register deallocate.
Split vector registers block, so can free them seperate.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Yang Rong [Fri, 11 Oct 2013 05:50:06 +0000 (13:50 +0800)]
Fix a vector argument deallocate assert.
Vector argument will allocate together but deallocate sepelate, when deallocate
will assert. Split the each allocatedBlock in register partitioner to fix it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Yang Rong [Mon, 21 Oct 2013 07:47:56 +0000 (15:47 +0800)]
Add more type for async copy test case.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 21 Oct 2013 07:26:02 +0000 (15:26 +0800)]
use int64_t to express "long" in a test case
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Mon, 21 Oct 2013 07:48:13 +0000 (15:48 +0800)]
runtime: Simply return success in clUnloadCompiler.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Fri, 18 Oct 2013 07:11:29 +0000 (15:11 +0800)]
GBE: Handle all-zero constant.
Also refine Undef value support.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Fri, 18 Oct 2013 05:38:59 +0000 (13:38 +0800)]
support vectorized saturated converting builtin functions
version 2: skip convert_float_sat(*)
version 3:
scalar converting is moved from "ocl_stdlib.tmpl.h" to "gen_convert.sh",
because scalar converting should be before vectorized version.
"ocl_convert.h" is updated.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Fri, 18 Oct 2013 02:26:56 +0000 (10:26 +0800)]
support saturated converting from narrower type to wider type
This patch supports saturated converting from narrower type to wider type.
It simply returns the parameter.
version 2: not need convert_float_sat(*)
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Fri, 18 Oct 2013 02:26:55 +0000 (10:26 +0800)]
support saturated converting from 64-bit int
This patch supports saturated converting from 64-bit int to shorter int,
and from 32-bit float to 64-bit int.
This patch also contains test case.
version 2: ulong had been declared in some platform
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 18 Oct 2013 05:29:12 +0000 (13:29 +0800)]
Runtime: correct some image related maximum values for IVB.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Thu, 17 Oct 2013 05:36:56 +0000 (13:36 +0800)]
Add test case for newValueProxy of InsertElementInst.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 17 Oct 2013 05:36:55 +0000 (13:36 +0800)]
Remove newValueProxy from scalarize pass to genWriter pass.
If call newValueProxy in scalarize pass, the realValue maybe been deleted by
the following pass, cause assert. Move to genWriter pass, can fix this bug and
make code more clean.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Thu, 17 Oct 2013 05:11:05 +0000 (13:11 +0800)]
add clCreateImageFromLibvaIntel() api
We can pass in libva's buffer object with other info and then create an image
in our CL code.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Thu, 17 Oct 2013 05:11:01 +0000 (13:11 +0800)]
add clCreateBufferFromLibvaIntel() api
We can pass in libva's buffer object name and then create the cl buffer from
it, thus we can share the buffer between libva and our opencl.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 18 Oct 2013 02:19:57 +0000 (10:19 +0800)]
Implement the CL api for clGetEventProfilingInfo
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 18 Oct 2013 02:19:51 +0000 (10:19 +0800)]
Using the PIPE_CONTROL to implement get time stamp in gen backend
We use PIPE_CONTROL to get the time stamps from GPU just after batch
start and before batch flush. Using the first one the caculate the
CL_PROFILING_COMMAND_START time and uing the second one to caculate
the CL_PROFILING_COMMAND_END time.
There are 2 limitations here:
1. Then end time stamp is just before the FLUSH, so the Flush time
is not included, which will cause to lose the accuracy. Because
the we do not know which event will be used to do the profling
when it is created, adding another flush for end time stamp may
add some overload.
2. The time of CPU and GPU can not be sync correctly now. So the
time of CL_PROFILING_COMMAND_QUEUED and CL_PROFILING_COMMAND_SUBMIT
which happens on CPU side can not be caculated correctly with the
same base time of GPU. So we just simplely set them to
CL_PROFILING_COMMAND_START now. For the Event not involving GPU
operations such as ReadBuffer, all the times are 0 now.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Ruiling Song [Wed, 16 Oct 2013 07:38:08 +0000 (15:38 +0800)]
utests: add test cases for function call.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Wed, 16 Oct 2013 07:38:07 +0000 (15:38 +0800)]
GBE: Skip non-kernel functions in backend passes.
As non-kernel functions hit many assert in the backend, simply
skip them as we already inline all function calls.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Wed, 16 Oct 2013 07:38:06 +0000 (15:38 +0800)]
GBE: Inline all function calls.
use an extra large value for llvm flag -inline-threshold to inline all functions.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Tue, 15 Oct 2013 10:39:54 +0000 (18:39 +0800)]
Add type long/ulong/double's async copy.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Tue, 15 Oct 2013 10:36:03 +0000 (18:36 +0800)]
Fix a read64/write64 schedule bug.
Set the read64/write64 correct data type, otherwise, the dependency will wrong.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 11 Oct 2013 02:43:41 +0000 (10:43 +0800)]
Delete the redundant intel_batchbuffer_t init in intel_gpgpu_new
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 10 Oct 2013 07:13:51 +0000 (15:13 +0800)]
GBE: Update program binary format.
1. Remove useless 'reg' field of constant.
2. Add slmSize for local variables defined in kernel function.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 10 Oct 2013 07:13:50 +0000 (15:13 +0800)]
GBE: Support local variable inside kernel function.
As Clang treat local variable in similar way like global constant,
(they are treated as Global variable in each own address space)
we refine the previous constant implementation in order to
share same code between local variable and global constant.
We will allocate an address register for each GlobalVariable
(constant or local) through calling newRegister().
In later step, through getRegister() we will get a proper
register derived from the allocated address register.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Tue, 24 Sep 2013 02:10:46 +0000 (10:10 +0800)]
support LLVM 3.4
LLVM 3.3 or earlier version don't support unary addition of vectors,
such as "++ int2". This patch supports LLVM 3.4.
Tested by PIGLIT, no regression.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Wed, 9 Oct 2013 07:55:43 +0000 (15:55 +0800)]
Add the test case for clEnqueueCopyBuffer
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Thu, 10 Oct 2013 04:28:47 +0000 (12:28 +0800)]
Implement the clEnqueueCopyBuffer API using internal binary kernel
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Thu, 10 Oct 2013 04:28:41 +0000 (12:28 +0800)]
Add the internal used kernels for buffer copy
Add internal used kernels for buffer copy. The align
1 4 16 is seperated into three kernels to improve
performance. The CMakeList is also updated.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Wed, 9 Oct 2013 07:55:23 +0000 (15:55 +0800)]
Add the string format support for gbe_bin_generater
The string format of kernel serializaion will be useful for
generating the code for internal kernel binary load in.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Thu, 10 Oct 2013 03:44:27 +0000 (11:44 +0800)]
Implement api clCreateKernelsInProgram.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Thu, 10 Oct 2013 03:15:56 +0000 (11:15 +0800)]
utests: put compiler_vector_inc into known issue list.
because ++/-- need LLVM3.4
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 10 Oct 2013 02:13:41 +0000 (10:13 +0800)]
saturated conversion of native GPU data type, larger to narrower
This patch supports saturated conversion of
native GPU data type (char/short/int/float),
from a larger-range data type to a narrower-range data type.
For instance, convert_uchar_sat(int)
Several test cases are in this patch.
v2: add uint->int, int->uint
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Wed, 9 Oct 2013 08:14:48 +0000 (16:14 +0800)]
fix isnan (builtin function)
this patch passes following piglit test case
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/relational/builtin-float-isnan-1.0.generated.cl
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Wed, 9 Oct 2013 08:44:47 +0000 (16:44 +0800)]
Add some preprocessor macros __IMAGE_SUPPORT__ and __FAST_RELAXED_MATH__ define.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 9 Oct 2013 06:36:27 +0000 (14:36 +0800)]
Remove blocking asserts in clEnqueueXXX apis.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 9 Oct 2013 06:36:26 +0000 (14:36 +0800)]
Change optimize level to -O2, to avoid loopunswitch opt.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 9 Oct 2013 06:34:41 +0000 (14:34 +0800)]
GBE: sampler_t should always be a const int.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Simon Richter [Sat, 28 Sep 2013 04:39:50 +0000 (06:39 +0200)]
ICD dispatch table must be first
The ICD loader expects the first member of any dispatchable object to be
the dispatch table.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 25 Sep 2013 09:00:16 +0000 (17:00 +0800)]
GBE: Refine the curbe entry allocation for sampler/image information.
After the previous patch, we can move the image infomation curbe
entry allocation to prior to the instruction selection.
Then we can concentrate all curbe allocation before we do the
normal register allocation. This way can bring two advantages:
1. Avoid the image information curbe entry is allocated among the normal registers.
2. The register interval analyzing could handle the image/sampler information correctly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Wed, 25 Sep 2013 03:20:45 +0000 (11:20 +0800)]
GBE: refact the curbe register payload allocation.
As we already handle all the used curbe registers when we build
the patchlist. We can easily create a set to store all the required
curbe registers, and then latter at register allocation stage, we
can easily insert the registers in that set without any other checking.
This way, at register allocation stage, we don't need to know anything
about those CURBE magic number. We only need to use the virtual register
as key naturally. This make the code a little bit clearer.
And most important, this change is to support dynamic curbe register
allocation. For example, the image attributes. Each image may have 6 DWs,
and we may have many images but only access part of the image and part
of the image attributes. So we can't just simply allocate a special
register for all the image attributes. We need to dynamic allocate
curbe registers on demand. So the previous implementation is not
satisfy this requirment. So I have to make this change.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Tue, 24 Sep 2013 11:34:44 +0000 (19:34 +0800)]
GBE: Fix the out-of-box checking for normalized coord clamping.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Wed, 25 Sep 2013 10:26:49 +0000 (18:26 +0800)]
GBE/Runtime: implement workaround for IVB sampler bug
Per IVB spec,
If the surface format of the associated surface is UINT or SINT,
the Surface Type cannot be SURFTYPE_3D or SURFTYPE_CUBE and Address
Control Mode cannot be CLAMP_BORDER or HALF_BORDER.
Besides this bug, there is another undocumented issue. If a surface
data type is IEEE float. Then when we use sampler to sample the pixel,
if the value is betweeo -1p-20 to 0, the sampler will rounding it to
zero. And this will also bring problem when we are using the clamp mode.
This patch is to workaround the above two hardware issues.
It introduces a new intrinsic get_sampler_info to get a sampler type
at runtime. When calling to read_image, it will check whether it
hits the above two cases. If it hit case 1, then we will force it to
use clamp to edge for those pixels within the box, And for those
pixel out of the box, we manually set the border color. To achieve this
solution, we have to prepare two sampler slot for each CL_ADDRESS_CLAMP
sampler. And the first has slot_1 which is using CL_ADDRESS_CLAMP,
the second use slot_1 + 8. Thus we can only use half of 16 samplers.
Fortunately, 8 samplers comply with the OpenCL's minimal requirement.
If it hits case 2, then we minor a epsilon to the coordinate, and
let it not rounds to zero.
If possible, programer should avoid to use float coordinates and/or int/uint
format image. Otherwise, it will hit the very slow path.
With this workaround, the compiler_copy_image1 can pass now.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Wed, 25 Sep 2013 09:44:47 +0000 (17:44 +0800)]
clCopyImage: fix up all the surface type to int type.
Per opencl spec, use read_imagei on a float image may cause
undefined behaviour. We fix up all type to int type.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Mon, 23 Sep 2013 01:11:58 +0000 (09:11 +0800)]
support 64-bit division and remainder
support both unsigned and signed type,
and division ("/") and remainder ("%") arithmetic
tested by piglit
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Homer Hsing [Sun, 22 Sep 2013 06:18:03 +0000 (14:18 +0800)]
add 64-bit version of "sub_sat"
passed PIGLIT test cases:
bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-sub_sat-1.0.generated.cl
bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-sub_sat-1.0.generated.cl
version 2:
temp flag register is allocated by RA now
version 3:
subnr of temp flag reg is divided by typesize
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Homer Hsing [Sun, 22 Sep 2013 06:18:02 +0000 (14:18 +0800)]
support 64-bit version "add_sat"
tested by piglit:
piglit/bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-add_sat-1.0.generated.cl
piglit/bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-add_sat-1.0.generated.cl
version 2:
temp flag register is now allocated by RA
version 3:
divide subnr of temp flag reg by typesize
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Homer Hsing [Sun, 22 Sep 2013 06:18:01 +0000 (14:18 +0800)]
add 64-bit version of "mad_sat"
tested by piglit:
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-mad_sat-1.0.generated.cl
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-mad_sat-1.0.generated.cl
version 2:
temp flag register is allocated by RA
version 3:
divide subnr of flag register by typesize
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Homer Hsing [Thu, 26 Sep 2013 05:42:46 +0000 (13:42 +0800)]
add 64-bit version of "mul_hi"
passed piglit test cases:
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-mul_hi-1.0.generated.cl
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-mul_hi-1.0.generated.cl
version 2:
temp flag register is allocated by RA
version 3:
divide subnr of flag register by typesize
version 4:
fix a typo
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Homer Hsing [Mon, 23 Sep 2013 07:19:42 +0000 (15:19 +0800)]
fix 64bit writing
fix 64bit writing when data register is scalar
this patch make some piglit test case pass
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 23 Sep 2013 06:57:45 +0000 (14:57 +0800)]
refactor the api of intel_driver_share_buffer
so that we can use this API in later patches for the integration of opencl and libva.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 23 Sep 2013 07:06:27 +0000 (15:06 +0800)]
fix the missing assignment for offset
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 23 Sep 2013 07:06:23 +0000 (15:06 +0800)]
we should check the 'err' parameter
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 23 Sep 2013 06:04:08 +0000 (14:04 +0800)]
Remove global offset need divide by local size restriction.
Set to global offset to 0 in walker, and add global offset when get_global_id.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 23 Sep 2013 05:16:24 +0000 (13:16 +0800)]
Fix clEnqueueMapImage error.
Correct map size calc and remove ptr + offset because has done in _cl_map_mem.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
Homer Hsing [Sun, 22 Sep 2013 05:58:04 +0000 (13:58 +0800)]
64-bit-int: allocate flag register by RA
64-bit integer arithmetic now uses flag register allocated by register
allocator.
version 2: divide subnr by typesize
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Tue, 24 Sep 2013 07:39:36 +0000 (15:39 +0800)]
GBE: Fix a constant bug which over-write memory.
Previously it will always write 8 byte no matter what size of integer.
Fix it by only copying necessary data.
Reported by Homer Hsing.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Tue, 24 Sep 2013 03:11:21 +0000 (11:11 +0800)]
fix scalarizing of llvm phi node
llvm phi node can have odd number of args.
this patch also contains a test case.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 18 Sep 2013 09:09:29 +0000 (17:09 +0800)]
Unmap the cl_mem in driver when application map a cl_mem and release without unmap.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 17 Sep 2013 09:35:29 +0000 (17:35 +0800)]
Refine cmake script file.
Remove GBM which is not needed. Adjust the header file including
sequence to avoid including incorrect cl header file when compile
with mesa source code package.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Wed, 18 Sep 2013 07:01:44 +0000 (15:01 +0800)]
Fix store undef value assert.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 13 Sep 2013 10:29:30 +0000 (18:29 +0800)]
GBE: fixed the store3 bug.
As the llvm will convert a type3 pointer to a type4 pointer
completely, we can't check whether a store is a type3 or type4
We have to do this in the front end.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 13 Sep 2013 08:54:12 +0000 (16:54 +0800)]
Runtime: Implement CL_MEM_USE_HOST_PTR flag for image.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 13 Sep 2013 08:52:30 +0000 (16:52 +0800)]
Runtime: prepare for CL_MEM_USE_HOST_PTR for image support.
To support CL_MEM_USE_HOST_PTR for image, we need to add back
those removed data element.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 13 Sep 2013 05:56:54 +0000 (13:56 +0800)]
Utests: refine the previous fake 3D test cases.
All the previous 3D test cases are only using depth 1, and not
really touch the 3D read/write code path. Now fix them.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Fri, 13 Sep 2013 05:56:17 +0000 (13:56 +0800)]
Runtime/driver : implement 3D image support.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Fri, 13 Sep 2013 05:52:26 +0000 (13:52 +0800)]
GBE: fixed the broken 3d image support.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Fri, 6 Sep 2013 05:43:07 +0000 (13:43 +0800)]
GBE: check the correct register for whether coord z exists.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Fri, 6 Sep 2013 05:43:06 +0000 (13:43 +0800)]
Runtime: enable border color state support.
Also fix the wrong clamp mode for CL_ADDRESS_CLAMP.
According to Gen Bspec, when the surface format is
int/uint, it doesn't support clamp border. We need
to workaround it in the kernel side.
v2: move compiler_copy_image1 to the have issue utest set.
As this patch can really enable to use clamp to border
mode for a int/uint surface. We have issues for this
combination. Need to be fixed.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Fri, 6 Sep 2013 05:43:05 +0000 (13:43 +0800)]
Runtime: fix a bug when set sampler value.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Fri, 6 Sep 2013 05:43:04 +0000 (13:43 +0800)]
Runtime: disable some unecessary image formats.
Per OpenCL, the minimum list of supported format is as below:
CL_RGBA:
CL_UNORM_INT8
CL_UNORM_INT16
CL_SIGNED_INT8
CL_SIGNED_INT16
CL_SIGNED_INT32
CL_UNSIGNED_INT8
CL_UNSIGNED_INT16
CL_UNSIGNED_INT32
CL_HALF_FLOAT
CL_FLOAT
CL_BGRA:
CL_UNORM_INT8
Let's only support this type and CL_R currently.
Also removed an unnecessary assertion. And fix the CL_Rx's type size.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Ruiling Song [Wed, 18 Sep 2013 02:18:43 +0000 (10:18 +0800)]
utests: add more constant test cases for composite type.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 18 Sep 2013 02:18:42 +0000 (10:18 +0800)]
GBE: Support composite type constant.
struct/vector/array of vector/struct of array/array of struct.
Also fix a bug 'constant index into constant array get wrong result'
brought in by patch 'Fix non-4byte program global constant issue'.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
Yang Rong [Tue, 17 Sep 2013 08:10:01 +0000 (16:10 +0800)]
Implement clEnqueueMarker and clEnqueueBarrier.
Add some event info to cl_command_queue.
One is non-complete user events, used to block marker event and barrier.
After these events become CL_COMPLETE, the events blocked by these events also
become CL_COMPLETE, so marker event will also set to CL_COMPLETE. If there is no
user events, need wait last event complete and set marker event to complete.
Add barrier_index, for clEnqueueBarrier, point to user events, indicate the enqueue
apis follow clEnqueueBarrier should wait on how many user events.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Tue, 17 Sep 2013 08:10:00 +0000 (16:10 +0800)]
Refine and fix some event bugs.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Tue, 17 Sep 2013 08:09:59 +0000 (16:09 +0800)]
Remove non-used data in clEnqueueMapImage to fix, and fix a clGetEventInfo bug.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Tue, 17 Sep 2013 08:09:58 +0000 (16:09 +0800)]
Fix cl_mem_kernel_copy_image typo.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 11 Sep 2013 07:22:04 +0000 (15:22 +0800)]
change constant test case to cover short/long type.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 11 Sep 2013 07:22:03 +0000 (15:22 +0800)]
Fix non-4byte program global constant issue.
We put array elements simply one after another, that is packed.
So, constant memory address should be calculated using real type size.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Boqun Feng [Tue, 17 Sep 2013 03:41:50 +0000 (11:41 +0800)]
GBE: define python interpreter by cmake variable
In some distros, python is linked to python3 not
python2, and GBE can't be built on such distros
without modification.
CMake provides a variable PYTHON_EXECUTABLE.
By default, this variable is the same as
`/usr/bin/env python`, and if another python2
interpreter is needed, just add this defination in
`cmake` command.
-DPYTHON_EXECUTABLE:FILEPATH=/path/to/python2
And this will change PYTHON_EXECUTABLE to
/path/to/python2
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Wed, 11 Sep 2013 03:21:37 +0000 (11:21 +0800)]
add 64-bit version of "rhadd"
v2:
keep highest carry bit
tested by piglit test cases:
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-rhadd-1.0.generated.cl
piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-rhadd-1.0.generated.cl
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Homer Hsing [Fri, 13 Sep 2013 01:41:02 +0000 (09:41 +0800)]
support converting 64-bit integer to 32-bit float
version 2:
improve algorithm to convert signed integer
fix source operand type in llvm_gen_backend
enable predicate in addWithCarry
change test case to test signed integer
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Mon, 9 Sep 2013 08:10:23 +0000 (16:10 +0800)]
Implement api clEnqueueCopyBufferToImage.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>