Ruiling Song [Tue, 12 Nov 2013 01:10:01 +0000 (09:10 +0800)]
GBE: handle half type size
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
Zhigang Gong [Mon, 11 Nov 2013 08:20:26 +0000 (16:20 +0800)]
Runtime: complete the api clGetKernelWorkGroupInfo.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Homer Hsing [Mon, 11 Nov 2013 02:45:42 +0000 (10:45 +0800)]
ignore a clang unsupported building option
IVB does not support float denorm value.
So the building option "-cl-denorms-are-zero" can be safely ignored.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Mon, 11 Nov 2013 02:33:28 +0000 (10:33 +0800)]
gbe_bin_generator: should not use append option when create new binary.
We should use trunc option rather than app when we need to create a new
binrary file.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Junyan He [Fri, 8 Nov 2013 10:41:59 +0000 (18:41 +0800)]
Fixup the problem of CL_PROGRAM_BINARIES in clGetProgramInfo API
clGetProgramInfo using CL_PROGRAM_BINARIES to get the binary will
not be right because the binary got is not the serilization one.
Add the serilization there to fix this bug.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Fri, 8 Nov 2013 07:35:41 +0000 (15:35 +0800)]
fix builtin function "fmax"
if an parameter is nan, then returns another parameter.
v2: no need to test nan for integer
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Fri, 8 Nov 2013 03:20:09 +0000 (11:20 +0800)]
GBE: Fix alignment for private variables
Private variables allocated on the stack should be aligned according to OCL spec.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Fri, 8 Nov 2013 03:16:47 +0000 (11:16 +0800)]
GBE: Fix alignment according to OCL spec
The patch simply store a 'align' for each kernel argument.
Then the runtime could align the kernel argument address to 'align'.
This patch works for constant and local address space.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Fri, 8 Nov 2013 03:12:44 +0000 (11:12 +0800)]
GBE: Remove max_limit for struct alignment
a struct may have vector field (like int8/16), max_limit is meaningless.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Fri, 8 Nov 2013 02:57:42 +0000 (10:57 +0800)]
release context in runtime_createcontextfromtype
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Thu, 7 Nov 2013 16:58:00 +0000 (00:58 +0800)]
Move the gpgpu struct from cl_command_queue to thread specific context
We find some cases will use multi-threads to run on the same queue,
executing the same kernel. This will cause the gpgpu struct which
is very important for GPU context setting be destroyed because we
do not implement any sync protect on it now.
Move the gpgpu struct into thread specific space will fix this problem
because the lib_drm will do the GPU command serialization for us.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Zou, Nanhai" <nanhai.zou@intel.com>
Junyan He [Thu, 7 Nov 2013 08:44:53 +0000 (16:44 +0800)]
Add the clGetMemObjectInfo options for sub-buffer and update the utest case
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Thu, 7 Nov 2013 08:44:46 +0000 (16:44 +0800)]
Add the test case for sub buffer check
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Thu, 7 Nov 2013 08:44:39 +0000 (16:44 +0800)]
Implement the clCreateSubBuffer API
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Thu, 7 Nov 2013 08:44:31 +0000 (16:44 +0800)]
Add the bo's internal offset support when do drm_intel_bo_emit_reloc
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Thu, 7 Nov 2013 07:13:13 +0000 (15:13 +0800)]
GBE: fix a 64bit scalar register issue.
For scalar register, should use stride 0.
also change the unit test to hit the point.
v2: fix h2()
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Thu, 7 Nov 2013 07:32:56 +0000 (15:32 +0800)]
improve multithread calling of llvm
call llvm multithread function instead of using a semaphore.
also exit llvm multithread mode at the end of life.
v2: not call llvm::shutdown() if llvm is older than 3.4
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 7 Nov 2013 06:55:23 +0000 (14:55 +0800)]
fix builtin function "fract"
v2: return nan for nan, +zero for +inf, -zero for -inf.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Thu, 7 Nov 2013 06:31:36 +0000 (14:31 +0800)]
fix builtin function "copysign"
using better algorithm
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Wed, 6 Nov 2013 01:04:53 +0000 (09:04 +0800)]
fix builtin function 'frexp'
returns correct value for nan or inf.
also returns correct value for very small float value.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Tue, 5 Nov 2013 05:08:15 +0000 (13:08 +0800)]
release previous program in cl_kernel_init
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Mon, 4 Nov 2013 08:29:02 +0000 (16:29 +0800)]
release previous kernel in cl_kernel_init
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 7 Nov 2013 01:47:11 +0000 (09:47 +0800)]
Runtime: fix some max/alignment values.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 7 Nov 2013 01:44:20 +0000 (09:44 +0800)]
Runtime: fix one bug in clGetProgramInfo.
The CL_PROGRAM_BINARIES forget to return the param value size.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 7 Nov 2013 01:31:37 +0000 (09:31 +0800)]
GBE: Don't modify argument 0 of the get image information instruction.
When the first round compilation fail, GBE will turn to recompile the
sample program by using another profile. If we changed the argument
0 of the get image information instruction, then it will fail the second
round compilation. But the argument 1 is ok to change, as we never change
the first instruction's argument, and all the subsequent instruction's
argument 1 is free to change.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 7 Nov 2013 01:29:20 +0000 (09:29 +0800)]
Runtime: fix the length of properties.
The last zero should also be counted.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Wed, 6 Nov 2013 04:49:30 +0000 (12:49 +0800)]
Runtime: implement clGetSamplerInfo.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Tue, 5 Nov 2013 05:56:06 +0000 (13:56 +0800)]
Runtime: fix some max values.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Tue, 5 Nov 2013 05:34:54 +0000 (13:34 +0800)]
Runtime: fix the incorrect device info string size.
sizeof(str) already includes the '\0', we don't need to add
1 to it.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Fri, 1 Nov 2013 06:16:08 +0000 (14:16 +0800)]
GBE: disable MulAdd pattern in instruction selection temporarily.
The story starts from 'FP_CONTRACT'. In c99 spec, it describes constract
expression as:
"A floating expression may be contracted, that is, evaluated as though it
were an atomic operation, thereby omitting rounding errors implied by the
source code and the expression evaluation method."
But user can use 'pragma FP_CONTRACT OFF' to disable float contraction,
in which condition, we should not do contraction like mad optimization.
In SPIR 1.2, named metadata 'opencl.enable.FP_CONTRACT' will be used to do this.
When Clang is ready, we need refine the backend logic.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Lu Guanqun [Tue, 5 Nov 2013 05:55:27 +0000 (13:55 +0800)]
utests: add test case for structure argument
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Tue, 5 Nov 2013 05:55:23 +0000 (13:55 +0800)]
fix the error that structure would be pushed twice
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Tested-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Ruiling Song [Tue, 5 Nov 2013 08:37:13 +0000 (16:37 +0800)]
GBE: use ISA mad for mad() builtin function.
directly map mad() to ISA mad. so mad will have better performance and
less precision loss.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 5 Nov 2013 08:37:12 +0000 (16:37 +0800)]
utests: use mad which will get better precision.
Normal mul/add could not met the precision requirement of this case.
Previously it passed because we will do mad optimization in backend.
Use mad directly, so the test case does not depend on backend optimization.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Mon, 4 Nov 2013 07:05:41 +0000 (15:05 +0800)]
Add a necessary include path for building with mesa.
Reported by Lv Meng.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Homer Hsing [Mon, 4 Nov 2013 01:39:33 +0000 (09:39 +0800)]
fix operators for 64 bit integer
if operand is signed 64 bit integer, emit -1 for SExt casting
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Fri, 1 Nov 2013 05:53:47 +0000 (13:53 +0800)]
fix pointer bugs in linked list
change the header of linked list if the header was deleted
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 31 Oct 2013 08:36:32 +0000 (16:36 +0800)]
fix ill-coded utest_run::main
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 31 Oct 2013 03:12:54 +0000 (11:12 +0800)]
add same type converting
converting a data type to same type ...
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Ruiling Song [Thu, 31 Oct 2013 03:01:21 +0000 (11:01 +0800)]
runtime: Fix a dangling pointer issue
ctx->events points to the head of 'event list' under the ctx.
When deleting an event from the list, we should also update
the head pointer besides updating its neighbour's next & prev,
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewedy-by: "Xing, Homer" <homer.xing@intel.com>
Zhigang Gong [Tue, 29 Oct 2013 07:02:15 +0000 (15:02 +0800)]
GBE: fixed one bug for vector relational builtin functions.
For most vector relational builtin functions, we need to
return -1 if the element result is true, return 0 if the element
result is 0. So we can simply put a - in front of the scalar
version of function for each element.
Reported by Yang Rong.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Homer Hsing [Tue, 29 Oct 2013 03:12:46 +0000 (11:12 +0800)]
fix built-in function "normalize"
divide the parameter by its length
ver 2: scalar typed function returns NaN if parameter is NaN.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 28 Oct 2013 01:02:33 +0000 (09:02 +0800)]
fix built-in function "fast_normalize"
if the parameter is zero, then return zero
if the parameter is positive, then return 1.
for other cases, return -1.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 29 Oct 2013 05:57:51 +0000 (13:57 +0800)]
GBE: Give a zero-initialized register for Undef value.
For instructions that reference an undef value, we simply
allocate a register to the undef operand and set as 0.
v2:
handle float and double type. also fix some typos about double type.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Yang Rong [Tue, 29 Oct 2013 05:59:41 +0000 (13:59 +0800)]
Refine the build option checking.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Tue, 29 Oct 2013 05:39:35 +0000 (13:39 +0800)]
Fix a event segment fault.
If event type is CL_COMMAND_USER, event->queue is NULL, cause segment fault.
Change the order to fix it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Zhigang Gong [Tue, 29 Oct 2013 05:31:09 +0000 (13:31 +0800)]
GBE: enable bitselect vector builtin functions.
Now we have the scalar version of bitselect, so we
enable the vector version in the def file. Also remove
some comments.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Tue, 29 Oct 2013 03:19:36 +0000 (11:19 +0800)]
Runtime: fixed an incorrect error checking for CL_INVALID_GLOBAL_OFFSET.
According to OpenCL spec:
CL_INVALID_GLOBAL_OFFSET if the value specified in global_work_size + the
corresponding values in global_work_offset for any dimensions is greater than the
sizeof(size t) for the device on which the kernel execution will be enqueued.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Zhigang Gong [Mon, 28 Oct 2013 09:33:06 +0000 (17:33 +0800)]
GBE: fix 3-component vector's astype macros.
According to OpenCL spec,
For 3-component vector data types, the size of the data type is 4 * sizeof(component). This
means that a 3-component vector data type will be aligned to a 4 * sizeof(component)
boundary. The vload3 and vstore3 built-in functions can be used to read and write, respectively,
3-component vector data types from an array of packed scalar data type.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Zhigang Gong [Mon, 28 Oct 2013 08:59:20 +0000 (16:59 +0800)]
GBE: fix a bug for the cast(FPToUI) instruction.
We need to choose unsigned dst type for this case.
v2:
fix a typo.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Homer Hsing [Mon, 28 Oct 2013 01:02:32 +0000 (09:02 +0800)]
fix built-in function "length"
if vector is zero, then returns zero.
if vector is very large, then do a scaling first.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 28 Oct 2013 01:02:31 +0000 (09:02 +0800)]
delete vec-8 or vec-16 typed geometric built-in
per OpenCL spec (ver 1.1 or 1.2), geometric built-in
should not have vec-8 type or vec-16 type versions.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 24 Oct 2013 03:22:58 +0000 (11:22 +0800)]
not use "mad" in vector type "dot"
the purpose is just to make code more readable, for float16 case
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 28 Oct 2013 06:02:18 +0000 (14:02 +0800)]
Per openCL spec, set p->is_built to 1 when build fail.
Also correct the err code when build fail.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Mon, 28 Oct 2013 06:02:17 +0000 (14:02 +0800)]
Re-build the program when build option changed.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Mon, 28 Oct 2013 06:02:16 +0000 (14:02 +0800)]
Remove CL_FP_DENORM in clGetDeviceInfo.
IVB don't support single float denorm, so compiler option -cl-denorms-are-zero should ingore.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Mon, 28 Oct 2013 06:02:15 +0000 (14:02 +0800)]
Add preprocessor #define that match the extension name string.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Yi Sun [Thu, 24 Oct 2013 07:56:58 +0000 (15:56 +0800)]
utest: add test case for builtin function exp/exp2/exp10/expm1.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Yangwei Shui <yangweix.shui@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yi Sun [Mon, 9 Sep 2013 08:55:47 +0000 (16:55 +0800)]
utest: Add test case for built-in function pow.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 21 Oct 2013 08:22:09 +0000 (16:22 +0800)]
support saturated-rounding converting
passed piglit test case:
piglit/bin/cl-program-tester tests/cl/program/execute/vector-conversion.cl
version 2:
contains updating of ocl_convert.h
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Fri, 18 Oct 2013 05:47:54 +0000 (13:47 +0800)]
support converting with rounding mode
support built-in functions of converting with rounding mode,
such as "convert_DSTTYPE_rtz, rte, rtp, rtn".
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Mon, 28 Oct 2013 02:45:07 +0000 (10:45 +0800)]
initialize GenRegister::subphysical
GenRegister::subphysical should have same init value as GenRegister::physical
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 24 Oct 2013 02:44:41 +0000 (10:44 +0800)]
add scalar type builtin function "dot"
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
Homer Hsing [Mon, 21 Oct 2013 01:12:42 +0000 (09:12 +0800)]
implement __builtin_* functions
backend does not support __builtin_* functions,
so they are implemented in ocl_stdlib.tmpl.h
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Tue, 22 Oct 2013 07:06:32 +0000 (15:06 +0800)]
Bump to version 0.3.
Also update some documents accordingly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 21 Oct 2013 08:30:47 +0000 (16:30 +0800)]
add a semaphore for clang lib
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Mon, 21 Oct 2013 11:12:56 +0000 (19:12 +0800)]
Disable instrucion schedule temp.
If enable schedule, will cause fails. Will enable it after fix these fails.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Chuanbo Weng [Tue, 22 Oct 2013 09:11:56 +0000 (17:11 +0800)]
Fix two memory leak.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Ruiling Song [Tue, 22 Oct 2013 04:02:56 +0000 (12:02 +0800)]
GBE: Fix a bo->offset assert
scratchSize was missed in the binary, which will cause a random value
when kernel is loaded from binary. add it in the binary format.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Yang Rong [Mon, 21 Oct 2013 11:12:55 +0000 (19:12 +0800)]
Fix zeroinitializer load/store vector assert.
After move the newValueProxy of vector load/store to genWriter pass, genWriter will
get ConstantAggregateZero of load/store vector with zeroinitializer. In function processConstant,
don't handle correct type of ConstantAggregateZero, cause assert. Add the types handle.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Yang Rong [Mon, 21 Oct 2013 07:20:41 +0000 (15:20 +0800)]
Add a test for vector argument deallocate assert.
V2: Add result compare.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Yang Rong [Fri, 11 Oct 2013 05:50:08 +0000 (13:50 +0800)]
Change -O3 to -O2 again because my previous change's typo.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Yang Rong [Fri, 11 Oct 2013 05:50:07 +0000 (13:50 +0800)]
Refine vector register deallocate.
Split vector registers block, so can free them seperate.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Yang Rong [Fri, 11 Oct 2013 05:50:06 +0000 (13:50 +0800)]
Fix a vector argument deallocate assert.
Vector argument will allocate together but deallocate sepelate, when deallocate
will assert. Split the each allocatedBlock in register partitioner to fix it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Yang Rong [Mon, 21 Oct 2013 07:47:56 +0000 (15:47 +0800)]
Add more type for async copy test case.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 21 Oct 2013 07:26:02 +0000 (15:26 +0800)]
use int64_t to express "long" in a test case
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Mon, 21 Oct 2013 07:48:13 +0000 (15:48 +0800)]
runtime: Simply return success in clUnloadCompiler.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Fri, 18 Oct 2013 07:11:29 +0000 (15:11 +0800)]
GBE: Handle all-zero constant.
Also refine Undef value support.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Fri, 18 Oct 2013 05:38:59 +0000 (13:38 +0800)]
support vectorized saturated converting builtin functions
version 2: skip convert_float_sat(*)
version 3:
scalar converting is moved from "ocl_stdlib.tmpl.h" to "gen_convert.sh",
because scalar converting should be before vectorized version.
"ocl_convert.h" is updated.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Fri, 18 Oct 2013 02:26:56 +0000 (10:26 +0800)]
support saturated converting from narrower type to wider type
This patch supports saturated converting from narrower type to wider type.
It simply returns the parameter.
version 2: not need convert_float_sat(*)
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Fri, 18 Oct 2013 02:26:55 +0000 (10:26 +0800)]
support saturated converting from 64-bit int
This patch supports saturated converting from 64-bit int to shorter int,
and from 32-bit float to 64-bit int.
This patch also contains test case.
version 2: ulong had been declared in some platform
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Fri, 18 Oct 2013 05:29:12 +0000 (13:29 +0800)]
Runtime: correct some image related maximum values for IVB.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Thu, 17 Oct 2013 05:36:56 +0000 (13:36 +0800)]
Add test case for newValueProxy of InsertElementInst.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 17 Oct 2013 05:36:55 +0000 (13:36 +0800)]
Remove newValueProxy from scalarize pass to genWriter pass.
If call newValueProxy in scalarize pass, the realValue maybe been deleted by
the following pass, cause assert. Move to genWriter pass, can fix this bug and
make code more clean.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Thu, 17 Oct 2013 05:11:05 +0000 (13:11 +0800)]
add clCreateImageFromLibvaIntel() api
We can pass in libva's buffer object with other info and then create an image
in our CL code.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Thu, 17 Oct 2013 05:11:01 +0000 (13:11 +0800)]
add clCreateBufferFromLibvaIntel() api
We can pass in libva's buffer object name and then create the cl buffer from
it, thus we can share the buffer between libva and our opencl.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 18 Oct 2013 02:19:57 +0000 (10:19 +0800)]
Implement the CL api for clGetEventProfilingInfo
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 18 Oct 2013 02:19:51 +0000 (10:19 +0800)]
Using the PIPE_CONTROL to implement get time stamp in gen backend
We use PIPE_CONTROL to get the time stamps from GPU just after batch
start and before batch flush. Using the first one the caculate the
CL_PROFILING_COMMAND_START time and uing the second one to caculate
the CL_PROFILING_COMMAND_END time.
There are 2 limitations here:
1. Then end time stamp is just before the FLUSH, so the Flush time
is not included, which will cause to lose the accuracy. Because
the we do not know which event will be used to do the profling
when it is created, adding another flush for end time stamp may
add some overload.
2. The time of CPU and GPU can not be sync correctly now. So the
time of CL_PROFILING_COMMAND_QUEUED and CL_PROFILING_COMMAND_SUBMIT
which happens on CPU side can not be caculated correctly with the
same base time of GPU. So we just simplely set them to
CL_PROFILING_COMMAND_START now. For the Event not involving GPU
operations such as ReadBuffer, all the times are 0 now.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Ruiling Song [Wed, 16 Oct 2013 07:38:08 +0000 (15:38 +0800)]
utests: add test cases for function call.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Wed, 16 Oct 2013 07:38:07 +0000 (15:38 +0800)]
GBE: Skip non-kernel functions in backend passes.
As non-kernel functions hit many assert in the backend, simply
skip them as we already inline all function calls.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Wed, 16 Oct 2013 07:38:06 +0000 (15:38 +0800)]
GBE: Inline all function calls.
use an extra large value for llvm flag -inline-threshold to inline all functions.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Yang Rong [Tue, 15 Oct 2013 10:39:54 +0000 (18:39 +0800)]
Add type long/ulong/double's async copy.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Tue, 15 Oct 2013 10:36:03 +0000 (18:36 +0800)]
Fix a read64/write64 schedule bug.
Set the read64/write64 correct data type, otherwise, the dependency will wrong.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 11 Oct 2013 02:43:41 +0000 (10:43 +0800)]
Delete the redundant intel_batchbuffer_t init in intel_gpgpu_new
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 10 Oct 2013 07:13:51 +0000 (15:13 +0800)]
GBE: Update program binary format.
1. Remove useless 'reg' field of constant.
2. Add slmSize for local variables defined in kernel function.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 10 Oct 2013 07:13:50 +0000 (15:13 +0800)]
GBE: Support local variable inside kernel function.
As Clang treat local variable in similar way like global constant,
(they are treated as Global variable in each own address space)
we refine the previous constant implementation in order to
share same code between local variable and global constant.
We will allocate an address register for each GlobalVariable
(constant or local) through calling newRegister().
In later step, through getRegister() we will get a proper
register derived from the allocated address register.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Homer Hsing [Tue, 24 Sep 2013 02:10:46 +0000 (10:10 +0800)]
support LLVM 3.4
LLVM 3.3 or earlier version don't support unary addition of vectors,
such as "++ int2". This patch supports LLVM 3.4.
Tested by PIGLIT, no regression.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Wed, 9 Oct 2013 07:55:43 +0000 (15:55 +0800)]
Add the test case for clEnqueueCopyBuffer
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Thu, 10 Oct 2013 04:28:47 +0000 (12:28 +0800)]
Implement the clEnqueueCopyBuffer API using internal binary kernel
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Thu, 10 Oct 2013 04:28:41 +0000 (12:28 +0800)]
Add the internal used kernels for buffer copy
Add internal used kernels for buffer copy. The align
1 4 16 is seperated into three kernels to improve
performance. The CMakeList is also updated.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>