contrib/beignet.git
10 years agoGBE: Support local variable inside kernel function.
Ruiling Song [Thu, 10 Oct 2013 07:13:50 +0000 (15:13 +0800)]
GBE: Support local variable inside kernel function.

As Clang treat local variable in similar way like global constant,
(they are treated as Global variable in each own address space)
we refine the previous constant implementation in order to
share same code between local variable and global constant.

We will allocate an address register for each GlobalVariable
(constant or local) through calling newRegister().
In later step, through getRegister() we will get a proper
register derived from the allocated address register.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agosupport LLVM 3.4
Homer Hsing [Tue, 24 Sep 2013 02:10:46 +0000 (10:10 +0800)]
support LLVM 3.4

LLVM 3.3 or earlier version don't support unary addition of vectors,
such as "++ int2". This patch supports LLVM 3.4.
Tested by PIGLIT, no regression.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the test case for clEnqueueCopyBuffer
Junyan He [Wed, 9 Oct 2013 07:55:43 +0000 (15:55 +0800)]
Add the test case for clEnqueueCopyBuffer

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoImplement the clEnqueueCopyBuffer API using internal binary kernel
Junyan He [Thu, 10 Oct 2013 04:28:47 +0000 (12:28 +0800)]
Implement the clEnqueueCopyBuffer API using internal binary kernel

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the internal used kernels for buffer copy
Junyan He [Thu, 10 Oct 2013 04:28:41 +0000 (12:28 +0800)]
Add the internal used kernels for buffer copy

Add internal used kernels for buffer copy. The align
1 4 16 is seperated into three kernels to improve
performance. The CMakeList is also updated.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the string format support for gbe_bin_generater
Junyan He [Wed, 9 Oct 2013 07:55:23 +0000 (15:55 +0800)]
Add the string format support for gbe_bin_generater

The string format of kernel serializaion will be useful for
generating the code for internal kernel binary load in.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoImplement api clCreateKernelsInProgram.
Yang Rong [Thu, 10 Oct 2013 03:44:27 +0000 (11:44 +0800)]
Implement api clCreateKernelsInProgram.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests: put compiler_vector_inc into known issue list.
Ruiling Song [Thu, 10 Oct 2013 03:15:56 +0000 (11:15 +0800)]
utests: put compiler_vector_inc into known issue list.

because ++/-- need LLVM3.4

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosaturated conversion of native GPU data type, larger to narrower
Homer Hsing [Thu, 10 Oct 2013 02:13:41 +0000 (10:13 +0800)]
saturated conversion of native GPU data type, larger to narrower

This patch supports saturated conversion of
native GPU data type (char/short/int/float),
from a larger-range data type to a narrower-range data type.
For instance,  convert_uchar_sat(int)

Several test cases are in this patch.

v2: add uint->int, int->uint

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix isnan (builtin function)
Homer Hsing [Wed, 9 Oct 2013 08:14:48 +0000 (16:14 +0800)]
fix isnan (builtin function)

this patch passes following piglit test case
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/relational/builtin-float-isnan-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd some preprocessor macros __IMAGE_SUPPORT__ and __FAST_RELAXED_MATH__ define.
Yang Rong [Wed, 9 Oct 2013 08:44:47 +0000 (16:44 +0800)]
Add some preprocessor macros __IMAGE_SUPPORT__ and __FAST_RELAXED_MATH__ define.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRemove blocking asserts in clEnqueueXXX apis.
Yang Rong [Wed, 9 Oct 2013 06:36:27 +0000 (14:36 +0800)]
Remove blocking asserts in clEnqueueXXX apis.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoChange optimize level to -O2, to avoid loopunswitch opt.
Yang Rong [Wed, 9 Oct 2013 06:36:26 +0000 (14:36 +0800)]
Change optimize level to -O2, to avoid loopunswitch opt.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: sampler_t should always be a const int.
Zhigang Gong [Wed, 9 Oct 2013 06:34:41 +0000 (14:34 +0800)]
GBE: sampler_t should always be a const int.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoICD dispatch table must be first
Simon Richter [Sat, 28 Sep 2013 04:39:50 +0000 (06:39 +0200)]
ICD dispatch table must be first

The ICD loader expects the first member of any dispatchable object to be
the dispatch table.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Refine the curbe entry allocation for sampler/image information.
Zhigang Gong [Wed, 25 Sep 2013 09:00:16 +0000 (17:00 +0800)]
GBE: Refine the curbe entry allocation for sampler/image information.

After the previous patch, we can move the image infomation curbe
entry allocation to prior to the instruction selection.

Then we can concentrate all curbe allocation before we do the
normal register allocation. This way can bring two advantages:
1. Avoid the image information curbe entry is allocated among the normal registers.
2. The register interval analyzing could handle the image/sampler information correctly.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: refact the curbe register payload allocation.
Zhigang Gong [Wed, 25 Sep 2013 03:20:45 +0000 (11:20 +0800)]
GBE: refact the curbe register payload allocation.

As we already handle all the used curbe registers when we build
the patchlist. We can easily create a set to store all the required
curbe registers, and then latter at register allocation stage, we
can easily insert the registers in that set without any other checking.

This way, at register allocation stage, we don't need to know anything
about those CURBE magic number. We only need to use the virtual register
as key naturally. This make the code a little bit clearer.

And most important, this change is to support dynamic curbe register
allocation. For example, the image attributes. Each image may have 6 DWs,
and we may have many images but only access part of the image and part
of the image attributes. So we can't just simply allocate a special
register for all the image attributes. We need to dynamic allocate
curbe registers on demand. So the previous implementation is not
satisfy this requirment. So I have to make this change.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Fix the out-of-box checking for normalized coord clamping.
Zhigang Gong [Tue, 24 Sep 2013 11:34:44 +0000 (19:34 +0800)]
GBE: Fix the out-of-box checking for normalized coord clamping.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE/Runtime: implement workaround for IVB sampler bug
Zhigang Gong [Wed, 25 Sep 2013 10:26:49 +0000 (18:26 +0800)]
GBE/Runtime: implement workaround for IVB sampler bug

Per IVB spec,

If the surface format of the associated surface is UINT or SINT,
the Surface Type cannot be SURFTYPE_3D or SURFTYPE_CUBE and Address
Control Mode cannot be CLAMP_BORDER or HALF_BORDER.

Besides this bug, there is another undocumented issue. If a surface
data type is IEEE float. Then when we use sampler to sample the pixel,
if the value is betweeo -1p-20 to 0, the sampler will rounding it to
zero. And this will also bring problem when we are using the clamp mode.

This patch is to workaround the above two hardware issues.
It introduces a new intrinsic get_sampler_info to get a sampler type
at runtime. When calling to read_image, it will check whether it
hits the above two cases. If it hit case 1, then we will force it to
use clamp to edge for those pixels within the box, And for those
pixel out of the box, we manually set the border color. To achieve this
solution, we have to prepare two sampler slot for each CL_ADDRESS_CLAMP
sampler. And the first has slot_1 which is using CL_ADDRESS_CLAMP,
the second use slot_1 + 8. Thus we can only use half of 16 samplers.
Fortunately, 8 samplers comply with the OpenCL's minimal requirement.

If it hits case 2, then we minor a epsilon to the coordinate, and
let it not rounds to zero.

If possible, programer should avoid to use float coordinates and/or int/uint
format image. Otherwise, it will hit the very slow path.

With this workaround, the compiler_copy_image1 can pass now.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoclCopyImage: fix up all the surface type to int type.
Zhigang Gong [Wed, 25 Sep 2013 09:44:47 +0000 (17:44 +0800)]
clCopyImage: fix up all the surface type to int type.

Per opencl spec, use read_imagei on a float image may cause
undefined behaviour. We fix up all type to int type.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agosupport 64-bit division and remainder
Homer Hsing [Mon, 23 Sep 2013 01:11:58 +0000 (09:11 +0800)]
support 64-bit division and remainder

support both unsigned and signed type,
and division ("/") and remainder ("%") arithmetic

tested by piglit

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoadd 64-bit version of "sub_sat"
Homer Hsing [Sun, 22 Sep 2013 06:18:03 +0000 (14:18 +0800)]
add 64-bit version of "sub_sat"

passed PIGLIT test cases:
  bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-sub_sat-1.0.generated.cl
  bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-sub_sat-1.0.generated.cl

version 2:
  temp flag register is allocated by RA now

version 3:
  subnr of temp flag reg is divided by typesize

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agosupport 64-bit version "add_sat"
Homer Hsing [Sun, 22 Sep 2013 06:18:02 +0000 (14:18 +0800)]
support 64-bit version "add_sat"

tested by piglit:
  piglit/bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-add_sat-1.0.generated.cl
  piglit/bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-add_sat-1.0.generated.cl

version 2:
  temp flag register is now allocated by RA

version 3:
  divide subnr of temp flag reg by typesize

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoadd 64-bit version of "mad_sat"
Homer Hsing [Sun, 22 Sep 2013 06:18:01 +0000 (14:18 +0800)]
add 64-bit version of "mad_sat"

tested by piglit:
   piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-mad_sat-1.0.generated.cl
   piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-mad_sat-1.0.generated.cl

version 2:
   temp flag register is allocated by RA

version 3:
   divide subnr of flag register by typesize

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoadd 64-bit version of "mul_hi"
Homer Hsing [Thu, 26 Sep 2013 05:42:46 +0000 (13:42 +0800)]
add 64-bit version of "mul_hi"

passed piglit test cases:
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-mul_hi-1.0.generated.cl
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-mul_hi-1.0.generated.cl

version 2:
  temp flag register is allocated by RA

version 3:
  divide subnr of flag register by typesize

version 4:
  fix a typo

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agofix 64bit writing
Homer Hsing [Mon, 23 Sep 2013 07:19:42 +0000 (15:19 +0800)]
fix 64bit writing

fix 64bit writing when data register is scalar
this patch make some piglit test case pass

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agorefactor the api of intel_driver_share_buffer
Lu Guanqun [Mon, 23 Sep 2013 06:57:45 +0000 (14:57 +0800)]
refactor the api of intel_driver_share_buffer

so that we can use this API in later patches for the integration of opencl and libva.

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix the missing assignment for offset
Lu Guanqun [Mon, 23 Sep 2013 07:06:27 +0000 (15:06 +0800)]
fix the missing assignment for offset

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agowe should check the 'err' parameter
Lu Guanqun [Mon, 23 Sep 2013 07:06:23 +0000 (15:06 +0800)]
we should check the 'err' parameter

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRemove global offset need divide by local size restriction.
Yang Rong [Mon, 23 Sep 2013 06:04:08 +0000 (14:04 +0800)]
Remove global offset need divide by local size restriction.

Set to global offset to 0 in walker, and add global offset when get_global_id.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix clEnqueueMapImage error.
Yang Rong [Mon, 23 Sep 2013 05:16:24 +0000 (13:16 +0800)]
Fix clEnqueueMapImage error.

Correct map size calc and remove ptr + offset because has done in _cl_map_mem.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
10 years ago64-bit-int: allocate flag register by RA
Homer Hsing [Sun, 22 Sep 2013 05:58:04 +0000 (13:58 +0800)]
64-bit-int: allocate flag register by RA

64-bit integer arithmetic now uses flag register allocated by register
allocator.

version 2: divide subnr by typesize

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Fix a constant bug which over-write memory.
Ruiling Song [Tue, 24 Sep 2013 07:39:36 +0000 (15:39 +0800)]
GBE: Fix a constant bug which over-write memory.

Previously it will always write 8 byte no matter what size of integer.
Fix it by only copying necessary data.

Reported by Homer Hsing.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix scalarizing of llvm phi node
Homer Hsing [Tue, 24 Sep 2013 03:11:21 +0000 (11:11 +0800)]
fix scalarizing of llvm phi node

llvm phi node can have odd number of args.
this patch also contains a test case.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUnmap the cl_mem in driver when application map a cl_mem and release without unmap.
Yang Rong [Wed, 18 Sep 2013 09:09:29 +0000 (17:09 +0800)]
Unmap the cl_mem in driver when application map a cl_mem and release without unmap.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRefine cmake script file.
Zhigang Gong [Tue, 17 Sep 2013 09:35:29 +0000 (17:35 +0800)]
Refine cmake script file.

Remove GBM which is not needed. Adjust the header file including
sequence to avoid including incorrect cl header file when compile
with mesa source code package.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoFix store undef value assert.
Yang Rong [Wed, 18 Sep 2013 07:01:44 +0000 (15:01 +0800)]
Fix store undef value assert.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fixed the store3 bug.
Zhigang Gong [Fri, 13 Sep 2013 10:29:30 +0000 (18:29 +0800)]
GBE: fixed the store3 bug.

As the llvm will convert a type3 pointer to a type4 pointer
completely, we can't check whether a store is a type3 or type4
We have to do this in the front end.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: Implement CL_MEM_USE_HOST_PTR flag for image.
Zhigang Gong [Fri, 13 Sep 2013 08:54:12 +0000 (16:54 +0800)]
Runtime: Implement CL_MEM_USE_HOST_PTR flag for image.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: prepare for CL_MEM_USE_HOST_PTR for image support.
Zhigang Gong [Fri, 13 Sep 2013 08:52:30 +0000 (16:52 +0800)]
Runtime: prepare for CL_MEM_USE_HOST_PTR for image support.

To support CL_MEM_USE_HOST_PTR for image, we need to add back
those removed data element.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoUtests: refine the previous fake 3D test cases.
Zhigang Gong [Fri, 13 Sep 2013 05:56:54 +0000 (13:56 +0800)]
Utests: refine the previous fake 3D test cases.

All the previous 3D test cases are only using depth 1, and not
really touch the 3D read/write code path. Now fix them.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoRuntime/driver : implement 3D image support.
Zhigang Gong [Fri, 13 Sep 2013 05:56:17 +0000 (13:56 +0800)]
Runtime/driver : implement 3D image support.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoGBE: fixed the broken 3d image support.
Zhigang Gong [Fri, 13 Sep 2013 05:52:26 +0000 (13:52 +0800)]
GBE: fixed the broken 3d image support.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoGBE: check the correct register for whether coord z exists.
Zhigang Gong [Fri, 6 Sep 2013 05:43:07 +0000 (13:43 +0800)]
GBE: check the correct register for whether coord z exists.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoRuntime: enable border color state support.
Zhigang Gong [Fri, 6 Sep 2013 05:43:06 +0000 (13:43 +0800)]
Runtime: enable border color state support.

Also fix the wrong clamp mode for CL_ADDRESS_CLAMP.
According to Gen Bspec, when the surface format is
int/uint, it doesn't support clamp border. We need
to workaround it in the kernel side.

v2: move compiler_copy_image1 to the have issue utest set.
As this patch can really enable to use clamp to border
mode for a int/uint surface. We have issues for this
combination. Need to be fixed.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoRuntime: fix a bug when set sampler value.
Zhigang Gong [Fri, 6 Sep 2013 05:43:05 +0000 (13:43 +0800)]
Runtime: fix a bug when set sampler value.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoRuntime: disable some unecessary image formats.
Zhigang Gong [Fri, 6 Sep 2013 05:43:04 +0000 (13:43 +0800)]
Runtime: disable some unecessary image formats.

Per OpenCL, the minimum list of supported format is as below:
CL_RGBA:
  CL_UNORM_INT8
  CL_UNORM_INT16
  CL_SIGNED_INT8
  CL_SIGNED_INT16
  CL_SIGNED_INT32
  CL_UNSIGNED_INT8
  CL_UNSIGNED_INT16
  CL_UNSIGNED_INT32
  CL_HALF_FLOAT
  CL_FLOAT

CL_BGRA:
  CL_UNORM_INT8

Let's only support this type and CL_R currently.

Also removed an unnecessary assertion. And fix the CL_Rx's type size.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoutests: add more constant test cases for composite type.
Ruiling Song [Wed, 18 Sep 2013 02:18:43 +0000 (10:18 +0800)]
utests: add more constant test cases for composite type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Support composite type constant.
Ruiling Song [Wed, 18 Sep 2013 02:18:42 +0000 (10:18 +0800)]
GBE: Support composite type constant.

struct/vector/array of vector/struct of array/array of struct.

Also fix a bug 'constant index into constant array get wrong result'
brought in by patch 'Fix non-4byte program global constant issue'.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
10 years agoImplement clEnqueueMarker and clEnqueueBarrier.
Yang Rong [Tue, 17 Sep 2013 08:10:01 +0000 (16:10 +0800)]
Implement clEnqueueMarker and clEnqueueBarrier.

Add some event info to cl_command_queue.
One is non-complete user events, used to block marker event and barrier.
After these events become CL_COMPLETE, the events blocked by these events also
become CL_COMPLETE, so marker event will also set to CL_COMPLETE. If there is no
user events, need wait last event complete and set marker event to complete.
Add barrier_index, for clEnqueueBarrier, point to user events, indicate the enqueue
apis follow clEnqueueBarrier should wait on how many user events.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRefine and fix some event bugs.
Yang Rong [Tue, 17 Sep 2013 08:10:00 +0000 (16:10 +0800)]
Refine and fix some event bugs.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRemove non-used data in clEnqueueMapImage to fix, and fix a clGetEventInfo bug.
Yang Rong [Tue, 17 Sep 2013 08:09:59 +0000 (16:09 +0800)]
Remove non-used data in clEnqueueMapImage to fix, and fix a clGetEventInfo bug.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix cl_mem_kernel_copy_image typo.
Yang Rong [Tue, 17 Sep 2013 08:09:58 +0000 (16:09 +0800)]
Fix cl_mem_kernel_copy_image typo.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agochange constant test case to cover short/long type.
Ruiling Song [Wed, 11 Sep 2013 07:22:04 +0000 (15:22 +0800)]
change constant test case to cover short/long type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix non-4byte program global constant issue.
Ruiling Song [Wed, 11 Sep 2013 07:22:03 +0000 (15:22 +0800)]
Fix non-4byte program global constant issue.

We put array elements simply one after another, that is packed.
So, constant memory address should be calculated using real type size.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: define python interpreter by cmake variable
Boqun Feng [Tue, 17 Sep 2013 03:41:50 +0000 (11:41 +0800)]
GBE: define python interpreter by cmake variable

In some distros, python is linked to python3 not
python2, and GBE can't be built on such distros
without modification.

CMake provides a variable PYTHON_EXECUTABLE.
By default, this variable is the same as
`/usr/bin/env python`, and if another python2
interpreter is needed, just add this defination in
`cmake` command.

-DPYTHON_EXECUTABLE:FILEPATH=/path/to/python2

And this will change PYTHON_EXECUTABLE to
/path/to/python2

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "rhadd"
Homer Hsing [Wed, 11 Sep 2013 03:21:37 +0000 (11:21 +0800)]
add 64-bit version of "rhadd"

v2:
  keep highest carry bit

tested by piglit test cases:
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-rhadd-1.0.generated.cl
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-rhadd-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agosupport converting 64-bit integer to 32-bit float
Homer Hsing [Fri, 13 Sep 2013 01:41:02 +0000 (09:41 +0800)]
support converting 64-bit integer to 32-bit float

version 2:
  improve algorithm to convert signed integer
  fix source operand type in llvm_gen_backend
  enable predicate in addWithCarry
  change test case to test signed integer

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoImplement api clEnqueueCopyBufferToImage.
Yang Rong [Mon, 9 Sep 2013 08:10:23 +0000 (16:10 +0800)]
Implement api clEnqueueCopyBufferToImage.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement api clEnqueueCopyImageToBuffer.
Yang Rong [Mon, 9 Sep 2013 08:10:22 +0000 (16:10 +0800)]
Implement api clEnqueueCopyImageToBuffer.

Also fix the function cl_mem_kernel_copy_image 3D image error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement api clEnqueueTask and clEnqueueNativeKernel.
Yang Rong [Fri, 13 Sep 2013 06:06:59 +0000 (14:06 +0800)]
Implement api clEnqueueTask and clEnqueueNativeKernel.

Also refine the whole memcpy's condition in function
cl_enqueue_read_buffer_rect and cl_enqueue_write_buffer_rect.

V2: Add a mem_list to enqueue_data to fix utest error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd built-in function "atan2pi"
Homer Hsing [Fri, 13 Sep 2013 02:22:48 +0000 (10:22 +0800)]
add built-in function "atan2pi"

version 2: fix a typo. and add corner cases

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the virtual dctr function of Serialization to kill warning.
Junyan He [Thu, 12 Sep 2013 02:52:47 +0000 (10:52 +0800)]
Add the virtual dctr function of Serialization to kill warning.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd a test case for binary load.
Junyan He [Thu, 12 Sep 2013 06:06:18 +0000 (14:06 +0800)]
Add a test case for binary load.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoImplement the clCreateProgramWithBinary to deseralize the binary.
Junyan He [Wed, 11 Sep 2013 10:07:51 +0000 (18:07 +0800)]
Implement the clCreateProgramWithBinary to deseralize the binary.

We now do not check the format of the binary.
We need to check the binary file format to handle the internal binary,
the LLVM binary or the invalid format differently.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd one tool program to build and serial the program.
Junyan He [Wed, 11 Sep 2013 10:07:44 +0000 (18:07 +0800)]
Add one tool program to build and serial the program.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the serialization support for backend
Junyan He [Wed, 11 Sep 2013 10:07:39 +0000 (18:07 +0800)]
Add the serialization support for backend

The Serializable class define the interface of serialize_to/deserialize_from
functions for internal binary and llvm binary. And also a print status
function for debugging.
The class which may need the serializaion support need to derive from it,
these classes including: Program, Kernel, ConstantSet, ImageSet and SamplerSet.
This patch just add serialize_to/deserialize_from internal binary support for
all these classes.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoadd 64-bit version of "hadd"
Homer Hsing [Wed, 11 Sep 2013 03:04:56 +0000 (11:04 +0800)]
add 64-bit version of "hadd"

v2:
  keep top carry bit

passed piglit test cases:

 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-hadd-1.0.generated.cl
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-hadd-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agosupport converting 64-bit integer to shorter integer
Homer Hsing [Mon, 2 Sep 2013 01:25:10 +0000 (09:25 +0800)]
support converting 64-bit integer to shorter integer

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoadd built-in function "atan2"
Homer Hsing [Thu, 29 Aug 2013 05:41:24 +0000 (13:41 +0800)]
add built-in function "atan2"

also improve the accuracy of built-in function "atan"
also add a test case

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoutest.cpp: run the cases with issue seperately.
Yi Sun [Mon, 9 Sep 2013 08:54:12 +0000 (16:54 +0800)]
utest.cpp: run the cases with issue seperately.

We should run both passed cases and failed cases via option '-c'.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
10 years agoAdd api clEnqueueCopyImage.
Yang Rong [Mon, 9 Sep 2013 08:10:09 +0000 (16:10 +0800)]
Add api clEnqueueCopyImage.

Also do some mirror changes:
1. Add a image var name to macro CHECK_IMAGE.
2. Fix local size error in cl_mem_copy_buffer_rect.
3. Fix cl_enqueue_write_image typo.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd clEnqueueCopyBufferRect api.
Yang Rong [Mon, 9 Sep 2013 08:10:08 +0000 (16:10 +0800)]
Add clEnqueueCopyBufferRect api.

Using enqueue ND range to copy two buffers. Now compile the kernel string, after
load binary ready, should using static binary.

V2: Add a comment for function check_copy_overlap and rename CL_INVALID TO CL_INTERNAL_KERNEL_MAX.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd clEnqueueWriteBufferRect api.
Yang Rong [Wed, 4 Sep 2013 08:58:08 +0000 (16:58 +0800)]
Add clEnqueueWriteBufferRect api.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd clEnqueueReadBufferRect api.
Yang Rong [Wed, 4 Sep 2013 08:58:07 +0000 (16:58 +0800)]
Add clEnqueueReadBufferRect api.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoCL: Enalbe gl sharing with new egl extension.
Zhigang Gong [Mon, 26 Aug 2013 14:45:48 +0000 (22:45 +0800)]
CL: Enalbe gl sharing with new egl extension.

The previous implementation is only for 2d/3d texture sharing and
is implemented in a hacky fashinon. We need to replace it with a
clean and complete one. We introduce a new egl extension to export
low level layout information of a buffer object/texture/render buffer
from the mesa dri driver to the cl driver layer. As the extension is
not accpepted by mesa, we have to implement this new extension in
beignet internally.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: He Junyan <junyan.he@inbox.com>
10 years agoRuntime: Only return the format allowed in the spec.
Zhigang Gong [Wed, 4 Sep 2013 07:04:03 +0000 (15:04 +0800)]
Runtime: Only return the format allowed in the spec.

For the CL_INTENSITY and CL_LUMINANCE, it only supports
CL_UNORM_INT8,CL_UNORM_INT16, CL_SNORM_INT8, CL_SNORM_INT16,
CL_HALF_FLOAT or CL_FLOAT.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: silent the compilation warning when generate the pch file.
Zhigang Gong [Tue, 3 Sep 2013 10:01:32 +0000 (18:01 +0800)]
GBE: silent the compilation warning when generate the pch file.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
10 years agofix 64-bit "clz" if parameter is "long4" or "ulong4"
Homer Hsing [Wed, 4 Sep 2013 02:33:39 +0000 (10:33 +0800)]
fix 64-bit "clz" if parameter is "long4" or "ulong4"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement constant buffer based on constant cache.
Ruiling Song [Wed, 4 Sep 2013 06:24:54 +0000 (14:24 +0800)]
Implement constant buffer based on constant cache.

Currently, simply allocate enough graphics memory as constant memory space.
And bind it to bti 2. Constant cache read are backed by dword scatter read.
Different from other data port messages, the address need to be dword aligned,
and the addresses are in units of dword.

The constant address space data are placed in order: first global constant,
then the constant buffer kernel argument.

v2: change function & variable naming, to make clear 'curbe' and 'constant buffer'

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix atomic_xchg float type error.
Yang Rong [Wed, 4 Sep 2013 05:55:23 +0000 (13:55 +0800)]
Fix atomic_xchg float type error.

Also refine the "\" of some atomic macro.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUtests: Enable bool_cross_basic_block.
Zhigang Gong [Wed, 4 Sep 2013 06:35:18 +0000 (14:35 +0800)]
Utests: Enable bool_cross_basic_block.

And put it to the category with known issues. It will be run
when invoke the the utests as below:
./utest_run -a

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUtests_run: Add known issue cases support.
Yi Sun [Wed, 4 Sep 2013 02:48:48 +0000 (10:48 +0800)]
Utests_run: Add known issue cases support.

Add some arguments:
  -c <casename>: run sub-case named 'casename'
  -l           : list all the available case name
  -a           : run all test cases
  -n           : run all test cases without known issue
  -h           : display this usage

Add a alternate macro named MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE to register a new test case, which has some known issue to be fixed till now.
While utest_run running, only cases which registered by MAKE_UTEST_FROM_FUNCTION will be involved by defalut.
If you want to run all the test cases including those with known issue, you should use argument '-a'.
Besides, you can use option '-c' to run any test case.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fix the incorrect global mem size.
Zhigang Gong [Fri, 30 Aug 2013 05:23:06 +0000 (13:23 +0800)]
Runtime: fix the incorrect global mem size.

The max_mem_alloc_size is 128M, we should set global mem size
less or equal to it. May be we can set both of them to much
larger than 128M in the future. For now, just set it to 128MB.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoChange constant unit test to cover 4 byte data type.
Ruiling Song [Tue, 3 Sep 2013 07:42:37 +0000 (15:42 +0800)]
Change constant unit test to cover 4 byte data type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Enable DWord scatter gather message for constant cache read.
Ruiling Song [Tue, 3 Sep 2013 07:42:35 +0000 (15:42 +0800)]
GBE: Enable DWord scatter gather message for constant cache read.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix GPU data type for 16-bit moving
Homer Hsing [Wed, 4 Sep 2013 01:18:20 +0000 (09:18 +0800)]
fix GPU data type for 16-bit moving

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutest: memset the output buffer to fix random fail.
Ruiling Song [Tue, 3 Sep 2013 07:39:56 +0000 (15:39 +0800)]
utest: memset the output buffer to fix random fail.

the inactive lanes will not modify corresponding output.
So, output buffer needs initialization to 0.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Support builtin vector functions for select() autogeneration.
Zhigang Gong [Tue, 3 Sep 2013 06:30:46 +0000 (14:30 +0800)]
GBE: Support builtin vector functions for select() autogeneration.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
10 years agoadd same type "convert_*(*)"
Homer Hsing [Tue, 3 Sep 2013 03:13:18 +0000 (11:13 +0800)]
add same type "convert_*(*)"

add some versions of "convert_*(*)" converting same-type parameter

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix 32-bit signed version of "sub_sat"
Homer Hsing [Tue, 3 Sep 2013 00:41:28 +0000 (08:41 +0800)]
fix 32-bit signed version of "sub_sat"

This patch makes following piglit test case pass.
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-int-sub_sat-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "rotate"
Homer Hsing [Tue, 3 Sep 2013 00:20:35 +0000 (08:20 +0800)]
add 64-bit version of "rotate"

tested by piglit. following test cases pass.
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-rotate-1.0.generated.cl
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-rotate-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "clz"
Homer Hsing [Mon, 2 Sep 2013 08:33:23 +0000 (16:33 +0800)]
add 64-bit version of "clz"

this patch passes following piglit test cases:

 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-clz-1.0.generated.cl
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-clz-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix 8-bit version of "clz"
Homer Hsing [Mon, 2 Sep 2013 08:21:25 +0000 (16:21 +0800)]
fix 8-bit version of "clz"

fix a typo in ocl_stdlib.tmpl.h
fix instruction type of 8-bit moving

this patch is tested by piglit
following two test cases has passed:
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-char-clz-1.0.generated.cl
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-uchar-clz-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "shuffle", "shuffle2"
Homer Hsing [Mon, 2 Sep 2013 05:42:35 +0000 (13:42 +0800)]
add 64-bit version of "shuffle", "shuffle2"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd scalar version of "convert_*(*)"
Homer Hsing [Mon, 2 Sep 2013 05:21:47 +0000 (13:21 +0800)]
Add scalar version of "convert_*(*)"

Scalar version of "convert_*(*)" was missing.
This patch adds scalar version.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix scalar type built-in function "select"
Homer Hsing [Mon, 2 Sep 2013 04:32:40 +0000 (12:32 +0800)]
fix scalar type built-in function "select"

add some missing scalar type version

v2: third parameter of "select" cannot be "float"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "bitselect"
Homer Hsing [Mon, 2 Sep 2013 02:59:51 +0000 (10:59 +0800)]
add 64-bit version of "bitselect"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fix the max group size for GT2.
Zhigang Gong [Fri, 30 Aug 2013 03:16:24 +0000 (11:16 +0800)]
Runtime: fix the max group size for GT2.

We should keep the max group size and the CL_KERNEL_WORK_GROUP_SIZE
consistency wit each other. Otherwise, the conformance test will trigger
an error.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
10 years agoGBE: We should set no predication/mask for EOT preparation.
Zhigang Gong [Thu, 29 Aug 2013 06:35:01 +0000 (14:35 +0800)]
GBE: We should set no predication/mask for EOT preparation.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>