contrib/beignet.git
11 years agotest case for function "rotate"
Homer Hsing [Thu, 27 Jun 2013 02:58:16 +0000 (10:58 +0800)]
test case for function "rotate"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
11 years agosupport build-in function "rotate"
Homer Hsing [Thu, 27 Jun 2013 02:58:15 +0000 (10:58 +0800)]
support build-in function "rotate"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
11 years agoRefine the get_local_id/... builtins.
Zhigang Gong [Tue, 25 Jun 2013 06:15:09 +0000 (14:15 +0800)]
Refine the get_local_id/... builtins.

As we could prepare correct value on runtime library side and give
a correct value in the payload for dim 0, 1 and 2. So for these 3
dim argument, we don't need to check it whether in the valid range,
we just read the payload's value.

This way, we can avoid any unecessary branching for normal usage of
these builtin functions. And could avoid a known bool related bug.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Sun, Yi <yi.sun@intel.com>
11 years agoutests: Add basic arithmetic test case
Ruiling Song [Wed, 26 Jun 2013 07:52:13 +0000 (15:52 +0800)]
utests: Add basic arithmetic test case

test case for + - * / % of data type (u)int8/16/32
remove duplicated cases.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Add more support of char and short arithmetic
Ruiling Song [Wed, 26 Jun 2013 07:52:12 +0000 (15:52 +0800)]
GBE: Add more support of char and short arithmetic

add * / % support of char and short type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agotest case for function "clz"
Homer Hsing [Wed, 26 Jun 2013 05:11:54 +0000 (13:11 +0800)]
test case for function "clz"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
11 years agosupport zero bit counting
Homer Hsing [Wed, 26 Jun 2013 05:11:53 +0000 (13:11 +0800)]
support zero bit counting

support OpenCL built-in function "clz", returning number of leading zero bits

add GEN GPU "fbh", "fbl" instructions, for counting zero bits

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
11 years agoAdd the test case for clGetProgramInfo API
Junyan He [Wed, 26 Jun 2013 03:34:20 +0000 (11:34 +0800)]
Add the test case for clGetProgramInfo API

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd the test case for clGetCommandQueueInfo API
Junyan He [Tue, 25 Jun 2013 10:15:32 +0000 (18:15 +0800)]
Add the test case for clGetCommandQueueInfo API

Because all the get clGetXXXInfo API have similar
structure in function type, we will integrate them
together, and rename the get_program_info.cpp to get_cl_info.cpp

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd the support of the API: clGetCommandQueueInfo
Junyan He [Tue, 25 Jun 2013 10:15:25 +0000 (18:15 +0800)]
Add the support of the API: clGetCommandQueueInfo

Though we support get the CL_QUEUE_PROPERTIES,
but because the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
and CL_QUEUE_PROFILING_ENABLE will never be set when
create the queue, it just return a all 0 bitfield now.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
11 years agoDocs: Rearrange documents according to wiki page on fd.o.
Zhigang Gong [Tue, 25 Jun 2013 10:04:53 +0000 (18:04 +0800)]
Docs: Rearrange documents according to wiki page on fd.o.

We use fd.o wiki to host our documents. To make the maintainence easier,
I change the direcotry and some links in the markdown files. And make
them the same as fd.o's.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Enable the local memory barrier test case.
Zhigang Gong [Tue, 25 Jun 2013 06:24:44 +0000 (14:24 +0800)]
utests: Enable the local memory barrier test case.

And fix one typo. Without the local memory barrier work around,
this test case will fail.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: work around the local memory barrier fence issue.
Zhigang Gong [Tue, 25 Jun 2013 06:22:43 +0000 (14:22 +0800)]
GBE: work around the local memory barrier fence issue.

According to the spec, we do not need to add a memory fence
for SLM's access. But, in pratice, we have to add it to make
sure get correct result. Don't know the root cause yet, just
use this work around now.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fix a bug in encoding MATH instruction
Ruiling Song [Mon, 24 Jun 2013 02:53:57 +0000 (10:53 +0800)]
GBE: Fix a bug in encoding MATH instruction

For std::vector, a push_back may cause memory relocation if no enough
memory in the vector pool. And iterator or pointer got previously will
become invalid after relocation.

Here in GenEncoder::next(), which will call push_back(), memory
relocation may occur. Then relocation will make 'insn' point to
invalid memory that does not belong to GenEncoder::store anymore.
Then, the setting of execution_width will fail.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Fix a typo.
Zhigang Gong [Mon, 24 Jun 2013 06:57:45 +0000 (14:57 +0800)]
utests: Fix a typo.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd one test case for clGetProgramInfo
Junyan He [Thu, 20 Jun 2013 05:58:14 +0000 (13:58 +0800)]
Add one test case for clGetProgramInfo

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix several typos in unit test.
Ruiling Song [Wed, 19 Jun 2013 02:04:54 +0000 (10:04 +0800)]
Fix several typos in unit test.

compiler_sub_bytes and compiler_sub_shorts

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoEnable generation of convert_ and as_ functions for double
Simon Richter [Wed, 19 Jun 2013 10:17:58 +0000 (12:17 +0200)]
Enable generation of convert_ and as_ functions for double

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoDefine double vector types
Simon Richter [Wed, 19 Jun 2013 10:17:57 +0000 (12:17 +0200)]
Define double vector types

Add the definition for the "doubleN" vector types

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoEnable cl_khr_fp64 extension for OpenCL stdlib header
Simon Richter [Wed, 19 Jun 2013 10:17:56 +0000 (12:17 +0200)]
Enable cl_khr_fp64 extension for OpenCL stdlib header

This allows the stdlib header to define overloads for doubles.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd long and ulong types to generated functions.
Simon Richter [Thu, 13 Jun 2013 11:15:52 +0000 (13:15 +0200)]
Add long and ulong types to generated functions.

This enables all generated functions for 64 bit integers.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: He Junyan <junyan.he@linux.intel.com>
11 years agoDefine all convert_* functions.
Simon Richter [Thu, 13 Jun 2013 11:15:51 +0000 (13:15 +0200)]
Define all convert_* functions.

These functions convert between vectors of the same length by casting each
member in turn.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: He Junyan <junyan.he@linux.intel.com>
11 years agoGenerate all supported as_* functions
Simon Richter [Thu, 13 Jun 2013 11:15:50 +0000 (13:15 +0200)]
Generate all supported as_* functions

This adds support for all reinterpreting type conversions currently
possible.

The conversion functions can be updated by invoking the
update_as.sh script.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: He Junyan <junyan.he@linux.intel.com>
11 years agoadd a lost special register name
Homer Hsing [Fri, 21 Jun 2013 01:48:32 +0000 (09:48 +0800)]
add a lost special register name

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agotest cases for 64-bit float
Homer Hsing [Fri, 21 Jun 2013 04:26:32 +0000 (12:26 +0800)]
test cases for 64-bit float

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
11 years agoSupport 64-bit float
Homer Hsing [Fri, 21 Jun 2013 04:26:31 +0000 (12:26 +0800)]
Support 64-bit float

support:
  arithmetic(+ - *)
  store load
  immediate_value
  if else
  select

other change:
  add "nib control" field in machine instruction format
  support "nib control"
  fix "directly store float-64 after load float-64".
  change hard coded store size (4) to flexible size (4 or 8)
  when using float-64 load(store), change to SIMD8

example:

/* support arithmetic store load immediate_value */
kernel void f(global double *src, global double *dst) {
  int i = get_global_id(0);
  double d = 1.234567890123456789;
  dst[i] = d * (src[i] + d);
}

/* support if else */
kernel void f(global float *src, global double *dst) {
  int i = get_global_id(0);
  float d = 1.234567890123456789f;
  if (i < 14)
    dst[i] = d * (d + src[i]);
  else
    dst[i] = 14;
}

/* support select */
kernel void f(global float *src, global double *dst) {
  int i = get_global_id(0);
  float d = 1.234567890123456789f;
  dst[i] = i < 14 ? d : 14;
}

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: Fixed one bug in scalarize pass
Zhigang Gong [Wed, 19 Jun 2013 10:30:47 +0000 (18:30 +0800)]
GBE: Fixed one bug in scalarize pass

I met segfault at void Scalarize::dce() randomly when I integrate
a openCL kernel to the chromium's GPU process. After discuss with
Yang Rong, I found one bug in this function. As it use two loops
to erase the dead instructions, but it doesn't set the pointer to
NULL at the first loop when it already erased the instruction. Thus
at the second loop, when it call (*i)->getParent, the (*i) may
already be deleted then it may refer a freed region and may cause
segfault.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoFix some piglit constant buffer tests fail.
Yang Rong [Wed, 19 Jun 2013 07:36:34 +0000 (15:36 +0800)]
Fix some piglit constant buffer tests fail.

If indirect move's source is scalrar reg, such as using cb[0] in kernel,
should not unpack.

Change test case compiler_function_constant0 to trigger this bug.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoupdate to OpenCL 1.1 header
Homer Hsing [Wed, 19 Jun 2013 06:49:51 +0000 (14:49 +0800)]
update to OpenCL 1.1 header

Based on our current progress, we plan to implement OpenCL 1.1 rather
than OpenCL 1.2 for the next release. Thus we downgrade the header file
to 1.1 in this commit.

put OpenCL 1.1 header in include/CL/

add OpenCL 1.2 defs, used by some code, by
 "#ifndef CL_VERSION_1.2"
 " some OpenCL 1.2 defs"
 "#endif"

add OCL_CREATE_IMAGE2D, OCL_CREATE_IMAGE3D,
    OCL_CREATE_GL_IMAGE2D, OCL_CREATE_GL_IMAGE3D

update test cases

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd the support of clGetProgramBuildInfo and clGetProgramInfo
Junyan He [Tue, 18 Jun 2013 08:44:49 +0000 (16:44 +0800)]
add the support of clGetProgramBuildInfo and clGetProgramInfo

For clGetProgramBuildInfo,
CL_BUILD_IN_PROGRESS not support now
and CL_PROGRAM_BUILD_LOG need do add the info collection
logic in backend and not support too, just return null
string now.
clGetProgramInfo all are fully supported.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
11 years agoSupport multi-source CL program
Ruiling Song [Wed, 19 Jun 2013 02:30:21 +0000 (10:30 +0800)]
Support multi-source CL program

Also fix several errors on clBuildProgram.

v2: minor refine of cl_program_release_sources()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@linux.intel.com>
11 years agoutests: Add a new local memory barrier case
Yang Rong [Tue, 18 Jun 2013 09:31:10 +0000 (17:31 +0800)]
utests: Add a new local memory barrier case

We fail this case right now, disalbe it. And need more work
to check the root cause. Change the local memory barrier can
pass it, but it doesn't comply with BSPEC which says SLM doesn't
need a memory fence.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoUtest: Add a test case for validating built-in function get_global_size()
Yi Sun [Thu, 13 Jun 2013 07:12:44 +0000 (15:12 +0800)]
Utest: Add a test case for validating built-in function get_global_size()

v1:
According to the OpenCL v1.1 & v1.2 chapter 6.11, the behavior of function get_global_size should be as following:

get_global_size(-1) = 1 (dimension:1)
get_global_size(0) = 3 (dimension:1)
get_global_size(1) = 1 (dimension:1)
get_global_size(2) = 1 (dimension:1)

get_global_size(-1) = 1 (dimension:2)
get_global_size(0) = 3 (dimension:2)
get_global_size(1) = 4 (dimension:2)
get_global_size(2) = 1 (dimension:2)
get_global_size(3) = 1 (dimension:2)

get_global_size(-1) = 1 (dimension:3)
get_global_size(0) = 3 (dimension:3)
get_global_size(1) = 4 (dimension:3)
get_global_size(2) = 5 (dimension:3)
get_global_size(3) = 1 (dimension:3)
get_global_size(4) = 1 (dimension:3)

if defined:
  globals[0] = 3;
  globals[1] = 4;
  globals[2] = 5;

v2:
    Handle different version.

    Add #if and #elif to make the test case be suitable to different version.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fixed GBE: Fix some builtin functions' return value.
Zhigang Gong [Mon, 17 Jun 2013 09:50:35 +0000 (17:50 +0800)]
GBE: Fixed GBE: Fix some builtin functions' return value.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Sun, Yi <yi.sun@intel.com>
11 years agoutests: enable test case for global memory barrier.
Zhigang Gong [Mon, 17 Jun 2013 09:35:44 +0000 (17:35 +0800)]
utests: enable test case for global memory barrier.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd memory fence before barrier to support global memory barrier.
Yang Rong [Mon, 17 Jun 2013 07:13:05 +0000 (15:13 +0800)]
Add memory fence before barrier to support global memory barrier.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd test case for null kernel arg of global/constant buffer
Ruiling Song [Fri, 14 Jun 2013 08:32:54 +0000 (16:32 +0800)]
add test case for null kernel arg of global/constant buffer

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in SetKernelArg() and support NULL buffer argument
Ruiling Song [Fri, 14 Jun 2013 08:32:53 +0000 (16:32 +0800)]
Refine error check in SetKernelArg() and support NULL buffer argument

1. refine error check in clSetKernelArg() to follow spec.
2. add support NULL buffer as argument, so user could write like below:
    __kernel void func(__global int * p1, constant int* p2) {
      if(p1) {
        //do some thing if p1 is not NULL.
      } else {
        //do other things if p1 is NULL
      }
    }
    Then calling clSetKernelArg(k, 0, sizeof(cl_mem), NULL);
3. as there maybe small possibility that we get a 0 starting address for bo,
   we add an assert() to make sure we do not get a 0 addressed bo.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in clCreateProgramFromSource() and clCreateKernel()
Ruiling Song [Thu, 13 Jun 2013 01:40:17 +0000 (09:40 +0800)]
Refine error check in clCreateProgramFromSource() and clCreateKernel()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in clCreateCommandQueue()
Ruiling Song [Thu, 13 Jun 2013 01:40:16 +0000 (09:40 +0800)]
Refine error check in clCreateCommandQueue()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Add test case for global memory barrier
Dag Lem [Mon, 10 Jun 2013 06:55:48 +0000 (08:55 +0200)]
utests: Add test case for global memory barrier

Disable this test case currently, as we don't support
global memory barrier now.

Signed-off-by: Dag Lem <dag@nimrod.no>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Added one test case for clFinish().
Edward Ching [Thu, 13 Jun 2013 07:05:13 +0000 (15:05 +0800)]
utests: Added one test case for clFinish().

Reviewed-by: Zou, Nanhai <nanhai.zou@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCL: Fix the bug in clfinish.
Zou, Nanhai [Thu, 13 Jun 2013 06:39:11 +0000 (14:39 +0800)]
CL: Fix the bug in clfinish.

The previous implementation forget to set the last batch
buffer, so it always does nothing at clFinish(). Now we
move the gpu sync to driver side, and set the last batch
to proper buffer object and make clFinish work as expected.

Reported and tested by: Edward Ching <edward.k.ching@gmail.com>

Signed-off-by: Zou, Nanhai <nanhai.zou@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix two tests fail when OCL_SIMD_WIDTH=8.
Yang Rong [Sat, 8 Jun 2013 04:33:37 +0000 (12:33 +0800)]
Fix two tests fail when OCL_SIMD_WIDTH=8.

Add barrier for compiler_local_memory and compiler_local_memory_two_ptr,
otherwise tests may fail if work group size bigger than thread's simd size.

After add barrier, the test compiler_local_memory is same as
compiler_local_memory_barrier, so delete test compiler_local_memory.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd mem_fence built-in.
Yang Rong [Sat, 8 Jun 2013 04:33:36 +0000 (12:33 +0800)]
Add mem_fence built-in.

According openCL spec, mem_fence is used to orders loads and stores of a work-item.
This is guarantee by hardware, so add empty mem_fence functions.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fixed a bug in register expieration.
Zhigang Gong [Sun, 9 Jun 2013 07:27:20 +0000 (15:27 +0800)]
GBE: Fixed a bug in register expieration.

Some special register which are preallocated at g0 and not
managed by our register allocator. So we need to ignore
them when process the register expieration. Otherwise,
it will trigger an assert latter.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoRefine error check in clCreateContext()
Ruiling Song [Thu, 6 Jun 2013 08:44:27 +0000 (16:44 +0800)]
Refine error check in clCreateContext()

v2: check is also needed in clCreateContextFromType()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in clGetDeviceIDs()
Ruiling Song [Thu, 6 Jun 2013 07:07:17 +0000 (15:07 +0800)]
Refine error check in clGetDeviceIDs()

move error check to api level. correctly handle mixed device type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in clGetPlatformIDs() and clGetPlatformInfo()
Ruiling Song [Thu, 6 Jun 2013 07:07:16 +0000 (15:07 +0800)]
Refine error check in clGetPlatformIDs() and clGetPlatformInfo()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoimprove disassembling GPU binary code
Homer Hsing [Thu, 6 Jun 2013 06:28:44 +0000 (14:28 +0800)]
improve disassembling GPU binary code

support 64-bit float data type
not print prefix zero for unsigned hex immediate constant
display SIMD mode, category, message type in data port message

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCheck for exhaustion of local memory
Dag Lem [Wed, 5 Jun 2013 17:57:26 +0000 (19:57 +0200)]
Check for exhaustion of local memory

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoPad instruction stream with 8 nops;
Zou Nan hai [Wed, 5 Jun 2013 01:09:57 +0000 (09:09 +0800)]
Pad instruction stream with 8 nops;

   According to Bspec, EU may access 8 instrunctions beyond the
   kernel program because of prefetch.
   Pad the instruction stream with 8 nop to avoid access into an
   invalide page.

Signed-off-by: Zou Nanhai <nanhai.zou@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdaptions for LLVM 3.3 / SPIR
Dag Lem [Mon, 3 Jun 2013 09:09:56 +0000 (11:09 +0200)]
Adaptions for LLVM 3.3 / SPIR

Handle the fact that several include files are moved from llvm/ to
llvm/IR/ in LLVM 3.3.

"__attribute__((always_inline)) inline" no longer works as intended,
and is replaced by "inline __attribute__((always_inline))".

For LLVM 3.3, the target is changed from "nvptx" to "spir", and
built-in address space qualifiers are used. For now, the built-in
types image2d_t, image3d_t, sampler_t, and event_t are overridden by
defines.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>
Tested-by: Yang, Rong R <rong.r.yang@intel.com>
11 years agoutests: change all kernels to unix style text.
Zhigang Gong [Wed, 5 Jun 2013 03:07:43 +0000 (11:07 +0800)]
utests: change all kernels to unix style text.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>
Tested-by: Yang, Rong R <rong.r.yang@intel.com>
11 years agoutest: Add test case for build-in function get_work_dim
Yi Sun [Tue, 4 Jun 2013 09:34:30 +0000 (17:34 +0800)]
utest: Add test case for build-in function get_work_dim

v2: Refine the case, verifying the result of function get_work_dim.

v3: Since the 16 work group size limitation is fixed, re-side the global size and local size with 1.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Add two builtin functions get_work_dim / get_global_offset.
Zhigang Gong [Fri, 31 May 2013 04:56:57 +0000 (12:56 +0800)]
GBE: Add two builtin functions get_work_dim / get_global_offset.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yi Sun <yi.sun@intel.com>
11 years agotest case for DW multiplication
Homer Hsing [Tue, 4 Jun 2013 07:52:13 +0000 (15:52 +0800)]
test case for DW multiplication

This case tests whether a predication bug of DW multiplication
had been fixed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoEnable built-in sub_sat/add_sat vector type.
Ruiling Song [Tue, 4 Jun 2013 06:45:34 +0000 (14:45 +0800)]
Enable built-in sub_sat/add_sat vector type.

also fix a typo in DECL_VECTOR_20P, DECL_VECTOR_30P macro.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: fixed a predication bug for DW multiplication.
Zhigang Gong [Tue, 4 Jun 2013 03:00:23 +0000 (11:00 +0800)]
GBE: fixed a predication bug for DW multiplication.

Per bspec:
  mul (8) acc0:d r2.0<8;8,1>:d r3.0<8;8,1>:d //All channels must be enabled
  mach (8) rTemp<1>:d r2.0<8;8,1>:d r3.0<8;8,1>:d //All channels must be enabled
  mov (8) r5.0<1>:d rTemp<8;8,1>:d // High 32 bits
  mov (8) r6.0<1>:d acc0:d // Low 32 bits

  The mul and mach instructions must have all channels enabled.
  The first mov should have channel enable from the destHI of IMUL,
  the second mov should have the channel enable from the destLO of IMUL.

We need to disable the predication and the mask rather than only set noMask to 1.
The strange thing here is for the first quarter, it seems we don't need to do so.
As change both quarter to this style will waste some registers which cause some
kernels fail to compile (compiler_box_blur.cl), I just change the second quater
to fully comply with bspec here. And in practice, it works fine with all unit
test cases and Homer's specific test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>
11 years agoAdd test case for group size
Ruiling Song [Fri, 31 May 2013 07:58:31 +0000 (15:58 +0800)]
Add test case for group size

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoSupport non-16 multiple group size.
Ruiling Song [Fri, 31 May 2013 07:58:08 +0000 (15:58 +0800)]
Support non-16 multiple group size.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fixed a 3 elements vector load/store bug.
Zhigang Gong [Fri, 31 May 2013 02:09:56 +0000 (10:09 +0800)]
GBE: Fixed a 3 elements vector load/store bug.

Per OpenCL spec, for 3-component vector data types,the
size of the data type is 4 * sizeof(component). And llvm
FE really cast a type3 data to type4 data for load/store
instruction, thus break our implementation. We need to
fixup it to the actual element size.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoFix a random assert caused by scalarize pass.
Yang Rong [Fri, 31 May 2013 07:19:09 +0000 (15:19 +0800)]
Fix a random assert caused by scalarize pass.

Revome the dead values in unit.valueMap at each begin of pass to avoid
the new value have some address.
Also fix a typo

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Add test case for box blur on image buffer
Dag Lem [Sat, 25 May 2013 08:14:17 +0000 (10:14 +0200)]
utests: Add test case for box blur on image buffer

This test demonstrates box blur on an image buffer, using an extremely
simple kernel.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect sampler address clamping for read image functions.
Dag Lem [Wed, 29 May 2013 21:49:40 +0000 (23:49 +0200)]
Correct sampler address clamping for read image functions.

Integer coordinates were converted to unsigned integers, yielding
incorrect clamping of negative coordinates.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fixed a hang issue on 64bit system.
Zhigang Gong [Thu, 30 May 2013 03:38:07 +0000 (11:38 +0800)]
GBE: Fixed a hang issue on 64bit system.

As the npos is a (size_t)-1, we have to change the start and
end to size_t type. Otherwise, the comparison will always
be true, and it will hang there forever.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Boqun Feng <boqun.feng@intel.com>
11 years agoPass user options to clang.
Yang Rong [Wed, 29 May 2013 03:20:47 +0000 (11:20 +0800)]
Pass user options to clang.

clang do not support all options in opencl.
clang unsupport options:
       -cl-denorms-are-zero, -cl-strict-aliasing
       -cl-no-signed-zeros, -cl-fp32-correctly-rounded-divide-sqrt
all support options, refer to clang/include/clang/Driver/Options.inc
Can ignore these options to avoid warning.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix int div/rem assert in SIMD8 mode.
Yang Rong [Wed, 29 May 2013 02:09:38 +0000 (10:09 +0800)]
Fix int div/rem assert in SIMD8 mode.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: test vector load and store.
Zhigang Gong [Fri, 24 May 2013 09:42:18 +0000 (17:42 +0800)]
utests: test vector load and store.

Add float4/short4/char4 test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: support load/store of char/short vector.
Zhigang Gong [Fri, 24 May 2013 09:40:54 +0000 (17:40 +0800)]
GBE: support load/store of char/short vector.

We just load/store those vector elements one by one.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoUpdate documents.
Zhigang Gong [Fri, 24 May 2013 04:02:15 +0000 (12:02 +0800)]
Update documents.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: use the simd width environment variable if set.
Zhigang Gong [Fri, 24 May 2013 04:02:14 +0000 (12:02 +0800)]
GBE: use the simd width environment variable if set.

If the user set OCL_SIMD_WIDTH to 8 or 16, we force the backend
to use it. Without this patch, the OCL_SIMD_WIDTH never have a
chance to take effect, as we always set the functions simdWidth
before we create a gen context.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: fixed a bug on simd8 mode for typed_write instruction.
Zhigang Gong [Fri, 24 May 2013 04:02:13 +0000 (12:02 +0800)]
GBE: fixed a bug on simd8 mode for typed_write instruction.

On simd8 mode, the total src length equal to 17 thus, we have
to adjust the max src num here, and adjust the bit width of
the srcNum from 4 to 5. After this fix, all the image/gl
related test case could pass on both simd8 and simd16 mode.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: Fixed a bug in byte gather/scatter.
Zhigang Gong [Thu, 23 May 2013 03:10:34 +0000 (11:10 +0800)]
GBE: Fixed a bug in byte gather/scatter.

We can't just use the alignment to determine whether use
the gather/scatter or not. As for a short type, the compiler
may also generate a 4 alignment, thus it will trigger this
bug.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGracefully handle unsupported systems
Mario Kicherer [Tue, 28 May 2013 18:25:39 +0000 (20:25 +0200)]
Gracefully handle unsupported systems

Gracefully handle cases with unsupported or unreachable GPUs.
Improved error handling in cl_get_device_ids.

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect clEnqueueReadBuffer, clEnqueueWriteBuffer and clEnqueueMapBuffer
Dag Lem [Sat, 25 May 2013 09:34:26 +0000 (11:34 +0200)]
Correct clEnqueueReadBuffer, clEnqueueWriteBuffer and clEnqueueMapBuffer

This implements handling of the offset parameter, and adds sanity
checks according to spec.

A bug is fixed in clEnqueueReadBuffer, where the buffer was not
unmapped after copying.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Correct box blur
Dag Lem [Sat, 25 May 2013 08:14:16 +0000 (10:14 +0200)]
utests: Correct box blur

The box blur test kernel incorrectly calculates the bottom extents of
the image. This yields visible blocking artifacts in the top of the
test image (since BMP images are stored bottom to top).

These calculations are corrected, other extent calculations are
simplified, and some dead code is removed.

The reference image is corrected accordingly, and is now identical to the
reference image for the box blur float buffer test.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoChange CMake to support LLVM 3.1.
Yang Rong [Thu, 23 May 2013 05:48:47 +0000 (13:48 +0800)]
Change CMake to support LLVM 3.1.

Also fix LLVM 3.1 build errors caused by my vector scalarize commit.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoChange clang system call to libclang api call.
Yang Rong [Thu, 23 May 2013 05:48:46 +0000 (13:48 +0800)]
Change clang system call to libclang api call.

The original call clang command directly as frontend. The implement is not very flexible.
I change to call libclang apis, support both clang 3.1 and clang 3.2.
Now still write the intermediate to the file, for code simply.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd ICD dispatch table in cloned kernels
Simon Richter [Wed, 22 May 2013 14:38:16 +0000 (16:38 +0200)]
Add ICD dispatch table in cloned kernels

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix several CL error code return bugs
Dag Lem [Wed, 22 May 2013 19:02:20 +0000 (21:02 +0200)]
Fix several CL error code return bugs

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix a scalarize pass bug.
Yang Rong [Wed, 22 May 2013 08:49:06 +0000 (16:49 +0800)]
Fix a scalarize pass bug.

Need to clear vectors and maps after finish one function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect clCreateImage(context, CL_MEM_COPY_HOST_PTR, ...)
Dag Lem [Wed, 22 May 2013 06:36:14 +0000 (08:36 +0200)]
Correct clCreateImage(context, CL_MEM_COPY_HOST_PTR, ...)

The current implementation of clCreateImage initializes images from
host memory using one of the functions cl_mem_copy_data_linear,
cl_mem_copy_data_tilex and cl_mem_copy_data_tiley.

This yields garbled images on some platforms, since the tiled formats
do not always correspond to the formats assumed by the functions
above.

This is fixed by replacing the functions above with the new function
cl_mem_copy_image, whichs maps tiled images in GTT mode for copying.

cl_mem_copy_image also implements missing 3D image copy functionality
(image buffers should also be copied correctly, if/when they are
allowed).

Signed-off-by: Dag Lem <dag@nimrod.no>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: extent get_image_size cases to other informations..
Zhigang Gong [Mon, 20 May 2013 08:41:28 +0000 (16:41 +0800)]
utests: extent get_image_size cases to other informations..

Extent it to test all the supported image informations.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoAdd more get image info functions.
Zhigang Gong [Mon, 20 May 2013 08:38:51 +0000 (16:38 +0800)]
Add more get image info functions.

Add get image depth/channel data type/channel order/dim support.
Now, only those functions for the unsupported image type have not
been implemented. The unsupported image types are as below:
image1d_t,image1d_buffer_t,image1d_array_t,image2d_array_t.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoCorrected return of error code in clCreateImage.
Dag Lem [Mon, 20 May 2013 13:08:39 +0000 (15:08 +0200)]
Corrected return of error code in clCreateImage.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoImplement clEnqueueReadImage and clEnqueueWriteImage.
Dag Lem [Mon, 20 May 2013 10:28:28 +0000 (12:28 +0200)]
Implement clEnqueueReadImage and clEnqueueWriteImage.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoImplement clEnqueueMapImage.
Dag Lem [Mon, 20 May 2013 10:28:27 +0000 (12:28 +0200)]
Implement clEnqueueMapImage.

For correct mapping and unmapping of tiled/untiled buffers, the new
functions cl_mem_map_auto and cl_mem_unmap_auto are used in
clEnqueueMapBuffer, clEnqueueMapImage, and clEnqueueUnmapMemObject.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCL: Fixed a get image info bug on 64 bit system.
Zhigang Gong [Mon, 20 May 2013 07:48:16 +0000 (15:48 +0800)]
CL: Fixed a get image info bug on 64 bit system.

Forgot to merge it with previous patch. Now commit this one
separately to fix the bug on 64 bit system.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <simon.richter@hogyros.de>
11 years agoAdd CL/*.hpp to installing files
Boqun Feng [Mon, 20 May 2013 03:55:58 +0000 (11:55 +0800)]
Add CL/*.hpp to installing files

The orignal CMake config only installed CL/*.h. As C++ binding is added,
CL/*.hpp also need to be installing files.

Signed-off-by: Boqun Feng <boqun.feng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoSave depth and slice_pitch in cl_mem images.
Dag Lem [Fri, 17 May 2013 16:06:13 +0000 (18:06 +0200)]
Save depth and slice_pitch in cl_mem images.

Signed-off-by: Dag Lem <dag@nimrod.no>
11 years agoutests: add one test case to test get_image_width/height.
Zhigang Gong [Fri, 17 May 2013 08:37:31 +0000 (16:37 +0800)]
utests: add one test case to test get_image_width/height.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCL: complete get_image_width/height support at runtime side.
Zhigang Gong [Fri, 17 May 2013 08:34:39 +0000 (16:34 +0800)]
CL: complete get_image_width/height support at runtime side.

We get each image's information's curbe slot whcih is
actually used at kernel side, then fill the corresponding
value.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Add support for get_image_width/height.
Zhigang Gong [Fri, 17 May 2013 08:31:17 +0000 (16:31 +0800)]
GBE: Add support for get_image_width/height.

Implement the new GEN IR instruction GetImageInfo. This new instruction
will allocate curbe slot for a given image surface only if it really
referred by a get_image_xxx function. As one image information element
only takes 4 bytes, it's relatively cheap to keep it in the curbe region.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: preare for get_image_xxx functions support.
Zhigang Gong [Thu, 16 May 2013 03:12:54 +0000 (11:12 +0800)]
GBE: preare for get_image_xxx functions support.

Only create the prototypes and the corresponding __gen_ocl_xxx
intrinsics. After this commit, the get_image_width/height could
pass the build, but will not generate code.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRemove useless vector check in GenWriter after scalarize pass.
Yang Rong [Fri, 17 May 2013 07:11:52 +0000 (15:11 +0800)]
Remove useless vector check in GenWriter after scalarize pass.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd a scalarize llvm pass.
Yang Rong [Fri, 17 May 2013 07:11:51 +0000 (15:11 +0800)]
Add a scalarize llvm pass.

In previous implementation expand vector ops in GenWrite,
it is hard to optimize. Now, I add new llvm pass to scalarize.
This pass will expand all normal vector ops to scalar ops,
except store/load, image read/write and function's argument.
Add fake ExtractElement/InsertElement instructions to avoid
dead instruction elimination, and unit valueMap hold the
relationship between these fake instructions and real
load/store instructions.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Refine the fill image0 test case to use map gtt.
Zhigang Gong [Thu, 16 May 2013 02:58:51 +0000 (10:58 +0800)]
utests: Refine the fill image0 test case to use map gtt.

Now we don't fill the whole image to a const color. we
fill it according to the coords. Then we can use map gtt
to get the result and verify the result easily on cpu
side.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoCL: Tell the kernel an image bo's tiling mode.
Zhigang Gong [Thu, 16 May 2013 02:56:20 +0000 (10:56 +0800)]
CL: Tell the kernel an image bo's tiling mode.

For an image bo allocation, we need to set its tiling mode thus
latter when we use map gtt to map its to a linear address space, the
kernel can do correct mapping. Otherwise, kernel will treat it as
not a tiled suface.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoStubs for C++ Bindings
Dag Lem [Thu, 16 May 2013 21:38:49 +0000 (23:38 +0200)]
Stubs for C++ Bindings

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>