contrib/beignet.git
11 years agoGBE: Fixed a bug in register expieration.
Zhigang Gong [Sun, 9 Jun 2013 07:27:20 +0000 (15:27 +0800)]
GBE: Fixed a bug in register expieration.

Some special register which are preallocated at g0 and not
managed by our register allocator. So we need to ignore
them when process the register expieration. Otherwise,
it will trigger an assert latter.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoRefine error check in clCreateContext()
Ruiling Song [Thu, 6 Jun 2013 08:44:27 +0000 (16:44 +0800)]
Refine error check in clCreateContext()

v2: check is also needed in clCreateContextFromType()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in clGetDeviceIDs()
Ruiling Song [Thu, 6 Jun 2013 07:07:17 +0000 (15:07 +0800)]
Refine error check in clGetDeviceIDs()

move error check to api level. correctly handle mixed device type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRefine error check in clGetPlatformIDs() and clGetPlatformInfo()
Ruiling Song [Thu, 6 Jun 2013 07:07:16 +0000 (15:07 +0800)]
Refine error check in clGetPlatformIDs() and clGetPlatformInfo()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoimprove disassembling GPU binary code
Homer Hsing [Thu, 6 Jun 2013 06:28:44 +0000 (14:28 +0800)]
improve disassembling GPU binary code

support 64-bit float data type
not print prefix zero for unsigned hex immediate constant
display SIMD mode, category, message type in data port message

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCheck for exhaustion of local memory
Dag Lem [Wed, 5 Jun 2013 17:57:26 +0000 (19:57 +0200)]
Check for exhaustion of local memory

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoPad instruction stream with 8 nops;
Zou Nan hai [Wed, 5 Jun 2013 01:09:57 +0000 (09:09 +0800)]
Pad instruction stream with 8 nops;

   According to Bspec, EU may access 8 instrunctions beyond the
   kernel program because of prefetch.
   Pad the instruction stream with 8 nop to avoid access into an
   invalide page.

Signed-off-by: Zou Nanhai <nanhai.zou@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdaptions for LLVM 3.3 / SPIR
Dag Lem [Mon, 3 Jun 2013 09:09:56 +0000 (11:09 +0200)]
Adaptions for LLVM 3.3 / SPIR

Handle the fact that several include files are moved from llvm/ to
llvm/IR/ in LLVM 3.3.

"__attribute__((always_inline)) inline" no longer works as intended,
and is replaced by "inline __attribute__((always_inline))".

For LLVM 3.3, the target is changed from "nvptx" to "spir", and
built-in address space qualifiers are used. For now, the built-in
types image2d_t, image3d_t, sampler_t, and event_t are overridden by
defines.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>
Tested-by: Yang, Rong R <rong.r.yang@intel.com>
11 years agoutests: change all kernels to unix style text.
Zhigang Gong [Wed, 5 Jun 2013 03:07:43 +0000 (11:07 +0800)]
utests: change all kernels to unix style text.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>
Tested-by: Yang, Rong R <rong.r.yang@intel.com>
11 years agoutest: Add test case for build-in function get_work_dim
Yi Sun [Tue, 4 Jun 2013 09:34:30 +0000 (17:34 +0800)]
utest: Add test case for build-in function get_work_dim

v2: Refine the case, verifying the result of function get_work_dim.

v3: Since the 16 work group size limitation is fixed, re-side the global size and local size with 1.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Add two builtin functions get_work_dim / get_global_offset.
Zhigang Gong [Fri, 31 May 2013 04:56:57 +0000 (12:56 +0800)]
GBE: Add two builtin functions get_work_dim / get_global_offset.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yi Sun <yi.sun@intel.com>
11 years agotest case for DW multiplication
Homer Hsing [Tue, 4 Jun 2013 07:52:13 +0000 (15:52 +0800)]
test case for DW multiplication

This case tests whether a predication bug of DW multiplication
had been fixed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoEnable built-in sub_sat/add_sat vector type.
Ruiling Song [Tue, 4 Jun 2013 06:45:34 +0000 (14:45 +0800)]
Enable built-in sub_sat/add_sat vector type.

also fix a typo in DECL_VECTOR_20P, DECL_VECTOR_30P macro.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: fixed a predication bug for DW multiplication.
Zhigang Gong [Tue, 4 Jun 2013 03:00:23 +0000 (11:00 +0800)]
GBE: fixed a predication bug for DW multiplication.

Per bspec:
  mul (8) acc0:d r2.0<8;8,1>:d r3.0<8;8,1>:d //All channels must be enabled
  mach (8) rTemp<1>:d r2.0<8;8,1>:d r3.0<8;8,1>:d //All channels must be enabled
  mov (8) r5.0<1>:d rTemp<8;8,1>:d // High 32 bits
  mov (8) r6.0<1>:d acc0:d // Low 32 bits

  The mul and mach instructions must have all channels enabled.
  The first mov should have channel enable from the destHI of IMUL,
  the second mov should have the channel enable from the destLO of IMUL.

We need to disable the predication and the mask rather than only set noMask to 1.
The strange thing here is for the first quarter, it seems we don't need to do so.
As change both quarter to this style will waste some registers which cause some
kernels fail to compile (compiler_box_blur.cl), I just change the second quater
to fully comply with bspec here. And in practice, it works fine with all unit
test cases and Homer's specific test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>
11 years agoAdd test case for group size
Ruiling Song [Fri, 31 May 2013 07:58:31 +0000 (15:58 +0800)]
Add test case for group size

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoSupport non-16 multiple group size.
Ruiling Song [Fri, 31 May 2013 07:58:08 +0000 (15:58 +0800)]
Support non-16 multiple group size.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fixed a 3 elements vector load/store bug.
Zhigang Gong [Fri, 31 May 2013 02:09:56 +0000 (10:09 +0800)]
GBE: Fixed a 3 elements vector load/store bug.

Per OpenCL spec, for 3-component vector data types,the
size of the data type is 4 * sizeof(component). And llvm
FE really cast a type3 data to type4 data for load/store
instruction, thus break our implementation. We need to
fixup it to the actual element size.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoFix a random assert caused by scalarize pass.
Yang Rong [Fri, 31 May 2013 07:19:09 +0000 (15:19 +0800)]
Fix a random assert caused by scalarize pass.

Revome the dead values in unit.valueMap at each begin of pass to avoid
the new value have some address.
Also fix a typo

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Add test case for box blur on image buffer
Dag Lem [Sat, 25 May 2013 08:14:17 +0000 (10:14 +0200)]
utests: Add test case for box blur on image buffer

This test demonstrates box blur on an image buffer, using an extremely
simple kernel.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect sampler address clamping for read image functions.
Dag Lem [Wed, 29 May 2013 21:49:40 +0000 (23:49 +0200)]
Correct sampler address clamping for read image functions.

Integer coordinates were converted to unsigned integers, yielding
incorrect clamping of negative coordinates.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Fixed a hang issue on 64bit system.
Zhigang Gong [Thu, 30 May 2013 03:38:07 +0000 (11:38 +0800)]
GBE: Fixed a hang issue on 64bit system.

As the npos is a (size_t)-1, we have to change the start and
end to size_t type. Otherwise, the comparison will always
be true, and it will hang there forever.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Boqun Feng <boqun.feng@intel.com>
11 years agoPass user options to clang.
Yang Rong [Wed, 29 May 2013 03:20:47 +0000 (11:20 +0800)]
Pass user options to clang.

clang do not support all options in opencl.
clang unsupport options:
       -cl-denorms-are-zero, -cl-strict-aliasing
       -cl-no-signed-zeros, -cl-fp32-correctly-rounded-divide-sqrt
all support options, refer to clang/include/clang/Driver/Options.inc
Can ignore these options to avoid warning.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix int div/rem assert in SIMD8 mode.
Yang Rong [Wed, 29 May 2013 02:09:38 +0000 (10:09 +0800)]
Fix int div/rem assert in SIMD8 mode.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: test vector load and store.
Zhigang Gong [Fri, 24 May 2013 09:42:18 +0000 (17:42 +0800)]
utests: test vector load and store.

Add float4/short4/char4 test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: support load/store of char/short vector.
Zhigang Gong [Fri, 24 May 2013 09:40:54 +0000 (17:40 +0800)]
GBE: support load/store of char/short vector.

We just load/store those vector elements one by one.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoUpdate documents.
Zhigang Gong [Fri, 24 May 2013 04:02:15 +0000 (12:02 +0800)]
Update documents.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: use the simd width environment variable if set.
Zhigang Gong [Fri, 24 May 2013 04:02:14 +0000 (12:02 +0800)]
GBE: use the simd width environment variable if set.

If the user set OCL_SIMD_WIDTH to 8 or 16, we force the backend
to use it. Without this patch, the OCL_SIMD_WIDTH never have a
chance to take effect, as we always set the functions simdWidth
before we create a gen context.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: fixed a bug on simd8 mode for typed_write instruction.
Zhigang Gong [Fri, 24 May 2013 04:02:13 +0000 (12:02 +0800)]
GBE: fixed a bug on simd8 mode for typed_write instruction.

On simd8 mode, the total src length equal to 17 thus, we have
to adjust the max src num here, and adjust the bit width of
the srcNum from 4 to 5. After this fix, all the image/gl
related test case could pass on both simd8 and simd16 mode.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGBE: Fixed a bug in byte gather/scatter.
Zhigang Gong [Thu, 23 May 2013 03:10:34 +0000 (11:10 +0800)]
GBE: Fixed a bug in byte gather/scatter.

We can't just use the alignment to determine whether use
the gather/scatter or not. As for a short type, the compiler
may also generate a 4 alignment, thus it will trigger this
bug.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoGracefully handle unsupported systems
Mario Kicherer [Tue, 28 May 2013 18:25:39 +0000 (20:25 +0200)]
Gracefully handle unsupported systems

Gracefully handle cases with unsupported or unreachable GPUs.
Improved error handling in cl_get_device_ids.

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect clEnqueueReadBuffer, clEnqueueWriteBuffer and clEnqueueMapBuffer
Dag Lem [Sat, 25 May 2013 09:34:26 +0000 (11:34 +0200)]
Correct clEnqueueReadBuffer, clEnqueueWriteBuffer and clEnqueueMapBuffer

This implements handling of the offset parameter, and adds sanity
checks according to spec.

A bug is fixed in clEnqueueReadBuffer, where the buffer was not
unmapped after copying.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Correct box blur
Dag Lem [Sat, 25 May 2013 08:14:16 +0000 (10:14 +0200)]
utests: Correct box blur

The box blur test kernel incorrectly calculates the bottom extents of
the image. This yields visible blocking artifacts in the top of the
test image (since BMP images are stored bottom to top).

These calculations are corrected, other extent calculations are
simplified, and some dead code is removed.

The reference image is corrected accordingly, and is now identical to the
reference image for the box blur float buffer test.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoChange CMake to support LLVM 3.1.
Yang Rong [Thu, 23 May 2013 05:48:47 +0000 (13:48 +0800)]
Change CMake to support LLVM 3.1.

Also fix LLVM 3.1 build errors caused by my vector scalarize commit.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoChange clang system call to libclang api call.
Yang Rong [Thu, 23 May 2013 05:48:46 +0000 (13:48 +0800)]
Change clang system call to libclang api call.

The original call clang command directly as frontend. The implement is not very flexible.
I change to call libclang apis, support both clang 3.1 and clang 3.2.
Now still write the intermediate to the file, for code simply.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd ICD dispatch table in cloned kernels
Simon Richter [Wed, 22 May 2013 14:38:16 +0000 (16:38 +0200)]
Add ICD dispatch table in cloned kernels

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix several CL error code return bugs
Dag Lem [Wed, 22 May 2013 19:02:20 +0000 (21:02 +0200)]
Fix several CL error code return bugs

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix a scalarize pass bug.
Yang Rong [Wed, 22 May 2013 08:49:06 +0000 (16:49 +0800)]
Fix a scalarize pass bug.

Need to clear vectors and maps after finish one function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect clCreateImage(context, CL_MEM_COPY_HOST_PTR, ...)
Dag Lem [Wed, 22 May 2013 06:36:14 +0000 (08:36 +0200)]
Correct clCreateImage(context, CL_MEM_COPY_HOST_PTR, ...)

The current implementation of clCreateImage initializes images from
host memory using one of the functions cl_mem_copy_data_linear,
cl_mem_copy_data_tilex and cl_mem_copy_data_tiley.

This yields garbled images on some platforms, since the tiled formats
do not always correspond to the formats assumed by the functions
above.

This is fixed by replacing the functions above with the new function
cl_mem_copy_image, whichs maps tiled images in GTT mode for copying.

cl_mem_copy_image also implements missing 3D image copy functionality
(image buffers should also be copied correctly, if/when they are
allowed).

Signed-off-by: Dag Lem <dag@nimrod.no>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: extent get_image_size cases to other informations..
Zhigang Gong [Mon, 20 May 2013 08:41:28 +0000 (16:41 +0800)]
utests: extent get_image_size cases to other informations..

Extent it to test all the supported image informations.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoAdd more get image info functions.
Zhigang Gong [Mon, 20 May 2013 08:38:51 +0000 (16:38 +0800)]
Add more get image info functions.

Add get image depth/channel data type/channel order/dim support.
Now, only those functions for the unsupported image type have not
been implemented. The unsupported image types are as below:
image1d_t,image1d_buffer_t,image1d_array_t,image2d_array_t.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoCorrected return of error code in clCreateImage.
Dag Lem [Mon, 20 May 2013 13:08:39 +0000 (15:08 +0200)]
Corrected return of error code in clCreateImage.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoImplement clEnqueueReadImage and clEnqueueWriteImage.
Dag Lem [Mon, 20 May 2013 10:28:28 +0000 (12:28 +0200)]
Implement clEnqueueReadImage and clEnqueueWriteImage.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoImplement clEnqueueMapImage.
Dag Lem [Mon, 20 May 2013 10:28:27 +0000 (12:28 +0200)]
Implement clEnqueueMapImage.

For correct mapping and unmapping of tiled/untiled buffers, the new
functions cl_mem_map_auto and cl_mem_unmap_auto are used in
clEnqueueMapBuffer, clEnqueueMapImage, and clEnqueueUnmapMemObject.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCL: Fixed a get image info bug on 64 bit system.
Zhigang Gong [Mon, 20 May 2013 07:48:16 +0000 (15:48 +0800)]
CL: Fixed a get image info bug on 64 bit system.

Forgot to merge it with previous patch. Now commit this one
separately to fix the bug on 64 bit system.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <simon.richter@hogyros.de>
11 years agoAdd CL/*.hpp to installing files
Boqun Feng [Mon, 20 May 2013 03:55:58 +0000 (11:55 +0800)]
Add CL/*.hpp to installing files

The orignal CMake config only installed CL/*.h. As C++ binding is added,
CL/*.hpp also need to be installing files.

Signed-off-by: Boqun Feng <boqun.feng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoSave depth and slice_pitch in cl_mem images.
Dag Lem [Fri, 17 May 2013 16:06:13 +0000 (18:06 +0200)]
Save depth and slice_pitch in cl_mem images.

Signed-off-by: Dag Lem <dag@nimrod.no>
11 years agoutests: add one test case to test get_image_width/height.
Zhigang Gong [Fri, 17 May 2013 08:37:31 +0000 (16:37 +0800)]
utests: add one test case to test get_image_width/height.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCL: complete get_image_width/height support at runtime side.
Zhigang Gong [Fri, 17 May 2013 08:34:39 +0000 (16:34 +0800)]
CL: complete get_image_width/height support at runtime side.

We get each image's information's curbe slot whcih is
actually used at kernel side, then fill the corresponding
value.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: Add support for get_image_width/height.
Zhigang Gong [Fri, 17 May 2013 08:31:17 +0000 (16:31 +0800)]
GBE: Add support for get_image_width/height.

Implement the new GEN IR instruction GetImageInfo. This new instruction
will allocate curbe slot for a given image surface only if it really
referred by a get_image_xxx function. As one image information element
only takes 4 bytes, it's relatively cheap to keep it in the curbe region.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: preare for get_image_xxx functions support.
Zhigang Gong [Thu, 16 May 2013 03:12:54 +0000 (11:12 +0800)]
GBE: preare for get_image_xxx functions support.

Only create the prototypes and the corresponding __gen_ocl_xxx
intrinsics. After this commit, the get_image_width/height could
pass the build, but will not generate code.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRemove useless vector check in GenWriter after scalarize pass.
Yang Rong [Fri, 17 May 2013 07:11:52 +0000 (15:11 +0800)]
Remove useless vector check in GenWriter after scalarize pass.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd a scalarize llvm pass.
Yang Rong [Fri, 17 May 2013 07:11:51 +0000 (15:11 +0800)]
Add a scalarize llvm pass.

In previous implementation expand vector ops in GenWrite,
it is hard to optimize. Now, I add new llvm pass to scalarize.
This pass will expand all normal vector ops to scalar ops,
except store/load, image read/write and function's argument.
Add fake ExtractElement/InsertElement instructions to avoid
dead instruction elimination, and unit valueMap hold the
relationship between these fake instructions and real
load/store instructions.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Refine the fill image0 test case to use map gtt.
Zhigang Gong [Thu, 16 May 2013 02:58:51 +0000 (10:58 +0800)]
utests: Refine the fill image0 test case to use map gtt.

Now we don't fill the whole image to a const color. we
fill it according to the coords. Then we can use map gtt
to get the result and verify the result easily on cpu
side.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoCL: Tell the kernel an image bo's tiling mode.
Zhigang Gong [Thu, 16 May 2013 02:56:20 +0000 (10:56 +0800)]
CL: Tell the kernel an image bo's tiling mode.

For an image bo allocation, we need to set its tiling mode thus
latter when we use map gtt to map its to a linear address space, the
kernel can do correct mapping. Otherwise, kernel will treat it as
not a tiled suface.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoStubs for C++ Bindings
Dag Lem [Thu, 16 May 2013 21:38:49 +0000 (23:38 +0200)]
Stubs for C++ Bindings

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoUpdated cl.hpp from http://www.khronos.org/registry/cl/api/1.2/cl.hpp
Dag Lem [Thu, 16 May 2013 21:38:48 +0000 (23:38 +0200)]
Updated cl.hpp from khronos.org/registry/cl/api/1.2/cl.hpp

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd samplerless read image functions for 2D and 3D images.
Dag Lem [Thu, 16 May 2013 21:07:34 +0000 (23:07 +0200)]
Add samplerless read image functions for 2D and 3D images.

Signed-off-by: Dag Lem <dag@nimrod.no>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd missing include of <limits>
Simon Richter [Wed, 15 May 2013 17:39:58 +0000 (19:39 +0200)]
Add missing include of <limits>

For std::numeric_limits<T>, the <limits> header is required.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix a negative number alignment bug in RegisterFilePartitioner allocate.
Yang Rong [Tue, 14 May 2013 02:39:12 +0000 (10:39 +0800)]
Fix a negative number alignment bug in RegisterFilePartitioner allocate.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoMake libgbm optional without EGL support
Simon Richter [Mon, 13 May 2013 21:07:06 +0000 (23:07 +0200)]
Make libgbm optional without EGL support

If EGL or GBM cannot be found, the EGL support is disabled, and then
neither library is required.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoUpdate gitignore files
Simon Richter [Mon, 13 May 2013 18:21:18 +0000 (20:21 +0200)]
Update gitignore files

 - Ignore CMake built files
 - Ignore .so files only in the subdirectory
 - Ignore generated .bmp files
 - Ignore generated config headers
 - Ignore generated source for OCL

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCorrect type of device properties
Simon Richter [Mon, 13 May 2013 18:21:17 +0000 (20:21 +0200)]
Correct type of device properties

 - CL_DEVICE_MAX_PARAMETER_SIZE is of type size_t
 - CL_DEVICE_MAX_WORK_GROUP_SIZE is of type size_t

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd clGetDeviceInfo(..., CL_BUILT_IN_KERNELS, ...)
Simon Richter [Mon, 13 May 2013 18:21:16 +0000 (20:21 +0200)]
Add clGetDeviceInfo(..., CL_BUILT_IN_KERNELS, ...)

Currently, there are no built-in kernels, so this function returns an empty
string.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoDefine clamp(value, lower, upper)
Simon Richter [Mon, 13 May 2013 18:21:15 +0000 (20:21 +0200)]
Define clamp(value, lower, upper)

The clamp(value, lower, upper) function is part of the standard library.

 - Define the function, using min() and max() on the lower level
 - Remove private definitions from kernels

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix out-of-date math macros
Homer Hsing [Thu, 16 May 2013 02:54:17 +0000 (10:54 +0800)]
Fix out-of-date math macros

Gen math function "tan" was replaced by "fdiv", and "sincos" was removed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang, Gong <zhigang.gong@linux.intel.com>
11 years agoTest new math built-in functions
Homer Hsing [Thu, 16 May 2013 02:54:16 +0000 (10:54 +0800)]
Test new math built-in functions

Use random test data.
Test 1000 times.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Feng, Boqun <boqun.feng@intel.com>
11 years agoEnable 39 math built-in functions
Homer Hsing [Thu, 16 May 2013 02:54:15 +0000 (10:54 +0800)]
Enable 39 math built-in functions

Enable math built-in functions, including:
  cospi, cbrt, tan, ilogb, nan, sincos, asin, asinpi, acos, acospi,
  atan, atanpi, copysign, erf, erfc, maxmag, minmag, hypot, sinpi,
  log1p, logb, tanpi, sinh, cosh, tanh, asinh, acosh, atanh, remainder,
  rint, fdim, fract, native_divide, pown, rootn, ldexp, exp2, exp10, expm1

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Feng, Boqun <boqun.feng@intel.com>
11 years agoFix instruction scheduler ScheduleDAGNode
Homer Hsing [Mon, 13 May 2013 07:05:29 +0000 (15:05 +0800)]
Fix instruction scheduler ScheduleDAGNode

When we add dependency for ScheduleDAGNode, we use
ScheduleDAGNode(node1)->children.push_back(node0) to express
node0 depends on node1.

To check this dependency, we should check whether node0 is inside
node1->children.

Original code checks whether node1 is inside node0->children.

So we fix this.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoRename CBMove to IndirectMove
Yang Rong [Thu, 9 May 2013 03:18:50 +0000 (11:18 +0800)]
Rename CBMove to IndirectMove

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutests: Fix a bug in movforphi test case.
Zhigang Gong [Wed, 15 May 2013 10:15:34 +0000 (18:15 +0800)]
utests: Fix a bug in movforphi test case.

This test case is to trigger a old MovForPHI bug, and it
just use read/write_image. But it has a bug in itself.
As in the kernel, the write image will only write the
first lane data not all the 16 lanes. As the previous
patch fix the write_image bug, thus now the write_image
work correctly and thus it only touch the first data element
thus it trigger the bug in this test case. Now fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: fixed a prediction bug in typed write instruction.
Zhigang Gong [Wed, 15 May 2013 03:47:08 +0000 (11:47 +0800)]
GBE: fixed a prediction bug in typed write instruction.

We need to put the header initialization and the LOD initialization
to no mask and no predication state. For all the other parts, we need
to enable mask and use the current predication state and need to set
the quater control properly. Otherwise, when write_image is called in
a condition-branch block, it will trigger this bug and doesn't write
data out correctly.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoGBE/Runtime: Optimize Sample/TypedWrite instruction.
Zhigang Gong [Mon, 13 May 2013 03:16:06 +0000 (11:16 +0800)]
GBE/Runtime: Optimize Sample/TypedWrite instruction.

This commit does two major things as below:
1. Allocate image surface at compile time, and add new gbe interfaces to let runtime know
how many image surfaces we have, and the image allocation informations. Thus the runtime
library know how to bind those image surfaces.

2. As now for both image and sampler, at compile time, we know the eaxct binding table
index. We no longer need to get those index from the input argument(curbe) and prepare
the desc to the architecture register. We can use imm as the desc thus we can save
4 out of 4 instructions for SampleInstruction and save 2 out of 12 instructions for
the TypedWriteInstruction.

This patch is also a major prepartion for the get_image_width/height/... functions.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoGBE: concentrate all samplers' allocation at compile time.
Zhigang Gong [Mon, 13 May 2013 03:00:38 +0000 (11:00 +0800)]
GBE: concentrate all samplers' allocation at compile time.

This is the first step to do image/sampler allocation fully
at compile time. Thus we can determine all the sampler id and image
bti index at compile time. So it can make the following things
easier or faster:

1. After we finish both image/sampler, we can treat all image bti and sampler
   as constant and can get their value when we encode the Sampler and TypedWrite
   instructions. Then we don't need to compute the message header at runtime which
   cost 3 instructions each call.

2. get image width/height/depth. As we know the surface bti at compile time,
   we can put those data at specified curbe entry and generate correct indirect
   register access to get those information at compile time.

This is the first step. And just finish the sampler part. Now all the
samplers including those defeined in kernel arguments will be allocated
at compile time. At runtime, it just need to fill in the sampler value
into the proper slot which map to the specified input argument. Then the
driver will create and bind the sampler to the correct slot.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoGBE: add scalar register support in loadImmInstruction.
Zhigang Gong [Thu, 9 May 2013 11:30:58 +0000 (19:30 +0800)]
GBE: add scalar register support in loadImmInstruction.

There is a slight possibility that the destination register
is a scalar register. We need to check it here.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoGBE: remove sampler address space.
Zhigang Gong [Thu, 9 May 2013 06:37:09 +0000 (14:37 +0800)]
GBE: remove sampler address space.

As now sampler_t is a normal integer data type, we don't
need the sampler address space any more.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoutests: Add one test cases for sampler support.
Zhigang Gong [Tue, 7 May 2013 06:29:44 +0000 (14:29 +0800)]
utests: Add one test cases for sampler support.

This new case tests define sampler in kernel side and in the
kernel argument.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoCL: Support kernel side defined samplers.
Zhigang Gong [Tue, 7 May 2013 06:26:54 +0000 (14:26 +0800)]
CL: Support kernel side defined samplers.

We changed the way to handle samplers. We gather all the kernel side
defined samplers and those sampler in kernel argument into one samplers
array. And don't allocate one single sampler each time.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agoGBE: refine the sampler implementation to comply with spec.
Zhigang Gong [Tue, 7 May 2013 06:15:45 +0000 (14:15 +0800)]
GBE: refine the sampler implementation to comply with spec.

The previous implementation is to use a new address space pointer to
represent a sampler. The reason is that there is no specified data
type for sampler_t in LLVM front end thus we can't determine the
sampler argument type if we use a normal interger to represnet the
sampler. But that breaks the OCL spec, the spec allows the kernel
to define and initialize sampler variables in kernel side.

Now I use a little tricky way to fix this problem. First, I decide
to use normal unsigned interger to represent sampler_t in kernel side.
Then at compile time, I check read_imagexxx function's sampler
arguments. If the argument is a constant value, then it should be a
kernel side defined sampler, then I insert the sampler type into a
global sampler set for the current kernel function. If the argument
is not a constant value, then I will check whether it's a kernel
argument, if it is, then I fix up the corresponding kernel arg type
to SAMPLER there.

To unify the kernel side defined sampler and kernel argument sampler,
I add two new gbe API. To export all the kernel side defined sampler
data and size to the runtime library. Then latter, the runtime library
can use this information to append new sampler to the unified sampler
buffer and bind all the sampler at one time.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
11 years agotest cases for image3d_t
Homer Hsing [Mon, 6 May 2013 00:45:52 +0000 (08:45 +0800)]
test cases for image3d_t

test cases for image3d_t

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoenable image3d_t
Homer Hsing [Mon, 6 May 2013 00:45:51 +0000 (08:45 +0800)]
enable image3d_t

enable image3d_t

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd third coord in backend
Homer Hsing [Wed, 8 May 2013 08:05:16 +0000 (16:05 +0800)]
add third coord in backend

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd backend symbol alias for image3d read write
Homer Hsing [Mon, 6 May 2013 00:45:49 +0000 (08:45 +0800)]
add backend symbol alias for image3d read write

add backend symbol alias for image3d read write

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd image3d read/write in stdlib.h
Homer Hsing [Mon, 6 May 2013 00:45:48 +0000 (08:45 +0800)]
add image3d read/write in stdlib.h

add image3d read/write in stdlib.h

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoclEnqueueNDRangeKernel: fix for segfault caused by empty local_work_size
Mario Kicherer [Tue, 7 May 2013 19:57:37 +0000 (21:57 +0200)]
clEnqueueNDRangeKernel: fix for segfault caused by empty local_work_size

Without this fix, an empty local_work_size that is allowed by specification
causes a segfault. Merged the block with a further check below.

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoCMakeLists.txt enable custom C/CXXFLAGS
Mario Kicherer [Tue, 7 May 2013 19:57:36 +0000 (21:57 +0200)]
CMakeLists.txt enable custom C/CXXFLAGS

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoGBE: fixed the hard coded implementation for sampler/typedwrite.
Zhigang Gong [Wed, 8 May 2013 05:00:02 +0000 (13:00 +0800)]
GBE: fixed the hard coded implementation for sampler/typedwrite.

This is one preparation for the image3d support.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Homer Hsing <homer.xing@intel.com>
11 years agoenable clGetContextInfo with CL_CONTEXT_DEVICES
Mario Kicherer [Mon, 6 May 2013 15:49:50 +0000 (17:49 +0200)]
enable clGetContextInfo with CL_CONTEXT_DEVICES

The following patch enables the clGetContextInfo query with CL_CONTEXT_DEVICES.
Applications can query the required size of the result buffer and then request
the cl_device_id of the available devices in this context.

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix compile error when use llvm-3.1 and InstVisitor.h path for llvm-3.3
Chuanbo Weng [Fri, 3 May 2013 15:56:24 +0000 (23:56 +0800)]
Fix compile error when use llvm-3.1 and InstVisitor.h path for llvm-3.3

DataLayout.h only exist in llvm-3.2. InstVisitor.h is also contained
in 'llvm/Support/InstVisitor.h' directory in llvm-3.0, 3.1 and 3.2,
but is contained in 'llvm/InstVisitor.h'. argID and PAL is needed in
llvm-3.1. After applying this patch, beignet can build successfully
and all test cases can run successfully using llvm-3.1.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agotest cases for global constant arrays
Homer Hsing [Thu, 2 May 2013 03:12:20 +0000 (11:12 +0800)]
test cases for global constant arrays

version 4

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoSupport global constant arrays
Homer Hsing [Thu, 2 May 2013 01:00:31 +0000 (09:00 +0800)]
Support global constant arrays

Version 3.
Support global constant arrays defined outside any kernel.

Example:

constant int h[] = {71,72,73,74,75,76,77};
kernel void k(global int *dst) {
  int i = get_global_id(0);
  dst[i] = h[i % 7];
}

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoir::unit can return its constantSet
Homer Hsing [Thu, 2 May 2013 01:00:30 +0000 (09:00 +0800)]
ir::unit can return its constantSet

Version 3.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd special register constoffst expressing curbe offset
Homer Hsing [Thu, 2 May 2013 01:00:29 +0000 (09:00 +0800)]
add special register constoffst expressing curbe offset

Version 3.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agodisable buggy old code doing global constant
Homer Hsing [Thu, 2 May 2013 01:00:28 +0000 (09:00 +0800)]
disable buggy old code doing global constant

Version 3.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd helper functions in gbe::Program
Homer Hsing [Thu, 2 May 2013 01:00:27 +0000 (09:00 +0800)]
add helper functions in gbe::Program

Version 3.
for get global constant related information

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd helper functions in ir::Constant and ir::ConstantSet
Homer Hsing [Thu, 2 May 2013 01:00:26 +0000 (09:00 +0800)]
add helper functions in ir::Constant and ir::ConstantSet

Version 3.
Deleted empty destructor of ConstantSet.
Updated ConstantSet::getConstant()

And I am afraid I cannot delete the empty constructor of ConstantSet.
That will break the constructor of ir::Unit. See unit.cpp, line 30.

add ir::Constant.reg field
add helper functions in ir::ConstantSet to get specified ir::Constant
and binary packed constant array

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoNew test case of ceil()
Homer Hsing [Sat, 27 Apr 2013 06:13:32 +0000 (14:13 +0800)]
New test case of ceil()

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoMake ceil() work
Homer Hsing [Sat, 27 Apr 2013 06:13:26 +0000 (14:13 +0800)]
Make ceil() work

Delete an extra comma in ocl_stdlib.h, then ceil() works. Fun

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd the INCLUDE_DIRECTORIES directive to all the FindXXX modules.
Junyan He [Fri, 26 Apr 2013 23:56:55 +0000 (07:56 +0800)]
Add the INCLUDE_DIRECTORIES directive to all the FindXXX modules.

If the header files are not at the standard location, we will FAIL
to compile because FIND_PATH directive will not add the Include Dir
to CXX flags. Add the INCLUDE_DIRECTORIES to handle this if we find
the headers we neeeds.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoImplement clGetDeviceInfo(..., CL_DRIVER_VERSION, ...)
Simon Richter [Fri, 19 Apr 2013 11:39:12 +0000 (13:39 +0200)]
Implement clGetDeviceInfo(..., CL_DRIVER_VERSION, ...)

This returns the library major/minor version. As it does not follow the
usual naming scheme, the output code is duplicated.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoHandle size queries in clGetDeviceInfo better
Simon Richter [Fri, 19 Apr 2013 11:39:11 +0000 (13:39 +0200)]
Handle size queries in clGetDeviceInfo better

This allows size queries on all fields.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>