contrib/beignet.git
10 years agoAdd clEnqueueReadBufferRect api.
Yang Rong [Wed, 4 Sep 2013 08:58:07 +0000 (16:58 +0800)]
Add clEnqueueReadBufferRect api.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoCL: Enalbe gl sharing with new egl extension.
Zhigang Gong [Mon, 26 Aug 2013 14:45:48 +0000 (22:45 +0800)]
CL: Enalbe gl sharing with new egl extension.

The previous implementation is only for 2d/3d texture sharing and
is implemented in a hacky fashinon. We need to replace it with a
clean and complete one. We introduce a new egl extension to export
low level layout information of a buffer object/texture/render buffer
from the mesa dri driver to the cl driver layer. As the extension is
not accpepted by mesa, we have to implement this new extension in
beignet internally.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: He Junyan <junyan.he@inbox.com>
10 years agoRuntime: Only return the format allowed in the spec.
Zhigang Gong [Wed, 4 Sep 2013 07:04:03 +0000 (15:04 +0800)]
Runtime: Only return the format allowed in the spec.

For the CL_INTENSITY and CL_LUMINANCE, it only supports
CL_UNORM_INT8,CL_UNORM_INT16, CL_SNORM_INT8, CL_SNORM_INT16,
CL_HALF_FLOAT or CL_FLOAT.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: silent the compilation warning when generate the pch file.
Zhigang Gong [Tue, 3 Sep 2013 10:01:32 +0000 (18:01 +0800)]
GBE: silent the compilation warning when generate the pch file.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
10 years agofix 64-bit "clz" if parameter is "long4" or "ulong4"
Homer Hsing [Wed, 4 Sep 2013 02:33:39 +0000 (10:33 +0800)]
fix 64-bit "clz" if parameter is "long4" or "ulong4"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement constant buffer based on constant cache.
Ruiling Song [Wed, 4 Sep 2013 06:24:54 +0000 (14:24 +0800)]
Implement constant buffer based on constant cache.

Currently, simply allocate enough graphics memory as constant memory space.
And bind it to bti 2. Constant cache read are backed by dword scatter read.
Different from other data port messages, the address need to be dword aligned,
and the addresses are in units of dword.

The constant address space data are placed in order: first global constant,
then the constant buffer kernel argument.

v2: change function & variable naming, to make clear 'curbe' and 'constant buffer'

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix atomic_xchg float type error.
Yang Rong [Wed, 4 Sep 2013 05:55:23 +0000 (13:55 +0800)]
Fix atomic_xchg float type error.

Also refine the "\" of some atomic macro.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUtests: Enable bool_cross_basic_block.
Zhigang Gong [Wed, 4 Sep 2013 06:35:18 +0000 (14:35 +0800)]
Utests: Enable bool_cross_basic_block.

And put it to the category with known issues. It will be run
when invoke the the utests as below:
./utest_run -a

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUtests_run: Add known issue cases support.
Yi Sun [Wed, 4 Sep 2013 02:48:48 +0000 (10:48 +0800)]
Utests_run: Add known issue cases support.

Add some arguments:
  -c <casename>: run sub-case named 'casename'
  -l           : list all the available case name
  -a           : run all test cases
  -n           : run all test cases without known issue
  -h           : display this usage

Add a alternate macro named MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE to register a new test case, which has some known issue to be fixed till now.
While utest_run running, only cases which registered by MAKE_UTEST_FROM_FUNCTION will be involved by defalut.
If you want to run all the test cases including those with known issue, you should use argument '-a'.
Besides, you can use option '-c' to run any test case.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fix the incorrect global mem size.
Zhigang Gong [Fri, 30 Aug 2013 05:23:06 +0000 (13:23 +0800)]
Runtime: fix the incorrect global mem size.

The max_mem_alloc_size is 128M, we should set global mem size
less or equal to it. May be we can set both of them to much
larger than 128M in the future. For now, just set it to 128MB.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoChange constant unit test to cover 4 byte data type.
Ruiling Song [Tue, 3 Sep 2013 07:42:37 +0000 (15:42 +0800)]
Change constant unit test to cover 4 byte data type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Enable DWord scatter gather message for constant cache read.
Ruiling Song [Tue, 3 Sep 2013 07:42:35 +0000 (15:42 +0800)]
GBE: Enable DWord scatter gather message for constant cache read.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix GPU data type for 16-bit moving
Homer Hsing [Wed, 4 Sep 2013 01:18:20 +0000 (09:18 +0800)]
fix GPU data type for 16-bit moving

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutest: memset the output buffer to fix random fail.
Ruiling Song [Tue, 3 Sep 2013 07:39:56 +0000 (15:39 +0800)]
utest: memset the output buffer to fix random fail.

the inactive lanes will not modify corresponding output.
So, output buffer needs initialization to 0.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Support builtin vector functions for select() autogeneration.
Zhigang Gong [Tue, 3 Sep 2013 06:30:46 +0000 (14:30 +0800)]
GBE: Support builtin vector functions for select() autogeneration.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
10 years agoadd same type "convert_*(*)"
Homer Hsing [Tue, 3 Sep 2013 03:13:18 +0000 (11:13 +0800)]
add same type "convert_*(*)"

add some versions of "convert_*(*)" converting same-type parameter

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix 32-bit signed version of "sub_sat"
Homer Hsing [Tue, 3 Sep 2013 00:41:28 +0000 (08:41 +0800)]
fix 32-bit signed version of "sub_sat"

This patch makes following piglit test case pass.
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-int-sub_sat-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "rotate"
Homer Hsing [Tue, 3 Sep 2013 00:20:35 +0000 (08:20 +0800)]
add 64-bit version of "rotate"

tested by piglit. following test cases pass.
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-rotate-1.0.generated.cl
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-rotate-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "clz"
Homer Hsing [Mon, 2 Sep 2013 08:33:23 +0000 (16:33 +0800)]
add 64-bit version of "clz"

this patch passes following piglit test cases:

 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-clz-1.0.generated.cl
 piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-long-clz-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix 8-bit version of "clz"
Homer Hsing [Mon, 2 Sep 2013 08:21:25 +0000 (16:21 +0800)]
fix 8-bit version of "clz"

fix a typo in ocl_stdlib.tmpl.h
fix instruction type of 8-bit moving

this patch is tested by piglit
following two test cases has passed:
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-char-clz-1.0.generated.cl
  piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-uchar-clz-1.0.generated.cl

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "shuffle", "shuffle2"
Homer Hsing [Mon, 2 Sep 2013 05:42:35 +0000 (13:42 +0800)]
add 64-bit version of "shuffle", "shuffle2"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd scalar version of "convert_*(*)"
Homer Hsing [Mon, 2 Sep 2013 05:21:47 +0000 (13:21 +0800)]
Add scalar version of "convert_*(*)"

Scalar version of "convert_*(*)" was missing.
This patch adds scalar version.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix scalar type built-in function "select"
Homer Hsing [Mon, 2 Sep 2013 04:32:40 +0000 (12:32 +0800)]
fix scalar type built-in function "select"

add some missing scalar type version

v2: third parameter of "select" cannot be "float"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64-bit version of "bitselect"
Homer Hsing [Mon, 2 Sep 2013 02:59:51 +0000 (10:59 +0800)]
add 64-bit version of "bitselect"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fix the max group size for GT2.
Zhigang Gong [Fri, 30 Aug 2013 03:16:24 +0000 (11:16 +0800)]
Runtime: fix the max group size for GT2.

We should keep the max group size and the CL_KERNEL_WORK_GROUP_SIZE
consistency wit each other. Otherwise, the conformance test will trigger
an error.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
10 years agoGBE: We should set no predication/mask for EOT preparation.
Zhigang Gong [Thu, 29 Aug 2013 06:35:01 +0000 (14:35 +0800)]
GBE: We should set no predication/mask for EOT preparation.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoRuntime: initialize single fp mode correctly.
Zhigang Gong [Thu, 29 Aug 2013 02:47:35 +0000 (10:47 +0800)]
Runtime: initialize single fp mode correctly.

According to opencl spec,
The mandated minimum single precision floating-point capability given by
CL_DEVICE_SINGLE_FP_CONFIG is CL_FP_ROUND_TO_ZERO or CL_FP_ROUND_TO_NEAREST.
We set the single float mode to IEEE 754 and set the rounding mode
to RTN.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoRuntime: vendor specified information is required for CL_DEVICE_VERSION/OPENCL_C_VERSION.
Zhigang Gong [Thu, 29 Aug 2013 02:47:34 +0000 (10:47 +0800)]
Runtime: vendor specified information is required for CL_DEVICE_VERSION/OPENCL_C_VERSION.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoRuntime: clEnqueueMapImage also need to maintain the mapped images.
Zhigang Gong [Fri, 30 Aug 2013 09:19:40 +0000 (17:19 +0800)]
Runtime: clEnqueueMapImage also need to maintain the mapped images.

v3: Use cl_mem_unmap_gtt rather than cl_mem_unmap_auto in function _cl_map_mem.
v2: merge with:

commit 0237652c579123436e5f48514f733e36c8b5264a
Author: Yang Rong <rong.r.yang@intel.com>
Date:   Fri Aug 23 11:04:21 2013 +0800

    Add clEnqueueMapBuffer and clEnqueueMapImage non-blocking map support.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Revied-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: null register could be used as src1.
Zhigang Gong [Mon, 2 Sep 2013 04:48:04 +0000 (12:48 +0800)]
GBE: null register could be used as src1.

We should not assert if null register is used as src1.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: add some macros for atom_xxx builtin functions.
Zhigang Gong [Fri, 30 Aug 2013 03:16:23 +0000 (11:16 +0800)]
GBE: add some macros for atom_xxx builtin functions.

The atom_xxx APIs are on OpenCL spec 1.0, but the conformance test suite
will tes them anyway.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
10 years agoGBE: don't use flag register as src 1 for xor instruction.
Zhigang Gong [Thu, 29 Aug 2013 06:27:05 +0000 (14:27 +0800)]
GBE: don't use flag register as src 1 for xor instruction.

Gen doesn't support to use ARF as src1. This bug is reported by
Edward Ching <edward.k.ching@gmail.com>.

v2: add an assert at setSrc1 to check whether we encode an instruction which
is using ARF as SRC1.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Edward Ching <dward.k.ching@gmail.com>
10 years agoCorrect event type' typo.
Yang Rong [Thu, 29 Aug 2013 05:07:38 +0000 (13:07 +0800)]
Correct event type' typo.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoadd a space to make the error more readable
Lu Guanqun [Wed, 28 Aug 2013 02:16:57 +0000 (10:16 +0800)]
add a space to make the error more readable

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fix the incorrect platform info size (conformance).
Zhigang Gong [Wed, 28 Aug 2013 08:53:36 +0000 (16:53 +0800)]
Runtime: fix the incorrect platform info size (conformance).

As sizeof(str) already includes the '\0', we should not add 1
on the return size. Conformance case computeinfo could pass with
this patch.

(28-Aug 16:51:00)     BEGIN  Compute Info                            :
           ==>  CL_DEVICE_ERROR_CORRECTION_SUPPORT == 0
           ==>  CL_DEVICE_ERROR_CORRECTION_SUPPORT == 0
           ==>  CL_DEVICE_ERROR_CORRECTION_SUPPORT == 0
               PASSED computeinfo.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoFix utest compiler_group_size4 error.
Ruiling Song [Fri, 30 Aug 2013 08:29:32 +0000 (16:29 +0800)]
Fix utest compiler_group_size4 error.

Per opencl spec, bitfield is not supported.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoChange event test case to cover clEnqueueMapBuffer.
Yang Rong [Fri, 23 Aug 2013 03:04:22 +0000 (11:04 +0800)]
Change event test case to cover clEnqueueMapBuffer.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd clEnqueueMapBuffer and clEnqueueMapImage non-blocking map support.
Yang Rong [Fri, 23 Aug 2013 03:04:21 +0000 (11:04 +0800)]
Add clEnqueueMapBuffer and clEnqueueMapImage non-blocking map support.

There is a unsync map function drm_intel_gem_bo_map_unsynchronized in drm, that can
be used to do non-blocking map. But this function only map gtt, so force to use map
gtt for all clEnqueueMapBuffer and clEnqueueMapImage.

V2: refined comment, and using map_gtt_unsync in clEnqueueMapBuffer/Image
    instead of map_auto to avoid confuse.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd pfn_notify support in clCreateContext.
Yang Rong [Mon, 26 Aug 2013 07:44:55 +0000 (15:44 +0800)]
Add pfn_notify support in clCreateContext.

Remove assert in cl_create_context when pfn_notify is not NULL,
and save it, but don't used now.
Per spec, driver should call it when devices becomes unavailable.
Now driver doesn't check the device status.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd built-in function "lgamma", "lgamma_r"
Homer Hsing [Mon, 26 Aug 2013 04:51:53 +0000 (12:51 +0800)]
add built-in function "lgamma", "lgamma_r"

also include test cases

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd built-in function "tgamma"
Homer Hsing [Mon, 26 Aug 2013 02:20:33 +0000 (10:20 +0800)]
add built-in function "tgamma"

also include a test case

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoimprove built-in function "sinpi"
Homer Hsing [Fri, 30 Aug 2013 07:24:42 +0000 (15:24 +0800)]
improve built-in function "sinpi"

"sinpi" was calculated as "sin(pi * x)".
But that was not a quite-good way.
This patch improved the function, also included a test case.

v2: fix compiling warning

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoCL: Refactor cl_mem's implementation.
Zhigang Gong [Mon, 26 Aug 2013 14:45:47 +0000 (22:45 +0800)]
CL: Refactor cl_mem's implementation.

The buffer object is much simpler than the image object.
We'd better to not use the same big data structure for
both objects.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
10 years agoAdd a test case that trigger a known bug.
Chuanbo Weng [Thu, 22 Aug 2013 11:23:38 +0000 (19:23 +0800)]
Add a test case that trigger a known bug.

This unit test case trigger a known bug:
ASSERTION FAILED: TODO Boolean values cannot escape their definition
basic block.

Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests: Add a unit test for non-aligned group size.
Ruiling Song [Thu, 22 Aug 2013 08:52:05 +0000 (16:52 +0800)]
utests: Add a unit test for non-aligned group size.

To hit prediction logic.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Clear Flag register to fix a gpu hang.
Ruiling Song [Thu, 22 Aug 2013 08:52:04 +0000 (16:52 +0800)]
GBE: Clear Flag register to fix a gpu hang.

When group size not aligned to simdWidth, prediction any8/16h will
calculate pmask also using flag register bits mapped to non-active
lanes. As flag register is not cleared by default, any8/16h used
for jmpi instruction may cause wrong jump, and possibly infinite loop.

So, we clear Flag register to 0 to make any8/16h prediction work correct.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: disable cl_khr_fp64.
Zhigang Gong [Wed, 21 Aug 2013 09:18:04 +0000 (17:18 +0800)]
GBE: disable cl_khr_fp64.

As the double support is incomplete currently, we disable it.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agolist all available utests' names
Lu Guanqun [Tue, 20 Aug 2013 07:01:22 +0000 (15:01 +0800)]
list all available utests' names

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agorename ulong to ulong64 to avoid the conflicts in <sys/types.h>
Lu Guanqun [Tue, 20 Aug 2013 06:45:15 +0000 (14:45 +0800)]
rename ulong to ulong64 to avoid the conflicts in <sys/types.h>

[ 31%] Building CXX object utests/CMakeFiles/utests.dir/compiler_abs_diff.cpp.o
/home/q/beignet.git/utests/compiler_abs_diff.cpp:201:18: error: conflicting declaration â€˜typedef uint64_t ulong’
/usr/include/i386-linux-gnu/sys/types.h:151:27: error: â€˜ulong’ has a previous declaration as â€˜typedef long unsigned int ulong’

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix warning when egl is not there
Lu Guanqun [Mon, 19 Aug 2013 06:23:56 +0000 (14:23 +0800)]
fix warning when egl is not there

[ 32%] Building CXX object utests/CMakeFiles/utests.dir/utest_helper.cpp.o
/home/q/beignet.git/utests/utest_helper.cpp: In function â€˜int cl_ocl_init()’:
/home/q/beignet.git/utests/utest_helper.cpp:314:8: warning: variable â€˜hasGLExt’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix left shift warning
Lu Guanqun [Mon, 19 Aug 2013 06:23:55 +0000 (14:23 +0800)]
fix left shift warning

/home/q/beignet.git/utests/compiler_long.cpp: In function â€˜void compiler_long()’:
/home/q/beignet.git/utests/compiler_long.cpp:33:32: warning: left shift count >= width of type [enabled by default]
/home/q/beignet.git/utests/compiler_long.cpp:34:32: warning: left shift count >= width of type [enabled by default]

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix left shift warnings in utests
Lu Guanqun [Mon, 19 Aug 2013 04:29:17 +0000 (12:29 +0800)]
fix left shift warnings in utests

We should use the explicit 64 bit types. Otherwise we would have warnings.

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUtests: enable long/ulong for abs_diff test case.
Zhigang Gong [Fri, 16 Aug 2013 07:28:52 +0000 (15:28 +0800)]
Utests: enable long/ulong for abs_diff test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoenable signed 64-bit version of "abs_diff"
Homer Hsing [Mon, 19 Aug 2013 06:55:29 +0000 (14:55 +0800)]
enable signed 64-bit version of "abs_diff"

fixed operand type in IR instruction "move".
used one less flag register in 64-bit integer comparing.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoenable unsigned 64bit version of "abs_diff"
Homer Hsing [Mon, 19 Aug 2013 02:38:00 +0000 (10:38 +0800)]
enable unsigned 64bit version of "abs_diff"

tested by piglit,

piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-abs_diff-1.0.generated.cl

piglit test case passed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: skip instruction pattern match for 64 bit sel_cmp.
Homer Hsing [Mon, 19 Aug 2013 01:43:32 +0000 (09:43 +0800)]
GBE: skip instruction pattern match for 64 bit sel_cmp.

CPU instruction "sel_cmp" don't support 64bit int.
not emit SelectModifierInstructionPattern in that case.
tested by piglit. piglit test cases "long(ulong)-max(min,clamp)" all passed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix a typo
Homer Hsing [Mon, 19 Aug 2013 01:41:17 +0000 (09:41 +0800)]
fix a typo

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd async copy and async stride copy test case.
Yang Rong [Fri, 16 Aug 2013 08:24:09 +0000 (16:24 +0800)]
Add async copy and async stride copy test case.

Just hard code the int2 and char4 type. Other types have tested using
comformance test.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement async and prefetch built-in.
Yang Rong [Fri, 16 Aug 2013 08:24:08 +0000 (16:24 +0800)]
Implement async and prefetch built-in.

Using the normal load & store to implement async copy,
and so wait_group_events use barrier.
Prefetch just define an empty function.

V2: fix llvm build error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest 64bit version of "upsample"
Homer Hsing [Fri, 16 Aug 2013 07:54:24 +0000 (15:54 +0800)]
test 64bit version of "upsample"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix unit test compiler_load_bool_imm error.
Yang Rong [Thu, 15 Aug 2013 09:10:15 +0000 (17:10 +0800)]
Fix unit test compiler_load_bool_imm error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64bit version of "upsample"
Homer Hsing [Fri, 16 Aug 2013 01:45:17 +0000 (09:45 +0800)]
add 64bit version of "upsample"

since simple 64bit integer are supported,
add 64bit version of "upsample".

to test this patch, in piglit, run
  bin/cl-program-tester generated_tests/cl/builtin/int/builtin-int-upsample-1.0.generated.cl
  bin/cl-program-tester generated_tests/cl/builtin/int/builtin-uint-upsample-1.0.generated.cl

piglit test cases all pass.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd empty 64bit-integer version built-in functions
Homer Hsing [Thu, 15 Aug 2013 06:53:27 +0000 (14:53 +0800)]
add empty 64bit-integer version built-in functions

also change vector built-in generator to auto generate
64bit-integer versions of built-in functions

function body is empty now. detail will add in the future.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport built-in function mad_sat(int) and mad_sat(uint)
Homer Hsing [Wed, 14 Aug 2013 08:23:18 +0000 (16:23 +0800)]
support built-in function mad_sat(int) and mad_sat(uint)

this patch has been tested by piglit.
piglit test cases "int_mad_sat" and "uint_mad_sat" passed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agouse r112 as source of EOT message
Zou Nan hai [Thu, 15 Aug 2013 23:56:08 +0000 (07:56 +0800)]
use r112 as source of EOT message

Fix random hang cases.
use r112 as source of EOT message.
Bspec requires r112-r127 as EOT message source.

Signed-off-by: Zou Nanhai <nanhai.zou@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix an illegal instruction.
Zhigang Gong [Thu, 15 Aug 2013 02:51:33 +0000 (10:51 +0800)]
GBE: fix an illegal instruction.

Per Gen ISA spec:
When ExecSize = Width, VertStride must be set to Width * HorzStride.

For horizontal stride 2 in bottom_half, we always use it simd8 mode,
so we need to set the vertstride to 16 according to the above restrication.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: I64CMP should be treated as CMP in reg allocation and insn scheduling.
Zhigang Gong [Wed, 14 Aug 2013 08:07:15 +0000 (16:07 +0800)]
GBE: I64CMP should be treated as CMP in reg allocation and insn scheduling.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest 64bit-integer comparing
Homer Hsing [Wed, 14 Aug 2013 06:23:51 +0000 (14:23 +0800)]
test 64bit-integer comparing

only work when OCL_POST_ALLOC_INSN_SCHEDULE=0
because the post alloc scheduler puts CMP after SEL, but in IR,
CMP is before SEL, like this
 GT.int64 %34 %31 %33
 LOADI.int64 %38 3
 LOADI.int64 %39 4
 SEL.int64 %35 %34 %38 %39

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviwed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer comparing
Homer Hsing [Wed, 14 Aug 2013 01:40:33 +0000 (09:40 +0800)]
support 64bit-integer comparing

support 64bit-integer comparing,
including EQ(==), NEQ(!=), G(>), GE(>=), L(<), LE(<=)

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviwed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFlush the queue after enqueue.
Zou Nan hai [Tue, 13 Aug 2013 23:29:18 +0000 (07:29 +0800)]
Flush the queue after enqueue.

Flush the queue after enqueue.
This can fix some random fails in unit tests.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviwed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yi Sun <yi.sun@intel.com>
10 years agoFix event pthread_mutex_lock dead lock.
Yang Rong [Wed, 14 Aug 2013 03:08:01 +0000 (11:08 +0800)]
Fix event pthread_mutex_lock dead lock.

In function cl_event_set_status, between pthread_mutex_lock and pthread_mutex_unlock
will call cl_event_delete, which also require the same lock, cause deak lock.
Unlock it before call cl_event_delete.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: set temporary address register for read64 to U64.
Zhigang Gong [Mon, 12 Aug 2013 07:53:43 +0000 (15:53 +0800)]
GBE: set temporary address register for read64 to U64.

Actually, we really use it as two DWORD rather than U64. But if
we don't set it to U64, in post scheduler, it doesn't know this
is a QWORD register and may cause incorrect scheduling.

We can easily trigger this bug when run compiler_vector_double16_load_store
with SIMD8 mode. This patch can fix the bug.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer multiplication
Homer Hsing [Tue, 13 Aug 2013 03:05:28 +0000 (11:05 +0800)]
support 64bit-integer multiplication

also add test case

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd a load bool imm test case.
Yang Rong [Tue, 13 Aug 2013 09:10:07 +0000 (17:10 +0800)]
Add a load bool imm test case.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd bool move imm support.
Yang Rong [Tue, 13 Aug 2013 07:06:30 +0000 (15:06 +0800)]
Add bool move imm support.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest 64bit-integer shifting
Homer Hsing [Tue, 13 Aug 2013 00:32:54 +0000 (08:32 +0800)]
test 64bit-integer shifting

v2: put shifting in branch

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer shifting
Homer Hsing [Mon, 12 Aug 2013 08:45:12 +0000 (16:45 +0800)]
support 64bit-integer shifting

support left-shifting (<<), right-shifting (>>),
and arithmetic right-shifting (>>).
v2: define temp reg as dest reg of instructions

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport converting shorter int to 64bit int
Homer Hsing [Mon, 12 Aug 2013 02:12:16 +0000 (10:12 +0800)]
support converting shorter int to 64bit int

converting byte/word/dword to int64
also add test case
v2: define temporary reg as dest reg of instruction

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoDefine temporary reg as dest reg of instruction
Homer Hsing [Mon, 12 Aug 2013 02:26:44 +0000 (10:26 +0800)]
Define temporary reg as dest reg of instruction

I defined temporary reg as source reg of instruction.
But instruction scheduler looks source reg as read only reg.
So I define them as dest now.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd event unit test.
Yang Rong [Mon, 12 Aug 2013 08:07:21 +0000 (16:07 +0800)]
Add event unit test.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd openCL event support.
Yang Rong [Mon, 12 Aug 2013 08:07:20 +0000 (16:07 +0800)]
Add openCL event support.

Now use the defer execute to wait events.
If there is no user event waited, then using wait rendering to wait
GPU event complete and call the enqueue api immediately.
If there is the user events waited, then should prepare the the enqueue
data, and resume the enqueue when all user events that waited complete.
The achieve these, add the enqueue callback to user event, and add the all
user event and other wait event list to enqueue callback. When set user event
to complete, check all enqueue callbacks wait this event.

Now, clEnqueueMark/clEnqueueBarrier still not impletement, and clEnqueueMapBuffer
/clEnqueueMapImage is not consistency with spec.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd function cl_command_queue_flush to flush a command
Yang Rong [Mon, 12 Aug 2013 08:07:19 +0000 (16:07 +0800)]
Add function cl_command_queue_flush to flush a command

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd some functions to support event in intel gpgpu.
Yang Rong [Mon, 12 Aug 2013 08:07:18 +0000 (16:07 +0800)]
Add some functions to support event in intel gpgpu.

Now runtime prepare command batch first, if can't flush this command
immediately, call cl_gpgpu_event_pending to append the command to event,
when the command batch's wait events completed, than call cl_gpgpu_event_resume
to flush.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd a struct and a function to handle all implemented enqueue api.
Yang Rong [Mon, 12 Aug 2013 08:07:17 +0000 (16:07 +0800)]
Add a struct and a function to handle all implemented enqueue api.

Event and non-blocking enqueue api may use this function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the empty functions of cl_enqueueXXX.
Yang Rong [Mon, 12 Aug 2013 08:07:16 +0000 (16:07 +0800)]
Add the empty functions of cl_enqueueXXX.

Copy from cl_enqueueXXX functions and comment out. This change is for trace only.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agono "div by zero" in smoothstep test case
Homer Hsing [Mon, 12 Aug 2013 05:51:26 +0000 (13:51 +0800)]
no "div by zero" in smoothstep test case

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoDriver: Fix the incorrect size of surface 1.
Zhigang Gong [Mon, 12 Aug 2013 05:34:15 +0000 (13:34 +0800)]
Driver: Fix the incorrect size of surface 1.

According to Ben's comments, the surface 0 and 1 should be exactly
match each other, and the only reason why we need two surfaces rather
than 1 is that for the fulsim usage. Thus we should set surface
1 and 0 with the same memory size.

This patch fixes the flat_address_space unit test case and also
a randome failure reported by yang rong.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoutest: Add test case for function acos/acosh/asin/asinh.
Yi Sun [Mon, 12 Aug 2013 02:35:39 +0000 (10:35 +0800)]
utest: Add test case for function acos/acosh/asin/asinh.

Case contains illegal, boundary and legal values.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoHandle boundary and illegal values.
Yi Sun [Mon, 12 Aug 2013 02:35:38 +0000 (10:35 +0800)]
Handle boundary and illegal values.

Such as |x| = 1.0, |x| < 2**-27 and |x| > 1.

v2. Replace some constant variable with existing macro value.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoSkip spill/unspill instruction when trying to do spill.
Ruiling Song [Fri, 9 Aug 2013 05:23:41 +0000 (13:23 +0800)]
Skip spill/unspill instruction when trying to do spill.

We can only spill virtual registers, should skip physical register.
This fix random failure of compiler_box_blur when do spilling.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a re-schedule issue of scratch write
Ruiling Song [Fri, 9 Aug 2013 05:23:40 +0000 (13:23 +0800)]
Fix a re-schedule issue of scratch write

As scratchMsgHeader+1 will be re-used as scratch write payload.
So, scratchMsgHeader+1 will be first spilled out.
Add the scratch write dependency to keep scratch write in order.
this fix a failure(compiler_box_blur_float) when spilling.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Fixed a bug and release 2 or 3 simdWidth register space.
Zhigang Gong [Fri, 9 Aug 2013 02:36:00 +0000 (10:36 +0800)]
GBE: Fixed a bug and release 2 or 3 simdWidth register space.

This patch fix two issues. One is for the sel_cmp pattern matching.
We should not set the sel_cmp instruction state to physicalFlag, as
sel_cmp will never use a flag register. And as it set the physicalFlag
and leave the flagIndex to zero. Then it just increase the virtual
register 0's interval to as long as the last sel_cmp instruction,
thus the virtual register 0 will never be freed.

Another issue is that when we allocate special registers. We
are not always allocate them on demand. For example, the 3
local id registers are always allocated. Thus maybe some of the
registers are not used at all. So the interval's end point will
not get a chance to set to a proper value and it will never be
released. Now just init the end point to 0. And latter, if it's
used, it will be set to a proper value. Otherwise, it will be zero,
and will be deallocated when do expiering.

This patch could fix(work around) a long standing bug:
When disable the pre allocation instruction scheduling by
export OCL_PRE_ALLOC_INSN_SCHEDULE=0
And run the case:
utests/utest_run compiler_menger_sponge_no_shadow
it fails.

I spent almost one day to track down that it's related to the register
allocation. But I haven't root caused that where is the actual buggy
code. I doubt the register allocation, but I reviewed the code very
careful, and haven't found anything wrong. Now the last suspect is
in the register interval handling.

Anyway, by apply these patch to release two registers to the pool
which may change the register allocation/expieration, thus work around
that bug. We may still need to spend some time to investigate the root
cause the failure in the future.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoUtests: enable long/ulong in vector load/store test case.
Zhigang Gong [Fri, 9 Aug 2013 02:35:59 +0000 (10:35 +0800)]
Utests: enable long/ulong in vector load/store test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Fix one bug in instruction scheduling.
Zhigang Gong [Thu, 8 Aug 2013 07:15:44 +0000 (15:15 +0800)]
GBE: Fix one bug in instruction scheduling.

As now we may use 8 byte registers (long and double), then one
register may take two(SIMD8) or four(SIMD16) physical registers.
Thus if we met a register with long or double type, we need to
handle the immediately next index at the same time.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fix insntruction scheduling related bugs in read64/write64.
Zhigang Gong [Thu, 8 Aug 2013 07:15:43 +0000 (15:15 +0800)]
GBE: fix insntruction scheduling related bugs in read64/write64.

In read64 and write64, we allocate some temporary registers and
we should put all of those temporary registers may be modified
to the instruction's dst array. Otherwise, the latter post instruction
scheduling may rearrange the instruction incorrectly.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: enable double vector load/store support.
Zhigang Gong [Tue, 6 Aug 2013 04:01:46 +0000 (12:01 +0800)]
GBE: enable double vector load/store support.

We have some accurate problem for double calculation
on GPU side. I have to change the test case for double
type to add a tolerate error when check the double
data result.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agotest 64bit-integer selection operator
Homer Hsing [Wed, 7 Aug 2013 08:28:34 +0000 (16:28 +0800)]
test 64bit-integer selection operator

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer selection operator "?:"
Homer Hsing [Wed, 7 Aug 2013 08:28:33 +0000 (16:28 +0800)]
support 64bit-integer selection operator "?:"

v2: reuse MOV to move 64bit integer. not add MOV_INT64 instruction.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement spill/unspill
Ruiling Song [Wed, 7 Aug 2013 07:15:50 +0000 (15:15 +0800)]
Implement spill/unspill

The current implementation works like below:
I reserve a pool of registers for spill/reload. Currently 6 registers
are reserved to handle SelectionVector with at most 5 elements.
The other one is used as scratch message header register. The register
after header register was used as the payload for scratch write.

To do spill, just iterate the instructions. If the virtual register
was used as src, insert reload instruction before it. If the virtual
register was used as dst, insert spill instruction to write the register
content to scratch memory.

Limitations yet:
64bit not support.
SelectionVector > 5 not handled.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoenable scratch memory allocation and read/write
Ruiling Song [Wed, 7 Aug 2013 07:07:40 +0000 (15:07 +0800)]
enable scratch memory allocation and read/write

v2: refine function naming.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>