contrib/beignet.git
10 years agofix piglit get kernel info FUNCTION ATTRIBUTE fail.
Luo [Fri, 12 Sep 2014 03:53:41 +0000 (11:53 +0800)]
fix piglit get kernel info FUNCTION ATTRIBUTE fail.

the backend need return the kernel FUNCTION ATTRIBUTE message to the
clGetKernelInfo.
there are 3 kind of function attribute so far, vec_type_hint parameter
is not available to return due to llvm lack of such info.

Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: fix build status handling.
Zhigang Gong [Fri, 12 Sep 2014 06:29:23 +0000 (14:29 +0800)]
runtime: fix build status handling.

According to the spec:
The build status is to
  Returns the build, compile or link status,
  whichever was performed last on program for
  device.

The previous implementation only consider the clProgramBuild and
doesn't consider the compile. Now fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoruntime: fix program binary type bug.
Zhigang Gong [Fri, 12 Sep 2014 05:47:25 +0000 (13:47 +0800)]
runtime: fix program binary type bug.

If the binary is a executable type, the first byte is zero and
we need to set the binary type correctly to CL_PROGRAM_BINARY_TYPE_EXECUTABLE.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Tested-by: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoUpdate license disclaimer.
Yang Rong [Fri, 5 Sep 2014 02:25:44 +0000 (10:25 +0800)]
Update license disclaimer.

LunarGLASS have update his copyright, so update the copyright in llvm_scalarize.cpp.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: don't enable double by default.
Zhigang Gong [Thu, 11 Sep 2014 07:04:44 +0000 (15:04 +0800)]
GBE: don't enable double by default.

Actually, we don't support double completely currently.
Let's disable it now. This bring a little incompatible point with the 1.2 spec
which doesn't require the kernel to use the following pragma to enable fp64.
 #pragma OPENCL EXTENSION cl_khr_fp64 : enable

If the application wants to try the partially supported double with beignet
under opencl 1.2, the application will still need to add the above pragma.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix a potential memory leak bug.
Zhigang Gong [Thu, 11 Sep 2014 05:45:33 +0000 (13:45 +0800)]
GBE: fix a potential memory leak bug.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Fix a potential segfault.
Zhigang Gong [Thu, 11 Sep 2014 05:44:16 +0000 (13:44 +0800)]
GBE: Fix a potential segfault.

And when we fail to compile a module, the fileName may be NULL, we can't
access it unconditionally.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: don't return error if we get an empty module.
Zhigang Gong [Thu, 11 Sep 2014 05:43:29 +0000 (13:43 +0800)]
GBE: don't return error if we get an empty module.

When compile a empty string, we may get an empty module which is not
an error.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix piglit cl-api-set-kernel-arg fail.
Luo Xionghu [Thu, 11 Sep 2014 00:43:54 +0000 (08:43 +0800)]
fix piglit cl-api-set-kernel-arg fail.

the memory object should be checked whether valid in context buffers before being set as kernel arguments.

v2: rename the function from mem_in_buffers to is_valid_mem, move the
magic header check into it.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix clGetKernelWorkGroupInfo built-in kernel fail.
Luo Xionghu [Wed, 10 Sep 2014 03:31:32 +0000 (11:31 +0800)]
fix clGetKernelWorkGroupInfo built-in kernel fail.

add CL_KERNEL_GLOBAL_WORK_SIZE option for clGetKernelWorkGroupInfo.

v2: should return the max global work size instead of current work size.
This funtion need return CL_INVALID_VALUE if the device is not a custom
device or kernel is not a built-in kernel.
we have 3 kind of built-in kernels for 1d/2d/3d memories, the max global
work size are decided by the dimension and memory type.
the piglit fail is caused by calling NON built-in kernels, so need send
patch to piglit later.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE/libocl: Added one missing prototype fma().
Zhigang Gong [Thu, 11 Sep 2014 02:33:07 +0000 (10:33 +0800)]
GBE/libocl: Added one missing prototype fma().

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix bugs when handling -cl-std option.
Zhigang Gong [Thu, 11 Sep 2014 01:51:19 +0000 (09:51 +0800)]
GBE: fix bugs when handling -cl-std option.

Actually, CLANG does take this option and we should not
filter it out. We also change the default option to create
PCH file to -cl-std=CL1.2. And if the user pass in a CL1.1
we will have to disable PCH.

Another change is that if we are CL1.2, then we should enable
the cl_khr_fp64 by default. As from CL1.2, this extension should
be enabled by default.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd the switch logic for math conformance fast path
Junyan He [Wed, 10 Sep 2014 08:10:28 +0000 (16:10 +0800)]
Add the switch logic for math conformance fast path

Modify the __ocl_math_fastpath_flag init value in the
backend link stage to switch between fast path and
conformance path.

V2:
    Rename the function prototype parameter name.

V3:
    Modify the parameter to boolean and correct some comment words.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE/libocl: fix the wrong prototype of scalar native_powr.
Zhigang Gong [Wed, 10 Sep 2014 08:21:12 +0000 (16:21 +0800)]
GBE/libocl: fix the wrong prototype of scalar native_powr.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoFix the issue of -cl-std=CLX.X option.
Junyan He [Wed, 10 Sep 2014 07:39:41 +0000 (15:39 +0800)]
Fix the issue of -cl-std=CLX.X option.

The -cl-std= will specify the least version to compile
the source code providing to our API. So we need to
check it early, and return failure if our platform's
version can not meet the request. In the backend, we
just ignore this cmd line option.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRuntime: Implement clGetExtensionFunctionAddressForPlatform.
Zhigang Gong [Sat, 30 Aug 2014 08:34:44 +0000 (16:34 +0800)]
Runtime: Implement clGetExtensionFunctionAddressForPlatform.

It seems that this function is required by latest PyOpenCL.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd copyright header for all libocl files.
Junyan He [Thu, 4 Sep 2014 14:53:00 +0000 (22:53 +0800)]
Add copyright header for all libocl files.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse ${PYTHON_EXECUTABLE} to run python scripts.
Yichao Yu [Wed, 10 Sep 2014 00:31:21 +0000 (20:31 -0400)]
Use ${PYTHON_EXECUTABLE} to run python scripts.

Signed-off-by: Yichao Yu <yyc1992@gmail.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoUpdate README for the command parser in drm kernel.
Yang Rong [Fri, 5 Sep 2014 03:25:05 +0000 (11:25 +0800)]
Update README for the command parser in drm kernel.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix piglit cl-api-get-program-info fail.
Luo Xionghu [Mon, 8 Sep 2014 22:42:07 +0000 (06:42 +0800)]
fix piglit cl-api-get-program-info fail.

add pointer check.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd uncompatible PCH Options to avoid compiling failure.
Junyan He [Fri, 5 Sep 2014 08:54:27 +0000 (16:54 +0800)]
Add uncompatible PCH Options to avoid compiling failure.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fallback if we get a wider than i64 constant.
Zhigang Gong [Fri, 5 Sep 2014 08:33:44 +0000 (16:33 +0800)]
GBE: fallback if we get a wider than i64 constant.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
10 years agoGBE: fix a bug with LLVM 3.3.
Zhigang Gong [Fri, 5 Sep 2014 08:19:30 +0000 (16:19 +0800)]
GBE: fix a bug with LLVM 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Meng, Mengmeng <mengmeng.meng@intel.com>
10 years agoAdd the missing function prototypes of any() and atom_add()
Junyan He [Fri, 5 Sep 2014 08:27:30 +0000 (16:27 +0800)]
Add the missing function prototypes of any() and atom_add()

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: avoid one optimization pass to generate wide integer.
Zhigang Gong [Fri, 5 Sep 2014 06:04:38 +0000 (14:04 +0800)]
GBE: avoid one optimization pass to generate wide integer.

Integer type wider than 64 bit is hard to handle on Gen.
Let's try to prevent ScalarReplAggregates pass to generate
such type of integer.

v2:
fix compilation error with LLVM 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: remove the user defined macro cl_khr_fp64.
Zhigang Gong [Thu, 4 Sep 2014 04:30:44 +0000 (12:30 +0800)]
GBE: remove the user defined macro cl_khr_fp64.

This is not a predefined macro according to the spec. Let's not
define it by default. This patch also disable the fp64 when enter
user kernels.

v2:
Some internal .cl files require cl_khr_fp64 enabled. Fixed that issue
by move the enable macro to ocl_types.h.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoDelete all the unused files of old huge header.
Junyan He [Mon, 1 Sep 2014 02:22:54 +0000 (10:22 +0800)]
Delete all the unused files of old huge header.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse the PCH to accelerate the parsing speed of the ocl.h
Junyan He [Mon, 1 Sep 2014 07:28:18 +0000 (15:28 +0800)]
Use the PCH to accelerate the parsing speed of the ocl.h

We disable the valid check for the PCH to avoid path
and modified time check, which brings us some trouble.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoEnable libocl and disable the usage of the old huge header.
Junyan He [Mon, 1 Sep 2014 07:28:02 +0000 (15:28 +0800)]
Enable libocl and disable the usage of the old huge header.

The llvm ir print out is modified.
From the OCL_OUTPUT_LLVM_BEFORE_EXTRA_PASS and
OCL_OUTPUT_LLVM, we change to
OCL_OUTPUT_LLVM_BEFORE_LINK
OCL_OUTPUT_LLVM_AFTER_LINK
OCL_OUTPUT_LLVM_AFTER_GEN
The first one print out the IR before link the bitcode lib.
The second one print out the IR result after linking.
Then last one print out the IR after gen translating.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoAdd the bit code linker into the module pass.
Junyan He [Mon, 1 Sep 2014 09:16:22 +0000 (17:16 +0800)]
Add the bit code linker into the module pass.

The bit code linker will load the beignet.bc as a
lib module and link the module of the kernel together.
Then we will filter out all the dead bit code by create
an InternalizePass for the module.
After this stage, the ir will include the bitcode just
used by the cl kernel.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoAdd memcpy, memset and barrier bitcode files into libocl
Junyan He [Mon, 1 Sep 2014 02:19:42 +0000 (10:19 +0800)]
Add memcpy, memset and barrier bitcode files into libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the ocl_defines header file into libocl
Junyan He [Mon, 1 Sep 2014 02:16:17 +0000 (10:16 +0800)]
Add the ocl_defines header file into libocl

This file will be used to define some common defines
for both CL and the backend source code.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the relational module into libocl as template
Junyan He [Mon, 1 Sep 2014 02:15:35 +0000 (10:15 +0800)]
Add the relational module into libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the math function into libocl as template
Junyan He [Mon, 1 Sep 2014 02:14:45 +0000 (10:14 +0800)]
Add the math function into libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the integer module into libocl as template
Junyan He [Mon, 1 Sep 2014 02:13:04 +0000 (10:13 +0800)]
Add the integer module into libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the common module into the libocl as template
Junyan He [Mon, 1 Sep 2014 02:12:42 +0000 (10:12 +0800)]
Add the common module into the libocl as template

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the gen_vector script into the libocl
Junyan He [Mon, 1 Sep 2014 02:12:26 +0000 (10:12 +0800)]
Add the gen_vector script into the libocl

This script will genenrate function defines and function
prototypes for all the vector functions.
Some modules need very verbose vector functions after their
scalar version. We will write a template for all the scalar
version and use this script the generate the vector version
and append them to the template to generate the header or
source file.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the convert and as modules into the libocl
Junyan He [Mon, 1 Sep 2014 02:11:52 +0000 (10:11 +0800)]
Add the convert and as modules into the libocl

The convert and as function suites have very similar
format for all tye types and vectors, and they are
really verbose. So the two scripts will generate the
code for Convert and AS separatedlly.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd thw workitem module into the libocl
Junyan He [Mon, 1 Sep 2014 02:11:39 +0000 (10:11 +0800)]
Add thw workitem module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd vload module into the libocl
Junyan He [Mon, 1 Sep 2014 02:11:24 +0000 (10:11 +0800)]
Add vload module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd printf module into libocl
Junyan He [Mon, 1 Sep 2014 02:10:05 +0000 (10:10 +0800)]
Add printf module into libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the sync module into the libocl
Junyan He [Mon, 1 Sep 2014 02:09:28 +0000 (10:09 +0800)]
Add the sync module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the misc module into the libocl
Junyan He [Mon, 1 Sep 2014 02:08:34 +0000 (10:08 +0800)]
Add the misc module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the image module into the libocl
Junyan He [Tue, 2 Sep 2014 13:28:13 +0000 (21:28 +0800)]
Add the image module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the geometric module into the libocl
Junyan He [Mon, 1 Sep 2014 02:07:04 +0000 (10:07 +0800)]
Add the geometric module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the atomic module into the libocl
Junyan He [Mon, 1 Sep 2014 02:06:35 +0000 (10:06 +0800)]
Add the atomic module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the async module into the libocl
Junyan He [Mon, 1 Sep 2014 02:06:09 +0000 (10:06 +0800)]
Add the async module into the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd common define header files to initialize the libocl
Junyan He [Mon, 1 Sep 2014 02:05:54 +0000 (10:05 +0800)]
Add common define header files to initialize the libocl

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: fixup/refine a bug for image1D array's extra binding index handling.
Zhigang Gong [Thu, 4 Sep 2014 00:01:25 +0000 (08:01 +0800)]
GBE: fixup/refine a bug for image1D array's extra binding index handling.

Due to hardware limitation on Gen7/Gen75 when sampling a
surface with clamp address mode and nearest filter mode
on a integer image1Darray type surface, we have to bind
one buffer to to bti. The previous implementation hard
coded it to 128 + original index and when check whether
it is such type bti in driver layer, assume the bti reserved
is 3 which is wrong now.

This patch fixed those hard coded functions and use the
macros defined in the program.h.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: refine the unaligned data gathering.
Zhigang Gong [Thu, 28 Aug 2014 01:26:06 +0000 (09:26 +0800)]
GBE: refine the unaligned data gathering.

Save some unecessary duplicate instructions.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: relax the batch byte/short load vector size restrication.
Zhigang Gong [Thu, 28 Aug 2014 00:44:58 +0000 (08:44 +0800)]
GBE: relax the batch byte/short load vector size restrication.

Previous restrication is that the vector size must be multiple
of DWORD. This restrication prevent the vload2/3 of char or
vload3 of ushort to be optimized. This patch relax this restrication
on the vload path.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: optimize unaligned char and short data vector's load.
Zhigang Gong [Wed, 27 Aug 2014 03:13:15 +0000 (11:13 +0800)]
GBE: optimize unaligned char and short data vector's load.

The gather the contiguous short/char loads into a single load instruction
could give us a good pportunity to use untyped load to optimize them.

This patch enable the short/char load gathering at the load store optimize
pass. Then at the backend, it will load corresponding DWORDs then covert to
short/char accordingly by applying shift and bitwise operations.

The benchmark shows, for vload4/8/16 char or vload/2/4/8/16 short, this patch brings
about 80%-100% improvement.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoAdd new vload benchmark/test case.
Zhigang Gong [Wed, 27 Aug 2014 02:33:42 +0000 (10:33 +0800)]
Add new vload benchmark/test case.

v2:
refine the benchmark case and don't mix it with normal
unit test cases.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fix error in the rootn fastpath function for some special input.
Zhigang Gong [Fri, 29 Aug 2014 02:04:38 +0000 (10:04 +0800)]
GBE: fix error in the rootn fastpath function for some special input.

The fastpath is to lose some accuracy but get fast speed. It is not
to generate error result. The rootn has many special input and need
to be taken care before we call the native pow directly.
This patch fix all the pow related failures at the OpenCV 3.0 test
suite.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoutests: fix two utest bugs.
Zhigang Gong [Tue, 2 Sep 2014 02:34:33 +0000 (10:34 +0800)]
utests: fix two utest bugs.

Similar as the bug found by junyan, some events are
accessed before assigned.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoFix a bug for runtime_barrier_list.cpp, event array out of bound
Junyan He [Tue, 2 Sep 2014 02:37:02 +0000 (10:37 +0800)]
Fix a bug for runtime_barrier_list.cpp, event array out of bound

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix the global string bug for printf.
Junyan He [Mon, 1 Sep 2014 08:18:45 +0000 (16:18 +0800)]
Fix the global string bug for printf.

When there are multi printf statements in multi kernel
fucntions within the same translate unit, if they have
the same sting parameter, the Clang will just generate
one global string named .strXXX to represent that string.
So when translating the kernel to gen, we can not unref
that global var. Just ignore it to avoid assert.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix two bugs.
Yang Rong [Mon, 1 Sep 2014 05:05:06 +0000 (13:05 +0800)]
Fix two bugs.

1. A INSERT_REGINSERT_REG typo.
2. Release main_buf in utest sub_buffer_check.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoTwo minor fix.
Yang Rong [Mon, 1 Sep 2014 05:05:05 +0000 (13:05 +0800)]
Two minor fix.

1. Some systems don't define ulong type, use unsigned long instead of..
2. Use sA, sB... instead of sa, sb... to access vector 16, because sometimes sa, sb will cause clang error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoremove dependency for non-X runtime environment
Guo Yejun [Thu, 28 Aug 2014 23:51:49 +0000 (07:51 +0800)]
remove dependency for non-X runtime environment

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoUpdate Beignet.mdwn X11 dependency.
Yang Rong [Fri, 29 Aug 2014 03:26:35 +0000 (11:26 +0800)]
Update Beignet.mdwn X11 dependency.

And also remove libgbe in the external dependencies section.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRevert "GBE: refine post register allocation scheduling for global buffers."
Zhigang Gong [Thu, 28 Aug 2014 05:58:31 +0000 (13:58 +0800)]
Revert "GBE: refine post register allocation scheduling for global buffers."

Different BTI buffer may point to the same memory. Let's not
change the load/store sequence. This fix a regression at the
opencv test suite OCL_Channels/MixChannels.Accuracy/8.

This reverts commit 435f63e9fde93c38331bf0231df5ee8625f88a62.

Singed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
10 years agoOnly compiler X11 files and do X11 operations when found X11.
Yang Rong [Thu, 28 Aug 2014 06:37:44 +0000 (14:37 +0800)]
Only compiler X11 files and do X11 operations when found X11.

Add a build flag HAS_X11 for it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: refine the llvm multi-thread related code.
Ruiling Song [Tue, 26 Aug 2014 07:39:24 +0000 (15:39 +0800)]
GBE: refine the llvm multi-thread related code.

LLVM 3.5 remove llvm_start/stop_multithreaded() API, instead multi-thread
support is determined when build llvm(build option LLVM_ENABLE_THREADS).
llvm_is_multithreaded() is used to check whether llvm is built with
muti-thread support.

If multi-thread is not support(LLVM3.3/3.4 or 3.5 built with LLVM_ENABLE_THREADS off),
we simply add a mutex when calling clang/llvm related API.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: clear deadprintfs when current function is done.
Ruiling Song [Tue, 26 Aug 2014 07:39:11 +0000 (15:39 +0800)]
GBE: clear deadprintfs when current function is done.

It should be cleared, to prevent invalid pointers staying there
when processing next Function.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agofix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression.
Luo Xionghu [Tue, 26 Aug 2014 02:12:28 +0000 (10:12 +0800)]
fix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression.

This regression is caused by structural analysis when check the if-then
node, acturally there are four types of if-then node according to the
topology and fallthrough information. fallthrough check is added in this
patch.

v2: add inversePredicate member and function for BranchInstruction;
print the exact meanning of IF instruction in GEN_IR.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix compile warnings for CLANG compiler
Lv Meng [Fri, 15 Aug 2014 01:16:33 +0000 (09:16 +0800)]
Fix compile warnings for CLANG compiler

1.fix data structure redefine warnings.
2.fix 'data' with variable sized type 'union<*>' not at the end of a class warning(in immediate.hpp).
3.fix implicitly conversion warning.
4.fix explicitly assigning a variable type warning.
5.fix comparison of unsigned expression < 0 is always false warning(in cl_api.c).

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoFix compile warnings for ICC compiler
Lv Meng [Thu, 14 Aug 2014 03:33:06 +0000 (11:33 +0800)]
Fix compile warnings for ICC compiler

1.the "const" associated functions' modification is to fix "type qualifier on return type is meaningless" for ICC compile warning.
2.the "operator new" shoud have the corresponding "operator delete" function.
3.In C++0x std::auto_ptr will be deprecated in favor of std::unique_ptr.

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agocmake: Fix a license issue.
Ruiling Song [Wed, 13 Aug 2014 01:53:33 +0000 (09:53 +0800)]
cmake: Fix a license issue.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoFix compile errors for CLANG compiler
Lv Meng [Fri, 8 Aug 2014 08:10:03 +0000 (16:10 +0800)]
Fix compile errors for CLANG compiler

Use vector to fix "variable length array of non-POD element type" compiler error.
The /beignet/backend/src/./ir/context.hpp "fn->immediates[index] = imm" would call a private func
'operator=' which would trigger error, and it is not being used.
The undefined reference to `check_copy_overlap' would occur in the following calling.

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoFix compile error for ICC compiler
Lv Meng [Fri, 8 Aug 2014 08:08:16 +0000 (16:08 +0800)]
Fix compile error for ICC compiler

fix the pthread_mutex_t undefine compile error and some undefined error would occur when
using math.h in C++ file.for C++ file,it is better using cmath instead off math.h.

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Fix a bug in gatherBTI.
Ruiling Song [Mon, 11 Aug 2014 05:58:26 +0000 (13:58 +0800)]
GBE: Fix a bug in gatherBTI.

The needNewBTI is a state that only valid for the current candidate.
So need to reset to default value for each candidate.

This fix the regression in opencv 3.0:
./opencv_perf_objdetect OCL_Cascade_Image_MinSize_CascadeClassifier.CascadeClassifier

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: initialize BTI structure to zero.
Ruiling Song [Mon, 11 Aug 2014 05:49:01 +0000 (13:49 +0800)]
GBE: initialize BTI structure to zero.

Clear to zero to avoid garbage data, as we do not
assign it later for local/constant memory access.

v2:
  move initialization code into constructor.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: Fix type size for vector3
Ruiling Song [Mon, 11 Aug 2014 05:48:49 +0000 (13:48 +0800)]
GBE: Fix type size for vector3

According to OCL spec, size of vector3 are aligned to vector4.
And for memory load/store, clang already aligned it to vector4.
If we do not calculate private/local memory size as vector4,
out of range memory access will appear.

This can fix the failure of opencv 3.0 case:
OCL_Arithm/MeanStdDev.Mat_Mask

v2:
  vec3 constant data should be aligned to vec4.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoFail gracefully on unsupported hardware
Rebecca Palmer [Fri, 8 Aug 2014 11:07:24 +0000 (12:07 +0100)]
Fail gracefully on unsupported hardware

If no compatible hardware is present, clGetDeviceIDs is supposed to
report CL_DEVICE_NOT_FOUND to the caller, but in Beignet this currently
ends the whole program with exit(-1) or assert(0).  This fixes this.

This is required to have a "just works" OpenCL in Debian/Ubuntu, as
their package manager doesn't know the hardware and hence commonly will
install Beignet on hardware that doesn't support it; returning an error
allows the caller to try other ICDs until it finds the right one, or to
run without using OpenCL.  Previous discussion:
http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-20140217/000096.html
http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-20140217/000100.html

Testing if you only have supported hardware: use a chroot, the GPU
isn't visible from inside.

Identical patch in case line wrap mangles this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=27;filename=fail_gracefully_without_hardware;att=1;bug=745363

Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: Fix a warning in getConstantPointerRegister.
Ruiling Song [Mon, 11 Aug 2014 02:15:14 +0000 (10:15 +0800)]
GBE: Fix a warning in getConstantPointerRegister.

compiler complains "warning: control reaches end of non-void function"

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agofix the relational built-in vector function regression.
Luo Xionghu [Wed, 6 Aug 2014 01:36:31 +0000 (09:36 +0800)]
fix the relational built-in vector function regression.

the relational vector function need return -1 instead of 1 according to
the spec.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoFix a utest compiler_async_stride_copy typo.
Yang Rong [Mon, 4 Aug 2014 07:04:16 +0000 (15:04 +0800)]
Fix a utest compiler_async_stride_copy typo.

And need to convert to char when compare.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoimprove the build performance of vector type built-in function.
LuoXionghu [Fri, 1 Aug 2014 01:40:09 +0000 (09:40 +0800)]
improve the build performance of vector type built-in function.

expand the gentypen with loop to reduce the redundant inline for more
than 4 components type.

v2: add the  greater than 4 componets conditon to avoid performace
degration.

Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
10 years agoGBE: remove some useless code for getting printf buffer address.
Ruiling Song [Thu, 31 Jul 2014 08:48:01 +0000 (16:48 +0800)]
GBE: remove some useless code for getting printf buffer address.

This is not used anymore.

Also fix an annoying warning.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
10 years agoGBE: adjust preferred vector length.
Zhigang Gong [Tue, 29 Jul 2014 04:55:44 +0000 (12:55 +0800)]
GBE: adjust preferred vector length.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Reduce random behaviour of the code generation
Zhigang Gong [Tue, 29 Jul 2014 04:36:42 +0000 (12:36 +0800)]
GBE: Reduce random behaviour of the code generation

There are two major types of random behviour source. One is the
register spill tick. Now we fix it to increase from 0 for
each new code generation.
The second random source is the register sorting. When two
register has the same startID or endID,  their sorting order
is not determined and maybe random in different machine. This
patch mitigate this random source by introduce another comparison
if the main key is identical.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoutest: add new test for constant expression processing.
Zhigang Gong [Thu, 24 Jul 2014 05:13:32 +0000 (13:13 +0800)]
utest: add new test for constant expression processing.

If we use 3-component vector in a union, it may introduce
some complex constant expression as below:

float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32 1065353216, i32 1073741824, i32 1077936128, i32 undef> to i128) to i32) to float).

To test the constant expression processing function.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: enable constant expression processing.
Zhigang Gong [Wed, 23 Jul 2014 09:02:52 +0000 (17:02 +0800)]
GBE: enable constant expression processing.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: complete constant expression processing.
Zhigang Gong [Tue, 22 Jul 2014 10:36:28 +0000 (18:36 +0800)]
GBE: complete constant expression processing.

The target is to process all possible complex nested constant expression as below:

const = type0 OP0 (const0)
const0 = type1 OP1 (const1, const2)
const1 = ...

The supported OPs are as below:
    BITCAST,
    ADD,
    SUB,
    MUL,
    DIV,
    REM,
    SHL,
    ASHR,
    LSHR,
    AND,
    OR,
    XOR

We also add support for array/vector type of immediate. Some possible examples are as below:
float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128) to i32) to float)
float bitcast (i32 trunc (i128 lshr (i128 bitcast (<4 x i32> <i32 1064178811, i32 1064346583, i32 1062836634, i32 undef> to i128), i128 32) to i32) to float)

v2:
separate all private method implementations to immediate.cpp.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: simplify processConstant.
Zhigang Gong [Tue, 22 Jul 2014 09:01:06 +0000 (17:01 +0800)]
GBE: simplify processConstant.

Preparation to support generic constant expression.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: refactor the immediate class to support vector data type.
Zhigang Gong [Tue, 22 Jul 2014 07:56:08 +0000 (15:56 +0800)]
GBE: refactor the immediate class to support vector data type.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: refine post register allocation scheduling for global buffers.
Zhigang Gong [Wed, 30 Jul 2014 07:49:29 +0000 (15:49 +0800)]
GBE: refine post register allocation scheduling for global buffers.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
10 years agoGBE: cleanup image base index related code.
Zhigang Gong [Wed, 30 Jul 2014 07:36:01 +0000 (15:36 +0800)]
GBE: cleanup image base index related code.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
10 years agoGBE: Handle bti allocation for internal buffer used by printf.
Ruiling Song [Wed, 30 Jul 2014 05:59:30 +0000 (13:59 +0800)]
GBE: Handle bti allocation for internal buffer used by printf.

1. Move the bti/Register map from gbe::Context to ir::Function.
2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Refine bti usage in backend & runtime.
Ruiling Song [Wed, 30 Jul 2014 05:59:29 +0000 (13:59 +0800)]
GBE: Refine bti usage in backend & runtime.

Previously, we simply map 2G surface for memory access,
which has obvious security issue, user can easily read/write graphics
memory that does not belong to him. To prevent such kind of behaviour,
We bind each surface to a dedicated bti. HW provides automatic
bounds check. For out-of-bound write, it will be ignored. And for read
out-of-bound, hardware will simply return zero value.

The idea behind the patch is for a load/store instruction, it will search
through the LLVM use-def chain until finding out where the address
comes from. Then the bti is saved in ir::Instruction and used for
the later code generation. And for mixed pointer case, a load/store
will access more than one bti.

To simplify some code, '0' is reserved for constant address space,
'1' is reserved for private address space. Other btis are assigned
automatically by backend.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: set correct state for constant buffer on hsw.
Ruiling Song [Tue, 29 Jul 2014 07:41:38 +0000 (15:41 +0800)]
runtime: set correct state for constant buffer on hsw.

According to spec, should set I965_SURCHAN_SELECT_XXX on hsw.
Then we can use sampler message to read constant surface.

This fix the regression in unit test brought by:
'GBE: Optimize constant load with sampler.'

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests: Fix a bug in image_1D_buffer.
Ruiling Song [Mon, 28 Jul 2014 01:19:30 +0000 (09:19 +0800)]
utests: Fix a bug in image_1D_buffer.

Should use buffer_sz to clCreateBuffer().

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: align the fields in union ImageInfoKey.
Ruiling Song [Mon, 28 Jul 2014 01:19:29 +0000 (09:19 +0800)]
GBE: align the fields in union ImageInfoKey.

To avoid possible garbage data.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agodelete GEPInst when it is no longer used
Guo Yejun [Thu, 24 Jul 2014 22:00:27 +0000 (06:00 +0800)]
delete GEPInst when it is no longer used

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoclean llvm resource in compiler (libgbe.so)
Guo Yejun [Thu, 17 Jul 2014 23:16:34 +0000 (07:16 +0800)]
clean llvm resource in compiler (libgbe.so)

since we have separated the compiler (libgbe.so) and the interpreter
(libgbeinterp.so), the LLVM resource cleanup task should be done in
the compiler instead of the GenProgram::~GenProgram which has no way
to clean llvm resources in libgbeinterp.so

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
10 years agofix three memory leaks
Guo Yejun [Wed, 23 Jul 2014 23:51:18 +0000 (07:51 +0800)]
fix three memory leaks

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agofree build_log when the cl program is released
Guo Yejun [Thu, 17 Jul 2014 18:38:33 +0000 (02:38 +0800)]
free build_log when the cl program is released

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoNEWS: update for 0.9.2.
Zhigang Gong [Thu, 17 Jul 2014 02:37:09 +0000 (10:37 +0800)]
NEWS: update for 0.9.2.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agodocs: add a NEWS document to point to the release notes pages.
Zhigang Gong [Thu, 17 Jul 2014 02:14:12 +0000 (10:14 +0800)]
docs: add a NEWS document to point to the release notes pages.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>