contrib/beignet.git
10 years agoGBE: improve precision of exp
Lv Meng [Wed, 18 Dec 2013 06:29:02 +0000 (14:29 +0800)]
GBE: improve precision of exp

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: we should allocate register for ExtractElement insn.
Zhigang Gong [Fri, 20 Dec 2013 01:31:04 +0000 (09:31 +0800)]
GBE: we should allocate register for ExtractElement insn.

We should allocate register when we firstly visit ExtractElement
instruction, as we may refer the value before we visit that instruction
at the emit instruction pass.

The case which trigger this corner case is as below:
Clang/llvm may generate some code similar to the following IRs:

... (there is no definition of %7)
  br label 2

label1:
  %10 = add  i32 %7, %6
  ...
  ret

label2:
  %8 = load <4 x i8> addrspace(1)* %3, align 4, !tbaa !1
  %7 = extractelement <4 x i8> %8, i32 0
  ...
  br label1

The value %7 is assigned after label2 but is referred at label1.
From the control flow, the IRs is valid. As the reference will
be executed after the assignment. But the previous implementation
doesn't allocate proxyvalue for %7, that's the root cause why
it triggers an assert when visit the instruction %10 = add i32 %7, %6

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix a corner case when allocate registers for local buffer.
Zhigang Gong [Tue, 17 Dec 2013 05:07:19 +0000 (13:07 +0800)]
GBE: fix a corner case when allocate registers for local buffer.

We use a simple way to find a instruction which refer to the local
data. Then we can identify the parent function. We found there is
a corner case that the instruction may be modified at the optimization
pass, for example the GVN pass. When all of the instruction's operand
are modified to constant, then the whole instruction seens to be a
constant either.

If that is the case, we fail to get a valid instruction and may trigger
an assert. This patch change to check another use of the local data to
avoid this assert.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Fix logb implementation.
Ruiling Song [Wed, 11 Dec 2013 06:37:46 +0000 (14:37 +0800)]
GBE: Fix logb implementation.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix clang's "incorrect" optimization for barrier call.
Zhigang Gong [Fri, 13 Dec 2013 06:37:58 +0000 (14:37 +0800)]
GBE: fix clang's "incorrect" optimization for barrier call.

Clang may duplicate one barrier call to multiple branches which
breaks opencl's spec and may cause gpu hang. To fix this issue,
we have to implement the barrier in a llvm module file and specify
the function attribute to noduplicate, and we have to link this
pre-compiled module before we compile the user kernel, so we set
it the pcm lib file to the LinkBitCodeFile field of the clang
instance.

v2: fix one typo.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAccelerate utest.
Zhigang Gong [Wed, 11 Dec 2013 05:40:51 +0000 (13:40 +0800)]
Accelerate utest.

For some test cases which include more than one kernel, the current
implementation always build the program for a new sub test case.

That wastes a lot of time. This patch introduce a new macro
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM which has an extra parameter
to specify whether to keep the previous program and avoid the extra
build. The normal usage is:

MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn1, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn2, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn3, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn4, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn5, false);

The scenario is that the above fn1-5 are included in the same kernel
file and we define the sub cases in the same cpp file. We already
have some examples of this usage in the compiler_abs.cpp, compiler_abs_diff.cpp
compiler_basic_arithmetic.cpp, compiler_vector_load_store.cpp, etc.

This patch reduces 2/3 of the utests execution time.

v2: should always destroy the program when run one specific test case.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoDisable the PCH valid check to save a lot of compiling time.
Junyan He [Wed, 11 Dec 2013 03:09:27 +0000 (11:09 +0800)]
Disable the PCH valid check to save a lot of compiling time.

In clang, The PCH file will be used as an AST source, so
the check is strict. The macro define is also checked,
and if anything is different, the PCH is invalid and
the build processing will start from scratch.
Disable Clang's PCH valid check and do the compatible
check by ourself.

This patch do not solve the clang version problems.
Because the AST represent is an internal Clang's
data struct and may change between two clang versions.
So we will modify this issue later.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Improve precision of log2
Ruiling Song [Tue, 10 Dec 2013 08:33:16 +0000 (16:33 +0800)]
GBE: Improve precision of log2

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Improve precision of log10
Ruiling Song [Tue, 10 Dec 2013 08:23:02 +0000 (16:23 +0800)]
GBE: Improve precision of log10

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of log/log1p
Ruiling Song [Tue, 10 Dec 2013 08:23:01 +0000 (16:23 +0800)]
GBE: improve precision of log/log1p

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fixed the region check for three rect region related APIs.
Zhigang Gong [Tue, 3 Dec 2013 07:26:46 +0000 (15:26 +0800)]
Runtime: fixed the region check for three rect region related APIs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve asin/acos precision
Ruiling Song [Fri, 29 Nov 2013 08:03:42 +0000 (16:03 +0800)]
GBE: improve asin/acos precision

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: register width should not exceed execution width
Ruiling Song [Wed, 20 Nov 2013 05:51:32 +0000 (13:51 +0800)]
GBE: register width should not exceed execution width

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoGBE: Do not change vertical stride when it is 0
Ruiling Song [Wed, 20 Nov 2013 05:51:31 +0000 (13:51 +0800)]
GBE: Do not change vertical stride when it is 0

It will change scalar register g3<0,1,0> into g3<16,1,0> which illegally
crosses more than 2 adjacent rows.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoGBE: Fix null register to integer type
Ruiling Song [Wed, 20 Nov 2013 05:51:30 +0000 (13:51 +0800)]
GBE: Fix null register to integer type

GEN 'mach' instruction only support integer type register.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoFix float to ulong/long fail.
Yang Rong [Mon, 2 Dec 2013 09:10:26 +0000 (17:10 +0800)]
Fix float to ulong/long fail.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix signed to unsinged type sat convert.
Yang Rong [Mon, 2 Dec 2013 02:47:43 +0000 (10:47 +0800)]
Fix signed to unsinged type sat convert.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRefine isnan builtin.
Yang Rong [Mon, 25 Nov 2013 07:08:09 +0000 (15:08 +0800)]
Refine isnan builtin.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd FCMP UNO support.
Yang Rong [Mon, 2 Dec 2013 04:50:13 +0000 (12:50 +0800)]
Add FCMP UNO support.

And also correct some UXX compares.
V2: Not use OCL_OPTIMIZE_IMMEDIATE for XOR and ORD compare.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: filter the unsupported cl compile arguments out.
Zhigang Gong [Thu, 28 Nov 2013 02:54:47 +0000 (10:54 +0800)]
GBE: filter the unsupported cl compile arguments out.

As the unsupported argument may trigger unexpected compilation
error, we just remove them from the arglist.

If latter clang's cl frontend supports these arguments, we need
to revisit here.

This patch also add a new environment variable
OCL_OUTPUT_BUILD_LOG.
If this variable is set to 1, GBE will print the compile log to
the standard error channel (llvm::errs()). By default, it is false
and GBE will not print any build log.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoWhen local_work_size is null, try to choose a local_work_size.
Yang Rong [Fri, 29 Nov 2013 02:59:59 +0000 (10:59 +0800)]
When local_work_size is null, try to choose a local_work_size.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoComplete the feature of clGetEventProfilingInfo API
Junyan He [Fri, 29 Nov 2013 02:55:54 +0000 (10:55 +0800)]
Complete the feature of clGetEventProfilingInfo API

The profiling feature is now all supported. We use
drm_intel_reg_read to get the current time of GPU
when the event is queued and submitted, and use
PIPI_CONTROL cmd to get the executing time of the
GPU for kernel start and end.
One trivial problem is that:
The GPU timer counter is 36 bits with resolution of
80ns, so 2^36*80 = 5500s, about half an hour.
Some test may last about 2~5 min and if it starts at
about half an hour, this may cause a wrap back problem
and cause the case fail.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix B/UB compare fail.
Yang Rong [Thu, 28 Nov 2013 08:37:22 +0000 (16:37 +0800)]
Fix B/UB compare fail.

Because B/UB is treated as W/UW, so can't set src1's type when dismatch.
Set the correct type before getRegisterFromImmediate.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse -O1 when -cl-opt-disable, for inline function.
Yang Rong [Thu, 28 Nov 2013 03:00:43 +0000 (11:00 +0800)]
Use -O1 when -cl-opt-disable, for inline function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRemove test cl_create_kernel.
Yang Rong [Wed, 27 Nov 2013 08:40:30 +0000 (16:40 +0800)]
Remove test cl_create_kernel.

This test only try to allocate buffer with size large than CL_DEVICE_MAX_MEM_ALLOC_SIZE, and
assert if return status if not CL_INVALID_BUFFER_SIZE. But in openCL spec:
Implementations may return CL_INVALID_BUFFER_SIZE if size is greater than
CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in table 4.3 for all devices in context.

It don't must return CL_INVALID_BUFFER_SIZE. So remove it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: implement the get build log function and fix one build error check issue.
Zhigang Gong [Tue, 26 Nov 2013 10:39:59 +0000 (18:39 +0800)]
Runtime: implement the get build log function and fix one build error check issue.

According to spec, we need to support CL_PROGRAM_BUILD_LOG which is
used to get the build log of a cl kernel. And we also need to check
whether a build failure is a generic build fail or a build option
error. This commit also fix the piglit case:
API/clBuildProgram.

Another change in this commit is that it reroute all the output
of the clang excution to internal buffer and don't print to the
console directly. If the user want to get the detail build log,
the CL_PROGRAM_BUILD_LOG could be used.

v2: include both clang error messages and the llvm-to-gen error
messages. Also refine the checking for the error buffer parameter.
If there is no error buffer specified, always flush the build log
to llvm::errs().

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoCL/Runtime: workaround the unused sampler_t kernel argument.
Zhigang Gong [Fri, 22 Nov 2013 06:09:28 +0000 (14:09 +0800)]
CL/Runtime: workaround the unused sampler_t kernel argument.

Current implementation is to use a normal integer to represent
a sampler_t, then later when the sampler is used in read_image
or get_sampler_info, the backend will fixup its type to SAMPLER.

But some test case in piglit will define a sampler_t kernel argument
with an empty kernel budy. Then we will not have a chance to fixup
the kernel argument type to sampler, then we will fail at runtime side.

To workaround this issue, we change the sampler_t to short type.
Then when the user call clSetKernelArg to set a sampler, it will pass
in a pointer size with a short value argument type. It will fail
the size checking logic, then we fixup its type to sampler there.

As this workaround will only take effect when error occur, it will
not bring too much side effect to the normal cases. And it can
pass the existing test cases.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRuntime: fix some piglit failures.
Zhigang Gong [Thu, 21 Nov 2013 09:04:54 +0000 (17:04 +0800)]
Runtime: fix some piglit failures.

compiler_available should be true. And when a program is retained, we should
not call build on it again.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRuntime: fixed one missing case for clGetKernelWorkGroupInfo.
Zhigang Gong [Wed, 20 Nov 2013 09:53:34 +0000 (17:53 +0800)]
Runtime: fixed one missing case for clGetKernelWorkGroupInfo.

CL_KERNEL_PRIVATE_MEM_SIZE is not implemented, this patch fix
this issue and can pass the piglit test case.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRuntime: fixed parameter error checking in cl create buffer.
Zhigang Gong [Wed, 20 Nov 2013 07:51:41 +0000 (15:51 +0800)]
Runtime: fixed parameter error checking in cl create buffer.

This patch can pass piglit test case cl-api-create-buffer.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoAdd the drm include and lib path for find when drm is not the system one.
Junyan He [Tue, 26 Nov 2013 09:59:54 +0000 (17:59 +0800)]
Add the drm include and lib path for find when drm is not the system one.

Add the support when the DRM lib is not in the system standard location.
In some cases, we want to debug the libdrm but not want to influence the
whole system.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoEnlarge the global mem size.
Yang Rong [Wed, 27 Nov 2013 06:06:50 +0000 (14:06 +0800)]
Enlarge the global mem size.

When create image, due to alignment, will casue size large than max alloc size.
Enlarge the global memory size and using it to check size when alloc.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix some get image info errors.
Yang Rong [Wed, 27 Nov 2013 06:06:51 +0000 (14:06 +0800)]
Fix some get image info errors.

Get correct grf offset and need clear image set offsets.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a build problem when the llvm version has the fix version digit.
Zhigang Gong [Wed, 27 Nov 2013 02:01:56 +0000 (10:01 +0800)]
Fix a build problem when the llvm version has the fix version digit.

If the llvm version is something like 3.3.1, the previous cmake script
will generate an incorrect cflags as: -DLLVM_33 1 which breaks the build.

This commit also update the stable llvm version from 3.1 to 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoAdd vload_half and vstore_half build in.
Yang Rong [Fri, 22 Nov 2013 11:51:57 +0000 (19:51 +0800)]
Add vload_half and vstore_half build in.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd convert between fp16 and fp32.
Yang Rong [Fri, 22 Nov 2013 11:51:56 +0000 (19:51 +0800)]
Add convert between fp16 and fp32.

Use convert instruction in ir, and ALU1 in gen selection.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a compare immediate optimize error.
Yang Rong [Fri, 15 Nov 2013 03:40:30 +0000 (11:40 +0800)]
Fix a compare immediate optimize error.

When do LOADI/compare -> compare optimize, IMM src1 will using LOADI type,
but LOADI doesn't  care unsigned or signed. Should use the compare type.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoAdd FCmpInst ord support.
Yang Rong [Fri, 15 Nov 2013 05:45:53 +0000 (13:45 +0800)]
Add FCmpInst ord support.

Because gen do not support isorder direct, use (src0 == src0) && (src1 == src1).
BTW: can't use !unordered.

v2: Refine, don't need AND.
v3: Do not change getGenCompare function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix builtin function "round"
Homer Hsing [Wed, 13 Nov 2013 08:49:16 +0000 (16:49 +0800)]
fix builtin function "round"

previously using round to even, the result was wrong.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoimprove builtin function "rint"
Homer Hsing [Wed, 13 Nov 2013 08:49:17 +0000 (16:49 +0800)]
improve builtin function "rint"

directly use __gen_ocl_rnde

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix builtin function "isnormal"
Homer Hsing [Fri, 8 Nov 2013 05:27:30 +0000 (13:27 +0800)]
fix builtin function "isnormal"

fix a corner case of very small input

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoput a mutex around gbe_program_new_from_llvm
Homer Hsing [Tue, 5 Nov 2013 05:28:13 +0000 (13:28 +0800)]
put a mutex around gbe_program_new_from_llvm

because random crash happens if without the mutex

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix ASR operator for 64bit integer
Homer Hsing [Mon, 4 Nov 2013 02:13:30 +0000 (10:13 +0800)]
fix ASR operator for 64bit integer

if operand is positive, then pad zero at high 32 bit.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRemove boolean values cannot cross their definition basic block restrict.
Yang Rong [Thu, 14 Nov 2013 03:14:33 +0000 (11:14 +0800)]
Remove boolean values cannot cross their definition basic block restrict.

Add mov bool support.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix builtin function "ilogb"
Homer Hsing [Tue, 5 Nov 2013 07:48:20 +0000 (15:48 +0800)]
fix builtin function "ilogb"

add FP_ILOGB0, FP_ILOGBNAN
return FP_ILOGB0 for zero.
return FP_ILOGBNAN for nan.
return INT_MAX for inf.
also improve function code for other cases.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix builtin function "ldexp"
Homer Hsing [Tue, 12 Nov 2013 08:38:26 +0000 (16:38 +0800)]
fix builtin function "ldexp"

fixed corner cases when input parameter has special value

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix builtin function "nextafter"
Homer Hsing [Tue, 12 Nov 2013 06:36:52 +0000 (14:36 +0800)]
fix builtin function "nextafter"

fix for some corner cases

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix builtin function "fdim"
Homer Hsing [Tue, 12 Nov 2013 05:12:35 +0000 (13:12 +0800)]
fix builtin function "fdim"

check whether input is NaN. fix the code if input is inf

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: fix the constant data allocation.
Zhigang Gong [Tue, 12 Nov 2013 14:53:24 +0000 (22:53 +0800)]
GBE: fix the constant data allocation.

Need to keep consistency between the constant data
allocation and the constant register allocation.
So we need to skip the unused constant data at the
constant data allocation stage.

To avoid possible mismatching, add a new assert in
the constant register(address) allocation stage to
make sure the address register match the eaxct constant
data.

Also modify the constant utest slightly to hit this
code path.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
10 years agoGBE: remove all vstore macros for constant memory space.
Zhigang Gong [Tue, 12 Nov 2013 10:51:01 +0000 (18:51 +0800)]
GBE: remove all vstore macros for constant memory space.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
10 years agoAdd bitcast support between vetor and scalar type.
Yang Rong [Tue, 12 Nov 2013 09:17:14 +0000 (17:17 +0800)]
Add bitcast support between vetor and scalar type.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd other unsigned interger types mask type of shuffle and shuffle2.
Yang Rong [Tue, 12 Nov 2013 09:17:13 +0000 (17:17 +0800)]
Add other unsigned interger types mask type of shuffle and shuffle2.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Add support for kernel attribute reqd_work_group_size.
Zhigang Gong [Tue, 12 Nov 2013 06:34:47 +0000 (14:34 +0800)]
GBE: Add support for kernel attribute reqd_work_group_size.

When a kernel has __attribute__((reqd_work_group_size(X, Y, Z))) qualifier,
the kernel will only accept that group size.

v2: add binary load/store support.
v3: fix the MDNode parsing according to spir spec. It's using the following
structure rather than a tbaa tree.

!spir.functions = !f !0,!1,...,!N g
; Note: The first element is always an LLVM::Function signature
!0 = metadata !f < function signature >, !01, !02, ..., , !0i g
!1 = metadata !f < function signature >, !11, !12, ..., , !1j g
...
!N = metadata !f < function signature >, !N1, !N2, ..., , !Nk g

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: handle half type size
Ruiling Song [Tue, 12 Nov 2013 01:10:01 +0000 (09:10 +0800)]
GBE: handle half type size

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
10 years agoRuntime: complete the api clGetKernelWorkGroupInfo.
Zhigang Gong [Mon, 11 Nov 2013 08:20:26 +0000 (16:20 +0800)]
Runtime: complete the api clGetKernelWorkGroupInfo.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoignore a clang unsupported building option
Homer Hsing [Mon, 11 Nov 2013 02:45:42 +0000 (10:45 +0800)]
ignore a clang unsupported building option

IVB does not support float denorm value.
So the building option "-cl-denorms-are-zero" can be safely ignored.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agogbe_bin_generator: should not use append option when create new binary.
Zhigang Gong [Mon, 11 Nov 2013 02:33:28 +0000 (10:33 +0800)]
gbe_bin_generator: should not use append option when create new binary.

We should use trunc option rather than app when we need to create a new
binrary file.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
10 years agoFixup the problem of CL_PROGRAM_BINARIES in clGetProgramInfo API
Junyan He [Fri, 8 Nov 2013 10:41:59 +0000 (18:41 +0800)]
Fixup the problem of CL_PROGRAM_BINARIES in clGetProgramInfo API

clGetProgramInfo using CL_PROGRAM_BINARIES to get the binary will
not be right because the binary got is not the serilization one.
Add the serilization there to fix this bug.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix builtin function "fmax"
Homer Hsing [Fri, 8 Nov 2013 07:35:41 +0000 (15:35 +0800)]
fix builtin function "fmax"

if an parameter is nan, then returns another parameter.

v2: no need to test nan for integer
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Fix alignment for private variables
Ruiling Song [Fri, 8 Nov 2013 03:20:09 +0000 (11:20 +0800)]
GBE: Fix alignment for private variables

Private variables allocated on the stack should be aligned according to OCL spec.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Fix alignment according to OCL spec
Ruiling Song [Fri, 8 Nov 2013 03:16:47 +0000 (11:16 +0800)]
GBE: Fix alignment according to OCL spec

The patch simply store a 'align' for each kernel argument.
Then the runtime could align the kernel argument address to 'align'.
This patch works for constant and local address space.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Remove max_limit for struct alignment
Ruiling Song [Fri, 8 Nov 2013 03:12:44 +0000 (11:12 +0800)]
GBE: Remove max_limit for struct alignment

a struct may have vector field (like int8/16), max_limit is meaningless.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agorelease context in runtime_createcontextfromtype
Homer Hsing [Fri, 8 Nov 2013 02:57:42 +0000 (10:57 +0800)]
release context in runtime_createcontextfromtype

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoMove the gpgpu struct from cl_command_queue to thread specific context
Junyan He [Thu, 7 Nov 2013 16:58:00 +0000 (00:58 +0800)]
Move the gpgpu struct from cl_command_queue to thread specific context

We find some cases will use multi-threads to run on the same queue,
executing the same kernel. This will cause the gpgpu struct which
is very important for GPU context setting be destroyed because we
do not implement any sync protect on it now.
Move the gpgpu struct into thread specific space will fix this problem
because the lib_drm will do the GPU command serialization for us.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Zou, Nanhai" <nanhai.zou@intel.com>
10 years agoAdd the clGetMemObjectInfo options for sub-buffer and update the utest case
Junyan He [Thu, 7 Nov 2013 08:44:53 +0000 (16:44 +0800)]
Add the clGetMemObjectInfo options for sub-buffer and update the utest case

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the test case for sub buffer check
Junyan He [Thu, 7 Nov 2013 08:44:46 +0000 (16:44 +0800)]
Add the test case for sub buffer check

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement the clCreateSubBuffer API
Junyan He [Thu, 7 Nov 2013 08:44:39 +0000 (16:44 +0800)]
Implement the clCreateSubBuffer API

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the bo's internal offset support when do drm_intel_bo_emit_reloc
Junyan He [Thu, 7 Nov 2013 08:44:31 +0000 (16:44 +0800)]
Add the bo's internal offset support when do drm_intel_bo_emit_reloc

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix a 64bit scalar register issue.
Ruiling Song [Thu, 7 Nov 2013 07:13:13 +0000 (15:13 +0800)]
GBE: fix a 64bit scalar register issue.

For scalar register, should use stride 0.
also change the unit test to hit the point.

v2: fix h2()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoimprove multithread calling of llvm
Homer Hsing [Thu, 7 Nov 2013 07:32:56 +0000 (15:32 +0800)]
improve multithread calling of llvm

call llvm multithread function instead of using a semaphore.
also exit llvm multithread mode at the end of life.

v2: not call llvm::shutdown() if llvm is older than 3.4
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix builtin function "fract"
Homer Hsing [Thu, 7 Nov 2013 06:55:23 +0000 (14:55 +0800)]
fix builtin function "fract"

v2: return nan for nan, +zero for +inf, -zero for -inf.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix builtin function "copysign"
Homer Hsing [Thu, 7 Nov 2013 06:31:36 +0000 (14:31 +0800)]
fix builtin function "copysign"

using better algorithm

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix builtin function 'frexp'
Homer Hsing [Wed, 6 Nov 2013 01:04:53 +0000 (09:04 +0800)]
fix builtin function 'frexp'

returns correct value for nan or inf.
also returns correct value for very small float value.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agorelease previous program in cl_kernel_init
Homer Hsing [Tue, 5 Nov 2013 05:08:15 +0000 (13:08 +0800)]
release previous program in cl_kernel_init

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agorelease previous kernel in cl_kernel_init
Homer Hsing [Mon, 4 Nov 2013 08:29:02 +0000 (16:29 +0800)]
release previous kernel in cl_kernel_init

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: fix some max/alignment values.
Zhigang Gong [Thu, 7 Nov 2013 01:47:11 +0000 (09:47 +0800)]
Runtime: fix some max/alignment values.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: fix one bug in clGetProgramInfo.
Zhigang Gong [Thu, 7 Nov 2013 01:44:20 +0000 (09:44 +0800)]
Runtime: fix one bug in clGetProgramInfo.

The CL_PROGRAM_BINARIES forget to return the param value size.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Don't modify argument 0 of the get image information instruction.
Zhigang Gong [Thu, 7 Nov 2013 01:31:37 +0000 (09:31 +0800)]
GBE: Don't modify argument 0 of the get image information instruction.

When the first round compilation fail, GBE will turn to recompile the
sample program by using another profile. If we changed the argument
0 of the get image information instruction, then it will fail the second
round compilation. But the argument 1 is ok to change, as we never change
the first instruction's argument, and all the subsequent instruction's
argument 1 is free to change.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: fix the length of properties.
Zhigang Gong [Thu, 7 Nov 2013 01:29:20 +0000 (09:29 +0800)]
Runtime: fix the length of properties.

The last zero should also be counted.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: implement clGetSamplerInfo.
Zhigang Gong [Wed, 6 Nov 2013 04:49:30 +0000 (12:49 +0800)]
Runtime: implement clGetSamplerInfo.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: fix some max values.
Zhigang Gong [Tue, 5 Nov 2013 05:56:06 +0000 (13:56 +0800)]
Runtime: fix some max values.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRuntime: fix the incorrect device info string size.
Zhigang Gong [Tue, 5 Nov 2013 05:34:54 +0000 (13:34 +0800)]
Runtime: fix the incorrect device info string size.

sizeof(str) already includes the '\0', we don't need to add
1 to it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: disable MulAdd pattern in instruction selection temporarily.
Ruiling Song [Fri, 1 Nov 2013 06:16:08 +0000 (14:16 +0800)]
GBE: disable MulAdd pattern in instruction selection temporarily.

The story starts from 'FP_CONTRACT'. In c99 spec, it describes constract
expression as:
"A floating expression may be contracted, that is, evaluated as though it
were an atomic operation, thereby omitting rounding errors implied by the
source code and the expression evaluation method."

But user can use 'pragma FP_CONTRACT OFF' to disable float contraction,
in which condition, we should not do contraction like mad optimization.
In SPIR 1.2, named metadata 'opencl.enable.FP_CONTRACT' will be used to do this.
When Clang is ready, we need refine the backend logic.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
10 years agoutests: add test case for structure argument
Lu Guanqun [Tue, 5 Nov 2013 05:55:27 +0000 (13:55 +0800)]
utests: add test case for structure argument

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix the error that structure would be pushed twice
Yang Rong [Tue, 5 Nov 2013 05:55:23 +0000 (13:55 +0800)]
fix the error that structure would be pushed twice

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Tested-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
10 years agoGBE: use ISA mad for mad() builtin function.
Ruiling Song [Tue, 5 Nov 2013 08:37:13 +0000 (16:37 +0800)]
GBE: use ISA mad for mad() builtin function.

directly map mad() to ISA mad. so mad will have better performance and
less precision loss.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests: use mad which will get better precision.
Ruiling Song [Tue, 5 Nov 2013 08:37:12 +0000 (16:37 +0800)]
utests: use mad which will get better precision.

Normal mul/add could not met the precision requirement of this case.
Previously it passed because we will do mad optimization in backend.
Use mad directly, so the test case does not depend on backend optimization.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd a necessary include path for building with mesa.
Zhigang Gong [Mon, 4 Nov 2013 07:05:41 +0000 (15:05 +0800)]
Add a necessary include path for building with mesa.

Reported by Lv Meng.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agofix operators for 64 bit integer
Homer Hsing [Mon, 4 Nov 2013 01:39:33 +0000 (09:39 +0800)]
fix operators for 64 bit integer

if operand is signed 64 bit integer, emit -1 for SExt casting

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix pointer bugs in linked list
Homer Hsing [Fri, 1 Nov 2013 05:53:47 +0000 (13:53 +0800)]
fix pointer bugs in linked list

change the header of linked list if the header was deleted

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix ill-coded utest_run::main
Homer Hsing [Thu, 31 Oct 2013 08:36:32 +0000 (16:36 +0800)]
fix ill-coded utest_run::main

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd same type converting
Homer Hsing [Thu, 31 Oct 2013 03:12:54 +0000 (11:12 +0800)]
add same type converting

converting a data type to same type ...

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoruntime: Fix a dangling pointer issue
Ruiling Song [Thu, 31 Oct 2013 03:01:21 +0000 (11:01 +0800)]
runtime: Fix a dangling pointer issue

ctx->events points to the head of 'event list' under the ctx.
When deleting an event from the list, we should also update
the head pointer besides updating its neighbour's next & prev,

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewedy-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoGBE: fixed one bug for vector relational builtin functions.
Zhigang Gong [Tue, 29 Oct 2013 07:02:15 +0000 (15:02 +0800)]
GBE: fixed one bug for vector relational builtin functions.

For most vector relational builtin functions, we need to
return -1 if the element result is true, return 0 if the element
result is 0. So we can simply put a - in front of the scalar
version of function for each element.

Reported by Yang Rong.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix built-in function "normalize"
Homer Hsing [Tue, 29 Oct 2013 03:12:46 +0000 (11:12 +0800)]
fix built-in function "normalize"

divide the parameter by its length

ver 2: scalar typed function returns NaN if parameter is NaN.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix built-in function "fast_normalize"
Homer Hsing [Mon, 28 Oct 2013 01:02:33 +0000 (09:02 +0800)]
fix built-in function "fast_normalize"

if the parameter is zero, then return zero
if the parameter is positive, then return 1.
for other cases, return -1.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Lu, Guanqun" <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Give a zero-initialized register for Undef value.
Ruiling Song [Tue, 29 Oct 2013 05:57:51 +0000 (13:57 +0800)]
GBE: Give a zero-initialized register for Undef value.

For instructions that reference an undef value, we simply
allocate a register to the undef operand and set as 0.

v2:
handle float and double type. also fix some typos about double type.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoRefine the build option checking.
Yang Rong [Tue, 29 Oct 2013 05:59:41 +0000 (13:59 +0800)]
Refine the build option checking.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoFix a event segment fault.
Yang Rong [Tue, 29 Oct 2013 05:39:35 +0000 (13:39 +0800)]
Fix a event segment fault.

If event type is CL_COMMAND_USER, event->queue is NULL, cause segment fault.
Change the order to fix it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoGBE: enable bitselect vector builtin functions.
Zhigang Gong [Tue, 29 Oct 2013 05:31:09 +0000 (13:31 +0800)]
GBE: enable bitselect vector builtin functions.

Now we have the scalar version of bitselect, so we
enable the vector version in the def file. Also remove
some comments.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>