contrib/beignet.git
10 years agoAdd -cl-fast-relaxed-math into incompatible opts and fix the PreprocessorOptions bug
Junyan He [Wed, 15 Jan 2014 07:34:12 +0000 (15:34 +0800)]
Add -cl-fast-relaxed-math into incompatible opts and fix the PreprocessorOptions bug

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRefine the method to find pch and pcm files.
Zhigang Gong [Thu, 9 Jan 2014 09:36:37 +0000 (17:36 +0800)]
Refine the method to find pch and pcm files.

When compile user kernels, we need to find the precompiled header
file and the precompiled module file. The previous implementation
will find the build directory then find the system directory.

This is not elegant when it is packaged to a distro. It doesn't
need to search the build directory. So I change the default search
path to the system directory only. And for the deveoper, I change
the build script to set a proper environment variable and make the
gbe bin generator and the utest could find the local pch files and
pcm files firstly.

The only change is now, after the build process. Before the user
run the utests, it need to set up the environment firstly. Just
invoke

. utest/setenv.sh.

Then everything should be the same as previous. This setenv.sh also
set the OCL_KERNEL_PATH, so you don't need to set it manually now.

This patch also update the document.

v2:
add the missing setenv.sh.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: enable relocatable pch files.
Zhigang Gong [Thu, 9 Jan 2014 06:20:29 +0000 (14:20 +0800)]
GBE: enable relocatable pch files.

As by default, when include a pch file, clang need to make sure
the original header file is untouched. This is impossible when
we want to distribute a pch file to a new system. We need to
use the relocatable pch feature provided by clang here.
We now create two pch files. One is relocatable pch file which
is used to install to the system directory. The other is a local
pch file which is used during the build time. We need both pch
files because at the build time, we don't have an ocl_stdlib.h
in the system directory. The local pch file is used for the beignet's
build and the utest only. All the other applications will use
the installed pch/pcm files.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoCL: prepare to support ICD if the system has ocl-icd..
Zhigang Gong [Wed, 8 Jan 2014 10:57:31 +0000 (18:57 +0800)]
CL: prepare to support ICD if the system has ocl-icd..

v2:
Only install the intel-beignet.icd if the system has ocl-icd
support.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoCL: back port ICD support to 1.1 branch.
Zhigang Gong [Wed, 8 Jan 2014 11:10:53 +0000 (19:10 +0800)]
CL: back port ICD support to 1.1 branch.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fixed a long related bug.
Zhigang Gong [Fri, 10 Jan 2014 09:49:12 +0000 (17:49 +0800)]
GBE: fixed a long related bug.

We need to consider the situation that the 64 bit virtual register
is crossing two GRFs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRevert faulty pushed patchset
Zhigang Gong [Tue, 14 Jan 2014 01:33:00 +0000 (09:33 +0800)]
Revert faulty pushed patchset

This reverts:
Revert "GBE: fixed a long related bug."
Revert "Refine the method to find pch and pcm files."
Revert "GBE: enable relocatable pch files."
Revert "CL: prepare to support ICD if the system has ocl-icd.."
Revert "CL: back port ICD support to 1.1 branch."

The above patches are merged by accident without review comments and
are broken. Now revert them.

10 years agoGBE: fixed a long related bug.
Zhigang Gong [Fri, 10 Jan 2014 09:49:12 +0000 (17:49 +0800)]
GBE: fixed a long related bug.

We need to consider the situation that the 64 bit virtual register
is crossing two GRFs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRefine the method to find pch and pcm files.
Zhigang Gong [Thu, 9 Jan 2014 09:36:37 +0000 (17:36 +0800)]
Refine the method to find pch and pcm files.

When compile user kernels, we need to find the precompiled header
file and the precompiled module file. The previous implementation
will find the build directory then find the system directory.

This is not elegant when it is packaged to a distro. It doesn't
need to search the build directory. So I change the default search
path to the system directory only. And for the deveoper, I change
the build script to set a proper environment variable and make the
gbe bin generator and the utest could find the local pch files and
pcm files firstly.

The only change is now, after the build process. Before the user
run the utests, it need to set up the environment firstly. Just
invoke

. utest/setenv.sh.

Then everything should be the same as previous. This setenv.sh also
set the OCL_KERNEL_PATH, so you don't need to set it manually now.

This patch also update the document.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: enable relocatable pch files.
Zhigang Gong [Thu, 9 Jan 2014 06:20:29 +0000 (14:20 +0800)]
GBE: enable relocatable pch files.

As by default, when include a pch file, clang need to make sure
the original header file is untouched. This is impossible when
we want to distribute a pch file to a new system. We need to
use the relocatable pch feature provided by clang here.
We now create two pch files. One is relocatable pch file which
is used to install to the system directory. The other is a local
pch file which is used during the build time. We need both pch
files because at the build time, we don't have an ocl_stdlib.h
in the system directory. The local pch file is used for the beignet's
build and the utest only. All the other applications will use
the installed pch/pcm files.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoCL: prepare to support ICD if the system has ocl-icd..
Zhigang Gong [Wed, 8 Jan 2014 10:57:31 +0000 (18:57 +0800)]
CL: prepare to support ICD if the system has ocl-icd..

v2:
Only install the intel-beignet.icd if the system has ocl-icd
support.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoCL: back port ICD support to 1.1 branch.
Zhigang Gong [Wed, 8 Jan 2014 11:10:53 +0000 (19:10 +0800)]
CL: back port ICD support to 1.1 branch.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: Remove some noduplicate to let inline works
Ruiling Song [Wed, 8 Jan 2014 06:58:07 +0000 (14:58 +0800)]
GBE: Remove some noduplicate to let inline works

llvm Inliner seems won't inline a function if it contains noduplicate function calls.
So, we just keep the noduplicate for barrier itself. then barrier() could still be inlined.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoMove the memory allocate size check to the callee.
Yang Rong [Tue, 7 Jan 2014 03:30:54 +0000 (11:30 +0800)]
Move the memory allocate size check to the callee.

Because image's alignment, the alloc size may exceed the CL_DEVICE_MAX_MEM_ALLOC_SIZE if the
image's size is calculate from it. So move the size check from cl_mem_allocate to the callee, and
slightly enlarge the limit size when check in allocate image.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoStart looking for LLVM from version 3.3 then higher version.
Simon Richter [Mon, 2 Dec 2013 13:27:46 +0000 (14:27 +0100)]
Start looking for LLVM from version 3.3 then higher version.

When different LLVM versions are installed, look for 3.5, 3.4 and 3.3 in
order, then try the system default.

As configuring for 3.1 and 3.2 gives an error now, drop these versions from
the search.

v2:
change to use llvm 3.3 as the preferred version.
update the document accordingly.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests/CMakeList.txt: Remove kernel files which generated by utest_generator.py.
Yi Sun [Thu, 2 Jan 2014 06:17:03 +0000 (14:17 +0800)]
utests/CMakeList.txt: Remove kernel files which generated by utest_generator.py.

v1. Remove all files which generated automatically.
v2. Refine the depends of generated test cases.
v3. Fix bug that error occurs while building project outside of source folder.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix the multi-thread crash problem of batch buffer release.
Junyan He [Mon, 6 Jan 2014 09:06:59 +0000 (17:06 +0800)]
Fix the multi-thread crash problem of batch buffer release.

The case causes like this:
our thread hold the ref of the batch buffer, but have called
cl_driver_delete to delete the bufmgr. So when we release
the buffer object next time, the bufmgr's function pointer
is invalid and cause the crash.
We now release the batch buffer before every time call the
cl_set_thread_batch_buf.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRefine calculation for ULP.
Yi Sun [Mon, 6 Jan 2014 08:51:52 +0000 (16:51 +0800)]
Refine calculation for ULP.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: handle the first index of GEP correctly.
Zhigang Gong [Tue, 7 Jan 2014 04:14:54 +0000 (12:14 +0800)]
GBE: handle the first index of GEP correctly.

The first index of GEP instruction is to step over the pointer[0]
to the index. We just need to calculate the *pointer's size, and
step over *pointer's size * Index to reach the position of the
data strucutre. Then we start to iterate the composite data type.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Fix a bug at constant GEP processing.
Zhigang Gong [Tue, 7 Jan 2014 02:37:55 +0000 (10:37 +0800)]
GBE: Fix a bug at constant GEP processing.

We need to initialize the offset to zero for each new operand.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: clang's FE doesn't support static, we just ignore it.
Zhigang Gong [Mon, 6 Jan 2014 08:37:36 +0000 (16:37 +0800)]
GBE: clang's FE doesn't support static, we just ignore it.

Although opencl spec does support static global variable or
non-kernel function, clang doesn't support them currently.
We simply ignore it currently.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: optimize JMP instruction.
Zhigang Gong [Fri, 3 Jan 2014 09:15:58 +0000 (17:15 +0800)]
GBE: optimize JMP instruction.

If the pred register is not in the liveIn set, it means this register
is defined in this block. Then we don't need to validate it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: optimize the CMP instruction.
Zhigang Gong [Fri, 3 Jan 2014 09:03:09 +0000 (17:03 +0800)]
GBE: optimize the CMP instruction.

If the dst bool value is not in the liveIn set, then we don't need
to care about those inactive lanes as they don't hold any active data.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: validate active bool value in the branching instruction.
Zhigang Gong [Fri, 3 Jan 2014 04:54:15 +0000 (12:54 +0800)]
GBE: validate active bool value in the branching instruction.

As one bool value may be used in multiple basic blocks, we have to
validate its value to and it with current flag register.

This patch is not fully optimized. As we can avoid the validation,
if we know this bool value is already validated in the same basic
block. I will write another patch to do this optimization.

After this patch, the Opencv's all filter/blur and filter/filter2D
passed.

v2:
The compare instruction should not touch the bool value's
inactive lanes. The previous implementation clear those
channels to zero by default.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: use soft mask to handle the barrier call.
Zhigang Gong [Mon, 30 Dec 2013 10:26:42 +0000 (18:26 +0800)]
GBE: use soft mask to handle the barrier call.

As the GPU is running under predication control, the following IR
may lead one single barrier be called twice at runtime.

A:
  barrier()
  instructions after barrier()

B:
  ...
  BR(cond) A

C:
  ...
  BR A

When it runs to B's BR instruction, and if any of the condition bits is
true, it will jump to block A to execute the barrier. Then latter, if
any of the condition bits is false, it will continue to execute the
block C's code and at the end of the C block, it jump to A to execute
the barrier again.

If on the other thread, all the condition bits are true, then it triggers
a hang.

And even if all the threads run the same count of barrier, it may cause
incorrect result, as it executes the instructions after barrier() in block
A before all the work items hit the barrier point.

The solution to fix this issue is to use a soft mask register. The register
is shared by all barrier call. We initialize it to !emask at the beginning
of the program.

barrierMask = !emask.

Then when it runs into the barrier call, we set current predication bits
to the mask register, and check whether all the lanes are set. If any of
the lanes is disabled, we simply jump to next basic block. Then latter
when it runs into barrier again, we can set more bits/lanes to 1, and
check it again, if all the bits are 1, then we set the preciation flag 0,0
to all 1 and execute the barrier call and after the wait, we reinitialize
the barrierMask to !emask, and run all the other instructions after the
barrier() in block A with all lanes enabled.

After this patch, we can fix the hang issue when testing the opencv's
transpose test cases.

v2:
1. If there are still some lanes not reach the barrier, we need to set all
   the finished lanes' block ip to FFFF, and we also need to clear all the
   flag0 to zero. Thus we can avoid to execute those instructions after the
   barrier too early.
2. fix some typos.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoMove the llvm optimize pass from clang to backend.
Yang Rong [Thu, 26 Dec 2013 01:55:54 +0000 (09:55 +0800)]
Move the llvm optimize pass from clang to backend.

Call llvm opt pass in llvmToGen. Remove SROA pass and call GVN pass with NoLoads is true to avoid
large integer. Also handle the opt level in function llvmToGen, 0 equal to clang -O1, and 1 equal to
clang -O2.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix utest compiler_function_argument3 error after move -O2 to backend.
Yang Rong [Tue, 31 Dec 2013 07:20:52 +0000 (15:20 +0800)]
Fix utest compiler_function_argument3 error after move -O2 to backend.

After move optimize from clang to backend, some pass is removed, and some pass using diff parameters,
will trigger the bug in build pushmap, cause compiler_function_argument3 fail.

There maybe one loadImm/add instruction used by different loads, in set seq. So should not add to pushmap
if the same argID/offset already added, also can't delete loadImm/add instruction again if have been deleted.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoModify the multi-thread support for queue.
Junyan He [Tue, 31 Dec 2013 07:25:57 +0000 (15:25 +0800)]
Modify the multi-thread support for queue.

The old multi-thread support for queue do not work
when threads will not exit. If the thread not exit
but the queue is re-generated all the time, the
gpgpu struct resouce will leak, and will fail to
create GPU bo for gpgpu struct finally.
We modify it to release the GPGPU resource every
enqueuNDR finished and we re-alloc our gpgpu struct
context next time.

Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoprovide meaningful device names through clGetDeviceInfo
Mario Kicherer [Sun, 29 Dec 2013 22:04:04 +0000 (23:04 +0100)]
provide meaningful device names through clGetDeviceInfo

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoreport errors if opening the DRI device fails
Mario Kicherer [Sun, 29 Dec 2013 22:04:03 +0000 (23:04 +0100)]
report errors if opening the DRI device fails

Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix the potential issue when there are inactive lanes.
Zhigang Gong [Mon, 30 Dec 2013 10:55:01 +0000 (18:55 +0800)]
GBE: fix the potential issue when there are inactive lanes.

If there are some inactive lanes, then the JMPI with all16h
may fail to jump even all the active lanes are in false condition.

Then it may execute a BB with all zero flag, and when the BB
has some noMask/noPredication instructions, it may bring unexpected
result. this patch fixes this problem by the following method.
It use two UW register to fixup the flag result before each
JMP. Before the ALL16/8H JMPI, it set the inactive lane to 1s.
Before the ANY16/8 JMPI, it clear all the inactive lane to 0s.

It introduces one extra instruction before each predicatable JMPI
instruction. It causes a little bit overhead.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: refine the register expiring handling.
Zhigang Gong [Mon, 30 Dec 2013 10:52:24 +0000 (18:52 +0800)]
GBE: refine the register expiring handling.

Previous implementation expires one register each time which
is not every efficient, now change to expire as much as possible
registers.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: rewrite the liveness analysis routine.
Zhigang Gong [Fri, 20 Dec 2013 07:15:09 +0000 (15:15 +0800)]
GBE: rewrite the liveness analysis routine.

The previous implementation has two problems:

1. At the liveness analysis phase, the liveIn and liveOut computation
is incorrect. The liveIn is not a static information it should be computed
along with the liveOut during the backward data flow analysis.

2. At the register allocation phase, it only considers the liveOut
information. Actually, we also need to consider the liveIn information.

v2:
a. Remove calculating maxID for the liveIn register set and remove calculating
   minID for the liveOut register set.
b. Don't insert a bb to the liveness work list if it is already in the list.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: improve precision of atanh
Lv Meng [Mon, 23 Dec 2013 04:09:13 +0000 (12:09 +0800)]
GBE: improve precision of atanh

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: improve precision of ldexp
Lv Meng [Mon, 23 Dec 2013 00:21:10 +0000 (08:21 +0800)]
GBE: improve precision of ldexp

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoFix a convert typo.
Yang Rong [Fri, 27 Dec 2013 09:15:36 +0000 (17:15 +0800)]
Fix a convert typo.

Should return float, but long. Correct it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix some long ops bug.
Yang Rong [Fri, 27 Dec 2013 08:29:33 +0000 (16:29 +0800)]
Fix some long ops bug.

Some long ops will using some bool registers as dst in selection. When allocate,
if flag register is not enough, will allocate these bool registers in grf. And then,
use these registers as flag register directly, will cause fail. Add a check before using
the bool register, if grf and f0.1 is not using, use f0.1.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a build pushMap bug.
Yang Rong [Thu, 26 Dec 2013 01:55:55 +0000 (09:55 +0800)]
Fix a build pushMap bug.

Insert the pushMap to set to avoid multiple push.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRevert choose local size change when local size is null in clEnqueueNDRang.
Yang Rong [Tue, 24 Dec 2013 05:40:16 +0000 (13:40 +0800)]
Revert choose local size change when local size is null in clEnqueueNDRang.

It will trigger some bugs if local size not 1, will re-enable it after fix these bugs.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix convert long/ulong to float.
Yang Rong [Thu, 26 Dec 2013 03:03:19 +0000 (11:03 +0800)]
Fix convert long/ulong to float.

Previour implement don't handle rounding. The default rouding mode should be round to even.
According float format, separate long/ulong to two part, first 23 non zero bits is mantissa,
add 1 when the next bit is 1, and than round to even when all remain bits is zero.

v2: correct jmpi's jumpDistance.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix rtz, rtp, rtn when convert int/uint/long/ulong to float.
Yang Rong [Tue, 17 Dec 2013 07:32:13 +0000 (15:32 +0800)]
Fix rtz, rtp, rtn when convert int/uint/long/ulong to float.

Convert input to float and convert float to input type again, as c. Compare the
input and c, if not match the rtz/rtp/rtn require, +/- 1 ULP.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd test cases generator.
Yi Sun [Tue, 24 Dec 2013 03:15:18 +0000 (11:15 +0800)]
Add test cases generator.

    v1:
    File utest_generator.py contain the base class and function for generating
    File utest_math_gen.py can generate most math function for all the gentype
    utest_math_gen.py can be run during cmake.

    v2:
    1. Put all the generated unit test cases to folder utest/generated.
    2. Delete all generated folder while involve make clean.
    3. At the top of the generated test cases, add some comments
    4. Instead of defined FLT_ULP(0.000001) as the ulp unit, caculate the float ulp before using it.
    5. Add several math functions' test case.

    v3:
    1. Refine the calculation for float, and calculate each float got from cpu function.

    v4:
    Refine the calculation for float.

    Following fucntions test cases fail with input 0, 1 or 3.14:
builtin_atan2_float
builtin_atanh_float
builtin_rootn_float
    builtin_cos_float
    builtin_cospi_float
    builtin_erf_float
    builtin_erfc_float
    builtin_mad_float
    builtin_nextafter_float
    builtin_pown_float
    builtin_powr_float
    builtin_rint_float
    builtin_sinpi_float
    builtin_tan_float
    builtin_tanpi_float

    v5:
    remove case builtin_mad_float

todo:
atan2pi
fmax
fmin
sincos

Signed-off-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Yangwei Shui <yangweix.shui@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: adjust instruction order for load/function call for vector.
Zhigang Gong [Wed, 18 Dec 2013 07:19:05 +0000 (07:19 +0000)]
GBE: adjust instruction order for load/function call for vector.

The previous implementation generates code as below:

  %33 = extractelement <4 x i8> %32, i32 0
  %34 = extractelement <4 x i8> %32, i32 1
  %35 = extractelement <4 x i8> %32, i32 2
  %36 = extractelement <4 x i8> %32, i32 3
  %32 = load <4 x i8> addrspace(1)* %31, align 4, !tbaa !3

It may bring some potential problems in the consequent optimization pass.
Now fix adjust the extractelement instruction after the load instruction.
  %32 = load <4 x i8> addrspace(1)* %31, align 4, !tbaa !3
  %33 = extractelement <4 x i8> %32, i32 0
  %34 = extractelement <4 x i8> %32, i32 1
  %35 = extractelement <4 x i8> %32, i32 2
  %36 = extractelement <4 x i8> %32, i32 3

This patch also move the dead code elimination pass after the scalarize pass.
As after scalarize pass, there may be some opportunity to remove more dead
instructions.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoDefer the scalarize to the last pass before the Gen pass.
Zhigang Gong [Mon, 16 Dec 2013 02:56:10 +0000 (10:56 +0800)]
Defer the scalarize to the last pass before the Gen pass.

I found that the previous pass, gvn pass,  may generate new vector instruction.
We just defer the scalarize pass to make sure the gen pass will not encounter
unsupported non scalar instructions.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of remainder
Lv Meng [Fri, 20 Dec 2013 07:54:05 +0000 (15:54 +0800)]
GBE: improve precision of remainder

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of cosh
Lv Meng [Fri, 20 Dec 2013 03:52:25 +0000 (11:52 +0800)]
GBE: improve precision of cosh

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of tanh
Lv Meng [Fri, 20 Dec 2013 02:36:46 +0000 (10:36 +0800)]
GBE: improve precision of tanh

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of sinh
Lv Meng [Fri, 20 Dec 2013 01:22:33 +0000 (09:22 +0800)]
GBE: improve precision of sinh

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of asinh
Lv Meng [Fri, 20 Dec 2013 00:14:04 +0000 (08:14 +0800)]
GBE: improve precision of asinh

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of acosh
Lv Meng [Wed, 18 Dec 2013 08:13:45 +0000 (16:13 +0800)]
GBE: improve precision of acosh

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of expm1
Lv Meng [Wed, 18 Dec 2013 07:24:14 +0000 (15:24 +0800)]
GBE: improve precision of expm1

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of fmod
Lv Meng [Wed, 18 Dec 2013 07:17:22 +0000 (15:17 +0800)]
GBE: improve precision of fmod

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve precision of exp
Lv Meng [Wed, 18 Dec 2013 06:29:02 +0000 (14:29 +0800)]
GBE: improve precision of exp

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: we should allocate register for ExtractElement insn.
Zhigang Gong [Fri, 20 Dec 2013 01:31:04 +0000 (09:31 +0800)]
GBE: we should allocate register for ExtractElement insn.

We should allocate register when we firstly visit ExtractElement
instruction, as we may refer the value before we visit that instruction
at the emit instruction pass.

The case which trigger this corner case is as below:
Clang/llvm may generate some code similar to the following IRs:

... (there is no definition of %7)
  br label 2

label1:
  %10 = add  i32 %7, %6
  ...
  ret

label2:
  %8 = load <4 x i8> addrspace(1)* %3, align 4, !tbaa !1
  %7 = extractelement <4 x i8> %8, i32 0
  ...
  br label1

The value %7 is assigned after label2 but is referred at label1.
From the control flow, the IRs is valid. As the reference will
be executed after the assignment. But the previous implementation
doesn't allocate proxyvalue for %7, that's the root cause why
it triggers an assert when visit the instruction %10 = add i32 %7, %6

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix a corner case when allocate registers for local buffer.
Zhigang Gong [Tue, 17 Dec 2013 05:07:19 +0000 (13:07 +0800)]
GBE: fix a corner case when allocate registers for local buffer.

We use a simple way to find a instruction which refer to the local
data. Then we can identify the parent function. We found there is
a corner case that the instruction may be modified at the optimization
pass, for example the GVN pass. When all of the instruction's operand
are modified to constant, then the whole instruction seens to be a
constant either.

If that is the case, we fail to get a valid instruction and may trigger
an assert. This patch change to check another use of the local data to
avoid this assert.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Fix logb implementation.
Ruiling Song [Wed, 11 Dec 2013 06:37:46 +0000 (14:37 +0800)]
GBE: Fix logb implementation.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix clang's "incorrect" optimization for barrier call.
Zhigang Gong [Fri, 13 Dec 2013 06:37:58 +0000 (14:37 +0800)]
GBE: fix clang's "incorrect" optimization for barrier call.

Clang may duplicate one barrier call to multiple branches which
breaks opencl's spec and may cause gpu hang. To fix this issue,
we have to implement the barrier in a llvm module file and specify
the function attribute to noduplicate, and we have to link this
pre-compiled module before we compile the user kernel, so we set
it the pcm lib file to the LinkBitCodeFile field of the clang
instance.

v2: fix one typo.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAccelerate utest.
Zhigang Gong [Wed, 11 Dec 2013 05:40:51 +0000 (13:40 +0800)]
Accelerate utest.

For some test cases which include more than one kernel, the current
implementation always build the program for a new sub test case.

That wastes a lot of time. This patch introduce a new macro
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM which has an extra parameter
to specify whether to keep the previous program and avoid the extra
build. The normal usage is:

MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn1, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn2, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn3, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn4, true);
MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(fn5, false);

The scenario is that the above fn1-5 are included in the same kernel
file and we define the sub cases in the same cpp file. We already
have some examples of this usage in the compiler_abs.cpp, compiler_abs_diff.cpp
compiler_basic_arithmetic.cpp, compiler_vector_load_store.cpp, etc.

This patch reduces 2/3 of the utests execution time.

v2: should always destroy the program when run one specific test case.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoDisable the PCH valid check to save a lot of compiling time.
Junyan He [Wed, 11 Dec 2013 03:09:27 +0000 (11:09 +0800)]
Disable the PCH valid check to save a lot of compiling time.

In clang, The PCH file will be used as an AST source, so
the check is strict. The macro define is also checked,
and if anything is different, the PCH is invalid and
the build processing will start from scratch.
Disable Clang's PCH valid check and do the compatible
check by ourself.

This patch do not solve the clang version problems.
Because the AST represent is an internal Clang's
data struct and may change between two clang versions.
So we will modify this issue later.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Improve precision of log2
Ruiling Song [Tue, 10 Dec 2013 08:33:16 +0000 (16:33 +0800)]
GBE: Improve precision of log2

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Improve precision of log10
Ruiling Song [Tue, 10 Dec 2013 08:23:02 +0000 (16:23 +0800)]
GBE: Improve precision of log10

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of log/log1p
Ruiling Song [Tue, 10 Dec 2013 08:23:01 +0000 (16:23 +0800)]
GBE: improve precision of log/log1p

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: fixed the region check for three rect region related APIs.
Zhigang Gong [Tue, 3 Dec 2013 07:26:46 +0000 (15:26 +0800)]
Runtime: fixed the region check for three rect region related APIs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: improve asin/acos precision
Ruiling Song [Fri, 29 Nov 2013 08:03:42 +0000 (16:03 +0800)]
GBE: improve asin/acos precision

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: register width should not exceed execution width
Ruiling Song [Wed, 20 Nov 2013 05:51:32 +0000 (13:51 +0800)]
GBE: register width should not exceed execution width

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoGBE: Do not change vertical stride when it is 0
Ruiling Song [Wed, 20 Nov 2013 05:51:31 +0000 (13:51 +0800)]
GBE: Do not change vertical stride when it is 0

It will change scalar register g3<0,1,0> into g3<16,1,0> which illegally
crosses more than 2 adjacent rows.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoGBE: Fix null register to integer type
Ruiling Song [Wed, 20 Nov 2013 05:51:30 +0000 (13:51 +0800)]
GBE: Fix null register to integer type

GEN 'mach' instruction only support integer type register.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoFix float to ulong/long fail.
Yang Rong [Mon, 2 Dec 2013 09:10:26 +0000 (17:10 +0800)]
Fix float to ulong/long fail.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix signed to unsinged type sat convert.
Yang Rong [Mon, 2 Dec 2013 02:47:43 +0000 (10:47 +0800)]
Fix signed to unsinged type sat convert.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRefine isnan builtin.
Yang Rong [Mon, 25 Nov 2013 07:08:09 +0000 (15:08 +0800)]
Refine isnan builtin.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd FCMP UNO support.
Yang Rong [Mon, 2 Dec 2013 04:50:13 +0000 (12:50 +0800)]
Add FCMP UNO support.

And also correct some UXX compares.
V2: Not use OCL_OPTIMIZE_IMMEDIATE for XOR and ORD compare.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: filter the unsupported cl compile arguments out.
Zhigang Gong [Thu, 28 Nov 2013 02:54:47 +0000 (10:54 +0800)]
GBE: filter the unsupported cl compile arguments out.

As the unsupported argument may trigger unexpected compilation
error, we just remove them from the arglist.

If latter clang's cl frontend supports these arguments, we need
to revisit here.

This patch also add a new environment variable
OCL_OUTPUT_BUILD_LOG.
If this variable is set to 1, GBE will print the compile log to
the standard error channel (llvm::errs()). By default, it is false
and GBE will not print any build log.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoWhen local_work_size is null, try to choose a local_work_size.
Yang Rong [Fri, 29 Nov 2013 02:59:59 +0000 (10:59 +0800)]
When local_work_size is null, try to choose a local_work_size.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoComplete the feature of clGetEventProfilingInfo API
Junyan He [Fri, 29 Nov 2013 02:55:54 +0000 (10:55 +0800)]
Complete the feature of clGetEventProfilingInfo API

The profiling feature is now all supported. We use
drm_intel_reg_read to get the current time of GPU
when the event is queued and submitted, and use
PIPI_CONTROL cmd to get the executing time of the
GPU for kernel start and end.
One trivial problem is that:
The GPU timer counter is 36 bits with resolution of
80ns, so 2^36*80 = 5500s, about half an hour.
Some test may last about 2~5 min and if it starts at
about half an hour, this may cause a wrap back problem
and cause the case fail.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix B/UB compare fail.
Yang Rong [Thu, 28 Nov 2013 08:37:22 +0000 (16:37 +0800)]
Fix B/UB compare fail.

Because B/UB is treated as W/UW, so can't set src1's type when dismatch.
Set the correct type before getRegisterFromImmediate.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse -O1 when -cl-opt-disable, for inline function.
Yang Rong [Thu, 28 Nov 2013 03:00:43 +0000 (11:00 +0800)]
Use -O1 when -cl-opt-disable, for inline function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRemove test cl_create_kernel.
Yang Rong [Wed, 27 Nov 2013 08:40:30 +0000 (16:40 +0800)]
Remove test cl_create_kernel.

This test only try to allocate buffer with size large than CL_DEVICE_MAX_MEM_ALLOC_SIZE, and
assert if return status if not CL_INVALID_BUFFER_SIZE. But in openCL spec:
Implementations may return CL_INVALID_BUFFER_SIZE if size is greater than
CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in table 4.3 for all devices in context.

It don't must return CL_INVALID_BUFFER_SIZE. So remove it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRuntime: implement the get build log function and fix one build error check issue.
Zhigang Gong [Tue, 26 Nov 2013 10:39:59 +0000 (18:39 +0800)]
Runtime: implement the get build log function and fix one build error check issue.

According to spec, we need to support CL_PROGRAM_BUILD_LOG which is
used to get the build log of a cl kernel. And we also need to check
whether a build failure is a generic build fail or a build option
error. This commit also fix the piglit case:
API/clBuildProgram.

Another change in this commit is that it reroute all the output
of the clang excution to internal buffer and don't print to the
console directly. If the user want to get the detail build log,
the CL_PROGRAM_BUILD_LOG could be used.

v2: include both clang error messages and the llvm-to-gen error
messages. Also refine the checking for the error buffer parameter.
If there is no error buffer specified, always flush the build log
to llvm::errs().

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoCL/Runtime: workaround the unused sampler_t kernel argument.
Zhigang Gong [Fri, 22 Nov 2013 06:09:28 +0000 (14:09 +0800)]
CL/Runtime: workaround the unused sampler_t kernel argument.

Current implementation is to use a normal integer to represent
a sampler_t, then later when the sampler is used in read_image
or get_sampler_info, the backend will fixup its type to SAMPLER.

But some test case in piglit will define a sampler_t kernel argument
with an empty kernel budy. Then we will not have a chance to fixup
the kernel argument type to sampler, then we will fail at runtime side.

To workaround this issue, we change the sampler_t to short type.
Then when the user call clSetKernelArg to set a sampler, it will pass
in a pointer size with a short value argument type. It will fail
the size checking logic, then we fixup its type to sampler there.

As this workaround will only take effect when error occur, it will
not bring too much side effect to the normal cases. And it can
pass the existing test cases.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRuntime: fix some piglit failures.
Zhigang Gong [Thu, 21 Nov 2013 09:04:54 +0000 (17:04 +0800)]
Runtime: fix some piglit failures.

compiler_available should be true. And when a program is retained, we should
not call build on it again.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRuntime: fixed one missing case for clGetKernelWorkGroupInfo.
Zhigang Gong [Wed, 20 Nov 2013 09:53:34 +0000 (17:53 +0800)]
Runtime: fixed one missing case for clGetKernelWorkGroupInfo.

CL_KERNEL_PRIVATE_MEM_SIZE is not implemented, this patch fix
this issue and can pass the piglit test case.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRuntime: fixed parameter error checking in cl create buffer.
Zhigang Gong [Wed, 20 Nov 2013 07:51:41 +0000 (15:51 +0800)]
Runtime: fixed parameter error checking in cl create buffer.

This patch can pass piglit test case cl-api-create-buffer.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoAdd the drm include and lib path for find when drm is not the system one.
Junyan He [Tue, 26 Nov 2013 09:59:54 +0000 (17:59 +0800)]
Add the drm include and lib path for find when drm is not the system one.

Add the support when the DRM lib is not in the system standard location.
In some cases, we want to debug the libdrm but not want to influence the
whole system.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoEnlarge the global mem size.
Yang Rong [Wed, 27 Nov 2013 06:06:50 +0000 (14:06 +0800)]
Enlarge the global mem size.

When create image, due to alignment, will casue size large than max alloc size.
Enlarge the global memory size and using it to check size when alloc.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix some get image info errors.
Yang Rong [Wed, 27 Nov 2013 06:06:51 +0000 (14:06 +0800)]
Fix some get image info errors.

Get correct grf offset and need clear image set offsets.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a build problem when the llvm version has the fix version digit.
Zhigang Gong [Wed, 27 Nov 2013 02:01:56 +0000 (10:01 +0800)]
Fix a build problem when the llvm version has the fix version digit.

If the llvm version is something like 3.3.1, the previous cmake script
will generate an incorrect cflags as: -DLLVM_33 1 which breaks the build.

This commit also update the stable llvm version from 3.1 to 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoAdd vload_half and vstore_half build in.
Yang Rong [Fri, 22 Nov 2013 11:51:57 +0000 (19:51 +0800)]
Add vload_half and vstore_half build in.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd convert between fp16 and fp32.
Yang Rong [Fri, 22 Nov 2013 11:51:56 +0000 (19:51 +0800)]
Add convert between fp16 and fp32.

Use convert instruction in ir, and ALU1 in gen selection.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a compare immediate optimize error.
Yang Rong [Fri, 15 Nov 2013 03:40:30 +0000 (11:40 +0800)]
Fix a compare immediate optimize error.

When do LOADI/compare -> compare optimize, IMM src1 will using LOADI type,
but LOADI doesn't  care unsigned or signed. Should use the compare type.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agoAdd FCmpInst ord support.
Yang Rong [Fri, 15 Nov 2013 05:45:53 +0000 (13:45 +0800)]
Add FCmpInst ord support.

Because gen do not support isorder direct, use (src0 == src0) && (src1 == src1).
BTW: can't use !unordered.

v2: Refine, don't need AND.
v3: Do not change getGenCompare function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix builtin function "round"
Homer Hsing [Wed, 13 Nov 2013 08:49:16 +0000 (16:49 +0800)]
fix builtin function "round"

previously using round to even, the result was wrong.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoimprove builtin function "rint"
Homer Hsing [Wed, 13 Nov 2013 08:49:17 +0000 (16:49 +0800)]
improve builtin function "rint"

directly use __gen_ocl_rnde

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix builtin function "isnormal"
Homer Hsing [Fri, 8 Nov 2013 05:27:30 +0000 (13:27 +0800)]
fix builtin function "isnormal"

fix a corner case of very small input

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoput a mutex around gbe_program_new_from_llvm
Homer Hsing [Tue, 5 Nov 2013 05:28:13 +0000 (13:28 +0800)]
put a mutex around gbe_program_new_from_llvm

because random crash happens if without the mutex

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix ASR operator for 64bit integer
Homer Hsing [Mon, 4 Nov 2013 02:13:30 +0000 (10:13 +0800)]
fix ASR operator for 64bit integer

if operand is positive, then pad zero at high 32 bit.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoRemove boolean values cannot cross their definition basic block restrict.
Yang Rong [Thu, 14 Nov 2013 03:14:33 +0000 (11:14 +0800)]
Remove boolean values cannot cross their definition basic block restrict.

Add mov bool support.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix builtin function "ilogb"
Homer Hsing [Tue, 5 Nov 2013 07:48:20 +0000 (15:48 +0800)]
fix builtin function "ilogb"

add FP_ILOGB0, FP_ILOGBNAN
return FP_ILOGB0 for zero.
return FP_ILOGBNAN for nan.
return INT_MAX for inf.
also improve function code for other cases.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agofix builtin function "ldexp"
Homer Hsing [Tue, 12 Nov 2013 08:38:26 +0000 (16:38 +0800)]
fix builtin function "ldexp"

fixed corner cases when input parameter has special value

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix builtin function "nextafter"
Homer Hsing [Tue, 12 Nov 2013 06:36:52 +0000 (14:36 +0800)]
fix builtin function "nextafter"

fix for some corner cases

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agofix builtin function "fdim"
Homer Hsing [Tue, 12 Nov 2013 05:12:35 +0000 (13:12 +0800)]
fix builtin function "fdim"

check whether input is NaN. fix the code if input is inf

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>