contrib/beignet.git
10 years agoGBE: Handle bti allocation for internal buffer used by printf.
Ruiling Song [Wed, 30 Jul 2014 05:59:30 +0000 (13:59 +0800)]
GBE: Handle bti allocation for internal buffer used by printf.

1. Move the bti/Register map from gbe::Context to ir::Function.
2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Refine bti usage in backend & runtime.
Ruiling Song [Wed, 30 Jul 2014 05:59:29 +0000 (13:59 +0800)]
GBE: Refine bti usage in backend & runtime.

Previously, we simply map 2G surface for memory access,
which has obvious security issue, user can easily read/write graphics
memory that does not belong to him. To prevent such kind of behaviour,
We bind each surface to a dedicated bti. HW provides automatic
bounds check. For out-of-bound write, it will be ignored. And for read
out-of-bound, hardware will simply return zero value.

The idea behind the patch is for a load/store instruction, it will search
through the LLVM use-def chain until finding out where the address
comes from. Then the bti is saved in ir::Instruction and used for
the later code generation. And for mixed pointer case, a load/store
will access more than one bti.

To simplify some code, '0' is reserved for constant address space,
'1' is reserved for private address space. Other btis are assigned
automatically by backend.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: set correct state for constant buffer on hsw.
Ruiling Song [Tue, 29 Jul 2014 07:41:38 +0000 (15:41 +0800)]
runtime: set correct state for constant buffer on hsw.

According to spec, should set I965_SURCHAN_SELECT_XXX on hsw.
Then we can use sampler message to read constant surface.

This fix the regression in unit test brought by:
'GBE: Optimize constant load with sampler.'

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests: Fix a bug in image_1D_buffer.
Ruiling Song [Mon, 28 Jul 2014 01:19:30 +0000 (09:19 +0800)]
utests: Fix a bug in image_1D_buffer.

Should use buffer_sz to clCreateBuffer().

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: align the fields in union ImageInfoKey.
Ruiling Song [Mon, 28 Jul 2014 01:19:29 +0000 (09:19 +0800)]
GBE: align the fields in union ImageInfoKey.

To avoid possible garbage data.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agodelete GEPInst when it is no longer used
Guo Yejun [Thu, 24 Jul 2014 22:00:27 +0000 (06:00 +0800)]
delete GEPInst when it is no longer used

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoclean llvm resource in compiler (libgbe.so)
Guo Yejun [Thu, 17 Jul 2014 23:16:34 +0000 (07:16 +0800)]
clean llvm resource in compiler (libgbe.so)

since we have separated the compiler (libgbe.so) and the interpreter
(libgbeinterp.so), the LLVM resource cleanup task should be done in
the compiler instead of the GenProgram::~GenProgram which has no way
to clean llvm resources in libgbeinterp.so

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
10 years agofix three memory leaks
Guo Yejun [Wed, 23 Jul 2014 23:51:18 +0000 (07:51 +0800)]
fix three memory leaks

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agofree build_log when the cl program is released
Guo Yejun [Thu, 17 Jul 2014 18:38:33 +0000 (02:38 +0800)]
free build_log when the cl program is released

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoNEWS: update for 0.9.2.
Zhigang Gong [Thu, 17 Jul 2014 02:37:09 +0000 (10:37 +0800)]
NEWS: update for 0.9.2.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agodocs: add a NEWS document to point to the release notes pages.
Zhigang Gong [Thu, 17 Jul 2014 02:14:12 +0000 (10:14 +0800)]
docs: add a NEWS document to point to the release notes pages.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoremove requirment as drm master in non-x environment
Guo Yejun [Wed, 16 Jul 2014 17:26:23 +0000 (01:26 +0800)]
remove requirment as drm master in non-x environment

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoimprove the clEnqueueCopyBufferRect performance in some cases
Lv Meng [Wed, 16 Jul 2014 07:38:48 +0000 (15:38 +0800)]
improve the clEnqueueCopyBufferRect performance in some cases

Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd utest load_program_from_gen_bin.
LuoXionghu [Wed, 16 Jul 2014 01:31:06 +0000 (09:31 +0800)]
add utest load_program_from_gen_bin.

this test case would check whether genProgramSerializeToBinary in
backend can generator gen binary correctly.

rename load_program_from_bin to load_program_from_bin_file.
the difference is load_program_from_bin_file could either load program
from llvm binary or gen binary file generated by gbe_bin_generator.

Signed-off-by: LuoXionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd platform info in the gen binary code.
LuoXionghu [Wed, 16 Jul 2014 01:31:05 +0000 (09:31 +0800)]
add platform info in the gen binary code.

the size of the platform info is 3 bytes, right after the '/0GENC'.
check the header magic number and platform info before deserializeFromBin.

v2: supports IVB/BYT/HSW binary on its' platform, and BYT binary runs on IVB.
v3: fix 'BYT' overwritten by 'IVB';

Signed-off-by: LuoXionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Optimize constant load with sampler.
Ruiling Song [Mon, 14 Jul 2014 09:24:38 +0000 (17:24 +0800)]
GBE: Optimize constant load with sampler.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Use varying register to save one instruction
Ruiling Song [Mon, 14 Jul 2014 09:24:37 +0000 (17:24 +0800)]
GBE: Use varying register to save one instruction

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agogbe: add the new else instruction to the assert checking.
Zhigang Gong [Fri, 11 Jul 2014 15:36:39 +0000 (23:36 +0800)]
gbe: add the new else instruction to the assert checking.

Else is the new branch instruction. We already added it to the
gen_encoder's patchJMPI but forget it for gen75_encoder. Now
fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd some hsw missed pci ids (reserved PCI IDs).
Yang Rong [Thu, 10 Jul 2014 03:38:41 +0000 (11:38 +0800)]
Add some hsw missed pci ids (reserved PCI IDs).

v2:
modified according to Matthias's suggestion.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Matthias Sattler <Matthias.Sattler@t-online.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: fix some subtle event bugs.
Zhigang Gong [Thu, 10 Jul 2014 10:57:53 +0000 (18:57 +0800)]
runtime: fix some subtle event bugs.

This patch fix the following two bugs in event handling.
1. When it's time to call a event's user call back function, we need to
   set the executed to true before the call. As that call back function
   may call into clReleaseEvent(), and if we don't set the executed status
   to true, it will enter infinite recursive loop.

2. After the user call clEnqueueNDRangeKernel to get a valid event, the
   user set a call back function to that event, and in that call back
   function, it will release that event. This scenario is totally correct.
   But our current event handling doesn't have a deadicated timer thread to
   update those on-the-fly events' status. Thus those events will not have
   a chance to get updated, and those call back function will not executed
   forever. To introduce a complete timer style thread to maintain this type
   of events is too heavy for this fix release. This patch choose an easy
   way to work around it. It will make sure the last gpgpu event to be finished
   before current task to be enqueued.

After this patch, most of the OpenCV 3.0 cases could run smoothly without
any serious issue.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoruntime/driver: refine error handlings.
Zhigang Gong [Thu, 10 Jul 2014 08:52:48 +0000 (16:52 +0800)]
runtime/driver: refine error handlings.

We should always check whether a dri_bo_map success or fail.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: fix empty block disassemble bug.
Yongjia Zhang [Thu, 17 Jul 2014 18:16:52 +0000 (02:16 +0800)]
GBE: fix empty block disassemble bug.

If a block is empty, there will be more than one label pointing the
same instruction we need to step over all those labels.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yongjia Zhang <zhang_yong_jia@126.com>
10 years agoEnable structural analysis
Yongjia Zhang [Thu, 17 Jul 2014 18:14:41 +0000 (02:14 +0800)]
Enable structural analysis

enable structural analysis and use if, else and endif in the
generated asm.

Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse instruction if else and endif manipulate structures
Yongjia Zhang [Thu, 17 Jul 2014 18:14:40 +0000 (02:14 +0800)]
Use instruction if else and endif manipulate structures

Use instruction if, else and endif manipulate the control flow of
identified if-then and if-else structures at backend. but this
is not enabled, just add the necessary code to backend.

Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd structure identification on ir level
Yongjia Zhang [Thu, 17 Jul 2014 18:14:39 +0000 (02:14 +0800)]
Add structure identification on ir level

Add tool structures and functions for identifying if-then and
if-else structures on Gen IR level.

Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd Gen instruction 'else'
Yongjia Zhang [Thu, 17 Jul 2014 18:14:38 +0000 (02:14 +0800)]
Add Gen instruction 'else'

Add Gen instruction 'else' for future use.

Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd Gen IR IF, ELSE and ENDIF
Yongjia Zhang [Thu, 17 Jul 2014 18:14:37 +0000 (02:14 +0800)]
Add Gen IR IF, ELSE and ENDIF

Add Gen IR IF, ELSE and ENDIF to mark the strucutred region.

Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoremove lspci, gbe_bin_genenrater would generator llvm binary by default.
Luo [Tue, 8 Jul 2014 01:07:54 +0000 (09:07 +0800)]
remove lspci, gbe_bin_genenrater would generator llvm binary by default.

driver can get chipset id by ioctl instead of calling lspci in cmake;
user could generator gen binary by configuring cmake option
-DGEN_PCI_ID=xxxx or calling the gbe_bin_generater with option -t
GEN_PCI_ID.

v2: add "\0GENC" magic code for gen binary, fix typo.

Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Fix builtin tanpi.
Ruiling Song [Mon, 7 Jul 2014 05:59:24 +0000 (13:59 +0800)]
GBE: Fix builtin tanpi.

To meet precision requirement of OCL Spec .

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoBuild: Change versioning policy.
Zhigang Gong [Tue, 8 Jul 2014 06:34:57 +0000 (14:34 +0800)]
Build: Change versioning policy.

Don't set patch version number on master branch.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRemove the generated test cases list.
Yi Sun [Fri, 4 Jul 2014 03:35:57 +0000 (11:35 +0800)]
Remove the generated test cases list.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoBuild: check whether lspci exists.
Zhigang Gong [Fri, 4 Jul 2014 02:04:33 +0000 (10:04 +0800)]
Build: check whether lspci exists.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoruntime: fix a gpgpu event and thread local gpgpu handling bug.
Zhigang Gong [Thu, 3 Jul 2014 03:33:10 +0000 (11:33 +0800)]
runtime: fix a gpgpu event and thread local gpgpu handling bug.

When pending a command queue, we need to record the whole gpgpu
structure not just the batch buffer. For the following reason:

1. We need to keep those private buffer, for example those printf buffers.
2. We need to make sure this gpgpu will not be reused by other enqueuement.

v2:
Don't try to flush all user event attached to the queue.
Just need to flush the current event when doing command queue flush.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoruntime: recover the maximum read image args to 128.
Zhigang Gong [Thu, 3 Jul 2014 04:52:17 +0000 (12:52 +0800)]
runtime: recover the maximum read image args  to 128.

To comply with the full profile.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
10 years agoRefine some event code.
Yang Rong [Thu, 3 Jul 2014 04:16:55 +0000 (12:16 +0800)]
Refine some event code.

1. Do not add user event to cb->wait_list to avoid ref this user event twice.
2. Add assert when update status.
3. Set the queue's last wait event and barrier event to NULL when remove last event.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Check family of spilled register correctly.
Ruiling Song [Wed, 2 Jul 2014 04:39:05 +0000 (12:39 +0800)]
GBE: Check family of spilled register correctly.

We only support DWORD QWORD register-spill currently.
So if we cannot spill a register, simply return false
instead of give an assert.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRefine the logic when suspend a batch buffer.
Yang Rong [Wed, 2 Jul 2014 06:29:24 +0000 (14:29 +0800)]
Refine the logic when suspend a batch buffer.

Clear the gpgpu's batch buffer when suspend to avoid potential issue.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix some event ref count error.
Yang Rong [Wed, 2 Jul 2014 05:42:52 +0000 (13:42 +0800)]
Fix some event ref count error.

Move the event add ref to function cl_event_new_enqueue_callback for clear.
Also need add the wait user events' ref count.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
10 years agoruntime: fix potential curbe allocation issue.
Zhigang Gong [Tue, 1 Jul 2014 04:50:36 +0000 (12:50 +0800)]
runtime: fix potential curbe allocation issue.

According to spec, different platforms have different curbe
allocation restrication. The previous code set the curbe
allocated size to 480 statically which is not correct.

This patch change to always set the curbe entry num to 64
which is the maximum work group size. And set proper curbe
allocation size according to the platform's hard limitation
and a relatively reasonable kernel argument usage limitation.

v3:
when we call load_vte_state, we already know the eaxctly constant urb
size used in the current kernel. We could choose a smallest valid curbe
size for this kernel. And if the size exceed the hardware limitation,
we report it as a warning here.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoruntime: fix max group size calculation issue.
Zhigang Gong [Sat, 28 Jun 2014 15:25:53 +0000 (23:25 +0800)]
runtime: fix max group size calculation issue.

If the kernel doesn't use slm/barrier, there is no hard limitation
for the max group size. And if the max work group size is more than
1024, the original 64 urb entry count will not be sufficient to hold
all the curbe payload. Change the entry count to max thread count to
fix this potential issue.

I found this bug when I tried to run phoronix test suite's juliagpu
test case on my MBA.

v2:
refine the max kernel work group size calculation mechanism.
the wg_sz should not be a device's member variable, it should be
a variable derived from kernel and device's attriute at runtime.
also fix wrong configuration for IVB GT1.

v3:
Add an important max thread limitation in the GPGPU_WALKER command.
For non-Baytrail, the max thread depth * max thread height * max thread width
should less than 64 (under either simd16 or simd8), no matter whether
SLM/barrier is used. We oversighted that limitation before, thus for
a simd8 kernel which use work group size 1024 will exceed this limitation
and half of the thread will not be executed at all.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoadd the usage of link program from llvm binary.
Luo [Fri, 27 Jun 2014 00:27:03 +0000 (08:27 +0800)]
add the usage of link program from llvm binary.

user A could compile and link kernel source to llvm binary first, then
query the binary to save to file; With the binary, user B can call
clCreateProgramWithBinary without compile the source again.
this usage could protect those who need to protect the kernel source.

Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: disable GVN pass when optLevel is zero.
Ruiling Song [Fri, 27 Jun 2014 06:17:57 +0000 (14:17 +0800)]
GBE: disable GVN pass when optLevel is zero.

GVN pass may generate some i256 data type, which our backend could not handle.
So, only enable it when optLevel > 0.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoBump to 0.9.1 (development version).
Zhigang Gong [Mon, 30 Jun 2014 03:48:36 +0000 (11:48 +0800)]
Bump to 0.9.1 (development version).

Bump to development version after 0.9.0.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoBump to version 0.9.
Zhigang Gong [Thu, 26 Jun 2014 05:23:23 +0000 (13:23 +0800)]
Bump to version 0.9.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoFix call cl_mem_copy_image_region bug.
Yang Rong [Thu, 26 Jun 2014 13:31:23 +0000 (21:31 +0800)]
Fix call cl_mem_copy_image_region bug.

When call cl_mem_copy_image_region, sometimes need add offset to src or dst address,
sometimes need not add. Add two parameter to indicate it.
Also fix the wrong offset when clEnqueueMapImage of CL_MEM_USE_HOST_PTR.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agodocs: fixup markup format.
Zhigang Gong [Thu, 26 Jun 2014 05:34:43 +0000 (13:34 +0800)]
docs: fixup markup format.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agodocs: fix some markdown links and correct some information.
Zhigang Gong [Thu, 26 Jun 2014 04:38:09 +0000 (12:38 +0800)]
docs: fix some markdown links and correct some information.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agodocs: update some documents.
Zhigang Gong [Thu, 26 Jun 2014 04:15:56 +0000 (12:15 +0800)]
docs: update some documents.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: fix some bugs in ocl stdlib header files.
Zhigang Gong [Thu, 26 Jun 2014 02:36:39 +0000 (10:36 +0800)]
GBE: fix some bugs in ocl stdlib header files.

The printf's prototype was added twice incorrectly.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agogbe_bin_generator: fix the incorrect type of cl_internal_built_in_kernel_str_size.
Zhigang Gong [Wed, 25 Jun 2014 15:21:54 +0000 (23:21 +0800)]
gbe_bin_generator: fix the incorrect type of cl_internal_built_in_kernel_str_size.

We should define it as size_t.

v2:
correct some extern definitions in cl_mem.c.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
10 years agoAdd optimization guide.
Yang Rong [Mon, 23 Jun 2014 16:28:51 +0000 (00:28 +0800)]
Add optimization guide.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: Remove 'Experiment' from the platform name.
Zhigang Gong [Wed, 25 Jun 2014 08:13:32 +0000 (16:13 +0800)]
runtime: Remove 'Experiment' from the platform name.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoadd how to for cross compiler
Guo Yejun [Wed, 18 Jun 2014 00:43:43 +0000 (08:43 +0800)]
add how to for cross compiler

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix clEnqueueMapImage with CL_MEM_USE_HOST_PTR bug.
Yang Rong [Wed, 25 Jun 2014 15:23:24 +0000 (23:23 +0800)]
Fix clEnqueueMapImage with CL_MEM_USE_HOST_PTR bug.

Should return host row pitch and host slice pitch.
Also should copy back to image when unmap.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd image_1d_to_1d builtin kernel name.
Luo [Wed, 25 Jun 2014 01:56:44 +0000 (09:56 +0800)]
add image_1d_to_1d builtin kernel name.

Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoutests: fix one bug when create image at one test case.
Zhigang Gong [Wed, 25 Jun 2014 05:53:14 +0000 (13:53 +0800)]
utests: fix one bug when create image at one test case.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoImplement the %p in the printf
Junyan He [Tue, 24 Jun 2014 15:33:20 +0000 (23:33 +0800)]
Implement the %p in the printf

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the support for vector type in printf.
Junyan He [Tue, 24 Jun 2014 08:35:58 +0000 (16:35 +0800)]
Add the support for vector type in printf.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Further optimize exp().
Ruiling Song [Tue, 24 Jun 2014 06:23:31 +0000 (14:23 +0800)]
GBE: Further optimize exp().

Use native_exp() as much as possible.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd cpu copy for 1Darray and 2darray related copy APIs.
Luo [Tue, 24 Jun 2014 02:09:12 +0000 (10:09 +0800)]
add cpu copy for 1Darray and 2darray related copy APIs.

detail cases: 1Darray, 2Darray, 2Darrayto2D, 2Darrayto3D, 2Dto2Darray, 3Dto2Darray.

1d used gpu copy.

v2:
fixed 1d array to 1d array copy, don't need to switch depth and height.

Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoadd BEIGNET_INSTALL_DIR to clean code
Guo Yejun [Mon, 23 Jun 2014 22:22:07 +0000 (06:22 +0800)]
add BEIGNET_INSTALL_DIR to clean code

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoset LD_LIBRARY_PATH of libgbe.so for gbe_bin_generater
Guo Yejun [Mon, 23 Jun 2014 21:36:50 +0000 (05:36 +0800)]
set LD_LIBRARY_PATH of libgbe.so for gbe_bin_generater

it is needed for cross compiler

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoimplement API clEnqueueFillImage.
Luo [Sun, 22 Jun 2014 22:03:30 +0000 (06:03 +0800)]
implement API clEnqueueFillImage.

enqueues a command to fill an image object with a specified color.

fix typo cl_context_get_static_kernel_from_bin.

v2:
fix image 1d array bug.

Signed-off-by: Luo <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix crash when OCL_STRICT_CONFORMANCE is unset
Guo Yejun [Mon, 23 Jun 2014 20:14:21 +0000 (04:14 +0800)]
fix crash when OCL_STRICT_CONFORMANCE is unset

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the format and flag support for printf.
Junyan He [Mon, 23 Jun 2014 08:38:56 +0000 (16:38 +0800)]
Add the format and flag support for printf.

The format and flag such as -+# and precision request has
been added into the output.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoupdate docs on environment variables.
Ruiling Song [Thu, 19 Jun 2014 07:20:54 +0000 (15:20 +0800)]
update docs on environment variables.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: switch to non strict conformance mode by default.
Zhigang Gong [Mon, 23 Jun 2014 08:59:56 +0000 (16:59 +0800)]
GBE: switch to non strict conformance mode by default.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoutest_generator.py: add OCL_STRICT_CONFORMANCE enviroment condition.
Yi Sun [Mon, 23 Jun 2014 00:56:33 +0000 (08:56 +0800)]
utest_generator.py: add OCL_STRICT_CONFORMANCE enviroment condition.

For auto-generated math cases, when OCL_STRICT_CONFORMANCE is not set,
the expected diff increases to 1000x.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: declare correct prototype for fastpath_rootn
Ruiling Song [Mon, 23 Jun 2014 08:34:55 +0000 (16:34 +0800)]
GBE: declare correct prototype for fastpath_rootn

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix some builtin math function
Ruiling Song [Mon, 23 Jun 2014 08:34:54 +0000 (16:34 +0800)]
GBE: fix some builtin math function

__gen_ocl_exp stands for 2^x. So, use __gen_ocl_pow to implement native_exp().
Fix atanh implementation.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd some OpenCL1.2 parameters of function clGetDeviceInfo.
Yang Rong [Mon, 23 Jun 2014 14:38:36 +0000 (22:38 +0800)]
Add some OpenCL1.2 parameters of function clGetDeviceInfo.

Include CL_DEVICE_LINKER_AVAILABLE, CL_DEVICE_PRINTF_BUFFER_SIZE, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a CL_MEM_HOST_PTR bug.
Yang Rong [Mon, 23 Jun 2014 14:38:35 +0000 (22:38 +0800)]
Fix a CL_MEM_HOST_PTR bug.

Can't add sub_offset if mem is image.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: replace OwningPtr with std::unique_ptr
Ruiling Song [Mon, 23 Jun 2014 06:39:26 +0000 (14:39 +0800)]
GBE: replace OwningPtr with std::unique_ptr

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve builtin exp.
Ruiling Song [Mon, 23 Jun 2014 02:33:17 +0000 (10:33 +0800)]
GBE: improve builtin exp.

Put some variables into register.
This could improve luxMark sala about 10% under strict conformance.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the test cases for 1D Image Array
Junyan He [Fri, 20 Jun 2014 10:07:40 +0000 (18:07 +0800)]
Add the test cases for 1D Image Array

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUpdate the printf test case.
Junyan He [Fri, 20 Jun 2014 09:41:31 +0000 (17:41 +0800)]
Update the printf test case.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoAdd the support for %s in printf
Junyan He [Fri, 20 Jun 2014 09:41:26 +0000 (17:41 +0800)]
Add the support for %s in printf

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoFix a crash bug when no %d appears in the printf fmt
Junyan He [Fri, 20 Jun 2014 09:41:19 +0000 (17:41 +0800)]
Fix a crash bug when no %d appears in the printf fmt

If there no %d for all the printf statement, the curbe
will ignore the content buffer ptr because no one use it.
So when bind the buffer ptr in the run time, crash happens.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoAdd %f and %c support for printf.
Junyan He [Fri, 20 Jun 2014 09:41:13 +0000 (17:41 +0800)]
Add %f and %c support for printf.

Add the %c and %f support for printf.
Also add the int to float and int to char conversion.
Some minor errors such as wrong index flags have been fixed.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: fix some get kernel arg info bugs.
Zhigang Gong [Fri, 20 Jun 2014 11:09:35 +0000 (19:09 +0800)]
GBE: fix some get kernel arg info bugs.

Still can't handle the sampler_t which is not used actually.
Access qualifier seems broken with llvm 3.3.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoruntime: choose the actual EU number as the max compute units.
Zhigang Gong [Fri, 20 Jun 2014 10:07:23 +0000 (18:07 +0800)]
runtime: choose the actual EU number as the max compute units.

Use the EU number as compute unit make more sense.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: Handle empty basicblock in Instruction selection
Ruiling Song [Fri, 20 Jun 2014 08:13:13 +0000 (16:13 +0800)]
GBE: Handle empty basicblock in Instruction selection

I meet a corner case which leads to empty bb.

Lable $12
add %3, %2, 1

and what's more %3 is not used anymore later, so we will not select
instruction for this line of code. Then only Label instruction left
in the bb, which leads to wrong endifLabel used. The fix simply
generate endif instruction if needed at first in matchBasicBlock().

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: tweak register expire frequency on simd16 mode.
Zhigang Gong [Tue, 17 Jun 2014 04:56:31 +0000 (12:56 +0800)]
GBE: tweak register expire frequency on simd16 mode.

According to Yongjia's test report, it's better to keep
the same freqency of expiration with both simd8 and simd16
mode.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: Yongjia Zhang <yongjia.zhang@intel.com>
10 years agoAdd some API's OpenCL 1.2 parameter support.
Yang Rong [Fri, 20 Jun 2014 16:15:44 +0000 (00:15 +0800)]
Add some API's OpenCL 1.2 parameter support.

Support CL_PROGRAM_KERNEL_NAMES and CL_PROGRAM_NUM_KERNELS in API clGetProgramInfo,
and CL_DOUBLE_FP_CONFIG in API clGetDeviceInfo.
Also fix a bug of CL_MEM_HOST_PTR in API clGetMemObjectInfo.

v2:
also fix the utest get_mem_info.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd some OpenCL1.2 new buffer flags handle.
Yang Rong [Fri, 20 Jun 2014 16:15:43 +0000 (00:15 +0800)]
Add some OpenCL1.2 new buffer flags handle.

And mem_base_addr_align' unit is bit, and origin's is byte, correct it when compare.

v2:
fix sub_buffer_check test case.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix sub buffer bug in clEnqueueReadBufferRect, clEnqueueWriteBufferRect, clEnqueueMap...
Yang Rong [Fri, 20 Jun 2014 16:15:42 +0000 (00:15 +0800)]
Fix sub buffer bug in clEnqueueReadBufferRect, clEnqueueWriteBufferRect, clEnqueueMapBuffer.

Should add sub_offset in these functions.

V2: clEnqueueMapBuffer's return ptr should not add sub offset. It will add sub offset in _cl_map_mem
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: fix image1d buffer allocation.
Zhigang Gong [Fri, 20 Jun 2014 07:45:34 +0000 (15:45 +0800)]
runtime: fix image1d buffer allocation.

Per bspec, a image should has a at least 2 line vertical alignment,
thus we can't simply attach a buffer to a 1d image surface which has the same size.
We have to create a new image, and copy the buffer data to this new image.
And replace all the buffer object's reference to this image.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoruntime: fix a slice pitch calculation bug.
Zhigang Gong [Fri, 20 Jun 2014 04:24:22 +0000 (12:24 +0800)]
runtime: fix a slice pitch calculation bug.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoutest: decrease the accuracy of tanpi.
Yi Sun [Fri, 20 Jun 2014 02:09:58 +0000 (10:09 +0800)]
utest: decrease the accuracy of tanpi.

Since some issue in tanpi, decrease the accuracy by 100 times.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoruntime: fix some get image info bugs.
Zhigang Gong [Thu, 19 Jun 2014 06:09:46 +0000 (14:09 +0800)]
runtime: fix some get image info bugs.

According to ocl spec:

Return height of the image in pixels. For a
1D image, 1D image buffer and 1D image
array object, height = 0.

Return depth of the image in pixels. For a
1D image, 1D image buffer, 2D image or
1D and 2D image array object, depth = 0.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoGBE/runtime: fixup broken 1d array image support.
Zhigang Gong [Wed, 18 Jun 2014 02:10:07 +0000 (10:10 +0800)]
GBE/runtime: fixup broken 1d array image support.

As sample LD message doesn't support array index, we have
to create a 2D array surface with the same buffer object.
Thus one 1D array image will have two surfaces binded to it
one is the index and the second is 128 + index.

And then at kernel side, we will access the corresponding
2D array surface when the LD message is required otherwise
will access the origin 1D array surface.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agocl/runtime: fixup 1D array image region and origins.
Zhigang Gong [Wed, 18 Jun 2014 06:53:06 +0000 (14:53 +0800)]
cl/runtime: fixup 1D array image region and origins.

As we treat 1D array image as a 2d array image with height 1
internally, we need to fixup region and origins passed in
from external APIs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agocl/driver: fix the incorrect handling of 1D array.
Zhigang Gong [Wed, 18 Jun 2014 02:01:15 +0000 (10:01 +0800)]
cl/driver: fix the incorrect handling of 1D array.

According to the bspec, the 1D array should be treated as a 3D like
surface which has height 1. So we need to make sure the depth is
the array_size. Thus the rt_view_extent's value should be always
the same as the depth.

According to the ocl spec, the 1D array firstly should be a 1D image rather
than a 2D image. Thus we should access different lines according to the
slice_pitch rather than the image_row_pitch.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
10 years agoEnable the 1D and 2D image support in run time.
Junyan He [Tue, 17 Jun 2014 04:06:54 +0000 (12:06 +0800)]
Enable the 1D and 2D image support in run time.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the image1d_array_t and image2d_array_t defines.
Junyan He [Tue, 17 Jun 2014 04:06:47 +0000 (12:06 +0800)]
Add the image1d_array_t and image2d_array_t defines.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd a lock in the place of printf output
Junyan He [Wed, 18 Jun 2014 06:42:15 +0000 (14:42 +0800)]
Add a lock in the place of printf output

If multi-thread run the kernel simultaneously, the output
may interlace with each other. Add a lock to avoid this.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRefine the code in llvm_printf_parser.cpp
Junyan He [Wed, 18 Jun 2014 06:42:07 +0000 (14:42 +0800)]
Refine the code in llvm_printf_parser.cpp

Fix some typo and use macro to simplify the code.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: pass compile against LLVM 3.5
Ruiling Song [Wed, 18 Jun 2014 07:09:44 +0000 (15:09 +0800)]
GBE: pass compile against LLVM 3.5

backward compatible with LLVM 3.3

merged a bug fix patch into this one.
  1. use_iterator point to 'Use' now instead of 'User'.
  2. all c-string are in constant address space now, which follows OCL Spec.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix an event status bug.
Yang Rong [Thu, 19 Jun 2014 14:37:42 +0000 (22:37 +0800)]
Fix an event status bug.

If event status is an Error code, the status of events wait on this event also should set to Error code.

V2: should not execute the enqueue command wait on the event whose status is error.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoTry to use drm render nodes.
Abrahm Scully [Thu, 19 Jun 2014 02:28:42 +0000 (22:28 -0400)]
Try to use drm render nodes.

Allows non-root user to run without X.
Works on Fedora 20 with render nodes enabled.

Signed-off-by: Abrahm Scully <abrahm.scully@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>