Zhigang Gong [Wed, 18 Jun 2014 02:10:07 +0000 (10:10 +0800)]
GBE/runtime: fixup broken 1d array image support.
As sample LD message doesn't support array index, we have
to create a 2D array surface with the same buffer object.
Thus one 1D array image will have two surfaces binded to it
one is the index and the second is 128 + index.
And then at kernel side, we will access the corresponding
2D array surface when the LD message is required otherwise
will access the origin 1D array surface.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Wed, 18 Jun 2014 06:53:06 +0000 (14:53 +0800)]
cl/runtime: fixup 1D array image region and origins.
As we treat 1D array image as a 2d array image with height 1
internally, we need to fixup region and origins passed in
from external APIs.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Zhigang Gong [Wed, 18 Jun 2014 02:01:15 +0000 (10:01 +0800)]
cl/driver: fix the incorrect handling of 1D array.
According to the bspec, the 1D array should be treated as a 3D like
surface which has height 1. So we need to make sure the depth is
the array_size. Thus the rt_view_extent's value should be always
the same as the depth.
According to the ocl spec, the 1D array firstly should be a 1D image rather
than a 2D image. Thus we should access different lines according to the
slice_pitch rather than the image_row_pitch.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Junyan He [Tue, 17 Jun 2014 04:06:54 +0000 (12:06 +0800)]
Enable the 1D and 2D image support in run time.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 17 Jun 2014 04:06:47 +0000 (12:06 +0800)]
Add the image1d_array_t and image2d_array_t defines.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Wed, 18 Jun 2014 06:42:15 +0000 (14:42 +0800)]
Add a lock in the place of printf output
If multi-thread run the kernel simultaneously, the output
may interlace with each other. Add a lock to avoid this.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Wed, 18 Jun 2014 06:42:07 +0000 (14:42 +0800)]
Refine the code in llvm_printf_parser.cpp
Fix some typo and use macro to simplify the code.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 18 Jun 2014 07:09:44 +0000 (15:09 +0800)]
GBE: pass compile against LLVM 3.5
backward compatible with LLVM 3.3
merged a bug fix patch into this one.
1. use_iterator point to 'Use' now instead of 'User'.
2. all c-string are in constant address space now, which follows OCL Spec.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 19 Jun 2014 14:37:42 +0000 (22:37 +0800)]
Fix an event status bug.
If event status is an Error code, the status of events wait on this event also should set to Error code.
V2: should not execute the enqueue command wait on the event whose status is error.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Abrahm Scully [Thu, 19 Jun 2014 02:28:42 +0000 (22:28 -0400)]
Try to use drm render nodes.
Allows non-root user to run without X.
Works on Fedora 20 with render nodes enabled.
Signed-off-by: Abrahm Scully <abrahm.scully@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Abrahm Scully [Thu, 19 Jun 2014 02:28:08 +0000 (22:28 -0400)]
Fix build with mesa 10.1.
Mesa renamed some constants and a directory.
Signed-off-by: Abrahm Scully <abrahm.scully@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Abrahm Scully [Thu, 19 Jun 2014 02:26:53 +0000 (22:26 -0400)]
Fix linking to X11 libraries.
After FindXLib.cmake was removed, XLIB_LIBARY should have been
replaced with X11_LIBRARIES.
Signed-off-by: Abrahm Scully <abrahm.scully@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 18 Jun 2014 07:59:53 +0000 (15:59 +0800)]
GBE: Correctly process constant for phi instruction
Simply use getRegister which deals with various ConstantExpr.
Thanks to Abrahm Scully who report the bug.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Wed, 18 Jun 2014 00:17:34 +0000 (08:17 +0800)]
add binary type support for compiled object and library.
save the llvm bitcode to program->binary: insert a byte in front of the
bitcode stands for binary type(0 means GEN binary, 1 means COMPILED_OBJECT, 2 means LIBRARY);
load the binary to module by ParseIR.
create random directory to save compile header files.
use strncpy and strncat to replace strcpy and strcat.
v6: fix enqueue_copy_fill bug, use '\0' instead of 0 in the header.
v7 binary header format issue: fix test_load_program_from_bin bug of standalone kernel generated by gbe_bin_generater.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Luo [Tue, 17 Jun 2014 02:59:05 +0000 (10:59 +0800)]
fix clEnqueueMarkerWithWaitList bug when input event is null.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 17 Jun 2014 03:16:55 +0000 (11:16 +0800)]
driver: fix a potential Null reference.
cl_gpgpu_flush may be called when the batch buffer has been
released. We need to check whether there is a valid buffer
before we really take the following actions.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Yang Rong [Mon, 16 Jun 2014 08:20:08 +0000 (16:20 +0800)]
Fix a clEnqueueBarrierWithWaitList event status bug.
Event's status should be CL_COMPLETE if all wait events are complete in the wait list, in function
clEnqueueBarrierWithWaitList and clEnqueueMarkerWithWaitList.
v2: revert delete the event change in v1.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 13 Jun 2014 09:50:31 +0000 (17:50 +0800)]
Bump beignet version to 0.8.99.
We are approaching the releae of version 0.9, so we bump
the version to 0.8.99 now.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Fri, 13 Jun 2014 09:44:28 +0000 (17:44 +0800)]
Bump OpenCL version to 1.2.
Now all opencl 1.2 functions in the opencl 1.2 branch have been
merged into master branch. Let's bump master's ocl version to 1.2.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Fri, 13 Jun 2014 09:44:00 +0000 (17:44 +0800)]
utests: use OpenCL 1.2 API for image related test cases.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Guo Yejun [Thu, 12 Jun 2014 22:06:50 +0000 (06:06 +0800)]
use LLVM_INSTALL_DIR as the path to clang/llvm-as/llvm-link
I invented CMAKE_BINARY_PATH as the path to clang/llvm-as/llvm-link
in last patch, it is not elegant. Actually, LLVM_INSTALL_DIR is
already used in CMake file and is a better choice.
So, for cross compile case, cmake can find the binaries such as clang,
llvm-as, llvm-link and llvm-config with the help of LLVM_INSTALL_DIR.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Guo Yejun [Thu, 12 Jun 2014 18:14:10 +0000 (02:14 +0800)]
clean code to remove gbe_kernel_set_const_buffer_size
this function is no longer needed.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Luo [Thu, 5 Jun 2014 21:00:40 +0000 (05:00 +0800)]
add [opencl-1.2] clUnloadPlatformCompiler implementation
just a empty hook.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Wed, 11 Jun 2014 01:33:36 +0000 (09:33 +0800)]
Implement the clEnqueueMigrateMemObjects API
So far, we just support 1 device and no subdevices.
So all the command queues should belong to the small context.
There is no need to migrate the mem objects from one subcontext
to another by now. We just do the checks and fill the event.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Mon, 9 Jun 2014 10:37:46 +0000 (18:37 +0800)]
GBE: Enable some implemented Opencl 1.2 functions in icd table.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Junyan He [Fri, 13 Jun 2014 09:05:06 +0000 (17:05 +0800)]
Add the utest case for clGetKernelArgInfo
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 09:04:58 +0000 (17:04 +0800)]
Add the clGetKernelArgInfo api and misc help functions
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 09:04:49 +0000 (17:04 +0800)]
Add the llvm info to the function for later usage.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 09:04:39 +0000 (17:04 +0800)]
Add the -cl-kernel-arg-info into the clang build options
We always add -cl-kernel-arg-info to the options. This option just generate
the arg information for the backend, no other side effect and does not have
performance issue. So we just always add it here.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Fri, 13 Jun 2014 03:17:39 +0000 (11:17 +0800)]
add [opencl-1.2] test case runtime_compile_link.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Luo [Fri, 13 Jun 2014 03:17:38 +0000 (11:17 +0800)]
add [opencl-1.2] API clLinkProgram.
this API links a set of compiled program objects and libraries for all
the devices or a specific device(s) in the OpenCL context and creates
an executable.
the llvm bitcode in the compiled program objects are linked together and
built to Gen binary.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Conflicts:
src/cl_gbe_loader.h
Luo [Fri, 13 Jun 2014 03:17:37 +0000 (11:17 +0800)]
add [opencl-1.2] API clCompileProgram.
This API compiles a program's source for all the devices or a specific
device in the OpenCL context associated with program.
The pre-processor runs before the program sources are compiled.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Luo [Fri, 13 Jun 2014 03:17:36 +0000 (11:17 +0800)]
add [opencl-1.2] API clCreateSubDevice.
creates an array of sub-devices that each reference a non-intersecting
set of compute units within in_device, according to a partition scheme
given by properties.
Reviewed-by: He Junyan <junyan.he@inbox.com>
Signed-off-by: Luo <xionghu.luo@intel.com>
Luo [Fri, 13 Jun 2014 03:17:35 +0000 (11:17 +0800)]
add test case runtime_barrier_list and runtime_marker_list.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Signed-off-by: Luo <xionghu.luo@intel.com>
Conflicts:
utests/CMakeLists.txt
Luo [Fri, 13 Jun 2014 03:17:34 +0000 (11:17 +0800)]
add [opencl-1.2] API clEnqueueBarrierWithWaitList.
This command blocks command execution, that is, any following commands
enqueued after it do not execute until it completes;
API clEnqueueMarkerWithWaitList patch didn't push the latest, update in
this patch.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Signed-off-by: Luo <xionghu.luo@intel.com>
Conflicts:
src/cl_event.c
Junyan He [Fri, 13 Jun 2014 07:08:10 +0000 (15:08 +0800)]
utests: fix the image desc initilization for get_image_info.
As now the clCreateImage implements more check, we need to
set more elements to pass all the argument check.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Junyan He [Fri, 13 Jun 2014 07:08:01 +0000 (15:08 +0800)]
Add the test case for 1D image from buffer
v2:
should not released the buffer which is handled by the utest helper.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Junyan He [Fri, 13 Jun 2014 07:07:52 +0000 (15:07 +0800)]
Add the support for 1D image from buffer.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 07:07:44 +0000 (15:07 +0800)]
Add test cases for 1d image fill and copy
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 07:07:31 +0000 (15:07 +0800)]
Add the support for 1D image in backend
1. Delete the is3D member in instruction class. Because we need more
than 1 bit to represent 1D 2D and 3D. We now add an invalid register
in ir profile, and comparing the coords to it to judge the dimension.
2. Rename all the xxx_image to xxx_image2D to make its meaning clear.
3. Update the according Sampler and Typed_Write instruction in selection
and Gen IR generation.
v2:
fix the use of InvalidRegister. Use ir::ocl::invalid only.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Junyan He [Fri, 13 Jun 2014 07:07:10 +0000 (15:07 +0800)]
Add checks for clCreateImage and add 1d image creating logic
Add more check for Image creating according to the spec.
Update the according image utest cases to pass it.
The 1d image creating is also be added.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Fri, 13 Jun 2014 00:58:17 +0000 (08:58 +0800)]
add[opencl-1.2] test case for API clCreateProgramWithBuiltInKernels.
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Fri, 13 Jun 2014 00:58:16 +0000 (08:58 +0800)]
add [opencl-1.2] API clCreateProgramWithBuiltInKernels.
This API creates a built-in program object for a context, and loads the
built-in kernels into this program object.
v2:
fix the image base index handling issue.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Luo [Fri, 13 Jun 2014 00:58:15 +0000 (08:58 +0800)]
add [opencl 1.2] API clEnqueueMarkerWithWaitList.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Junyan He [Fri, 13 Jun 2014 05:30:49 +0000 (13:30 +0800)]
Add the test case for clEnqueueFillBuffer
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 05:30:42 +0000 (13:30 +0800)]
Implement the clEnqueueFillBuffer API.
We use the floatn's assigment to do the copy.
128 pattern size is according to double16, and because
the double problem on our platform, we use to float16
to handle this.
unaligned cases is not optimized now, just use the char
assigment.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 13 Jun 2014 05:30:30 +0000 (13:30 +0800)]
Add the kernels used by clEnqueueBufferFill API
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 12 Jun 2014 06:31:00 +0000 (14:31 +0800)]
GBE: switch to ocl-1.2 header files.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Guo Yejun [Mon, 26 May 2014 22:13:12 +0000 (06:13 +0800)]
relax the build dependency on Gen GPU
currently, the Gen GPU pciid of the underlying system is queried
and then passed to gbe_bin_generater as the target option.
This does not work when building the driver on another system with
non-intel GPUs, this patch relaxes the dependency by exporting the
pciid setting at CMake level, therefore, the pciid could be given
as a CMake option besides the current real time query method.
this patch also remove the redundancy code in utest/CMake by setting
PARENT_SCOPE in src/CMake.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yongjia Zhang [Mon, 23 Jun 2014 15:09:33 +0000 (23:09 +0800)]
Fix the same kernel name issue of OCL_OUTPUT_KERNEL_PERF
Now it treats kernels with same kernel name and different build
options separately. When OCL_OUTPUT_KERNEL_PERF==1, it outputs the
time summary as before, but if OCL_OUTPUT_KERNEL_PERF==2, it will
output the time details including the kernel build options and
kernels with same kernel name but different build options will
output separately.
v2: use strncmp and strncpy instead of strcmp and strcpy.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 12 Jun 2014 08:45:19 +0000 (16:45 +0800)]
utest: reduce group size to fit into baytrail platform.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Yang Rong [Thu, 12 Jun 2014 15:22:15 +0000 (23:22 +0800)]
HSW: Remove the jmpi distance limit of HSW.
Because the HSW's jmpi distance's unit is byte, the distance in JMPI instruction should
be S31, so remove S16 restriction.
It can fix luxmark fail when OCL_STRICT_CONFORMANCE=1.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Li, Peng <peng.li@intel.com>
Ruiling Song [Thu, 12 Jun 2014 07:11:52 +0000 (15:11 +0800)]
GBE: fix some bugs in 64bit bitcast.
1. set correct vstride when do int64 bitcast.
2. the condition to offset to next half should be (i%multiple) >= multiple/2.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 12 Jun 2014 11:42:12 +0000 (19:42 +0800)]
HSW: Fix potential issue of GT3 when calc stack address.
GT3 have 4 half slice, so should shift left 2 bits, and also should enlarge the stack buffer size,
otherwize, if thread generate is non-balance, may out of bound.
Per bspec, scratch size need set 2X of desired.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 12 Jun 2014 11:04:27 +0000 (19:04 +0800)]
Handle the difference timestamp count, got from drm_intel_reg_read.
In HSW and IVB, if x86_64 system, the low 32bits of timestamp count are stored in the high 32 bits of result which
got from drm_intel_reg_read, and 32-35 bits are lost; but in i386 system, the timestamp count match bspec.
It seems the kernel readq bug. So shift 32 bit in x86_64, and only remain 32 bits data in i386.
V2: In baytrail, don't have these issue, but need clear 32-35 bits.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Wed, 11 Jun 2014 18:38:22 +0000 (02:38 +0800)]
remove RTLD_DEEPBIND to avoid stdc++ issues
there are weired issues about stdc++ when dlopen .so file with flag
RTLD_DEEPBIND, remove the flag by renaming the function pointer names.
The new names in runtime begin with interp_*, meaning that they finally
go into libgbeinterp.so to interpret the meta data of binary kernel.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
Guo Yejun [Tue, 10 Jun 2014 21:27:26 +0000 (05:27 +0800)]
fix utest simd_any for simd width 8 and 16
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 6 Jun 2014 07:34:18 +0000 (15:34 +0800)]
GBE: ignoring some debug related intrinsics.
We don't need to assert the kernel if we met some
debug related intrinsics. Just ignore them.
This patch could make beignet works well with Debug
mode clBLAS.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Wed, 11 Jun 2014 03:14:52 +0000 (11:14 +0800)]
GBE: output compact flag when output asm.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Mon, 9 Jun 2014 00:39:33 +0000 (08:39 +0800)]
fix issue when create cl image from libva with offset
to share data between libva and ocl (at drm level), it is acceptable
to create cl image from libva with offset (to drm object). Correct
the bo offset whose value will finally go to ss1.base_addr.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 10 Jun 2014 04:53:22 +0000 (12:53 +0800)]
Add the utest case for printf
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 10 Jun 2014 04:53:12 +0000 (12:53 +0800)]
Add the printf logic into the run time.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 10 Jun 2014 04:53:04 +0000 (12:53 +0800)]
Add the printfSet into the kernel Class and add misc helper functions
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 10 Jun 2014 04:52:54 +0000 (12:52 +0800)]
Add the PrintfParser llvm parser into the llvm backend.
The PrintfParser will work before the llvm gen backend.
It will filter out all the printf function call. When
the printf call found, we will analyse the print format
and % place holder here. Replace the print call with
STORE or CONV+STORE instruction if needed.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 10 Jun 2014 04:52:45 +0000 (12:52 +0800)]
Add the PrintfSet class into the ir
The PrintfSet will be used to collect all the infomation in
the kernel. After the kernel executed, it will be used
to generate the according printf output.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 10 Jun 2014 04:52:37 +0000 (12:52 +0800)]
Add two special register for printf output buffer usage
printfiptr for printf index buffer pointer in curbe
and printfbptr for printf output buffer pointer in curbe.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 10 Jun 2014 02:45:56 +0000 (10:45 +0800)]
GBE: support SLM bool load and store.
The OCL spec does allow the use of a i1/BOOL SLM
variable, so we have to support the load and store of
it. To make things simple, I choose to use S16 to represent
i1 value.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Zhigang Gong [Mon, 9 Jun 2014 10:39:00 +0000 (18:39 +0800)]
GBE: increase batch size to relax the max reloc restriction.
The drm will restrict the max reloc to (batch size)/8.
Current batch buffer size is 8K, then the max reloc is 1024.
As the max workgroup size is 1024, if it uses simd16 channel
then the thread_n will be 1024/16 = 64. And if it need to bind
32 buffers, then the reloc count will be 64*32 which is 2048
and exceed current limitation. Let's increase the batch size to
16K to relax this restrication to 2048 relocs.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Luo [Fri, 6 Jun 2014 06:17:31 +0000 (14:17 +0800)]
remove the code of saving the llvm bitcode to file, replace it with llvm::Module
Save the global LLVMContext and module pointer to GenProgram, delete the
module pointer in the destructor.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Abrahm Scully [Fri, 6 Jun 2014 18:16:25 +0000 (14:16 -0400)]
Handle server IVB GT2.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@aim.com>
Ruiling Song [Mon, 9 Jun 2014 08:14:29 +0000 (16:14 +0800)]
GBE: Fix an assert on bitcast long to char8
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Mon, 9 Jun 2014 08:31:03 +0000 (16:31 +0800)]
Add some lost pci id into GetGenID.sh
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 9 Jun 2014 15:29:50 +0000 (23:29 +0800)]
HSW: Restore L3 control register to disable SLM mode.
It seems L3 control register is per context in IVB, but not per context
in HSW, so need restore the L3 control register, otherwise, it may cause screen flick.
this patch may hurt some performance.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 9 Jun 2014 15:29:49 +0000 (23:29 +0800)]
HSW: enable the surface's cache in HSW.
HSW's surface cache control is changed, correct it. Also correct scratch size calculate.
And disable exec flag for slm. When kernel parse cmd finish, need remove it totally
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 9 Jun 2014 15:29:48 +0000 (23:29 +0800)]
HSW: Set correct max work group size for GT2 and GT3.
v2: Return an error when can't fit work group to a single half slice.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 9 Jun 2014 15:29:47 +0000 (23:29 +0800)]
HSW: add data port 1 support in disassemble.
HSW add new data port, add support in diassemble.
V2: seperate HSW and IVB's send msg function table, so need pass deviceID to gen_disasm.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 6 Jun 2014 10:05:09 +0000 (18:05 +0800)]
GBE: fix one illegal instruction.
When the destination is a scalar and the execution width
is 1, we should use scalar vec rather.
This patch fix the following illegal instruction:
(38 ) mov(1) g124.3<1>:F acc0<8,8,1>:F
to the correct one:
(38 ) mov(1) g124.3<1>:F acc0<0,1,0>:F
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Fri, 6 Jun 2014 06:57:18 +0000 (14:57 +0800)]
GBE: Fix a jump issue in int64 to float conversion
The the inactive lanes should use 32, so later jump could jump
as desired.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 6 Jun 2014 06:26:48 +0000 (14:26 +0800)]
GBE: fix a bug in int64 to float conversion.
When copy those pure 32bit int to float destination, we
should enable the mask. Otherwise, we may destroy the
value in inactive lanes.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Zhigang Gong [Mon, 9 Jun 2014 07:04:46 +0000 (15:04 +0800)]
GBE: fix a typo in utests.
sub_bufffer_check ==> sub_buffer_check.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Fri, 6 Jun 2014 02:37:19 +0000 (10:37 +0800)]
utests: add a double precision check test case.
v2:
fix some bugs in test case.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Zhigang Gong [Fri, 30 May 2014 10:19:04 +0000 (18:19 +0800)]
GBE: Add support double to float conversion.
Previous double to float conversion will go to the
int64 to float code path incorrectly. And don't really
have double to float conversion support at gen_encoder.
This patch fix the above issues.
v2:
fix some bug on HSW platform.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Zhigang Gong [Thu, 5 Jun 2014 08:16:10 +0000 (16:16 +0800)]
GBE: optimize a special case of convert INT64 to float.
We found the following instruction sequence is common
in luxmark:
CVT.int64.uin32 %75 %74
LOADI.int64 %537
16777215
AND.int64 %76 %75 %537
CVT.float.uin64 %77 %76
Actually, the immediate value is a pure 32 bit value,
and the %74 is also a uint32 bit value. The AND instruction
will not touch the high 32 bit as well. So we can simply optimize
the above instruction series to the follow:
AND.uint32 %tmp %74
16777215
MOV.float %77 %tmp
This way, it will finally save about 55 instructions for each
of the above case. This patch could bring about 8% performance
gain with sala scene in luxmark.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Li Peng [Wed, 4 Jun 2014 06:21:44 +0000 (14:21 +0800)]
add DRM_LIBDIR path into link directory list
Then beignet can link to user preferred drm library rather than default
Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 29 May 2014 16:37:30 +0000 (00:37 +0800)]
HSW: Fix a compact assert.
Also use const static int instead of const int to avoid build error
in some gcc.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 3 Jun 2014 05:53:15 +0000 (13:53 +0800)]
GBE: Optmize phi elimination
During phi elimination, we simply insert 3 MOVs for one phi instruction
to avoid lost copy issue. But in fact, only two of them are needed for
most of time. This patch tries to see whether the move from phiCopy
to phi can be avoided.
The patch basically checks whether the phiCopy and phi have live range
interference. If no, then they can be coalesced, thus one instruction
can be optimized.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 3 Jun 2014 05:53:14 +0000 (13:53 +0800)]
Revert "GBE: No need to compute liveout again in value.cpp."
We need to transfer ValueDef from predecessors to their successors.
Consider a register defined in BB0, and used in BB3. we need to
iterate over liveout to pass the def in BB0 to BB3, so the use
in BB3 could get that correct def. Otherwise, the UD/DU graph is incomplete.
This reverts commit
89b490b5a17cfda2d9816dc1c246ce5bbff12648.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Mon, 2 Jun 2014 18:13:54 +0000 (02:13 +0800)]
refine code for the usage of set_image_base_index
In libgbe.so and libgbeinterp.so, the same function pointer name
gbe_set_image_base_index is used for a unified source code.
In libcl.so, function pointer names begin with compiler_* point to
the functions from libgbe.so, function pointer names begin with
gbe_* point to the functions from libgbeinterp.so.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Fri, 30 May 2014 08:22:30 +0000 (16:22 +0800)]
GBE: Fix bitcast between long and other type.
As we store long low/high 32bits separately, when we do bitcast
like int64 --> int16, the horizontal stride of the int64's low/high
half should be set as 2 instead of 4.
This fix an regression of opencv test:
Imgproc/Threshold.Mat/40, where GetParam() = (16SC1, 0, 0, false)
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yi Sun [Fri, 30 May 2014 03:22:33 +0000 (11:22 +0800)]
Make utest pass rate reach 100%.
1. Add more input values
2. remove case pow(0,0)
3. remove negtive values test in powr && pown
Signed-off-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Yangweix Shui <yangweix.shui@intel.com>
Yi Sun [Fri, 30 May 2014 03:22:32 +0000 (11:22 +0800)]
Refine some test for math function
1. nextafter: we originally use nextafter as cpu execution result, It's return value is double, so changed it to nextafterf.
2. sinpi: add judgement to reduce input data limitation from [-2pi,2pi] to [-pi,pi]
3. cospi: define cospi function.
4. tanpi: define tanpi function by using sinpi/cospi.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: YangweiX Shui <yangweix.shui@intel.com>
Junyan He [Fri, 30 May 2014 06:28:30 +0000 (14:28 +0800)]
Refine the cl thread implement for queue.
Because the cl_command_queue can be used in several threads simultaneously but
without add ref to it, we now handle it like this:
Keep one threads_slot_array, every time the thread get gpgpu or batch buffer, if it
does not have a slot, assign it.
The resources are keeped in queue private, and resize it if needed.
When the thread exit, the slot will be set invalid.
When queue released, all the resources will be released. If user still enqueue, flush
or finish the queue after it has been released, the behavior is undefined.
TODO: Need to shrink the slot map.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Li Peng [Mon, 26 May 2014 11:25:59 +0000 (19:25 +0800)]
Fix timestamp on HASWELL
The GPU timestamp should be lower 36 bit on HASWELL
Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Guo Yejun [Thu, 29 May 2014 22:29:09 +0000 (06:29 +0800)]
extract libgbeinterp.so from runtime (libcl.so)
currently, there are same symbol names in libinterp.a (inside
libcl.so) and libgbe.so (compiler), and so have to dlopen libgbe.so
with RTLD_DEEPBIND, this flag makes std::cerr inside libgbe crash.
extract the interp part from libcl.so as libgbeinterp.so, therefore,
first dlopen libgbe.so without RTLD_DEEPBIND, then dlopen libgbeinterp.so
with RTLD_DEEPBIND, to fix the std:cerr crash issue.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 29 May 2014 04:27:02 +0000 (12:27 +0800)]
GBE: fix one illegal instruction when replace a uniform dst.
When the dst is a uniform value, we replace it with a vector value, then
copy the vector value back may generate an illegal instruction as below
at address 18:
(14 ) mov(16) g124<1>:F g127.7<0,1,0>:F { align1 WE_all 1H };
(16 ) send(16) g122<1>:UW g124<8,8,1>:UD
data (bti: 1, rgba: 14, SIMD16, legacy, Untyped Surface Read) mlen 2 rlen 2 { align1 WE_all 1H };
(18 ) mov(1) g127.6<1>:F g122<8,8,1>:F { align1 WE_all };
This patch could fix this issue and generate correct instruction as below:
( 14) mov(16) g124<1>:UD g127.7<0,1,0>:UD { align1 WE_all 1H };
( 16) send(16) g122<1>:UW g124<8,8,1>:UD
data (bti: 1, rgba: 14, SIMD16, legacy, Untyped Surface Read) mlen 2 rlen 2 { align1 WE_all 1H };
( 18) mov(1) g127.6<1>:UD g122<0,1,0>:UD { align1 WE_all };
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 29 May 2014 02:29:35 +0000 (10:29 +0800)]
utests: disable double test case.
As we could not provide full support of double now,
and my patch to refine long support breaks double load/store.
So, we disable all double test cases.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 29 May 2014 02:29:34 +0000 (10:29 +0800)]
GBE: Pass correct register type when replaceReg
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Ruiling Song [Thu, 29 May 2014 02:29:33 +0000 (10:29 +0800)]
GBE: Change 64bit integer storage in register
Previously, we store low/high half of 64bit together, which need several
32bit instructions to do one 64bit instruction. Now we simply change its
storage in register, low 32bit of all lanes are stored together, and then the
high 32bit of all lanes. This will make long support cleaner and less
32bit instructions needed.
v2:
fix a typo when getRegAtrrib().
Refine SelectionVector alignment.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 29 May 2014 01:26:39 +0000 (09:26 +0800)]
GBE: optimize scalar data type conversion.
If the dst is scalar, the register region restrication is relaxed.
we can save one instruction as below:
(12 ) mov.sat(1) g127.24<4>:B g1.3<0,1,0>:D { align1 WE_all };
(14 ) mov(1) g127.28<1>:B g127.24<0,1,4>:D { align1 WE_all };
Optimized to:
(12 ) mov.sat(1) g128.28<4>:B g1.3<0,1,0>:D { align1 WE_all };
No need to create a temporary register g127.24.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Thu, 29 May 2014 01:26:24 +0000 (09:26 +0800)]
GBE: fix uniform/scalar related bugs.
One major fix is that even a register is a scalar, when
we move a scalar Dword to a scalar Byte, we have to set
the hstride to 4, otherwise, it breaks the following
register restication:
B. When the Execution Data Type is wider than the destination data type,
the destination must be aligned as required by the wider execution data
type and specify a HorzStride equal to the ratio in sizes of the two data
types. For example, a mov with a D source and B destination must use a
4-byte aligned destination and a Dst.HorzStride of 4.
The following instruction may doesn't take effect.
mov.sat(1) g127.4<1>:B g126<0,1,0>:D
We have to change it to
mov.sat(1) g127.4<4>:B g126<0,1,0>:D
v2: keep the instruction selection stage unchanged, we fix this restircation
in setDst only.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>