Yang Rong [Thu, 28 Aug 2014 06:37:44 +0000 (14:37 +0800)]
Only compiler X11 files and do X11 operations when found X11.
Add a build flag HAS_X11 for it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 26 Aug 2014 07:39:24 +0000 (15:39 +0800)]
GBE: refine the llvm multi-thread related code.
LLVM 3.5 remove llvm_start/stop_multithreaded() API, instead multi-thread
support is determined when build llvm(build option LLVM_ENABLE_THREADS).
llvm_is_multithreaded() is used to check whether llvm is built with
muti-thread support.
If multi-thread is not support(LLVM3.3/3.4 or 3.5 built with LLVM_ENABLE_THREADS off),
we simply add a mutex when calling clang/llvm related API.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 26 Aug 2014 07:39:11 +0000 (15:39 +0800)]
GBE: clear deadprintfs when current function is done.
It should be cleared, to prevent invalid pointers staying there
when processing next Function.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Luo Xionghu [Tue, 26 Aug 2014 02:12:28 +0000 (10:12 +0800)]
fix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression.
This regression is caused by structural analysis when check the if-then
node, acturally there are four types of if-then node according to the
topology and fallthrough information. fallthrough check is added in this
patch.
v2: add inversePredicate member and function for BranchInstruction;
print the exact meanning of IF instruction in GEN_IR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lv Meng [Fri, 15 Aug 2014 01:16:33 +0000 (09:16 +0800)]
Fix compile warnings for CLANG compiler
1.fix data structure redefine warnings.
2.fix 'data' with variable sized type 'union<*>' not at the end of a class warning(in immediate.hpp).
3.fix implicitly conversion warning.
4.fix explicitly assigning a variable type warning.
5.fix comparison of unsigned expression < 0 is always false warning(in cl_api.c).
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Lv Meng [Thu, 14 Aug 2014 03:33:06 +0000 (11:33 +0800)]
Fix compile warnings for ICC compiler
1.the "const" associated functions' modification is to fix "type qualifier on return type is meaningless" for ICC compile warning.
2.the "operator new" shoud have the corresponding "operator delete" function.
3.In C++0x std::auto_ptr will be deprecated in favor of std::unique_ptr.
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Ruiling Song [Wed, 13 Aug 2014 01:53:33 +0000 (09:53 +0800)]
cmake: Fix a license issue.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Lv Meng [Fri, 8 Aug 2014 08:10:03 +0000 (16:10 +0800)]
Fix compile errors for CLANG compiler
Use vector to fix "variable length array of non-POD element type" compiler error.
The /beignet/backend/src/./ir/context.hpp "fn->immediates[index] = imm" would call a private func
'operator=' which would trigger error, and it is not being used.
The undefined reference to `check_copy_overlap' would occur in the following calling.
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Lv Meng [Fri, 8 Aug 2014 08:08:16 +0000 (16:08 +0800)]
Fix compile error for ICC compiler
fix the pthread_mutex_t undefine compile error and some undefined error would occur when
using math.h in C++ file.for C++ file,it is better using cmath instead off math.h.
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Ruiling Song [Mon, 11 Aug 2014 05:58:26 +0000 (13:58 +0800)]
GBE: Fix a bug in gatherBTI.
The needNewBTI is a state that only valid for the current candidate.
So need to reset to default value for each candidate.
This fix the regression in opencv 3.0:
./opencv_perf_objdetect OCL_Cascade_Image_MinSize_CascadeClassifier.CascadeClassifier
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Ruiling Song [Mon, 11 Aug 2014 05:49:01 +0000 (13:49 +0800)]
GBE: initialize BTI structure to zero.
Clear to zero to avoid garbage data, as we do not
assign it later for local/constant memory access.
v2:
move initialization code into constructor.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Ruiling Song [Mon, 11 Aug 2014 05:48:49 +0000 (13:48 +0800)]
GBE: Fix type size for vector3
According to OCL spec, size of vector3 are aligned to vector4.
And for memory load/store, clang already aligned it to vector4.
If we do not calculate private/local memory size as vector4,
out of range memory access will appear.
This can fix the failure of opencv 3.0 case:
OCL_Arithm/MeanStdDev.Mat_Mask
v2:
vec3 constant data should be aligned to vec4.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Rebecca Palmer [Fri, 8 Aug 2014 11:07:24 +0000 (12:07 +0100)]
Fail gracefully on unsupported hardware
If no compatible hardware is present, clGetDeviceIDs is supposed to
report CL_DEVICE_NOT_FOUND to the caller, but in Beignet this currently
ends the whole program with exit(-1) or assert(0). This fixes this.
This is required to have a "just works" OpenCL in Debian/Ubuntu, as
their package manager doesn't know the hardware and hence commonly will
install Beignet on hardware that doesn't support it; returning an error
allows the caller to try other ICDs until it finds the right one, or to
run without using OpenCL. Previous discussion:
http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-
20140217/000096.html
http://lists.alioth.debian.org/pipermail/pkg-opencl-devel/Week-of-Mon-
20140217/000100.html
Testing if you only have supported hardware: use a chroot, the GPU
isn't visible from inside.
Identical patch in case line wrap mangles this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=27;filename=fail_gracefully_without_hardware;att=1;bug=745363
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Ruiling Song [Mon, 11 Aug 2014 02:15:14 +0000 (10:15 +0800)]
GBE: Fix a warning in getConstantPointerRegister.
compiler complains "warning: control reaches end of non-void function"
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Luo Xionghu [Wed, 6 Aug 2014 01:36:31 +0000 (09:36 +0800)]
fix the relational built-in vector function regression.
the relational vector function need return -1 instead of 1 according to
the spec.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Yang Rong [Mon, 4 Aug 2014 07:04:16 +0000 (15:04 +0800)]
Fix a utest compiler_async_stride_copy typo.
And need to convert to char when compare.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
LuoXionghu [Fri, 1 Aug 2014 01:40:09 +0000 (09:40 +0800)]
improve the build performance of vector type built-in function.
expand the gentypen with loop to reduce the redundant inline for more
than 4 components type.
v2: add the greater than 4 componets conditon to avoid performace
degration.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Ruiling Song [Thu, 31 Jul 2014 08:48:01 +0000 (16:48 +0800)]
GBE: remove some useless code for getting printf buffer address.
This is not used anymore.
Also fix an annoying warning.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Zhigang Gong [Tue, 29 Jul 2014 04:55:44 +0000 (12:55 +0800)]
GBE: adjust preferred vector length.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Tue, 29 Jul 2014 04:36:42 +0000 (12:36 +0800)]
GBE: Reduce random behaviour of the code generation
There are two major types of random behviour source. One is the
register spill tick. Now we fix it to increase from 0 for
each new code generation.
The second random source is the register sorting. When two
register has the same startID or endID, their sorting order
is not determined and maybe random in different machine. This
patch mitigate this random source by introduce another comparison
if the main key is identical.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Thu, 24 Jul 2014 05:13:32 +0000 (13:13 +0800)]
utest: add new test for constant expression processing.
If we use 3-component vector in a union, it may introduce
some complex constant expression as below:
float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32
1065353216, i32
1073741824, i32
1077936128, i32 undef> to i128) to i32) to float).
To test the constant expression processing function.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Wed, 23 Jul 2014 09:02:52 +0000 (17:02 +0800)]
GBE: enable constant expression processing.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Tue, 22 Jul 2014 10:36:28 +0000 (18:36 +0800)]
GBE: complete constant expression processing.
The target is to process all possible complex nested constant expression as below:
const = type0 OP0 (const0)
const0 = type1 OP1 (const1, const2)
const1 = ...
The supported OPs are as below:
BITCAST,
ADD,
SUB,
MUL,
DIV,
REM,
SHL,
ASHR,
LSHR,
AND,
OR,
XOR
We also add support for array/vector type of immediate. Some possible examples are as below:
float bitcast (i32 trunc (i128 bitcast (<4 x i32> <i32
1064178811, i32
1064346583, i32
1062836634, i32 undef> to i128) to i32) to float)
float bitcast (i32 trunc (i128 lshr (i128 bitcast (<4 x i32> <i32
1064178811, i32
1064346583, i32
1062836634, i32 undef> to i128), i128 32) to i32) to float)
v2:
separate all private method implementations to immediate.cpp.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Tue, 22 Jul 2014 09:01:06 +0000 (17:01 +0800)]
GBE: simplify processConstant.
Preparation to support generic constant expression.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Tue, 22 Jul 2014 07:56:08 +0000 (15:56 +0800)]
GBE: refactor the immediate class to support vector data type.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Wed, 30 Jul 2014 07:49:29 +0000 (15:49 +0800)]
GBE: refine post register allocation scheduling for global buffers.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Zhigang Gong [Wed, 30 Jul 2014 07:36:01 +0000 (15:36 +0800)]
GBE: cleanup image base index related code.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
Ruiling Song [Wed, 30 Jul 2014 05:59:30 +0000 (13:59 +0800)]
GBE: Handle bti allocation for internal buffer used by printf.
1. Move the bti/Register map from gbe::Context to ir::Function.
2. use GlobalVariable instead of 'call' to get internal buffer (used for printf) base address.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 30 Jul 2014 05:59:29 +0000 (13:59 +0800)]
GBE: Refine bti usage in backend & runtime.
Previously, we simply map 2G surface for memory access,
which has obvious security issue, user can easily read/write graphics
memory that does not belong to him. To prevent such kind of behaviour,
We bind each surface to a dedicated bti. HW provides automatic
bounds check. For out-of-bound write, it will be ignored. And for read
out-of-bound, hardware will simply return zero value.
The idea behind the patch is for a load/store instruction, it will search
through the LLVM use-def chain until finding out where the address
comes from. Then the bti is saved in ir::Instruction and used for
the later code generation. And for mixed pointer case, a load/store
will access more than one bti.
To simplify some code, '0' is reserved for constant address space,
'1' is reserved for private address space. Other btis are assigned
automatically by backend.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 29 Jul 2014 07:41:38 +0000 (15:41 +0800)]
runtime: set correct state for constant buffer on hsw.
According to spec, should set I965_SURCHAN_SELECT_XXX on hsw.
Then we can use sampler message to read constant surface.
This fix the regression in unit test brought by:
'GBE: Optimize constant load with sampler.'
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 28 Jul 2014 01:19:30 +0000 (09:19 +0800)]
utests: Fix a bug in image_1D_buffer.
Should use buffer_sz to clCreateBuffer().
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 28 Jul 2014 01:19:29 +0000 (09:19 +0800)]
GBE: align the fields in union ImageInfoKey.
To avoid possible garbage data.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Thu, 24 Jul 2014 22:00:27 +0000 (06:00 +0800)]
delete GEPInst when it is no longer used
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Thu, 17 Jul 2014 23:16:34 +0000 (07:16 +0800)]
clean llvm resource in compiler (libgbe.so)
since we have separated the compiler (libgbe.so) and the interpreter
(libgbeinterp.so), the LLVM resource cleanup task should be done in
the compiler instead of the GenProgram::~GenProgram which has no way
to clean llvm resources in libgbeinterp.so
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Guo Yejun [Wed, 23 Jul 2014 23:51:18 +0000 (07:51 +0800)]
fix three memory leaks
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Guo Yejun [Thu, 17 Jul 2014 18:38:33 +0000 (02:38 +0800)]
free build_log when the cl program is released
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 17 Jul 2014 02:37:09 +0000 (10:37 +0800)]
NEWS: update for 0.9.2.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Thu, 17 Jul 2014 02:14:12 +0000 (10:14 +0800)]
docs: add a NEWS document to point to the release notes pages.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Guo Yejun [Wed, 16 Jul 2014 17:26:23 +0000 (01:26 +0800)]
remove requirment as drm master in non-x environment
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lv Meng [Wed, 16 Jul 2014 07:38:48 +0000 (15:38 +0800)]
improve the clEnqueueCopyBufferRect performance in some cases
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
LuoXionghu [Wed, 16 Jul 2014 01:31:06 +0000 (09:31 +0800)]
add utest load_program_from_gen_bin.
this test case would check whether genProgramSerializeToBinary in
backend can generator gen binary correctly.
rename load_program_from_bin to load_program_from_bin_file.
the difference is load_program_from_bin_file could either load program
from llvm binary or gen binary file generated by gbe_bin_generator.
Signed-off-by: LuoXionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
LuoXionghu [Wed, 16 Jul 2014 01:31:05 +0000 (09:31 +0800)]
add platform info in the gen binary code.
the size of the platform info is 3 bytes, right after the '/0GENC'.
check the header magic number and platform info before deserializeFromBin.
v2: supports IVB/BYT/HSW binary on its' platform, and BYT binary runs on IVB.
v3: fix 'BYT' overwritten by 'IVB';
Signed-off-by: LuoXionghu <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 14 Jul 2014 09:24:38 +0000 (17:24 +0800)]
GBE: Optimize constant load with sampler.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 14 Jul 2014 09:24:37 +0000 (17:24 +0800)]
GBE: Use varying register to save one instruction
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 11 Jul 2014 15:36:39 +0000 (23:36 +0800)]
gbe: add the new else instruction to the assert checking.
Else is the new branch instruction. We already added it to the
gen_encoder's patchJMPI but forget it for gen75_encoder. Now
fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Thu, 10 Jul 2014 03:38:41 +0000 (11:38 +0800)]
Add some hsw missed pci ids (reserved PCI IDs).
v2:
modified according to Matthias's suggestion.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Matthias Sattler <Matthias.Sattler@t-online.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 10 Jul 2014 10:57:53 +0000 (18:57 +0800)]
runtime: fix some subtle event bugs.
This patch fix the following two bugs in event handling.
1. When it's time to call a event's user call back function, we need to
set the executed to true before the call. As that call back function
may call into clReleaseEvent(), and if we don't set the executed status
to true, it will enter infinite recursive loop.
2. After the user call clEnqueueNDRangeKernel to get a valid event, the
user set a call back function to that event, and in that call back
function, it will release that event. This scenario is totally correct.
But our current event handling doesn't have a deadicated timer thread to
update those on-the-fly events' status. Thus those events will not have
a chance to get updated, and those call back function will not executed
forever. To introduce a complete timer style thread to maintain this type
of events is too heavy for this fix release. This patch choose an easy
way to work around it. It will make sure the last gpgpu event to be finished
before current task to be enqueued.
After this patch, most of the OpenCV 3.0 cases could run smoothly without
any serious issue.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Zhigang Gong [Thu, 10 Jul 2014 08:52:48 +0000 (16:52 +0800)]
runtime/driver: refine error handlings.
We should always check whether a dri_bo_map success or fail.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Yongjia Zhang [Thu, 17 Jul 2014 18:16:52 +0000 (02:16 +0800)]
GBE: fix empty block disassemble bug.
If a block is empty, there will be more than one label pointing the
same instruction we need to step over all those labels.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yongjia Zhang <zhang_yong_jia@126.com>
Yongjia Zhang [Thu, 17 Jul 2014 18:14:41 +0000 (02:14 +0800)]
Enable structural analysis
enable structural analysis and use if, else and endif in the
generated asm.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yongjia Zhang [Thu, 17 Jul 2014 18:14:40 +0000 (02:14 +0800)]
Use instruction if else and endif manipulate structures
Use instruction if, else and endif manipulate the control flow of
identified if-then and if-else structures at backend. but this
is not enabled, just add the necessary code to backend.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yongjia Zhang [Thu, 17 Jul 2014 18:14:39 +0000 (02:14 +0800)]
Add structure identification on ir level
Add tool structures and functions for identifying if-then and
if-else structures on Gen IR level.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yongjia Zhang [Thu, 17 Jul 2014 18:14:38 +0000 (02:14 +0800)]
Add Gen instruction 'else'
Add Gen instruction 'else' for future use.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yongjia Zhang [Thu, 17 Jul 2014 18:14:37 +0000 (02:14 +0800)]
Add Gen IR IF, ELSE and ENDIF
Add Gen IR IF, ELSE and ENDIF to mark the strucutred region.
Signed-off-by: Yongjia Zhang <yongjia.zhang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Tue, 8 Jul 2014 01:07:54 +0000 (09:07 +0800)]
remove lspci, gbe_bin_genenrater would generator llvm binary by default.
driver can get chipset id by ioctl instead of calling lspci in cmake;
user could generator gen binary by configuring cmake option
-DGEN_PCI_ID=xxxx or calling the gbe_bin_generater with option -t
GEN_PCI_ID.
v2: add "\0GENC" magic code for gen binary, fix typo.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 7 Jul 2014 05:59:24 +0000 (13:59 +0800)]
GBE: Fix builtin tanpi.
To meet precision requirement of OCL Spec .
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 8 Jul 2014 06:34:57 +0000 (14:34 +0800)]
Build: Change versioning policy.
Don't set patch version number on master branch.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Yi Sun [Fri, 4 Jul 2014 03:35:57 +0000 (11:35 +0800)]
Remove the generated test cases list.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 4 Jul 2014 02:04:33 +0000 (10:04 +0800)]
Build: check whether lspci exists.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Thu, 3 Jul 2014 03:33:10 +0000 (11:33 +0800)]
runtime: fix a gpgpu event and thread local gpgpu handling bug.
When pending a command queue, we need to record the whole gpgpu
structure not just the batch buffer. For the following reason:
1. We need to keep those private buffer, for example those printf buffers.
2. We need to make sure this gpgpu will not be reused by other enqueuement.
v2:
Don't try to flush all user event attached to the queue.
Just need to flush the current event when doing command queue flush.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Thu, 3 Jul 2014 04:52:17 +0000 (12:52 +0800)]
runtime: recover the maximum read image args to 128.
To comply with the full profile.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
Yang Rong [Thu, 3 Jul 2014 04:16:55 +0000 (12:16 +0800)]
Refine some event code.
1. Do not add user event to cb->wait_list to avoid ref this user event twice.
2. Add assert when update status.
3. Set the queue's last wait event and barrier event to NULL when remove last event.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Wed, 2 Jul 2014 04:39:05 +0000 (12:39 +0800)]
GBE: Check family of spilled register correctly.
We only support DWORD QWORD register-spill currently.
So if we cannot spill a register, simply return false
instead of give an assert.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 2 Jul 2014 06:29:24 +0000 (14:29 +0800)]
Refine the logic when suspend a batch buffer.
Clear the gpgpu's batch buffer when suspend to avoid potential issue.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 2 Jul 2014 05:42:52 +0000 (13:42 +0800)]
Fix some event ref count error.
Move the event add ref to function cl_event_new_enqueue_callback for clear.
Also need add the wait user events' ref count.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Zhigang Gong [Tue, 1 Jul 2014 04:50:36 +0000 (12:50 +0800)]
runtime: fix potential curbe allocation issue.
According to spec, different platforms have different curbe
allocation restrication. The previous code set the curbe
allocated size to 480 statically which is not correct.
This patch change to always set the curbe entry num to 64
which is the maximum work group size. And set proper curbe
allocation size according to the platform's hard limitation
and a relatively reasonable kernel argument usage limitation.
v3:
when we call load_vte_state, we already know the eaxctly constant urb
size used in the current kernel. We could choose a smallest valid curbe
size for this kernel. And if the size exceed the hardware limitation,
we report it as a warning here.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Sat, 28 Jun 2014 15:25:53 +0000 (23:25 +0800)]
runtime: fix max group size calculation issue.
If the kernel doesn't use slm/barrier, there is no hard limitation
for the max group size. And if the max work group size is more than
1024, the original 64 urb entry count will not be sufficient to hold
all the curbe payload. Change the entry count to max thread count to
fix this potential issue.
I found this bug when I tried to run phoronix test suite's juliagpu
test case on my MBA.
v2:
refine the max kernel work group size calculation mechanism.
the wg_sz should not be a device's member variable, it should be
a variable derived from kernel and device's attriute at runtime.
also fix wrong configuration for IVB GT1.
v3:
Add an important max thread limitation in the GPGPU_WALKER command.
For non-Baytrail, the max thread depth * max thread height * max thread width
should less than 64 (under either simd16 or simd8), no matter whether
SLM/barrier is used. We oversighted that limitation before, thus for
a simd8 kernel which use work group size 1024 will exceed this limitation
and half of the thread will not be executed at all.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Luo [Fri, 27 Jun 2014 00:27:03 +0000 (08:27 +0800)]
add the usage of link program from llvm binary.
user A could compile and link kernel source to llvm binary first, then
query the binary to save to file; With the binary, user B can call
clCreateProgramWithBinary without compile the source again.
this usage could protect those who need to protect the kernel source.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Fri, 27 Jun 2014 06:17:57 +0000 (14:17 +0800)]
GBE: disable GVN pass when optLevel is zero.
GVN pass may generate some i256 data type, which our backend could not handle.
So, only enable it when optLevel > 0.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Zhigang Gong [Mon, 30 Jun 2014 03:48:36 +0000 (11:48 +0800)]
Bump to 0.9.1 (development version).
Bump to development version after 0.9.0.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Thu, 26 Jun 2014 05:23:23 +0000 (13:23 +0800)]
Bump to version 0.9.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Yang Rong [Thu, 26 Jun 2014 13:31:23 +0000 (21:31 +0800)]
Fix call cl_mem_copy_image_region bug.
When call cl_mem_copy_image_region, sometimes need add offset to src or dst address,
sometimes need not add. Add two parameter to indicate it.
Also fix the wrong offset when clEnqueueMapImage of CL_MEM_USE_HOST_PTR.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 26 Jun 2014 05:34:43 +0000 (13:34 +0800)]
docs: fixup markup format.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Thu, 26 Jun 2014 04:38:09 +0000 (12:38 +0800)]
docs: fix some markdown links and correct some information.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Thu, 26 Jun 2014 04:15:56 +0000 (12:15 +0800)]
docs: update some documents.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Zhigang Gong [Thu, 26 Jun 2014 02:36:39 +0000 (10:36 +0800)]
GBE: fix some bugs in ocl stdlib header files.
The printf's prototype was added twice incorrectly.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
Zhigang Gong [Wed, 25 Jun 2014 15:21:54 +0000 (23:21 +0800)]
gbe_bin_generator: fix the incorrect type of cl_internal_built_in_kernel_str_size.
We should define it as size_t.
v2:
correct some extern definitions in cl_mem.c.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
Yang Rong [Mon, 23 Jun 2014 16:28:51 +0000 (00:28 +0800)]
Add optimization guide.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 25 Jun 2014 08:13:32 +0000 (16:13 +0800)]
runtime: Remove 'Experiment' from the platform name.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Guo Yejun [Wed, 18 Jun 2014 00:43:43 +0000 (08:43 +0800)]
add how to for cross compiler
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Wed, 25 Jun 2014 15:23:24 +0000 (23:23 +0800)]
Fix clEnqueueMapImage with CL_MEM_USE_HOST_PTR bug.
Should return host row pitch and host slice pitch.
Also should copy back to image when unmap.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Wed, 25 Jun 2014 01:56:44 +0000 (09:56 +0800)]
add image_1d_to_1d builtin kernel name.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 25 Jun 2014 05:53:14 +0000 (13:53 +0800)]
utests: fix one bug when create image at one test case.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Junyan He [Tue, 24 Jun 2014 15:33:20 +0000 (23:33 +0800)]
Implement the %p in the printf
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Tue, 24 Jun 2014 08:35:58 +0000 (16:35 +0800)]
Add the support for vector type in printf.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Tue, 24 Jun 2014 06:23:31 +0000 (14:23 +0800)]
GBE: Further optimize exp().
Use native_exp() as much as possible.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Tue, 24 Jun 2014 02:09:12 +0000 (10:09 +0800)]
add cpu copy for 1Darray and 2darray related copy APIs.
detail cases: 1Darray, 2Darray, 2Darrayto2D, 2Darrayto3D, 2Dto2Darray, 3Dto2Darray.
1d used gpu copy.
v2:
fixed 1d array to 1d array copy, don't need to switch depth and height.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Guo Yejun [Mon, 23 Jun 2014 22:22:07 +0000 (06:22 +0800)]
add BEIGNET_INSTALL_DIR to clean code
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Mon, 23 Jun 2014 21:36:50 +0000 (05:36 +0800)]
set LD_LIBRARY_PATH of libgbe.so for gbe_bin_generater
it is needed for cross compiler
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Luo [Sun, 22 Jun 2014 22:03:30 +0000 (06:03 +0800)]
implement API clEnqueueFillImage.
enqueues a command to fill an image object with a specified color.
fix typo cl_context_get_static_kernel_from_bin.
v2:
fix image 1d array bug.
Signed-off-by: Luo <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Guo Yejun [Mon, 23 Jun 2014 20:14:21 +0000 (04:14 +0800)]
fix crash when OCL_STRICT_CONFORMANCE is unset
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Mon, 23 Jun 2014 08:38:56 +0000 (16:38 +0800)]
Add the format and flag support for printf.
The format and flag such as -+# and precision request has
been added into the output.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Thu, 19 Jun 2014 07:20:54 +0000 (15:20 +0800)]
update docs on environment variables.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Mon, 23 Jun 2014 08:59:56 +0000 (16:59 +0800)]
GBE: switch to non strict conformance mode by default.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Yi Sun [Mon, 23 Jun 2014 00:56:33 +0000 (08:56 +0800)]
utest_generator.py: add OCL_STRICT_CONFORMANCE enviroment condition.
For auto-generated math cases, when OCL_STRICT_CONFORMANCE is not set,
the expected diff increases to 1000x.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 23 Jun 2014 08:34:55 +0000 (16:34 +0800)]
GBE: declare correct prototype for fastpath_rootn
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 23 Jun 2014 08:34:54 +0000 (16:34 +0800)]
GBE: fix some builtin math function
__gen_ocl_exp stands for 2^x. So, use __gen_ocl_pow to implement native_exp().
Fix atanh implementation.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 23 Jun 2014 14:38:36 +0000 (22:38 +0800)]
Add some OpenCL1.2 parameters of function clGetDeviceInfo.
Include CL_DEVICE_LINKER_AVAILABLE, CL_DEVICE_PRINTF_BUFFER_SIZE, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 23 Jun 2014 14:38:35 +0000 (22:38 +0800)]
Fix a CL_MEM_HOST_PTR bug.
Can't add sub_offset if mem is image.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Ruiling Song [Mon, 23 Jun 2014 06:39:26 +0000 (14:39 +0800)]
GBE: replace OwningPtr with std::unique_ptr
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>