Zhigang Gong [Wed, 15 May 2013 10:15:34 +0000 (18:15 +0800)]
utests: Fix a bug in movforphi test case.
This test case is to trigger a old MovForPHI bug, and it
just use read/write_image. But it has a bug in itself.
As in the kernel, the write image will only write the
first lane data not all the 16 lanes. As the previous
patch fix the write_image bug, thus now the write_image
work correctly and thus it only touch the first data element
thus it trigger the bug in this test case. Now fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 15 May 2013 03:47:08 +0000 (11:47 +0800)]
GBE: fixed a prediction bug in typed write instruction.
We need to put the header initialization and the LOD initialization
to no mask and no predication state. For all the other parts, we need
to enable mask and use the current predication state and need to set
the quater control properly. Otherwise, when write_image is called in
a condition-branch block, it will trigger this bug and doesn't write
data out correctly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Mon, 13 May 2013 03:16:06 +0000 (11:16 +0800)]
GBE/Runtime: Optimize Sample/TypedWrite instruction.
This commit does two major things as below:
1. Allocate image surface at compile time, and add new gbe interfaces to let runtime know
how many image surfaces we have, and the image allocation informations. Thus the runtime
library know how to bind those image surfaces.
2. As now for both image and sampler, at compile time, we know the eaxct binding table
index. We no longer need to get those index from the input argument(curbe) and prepare
the desc to the architecture register. We can use imm as the desc thus we can save
4 out of 4 instructions for SampleInstruction and save 2 out of 12 instructions for
the TypedWriteInstruction.
This patch is also a major prepartion for the get_image_width/height/... functions.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Mon, 13 May 2013 03:00:38 +0000 (11:00 +0800)]
GBE: concentrate all samplers' allocation at compile time.
This is the first step to do image/sampler allocation fully
at compile time. Thus we can determine all the sampler id and image
bti index at compile time. So it can make the following things
easier or faster:
1. After we finish both image/sampler, we can treat all image bti and sampler
as constant and can get their value when we encode the Sampler and TypedWrite
instructions. Then we don't need to compute the message header at runtime which
cost 3 instructions each call.
2. get image width/height/depth. As we know the surface bti at compile time,
we can put those data at specified curbe entry and generate correct indirect
register access to get those information at compile time.
This is the first step. And just finish the sampler part. Now all the
samplers including those defeined in kernel arguments will be allocated
at compile time. At runtime, it just need to fill in the sampler value
into the proper slot which map to the specified input argument. Then the
driver will create and bind the sampler to the correct slot.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Thu, 9 May 2013 11:30:58 +0000 (19:30 +0800)]
GBE: add scalar register support in loadImmInstruction.
There is a slight possibility that the destination register
is a scalar register. We need to check it here.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Thu, 9 May 2013 06:37:09 +0000 (14:37 +0800)]
GBE: remove sampler address space.
As now sampler_t is a normal integer data type, we don't
need the sampler address space any more.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Tue, 7 May 2013 06:29:44 +0000 (14:29 +0800)]
utests: Add one test cases for sampler support.
This new case tests define sampler in kernel side and in the
kernel argument.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Tue, 7 May 2013 06:26:54 +0000 (14:26 +0800)]
CL: Support kernel side defined samplers.
We changed the way to handle samplers. We gather all the kernel side
defined samplers and those sampler in kernel argument into one samplers
array. And don't allocate one single sampler each time.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Zhigang Gong [Tue, 7 May 2013 06:15:45 +0000 (14:15 +0800)]
GBE: refine the sampler implementation to comply with spec.
The previous implementation is to use a new address space pointer to
represent a sampler. The reason is that there is no specified data
type for sampler_t in LLVM front end thus we can't determine the
sampler argument type if we use a normal interger to represnet the
sampler. But that breaks the OCL spec, the spec allows the kernel
to define and initialize sampler variables in kernel side.
Now I use a little tricky way to fix this problem. First, I decide
to use normal unsigned interger to represent sampler_t in kernel side.
Then at compile time, I check read_imagexxx function's sampler
arguments. If the argument is a constant value, then it should be a
kernel side defined sampler, then I insert the sampler type into a
global sampler set for the current kernel function. If the argument
is not a constant value, then I will check whether it's a kernel
argument, if it is, then I fix up the corresponding kernel arg type
to SAMPLER there.
To unify the kernel side defined sampler and kernel argument sampler,
I add two new gbe API. To export all the kernel side defined sampler
data and size to the runtime library. Then latter, the runtime library
can use this information to append new sampler to the unified sampler
buffer and bind all the sampler at one time.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lv, Meng <meng.lv@intel.com>
Homer Hsing [Mon, 6 May 2013 00:45:52 +0000 (08:45 +0800)]
test cases for image3d_t
test cases for image3d_t
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 6 May 2013 00:45:51 +0000 (08:45 +0800)]
enable image3d_t
enable image3d_t
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Wed, 8 May 2013 08:05:16 +0000 (16:05 +0800)]
add third coord in backend
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 6 May 2013 00:45:49 +0000 (08:45 +0800)]
add backend symbol alias for image3d read write
add backend symbol alias for image3d read write
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 6 May 2013 00:45:48 +0000 (08:45 +0800)]
add image3d read/write in stdlib.h
add image3d read/write in stdlib.h
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Mario Kicherer [Tue, 7 May 2013 19:57:37 +0000 (21:57 +0200)]
clEnqueueNDRangeKernel: fix for segfault caused by empty local_work_size
Without this fix, an empty local_work_size that is allowed by specification
causes a segfault. Merged the block with a further check below.
Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Mario Kicherer [Tue, 7 May 2013 19:57:36 +0000 (21:57 +0200)]
CMakeLists.txt enable custom C/CXXFLAGS
Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 8 May 2013 05:00:02 +0000 (13:00 +0800)]
GBE: fixed the hard coded implementation for sampler/typedwrite.
This is one preparation for the image3d support.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Homer Hsing <homer.xing@intel.com>
Mario Kicherer [Mon, 6 May 2013 15:49:50 +0000 (17:49 +0200)]
enable clGetContextInfo with CL_CONTEXT_DEVICES
The following patch enables the clGetContextInfo query with CL_CONTEXT_DEVICES.
Applications can query the required size of the result buffer and then request
the cl_device_id of the available devices in this context.
Signed-off-by: Mario Kicherer <dev@kicherer.org>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Chuanbo Weng [Fri, 3 May 2013 15:56:24 +0000 (23:56 +0800)]
Fix compile error when use llvm-3.1 and InstVisitor.h path for llvm-3.3
DataLayout.h only exist in llvm-3.2. InstVisitor.h is also contained
in 'llvm/Support/InstVisitor.h' directory in llvm-3.0, 3.1 and 3.2,
but is contained in 'llvm/InstVisitor.h'. argID and PAL is needed in
llvm-3.1. After applying this patch, beignet can build successfully
and all test cases can run successfully using llvm-3.1.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 03:12:20 +0000 (11:12 +0800)]
test cases for global constant arrays
version 4
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 01:00:31 +0000 (09:00 +0800)]
Support global constant arrays
Version 3.
Support global constant arrays defined outside any kernel.
Example:
constant int h[] = {71,72,73,74,75,76,77};
kernel void k(global int *dst) {
int i = get_global_id(0);
dst[i] = h[i % 7];
}
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 01:00:30 +0000 (09:00 +0800)]
ir::unit can return its constantSet
Version 3.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 01:00:29 +0000 (09:00 +0800)]
add special register constoffst expressing curbe offset
Version 3.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 01:00:28 +0000 (09:00 +0800)]
disable buggy old code doing global constant
Version 3.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 01:00:27 +0000 (09:00 +0800)]
add helper functions in gbe::Program
Version 3.
for get global constant related information
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Thu, 2 May 2013 01:00:26 +0000 (09:00 +0800)]
add helper functions in ir::Constant and ir::ConstantSet
Version 3.
Deleted empty destructor of ConstantSet.
Updated ConstantSet::getConstant()
And I am afraid I cannot delete the empty constructor of ConstantSet.
That will break the constructor of ir::Unit. See unit.cpp, line 30.
add ir::Constant.reg field
add helper functions in ir::ConstantSet to get specified ir::Constant
and binary packed constant array
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Sat, 27 Apr 2013 06:13:32 +0000 (14:13 +0800)]
New test case of ceil()
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Sat, 27 Apr 2013 06:13:26 +0000 (14:13 +0800)]
Make ceil() work
Delete an extra comma in ocl_stdlib.h, then ceil() works. Fun
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Junyan He [Fri, 26 Apr 2013 23:56:55 +0000 (07:56 +0800)]
Add the INCLUDE_DIRECTORIES directive to all the FindXXX modules.
If the header files are not at the standard location, we will FAIL
to compile because FIND_PATH directive will not add the Include Dir
to CXX flags. Add the INCLUDE_DIRECTORIES to handle this if we find
the headers we neeeds.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Fri, 19 Apr 2013 11:39:12 +0000 (13:39 +0200)]
Implement clGetDeviceInfo(..., CL_DRIVER_VERSION, ...)
This returns the library major/minor version. As it does not follow the
usual naming scheme, the output code is duplicated.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Fri, 19 Apr 2013 11:39:11 +0000 (13:39 +0200)]
Handle size queries in clGetDeviceInfo better
This allows size queries on all fields.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Mon, 22 Apr 2013 17:02:52 +0000 (19:02 +0200)]
Add missing fmin() and fmax() overloads
These functions must also exist for vector types.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Simon Richter [Mon, 22 Apr 2013 16:29:57 +0000 (18:29 +0200)]
Add missing select() overloads
Both signed and unsigned types are allowed as the condition type,
regardless of the true/false branch types.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Simon Richter [Mon, 22 Apr 2013 16:29:56 +0000 (18:29 +0200)]
Fix typo
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Homer Hsing [Tue, 23 Apr 2013 07:34:21 +0000 (15:34 +0800)]
Fix crash when output IR
When output IR, program crashes because two register names are missing.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Simon Richter [Mon, 22 Apr 2013 09:26:27 +0000 (11:26 +0200)]
Add typedefs for extension functions
This allows application programs to cast the pointer returned from
clGetExtensionFunctionAddress to the appropriate function pointer type.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Mon, 22 Apr 2013 09:26:26 +0000 (11:26 +0200)]
Add cl*Intel to extension function lookup
This allows applications to look up the extension functions when going
through the ICD loader.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Mon, 22 Apr 2013 09:26:25 +0000 (11:26 +0200)]
Rename Intel specific extension functions for ICD
The ICD extension specification requires that extension function names end
in the vendor name to allow clGetExtensionFunctionAddress to switch between
implementations.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 22 Apr 2013 01:35:00 +0000 (09:35 +0800)]
Add clIntelMapBufferGTT, clIntelUnmapBufferGTT, cl_mem_map_gtt and cl_mem_unmap_gtt
cl_mem_map_gtt calls cl_buffer_map_gtt.
cl_mem_unmap_gtt calls cl_buffer_unmap_gtt.
clIntelMapBufferGTT is a one to one mapping of drm_intel_gem_bo_map_gtt.
clIntelUnmapBufferGTT is a one to one mapping of drm_intel_gem_bo_unmap_gtt.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 22 Apr 2013 05:11:52 +0000 (13:11 +0800)]
Add constant ptr argument test case.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 22 Apr 2013 05:11:51 +0000 (13:11 +0800)]
Add constant pointer as argument support in runtime.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 22 Apr 2013 05:11:50 +0000 (13:11 +0800)]
Add constant pointer as argument support in kernel.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Yang Rong [Mon, 22 Apr 2013 05:11:49 +0000 (13:11 +0800)]
Add register allocate from tail support for constant buffer.
By default curbe alloc from head, grf alloc from tail.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Fri, 19 Apr 2013 08:31:09 +0000 (16:31 +0800)]
utests: should set pitch to zero if host_ptr is NULL.
Per OCL spec, we should set pitch to zero if the host_ptr is
NULL.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter <Simon.Richter@hogyros.de>
Simon Richter [Fri, 19 Apr 2013 06:12:59 +0000 (08:12 +0200)]
Implement clGetMemObjectInfo
Currently, only the properties
- CL_MEM_TYPE
- CL_MEM_FLAGS
- CL_MEM_SIZE
are implemented.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Fri, 19 Apr 2013 06:12:58 +0000 (08:12 +0200)]
Fix pitch parameter in clCreateImage2D
If the host_ptr is NULL, the pitch parameter must be 0.
The pitch parameter gives the number of bytes per scanline. Specifying 0
selects the minimum possible.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Tue, 2 Apr 2013 13:01:52 +0000 (15:01 +0200)]
Use "clang" command from PATH
This assumes that LLVM is installed in the system path, but avoids
compiling the path of binaries into the library.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Tue, 16 Apr 2013 18:04:42 +0000 (20:04 +0200)]
Make EGL optional
This fixes builds if EGL is unavailable. The OpenGL sharing extension will
be disabled then.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Tue, 16 Apr 2013 18:02:08 +0000 (20:02 +0200)]
Prefer versioned llvm-config
If multiple versions are installed, prefer version 3.2 before falling back
to the default version.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Fri, 12 Apr 2013 09:21:19 +0000 (11:21 +0200)]
Accept glibc's implementation of memalign()
If the platform is not Linux, but glibc based, we assume that the
memalign() function is working satisfactorily.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Wed, 3 Apr 2013 18:32:45 +0000 (20:32 +0200)]
Implement KHR ICD extension
This adds a pointer to the dispatch table at the beginning of every object
of type
- cl_command_queue
- cl_context
- cl_device_id
- cl_event
- cl_kernel
- cl_mem
- cl_platform_id
- cl_program
- cl_sampler
as required by the ICD specification. The layout of the dispatch table
comes from the OpenCL ICD loader by Brice Videau <brice.videau@imag.fr> and
Vincent Danjean <Vincent.Danjean@ens-lyon.org>.
To avoid dispatch table entries being overwritten with the ICD loader's
implementations of the CL functions (as would be the proper behaviour for
the ELF loader), the -Bsymbolic option is given to the linker.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Tue, 2 Apr 2013 13:11:01 +0000 (15:11 +0200)]
"Implement" clGetExtensionFunctionAddress()
This function should not fail if a function entry point cannot be found --
instead we return NULL.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Tue, 16 Apr 2013 18:05:54 +0000 (20:05 +0200)]
Avoid extension names as preprocessor tokens
The Khronos Group headers define constants with the names of extensions if
the header defines the extension API. When the preprocessor sees one of
these names, it performs macro substitution, leading to compilation errors.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Simon Richter [Tue, 2 Apr 2013 12:51:52 +0000 (14:51 +0200)]
Fix typo in cl_get_platform_info function name
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 17 Apr 2013 03:15:15 +0000 (11:15 +0800)]
write_image: Fixed a bug when use scalar data as color source.
If the color source is scalar data type for example from the
kernel input parameters, then we should use QnPhysical to get
the source, rather than hard coded the subnr to zero.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Meng Lv <meng.lv@intel.com>
Lu Guanqun [Tue, 16 Apr 2013 06:39:03 +0000 (14:39 +0800)]
GenRegister src[] is not allowed in clang, change this style.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 16 Apr 2013 06:38:58 +0000 (14:38 +0800)]
GenRegInterval should be a struct instead of a class
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 16 Apr 2013 06:38:54 +0000 (14:38 +0800)]
fix error in clang: variable array length is not support for non-POD element
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 16 Apr 2013 06:38:49 +0000 (14:38 +0800)]
--no-rtti should be changed to -fno-rtti in clang++
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 16 Apr 2013 06:38:45 +0000 (14:38 +0800)]
remove dollar sign in identifier
Therefore it removes compiler warnings.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 16 Apr 2013 06:32:45 +0000 (14:32 +0800)]
remove -ffast-math comiler flag
FC16 has the bug https://bugzilla.redhat.com/show_bug.cgi?id=795995
which prevents us from using our library on FC16.
Even though it's not our compiler's bug, we don't need fast-math flag here as
our compiler is not a computation oriented program, so we can remove this flag
safely.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Tue, 16 Apr 2013 03:12:01 +0000 (11:12 +0800)]
make raw_fd_ostream not close stdout
stdout was closed by llvm::raw_fd_ostream.
Now let llvm::raw_fd_ostream not close stdout.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@gmail.com>
Zhigang Gong [Thu, 11 Apr 2013 07:08:59 +0000 (15:08 +0800)]
utests: refine the helper macros.
The previous implementation has many duplicate code, and the FN
is used incorrectly. Now fix it and the code looks a little bit
clean and when it fails the reported function name should be
correct now.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Zhigang Gong [Wed, 10 Apr 2013 11:44:19 +0000 (19:44 +0800)]
utests: add a simple test case for cl_khr_gl_sharing.
This test case creates a OCL image from a OGL texture.
Then use a OCL kernel to fill the image. Then it back
to OGL to read the pixels back and verify the color.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lu, Guanqun <guanqun.lu@intel.com>
Zhigang Gong [Wed, 10 Apr 2013 11:39:40 +0000 (19:39 +0800)]
utests: added cl_khr_gl_sharing related helper functions.
This is the preparation to add clgl interoperate test cases.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lu, Guanqun <guanqun.lu@intel.com>
Zhigang Gong [Wed, 10 Apr 2013 11:23:41 +0000 (19:23 +0800)]
Fixed the extension string for both platform and device.
I forgot to calculate the extensions string size. Now fix it.
And also forgot to handle the device's extension case, now add
it and duplicate the extensions from the platform directly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lu, Guanqun <guanqun.lu@intel.com>
Zhigang Gong [Wed, 10 Apr 2013 09:38:46 +0000 (17:38 +0800)]
Enable the clFlush.
We don't need to do anything now, as current it alwasy flushs
all the commands each time. We may need to revisit here after we
optmize the clEnqueueNDRangeKernel's behaviour and don't flush
pipeline every time.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Lu, Guanqun <guanqun.lu@intel.com>
Homer Hsing [Wed, 10 Apr 2013 08:39:34 +0000 (16:39 +0800)]
Fix brw instruction field "flag"
bits2.da1.flag_subreg_nr is missing in brw_instruction.
The location of bits2.da1.flag_reg_nr is wrong. See IVB spec.
This patch fixes bugs above, also
make disassembler output correct flag_subreg_nr for conditional modifier
and prediction.
Before we change it:
(+f0.1) cmp.l(8) null g12<8,8,1>D g2.2<0,1,0>D {align1 WE_normal 1Q};
After we change it:
(+f1.1) cmp.l.f1.1(8) null g12<8,8,1>D g2.2<0,1,0>D {align1 WE_normal 1Q};
Although flag_reg_nr has moved position, other code is still right,
because if we use f0.1 before, now we use f1.0
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Lu, Guanqun <guanqun.lu@intel.com>
Lu Guanqun [Wed, 10 Apr 2013 08:12:08 +0000 (16:12 +0800)]
throw exception instead of just assert
So that we know the reason of why we're failing.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Wed, 10 Apr 2013 08:11:59 +0000 (16:11 +0800)]
release the contraint of volatile pointer
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Wed, 10 Apr 2013 08:11:53 +0000 (16:11 +0800)]
add disassembler support for message gateway
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Feng, Boqun [Wed, 10 Apr 2013 06:29:33 +0000 (14:29 +0800)]
backend: Use alignof keyword when supported
the keyword alignof of C++11 is supported after gcc 4.8, other than use
old template way to calculate the align of a class, the keyword is used.
Signed-off-by: Feng, Boqun <boqun.feng@intel.com>
Reviewed-by: Zhigang, Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Wed, 27 Mar 2013 12:22:29 +0000 (20:22 +0800)]
Update documents.
Update documents according to our latest progress, which is for
releasing version 0.1. Also add those html files back.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Lv, Meng [Tue, 26 Mar 2013 05:25:30 +0000 (05:25 +0000)]
enable CL_DEVICE_IMAGE_SUPPORT check
Signed-off-by: lv meng <meng.lv@intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 25 Mar 2013 03:33:40 +0000 (11:33 +0800)]
change the way clGetDeviceInfo() is called in cl_ocl_init()
This works as a test case for the newly implemented clGetDeviceInfo()
behaviour: pass NULL to param_value and it should return the string size.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 25 Mar 2013 03:29:31 +0000 (11:29 +0800)]
enhance clGetDeviceInfo() API to return the length of string fields
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 25 Mar 2013 03:19:05 +0000 (11:19 +0800)]
change the way clGetPlatformInfo() is called in cl_ocl_init()
This works as a test case for the newly implemented clGetPlatformInfo()
behaviour: pass NULL to param_value and it should return the string size.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Mon, 25 Mar 2013 03:04:19 +0000 (11:04 +0800)]
enhance clGetPlatformInfo() API to return the string length
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Fri, 22 Mar 2013 08:07:08 +0000 (16:07 +0800)]
add test case for clCreateContextFromType()
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Fri, 22 Mar 2013 08:07:43 +0000 (16:07 +0800)]
implement clCreateContextFromType()
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Fri, 22 Mar 2013 03:17:23 +0000 (11:17 +0800)]
fix one typo for clCreateContextFromType()
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Fri, 22 Mar 2013 03:40:55 +0000 (11:40 +0800)]
fix typo in FindLLVM.cmake
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 19 Mar 2013 12:41:49 +0000 (20:41 +0800)]
Fixed a potential null pointer reference bug.
When we failed to create a dri2 connection, we jump to the error out
path, and the driver_name may be null pointer, before set it to null,
we need to check it first.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Zhigang Gong [Tue, 19 Mar 2013 11:39:13 +0000 (19:39 +0800)]
Set the initial library versions to 0.1.
We have two libraries here, one is the gen backend and the other
is libcl runtime library. We set both initial versions to 0.1.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Zhigang Gong [Tue, 19 Mar 2013 11:37:41 +0000 (19:37 +0800)]
Change the cl version to 1.0.
The first stage goal is to deliver a OCL 1.0 implementation,
so let's change the version here.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Homer Hsing [Thu, 21 Feb 2013 03:03:08 +0000 (11:03 +0800)]
Add the interface of cl_buffer_map_gtt
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Mon, 25 Feb 2013 05:16:46 +0000 (13:16 +0800)]
fix unused-result warning
fix unused-result warning, because we didn't use the return value of fread
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Sun, 17 Feb 2013 05:23:41 +0000 (13:23 +0800)]
Output the map from IR reg to ASM reg
It is hard to guess the meaning of
"mul(8) g10<1>d g0.6<0,1,0>d g3.2<0,1,0>d"
if you don't know the IR reg num of "g10" etc.
Now we can output the map from IR reg to ASM reg, such as
"%0 g10.0D"
"%1 g0.6D"
"%2 g3.2D"
So you know the meaning is
"mul %0 %1 %2"
By default, not output those message.
You can turn on by BVAR "OCL_OUTPUT_REG_ALLOC".
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Thu, 21 Feb 2013 09:09:51 +0000 (17:09 +0800)]
Use new OCL1.2 API rather than those deprecated API.
Use clCreateImage to replace the old API clCreateImage2D.
It will silent the compiler warnings.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Lu Guanqun [Tue, 9 Apr 2013 06:41:08 +0000 (14:41 +0800)]
do not use the advanced C++ feature
so that it can work on gcc 4.6 happily, otherwise, I have the following issues:
backend/src/./ir/instruction.hpp:112:7: error: type ‘gbe::ir::Instruction’ is not a direct base of ‘gbe::ir::Instruction’
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 9 Apr 2013 06:44:07 +0000 (14:44 +0800)]
add a case for MEM_INVALID to fix a warning
warning: enumeration value ‘MEM_INVALID’ not handled in switch [-Wswitch]
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Tue, 9 Apr 2013 06:22:58 +0000 (14:22 +0800)]
add linking library for gcc compiler.
Otherwise, there are lots of linking errors such as:
undefined reference to `llvm::ConstantDataSequential::getElementType() const'
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zhigang Gong [Tue, 19 Feb 2013 11:52:03 +0000 (19:52 +0800)]
Fixed a bug when expire registers.
The previous implementation forgot to change the head when
the to expired register is at the left side of the current
head. Thus the algorithm will be broken, as the algorithm need
the head has the smallest offset.
Without this patch, the register expireing doesn't work. Thus
any kernel function need more than 44 DWORD registers or 11
DWORD vec4 will fail to get registers.
The calculation is:
(Register file size / (type size * simdwidth) - special registers)
(4K / (4 * 16)) - 20 = 44
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Zhigang Gong [Tue, 19 Feb 2013 03:26:35 +0000 (11:26 +0800)]
Add utest case for movforphi's undef case.
This case will trigger MovForPhi to handle a undef vector
element.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
Homer Hsing [Thu, 21 Feb 2013 02:45:42 +0000 (10:45 +0800)]
Delete temp files if compiled successfully
If IR was compiled successfully, delete temp files.
My '/tmp' was full of '*.cl', '*.ll' files.
Now the temp files are gone, world is clean :)
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Lu Guanqun <guanqun.lu@intel.com>
Homer Hsing [Sun, 17 Feb 2013 02:59:44 +0000 (10:59 +0800)]
Output meaning of special registers in dumped IR
Help debug IR. Before we change this, we feel hard to know
what "%3, %4" stuff mean in IR. Now we output their meaning.
Before we change this, dumped IR is:
.decl.dword %0
.decl.dword %1
.decl.dword %2
After we change this, dumped IR is:
.decl.dword %0 local_id_0
.decl.dword %1 local_id_1
.decl.dword %2 local_id_2
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Sun, 17 Feb 2013 01:59:18 +0000 (09:59 +0800)]
Display function argument name in IR
Help debug IR. If we see "LT.int32 %31 %30 %20" and we know
"%20" is a input argument with detailed name, we can debug
IR better.
Before we change it, dumped IR is:
decl_input.value %20
After we change it, dumped IR is:
decl_input.value %20 argument_name
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Lu Guanqun [Wed, 30 Jan 2013 01:11:10 +0000 (09:11 +0800)]
fix the possible overflow in slm_sz
slm_sz in this structure has 16 bits, but we might specify 64KB which has 17
bits, it would thus cause overflow and undesired truncation.
Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Zou, Nanhai [Mon, 4 Feb 2013 07:10:40 +0000 (15:10 +0800)]
Fix uninitialize value warning
Give a inital value to shut up compiler
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Homer Hsing [Fri, 1 Feb 2013 07:52:44 +0000 (15:52 +0800)]
Also make "arithmetic shift right" work
Only add a line of code ...
Looks like Mr. Ben has forgotten that line ...
Also add a test case.
Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>