contrib/beignet.git
10 years agomerge some state buffers into one buffer
Guo Yejun [Thu, 6 Mar 2014 16:59:38 +0000 (00:59 +0800)]
merge some state buffers into one buffer

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoFix a convert float to long bug.
Yang Rong [Mon, 3 Mar 2014 03:25:19 +0000 (11:25 +0800)]
Fix a convert float to long bug.

When convert some special float values, slight large than LONG_MAX, to long with sat,
will error. Simply using LONG_MAX when float value equal to LONG_MAX.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Optimize byte/short load/store using untyped read/write
Ruiling Song [Fri, 7 Mar 2014 05:48:48 +0000 (13:48 +0800)]
GBE: Optimize byte/short load/store using untyped read/write

Scatter/gather are much worse than untyped read/write. So if we can pack
load/store of char/short to use untyped message, jut do it.

v2:
add some assert in splitReg()

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Fix a potential issue if increase srcNum.
Ruiling Song [Fri, 7 Mar 2014 05:48:47 +0000 (13:48 +0800)]
GBE: Fix a potential issue if increase srcNum.

If increase MAX_SRC_NUM for ir::Instruction, unpredicted behaviour may happen.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: make vload3 only read 3 elements.
Ruiling Song [Fri, 7 Mar 2014 05:48:46 +0000 (13:48 +0800)]
GBE: make vload3 only read 3 elements.

clang will align the vec3 load into vec4. we have to do it in frontend.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Optimize scratch memory usage using register interval
Ruiling Song [Fri, 28 Feb 2014 02:16:45 +0000 (10:16 +0800)]
GBE: Optimize scratch memory usage using register interval

As scratch memory is a limited resource in HW. And different
register have the opptunity to share same scratch memory. So
I introduce an allocator for scratch memory management.

v2:
In order to reuse the registerFilePartitioner, I rename it as
SimpleAllocator, and derive ScratchAllocator & RegisterAllocator
from it.

v3:
fix a typo, scratch size is 12KB.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: show correct line number in build log
Guo Yejun [Thu, 27 Feb 2014 17:58:20 +0000 (01:58 +0800)]
GBE: show correct line number in build log

Sometimes, we insert some code into the kernel,
it makes the line number reported in build log
mismatch with the line number in the kernel from
programer's view, use #line to correct it.

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: support getelementptr with ConstantExpr operand
Guo Yejun [Wed, 26 Feb 2014 22:54:26 +0000 (06:54 +0800)]
GBE: support getelementptr with ConstantExpr operand

Add support during LLVM IR -> Gen IR period when the
first operand of getelementptr is ConstantExpr.

utest is also added.

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: add fast path for more math functions
Guo Yejun [Thu, 20 Feb 2014 21:51:33 +0000 (05:51 +0800)]
GBE: add fast path for more math functions

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: remove the useless get sampler info function.
Zhigang Gong [Fri, 21 Feb 2014 05:09:20 +0000 (13:09 +0800)]
GBE: remove the useless get sampler info function.

We don't need to get the sampler info dynamically, so
remove the corresponding instruction.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: optimize read_image to avoid get sampler info dynamically.
Zhigang Gong [Fri, 21 Feb 2014 04:50:55 +0000 (12:50 +0800)]
GBE: optimize read_image to avoid get sampler info dynamically.

Most of time, the user is using a const sampler value in the kernel
directly. Thus we don't need to get the sampler value through a function
call. And this way, the compiler front end could do much better optimization
than using the dynamic get sampler information. For the luxmark's
median/simple case, this patch could get about 30-45% performance gain.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: don't put a long live register to a selection vector.
Zhigang Gong [Fri, 21 Feb 2014 02:40:08 +0000 (10:40 +0800)]
GBE: don't put a long live register to a selection vector.

If an element has very long interval, we don't want to put it into a
vector as it will add more pressure to the register allocation.

With this patch, it can reduce more than 20% spill registers for luxmark's
median scene benchmark(from 288 to 224).

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: prepare to optimize generic selection vector allocation.
Zhigang Gong [Wed, 19 Feb 2014 02:16:48 +0000 (10:16 +0800)]
GBE: prepare to optimize generic selection vector allocation.

Move the selection vector allocation after the register interval
calculation.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fixed a potential bug in 64 bit instruction.
Zhigang Gong [Wed, 19 Feb 2014 02:47:46 +0000 (10:47 +0800)]
GBE: fixed a potential bug in 64 bit instruction.

Current selection vector handling requires the dst/src
vector is starting at dst(0) or src(0).

v2:
fix an assertion.
v3:
fix a bug in gen_context.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix the overflow bug in register spilling.
Zhigang Gong [Wed, 19 Feb 2014 08:36:33 +0000 (16:36 +0800)]
GBE: fix the overflow bug in register spilling.

Change to use int32 to represent the maxID.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: code cleanup for read_image/write_image.
Zhigang Gong [Tue, 18 Feb 2014 10:32:33 +0000 (18:32 +0800)]
GBE: code cleanup for read_image/write_image.

Remove some useless instructions and make the read/write_image
more readable.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fixed the incorrect max_dst_num and max_src_num.
Zhigang Gong [Tue, 18 Feb 2014 09:41:05 +0000 (17:41 +0800)]
GBE: fixed the incorrect max_dst_num and max_src_num.

Some I64 instructions are using more than 11 dst registers,
this patch change the max src number to 16. And add a assertion
to check if we run into this type of issue again.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: Optimize write_image instruction for simd8 mode.
Zhigang Gong [Tue, 18 Feb 2014 09:19:41 +0000 (17:19 +0800)]
GBE: Optimize write_image instruction for simd8 mode.

On simd8 mode, we can put the u,v,w,x,r,g,b,a to
a selection vector directly and don't need to
assign those values again.

Let's see an example, the following code is generated without this
patch which is doing a simple image copy:

    (26      )  (+f0) mov(8)    g113<1>F        g114<8,8,1>D                    { align1 WE_normal 1Q };
    (28      )  (+f0) send(8)   g108<1>UD       g112<8,8,1>F
                sampler (3, 0, 0, 1) mlen 2 rlen 4              { align1 WE_normal 1Q };
    (30      )  mov(8)          g99<1>UD        0x0UD                           { align1 WE_all 1Q };
    (32      )  mov(1)          g99.7<1>UD      0xffffUD                        { align1 WE_all };
    (34      )  mov(8)          g103<1>UD       0x0UD                           { align1 WE_all 1Q };
    (36      )  (+f0) mov(8)    g100<1>UD       g117<8,8,1>UD                   { align1 WE_normal 1Q };
    (38      )  (+f0) mov(8)    g101<1>UD       g114<8,8,1>UD                   { align1 WE_normal 1Q };
    (40      )  (+f0) mov(8)    g104<1>UD       g108<8,8,1>UD                   { align1 WE_normal 1Q };
    (42      )  (+f0) mov(8)    g105<1>UD       g109<8,8,1>UD                   { align1 WE_normal 1Q };
    (44      )  (+f0) mov(8)    g106<1>UD       g110<8,8,1>UD                   { align1 WE_normal 1Q };
    (46      )  (+f0) mov(8)    g107<1>UD       g111<8,8,1>UD                   { align1 WE_normal 1Q };
    (48      )  (+f0) send(8)   null            g99<8,8,1>UD
                renderunsupported target 5 mlen 9 rlen 0        { align1 WE_normal 1Q };
    (50      )  (+f0) mov(8)    g1<1>UW         0x1UW                           { align1 WE_normal 1Q };
  L1:
    (52      )  mov(8)          g112<1>UD       g0<8,8,1>UD                     { align1 WE_all 1Q };
    (54      )  send(8)         null            g112<8,8,1>UD
                thread_spawnerunsupported target 7 mlen 1 rlen 0 { align1 WE_normal 1Q EOT };

With this patch, we can optimize it as below:

    (26      )  (+f0) mov(8)    g106<1>F        g111<8,8,1>D                    { align1 WE_normal 1Q };
    (28      )  (+f0) send(8)   g114<1>UD       g105<8,8,1>F
                sampler (3, 0, 0, 1) mlen 2 rlen 4              { align1 WE_normal 1Q };
    (30      )  mov(8)          g109<1>UD       0x0UD                           { align1 WE_all 1Q };
    (32      )  mov(1)          g109.7<1>UD     0xffffUD                        { align1 WE_all };
    (34      )  mov(8)          g113<1>UD       0x0UD                           { align1 WE_all 1Q };
    (36      )  (+f0) send(8)   null            g109<8,8,1>UD
                renderunsupported target 5 mlen 9 rlen 0        { align1 WE_normal 1Q };
    (38      )  (+f0) mov(8)    g1<1>UW         0x1UW                           { align1 WE_normal 1Q };
  L1:
    (40      )  mov(8)          g112<1>UD       g0<8,8,1>UD                     { align1 WE_all 1Q };
    (42      )  send(8)         null            g112<8,8,1>UD
                thread_spawnerunsupported target 7 mlen 1 rlen 0 { align1 WE_normal 1Q EOT };

This patch could save about 8 instructions per write_image.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: optimize sample instruction.
Zhigang Gong [Tue, 18 Feb 2014 06:40:59 +0000 (14:40 +0800)]
GBE: optimize sample instruction.

The U,V,W registers could be allocated to a selection vector directly.
Then we can save some MOV instructions for the read_image functions.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoChange the order of the code
xiuli pan [Fri, 21 Feb 2014 08:25:20 +0000 (16:25 +0800)]
Change the order of the code

Fix the 66K problem in the OpenCV testing.
The bug was casued by the incorrect order
of the code, it will result the beignet to
calculate the whole localsize of the kernel
file. Now the OpenCV test can pass.

Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoFix a long DIV/REM hang.
Yang Rong [Fri, 21 Feb 2014 08:54:39 +0000 (16:54 +0800)]
Fix a long DIV/REM hang.

There is a jumpi in long DIV/REM, with predication is any16/any8. So
MUST AND the predication register with emask, otherwise may dead loop.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of rootn
Lv Meng [Tue, 14 Jan 2014 03:04:57 +0000 (11:04 +0800)]
GBE: improve precision of rootn

Signed-off-by: Lv Meng <meng.lv@intel.com>
10 years agoRemove some unreasonable input values for rootn
Yi Sun [Thu, 20 Feb 2014 01:32:32 +0000 (09:32 +0800)]
Remove some unreasonable input values for rootn

In manual for function pow(), there's following description:
"If x is a finite value less than 0,
and y is a finite noninteger,
a domain error occurs, and a NaN is returned."
That means we can't calculate rootn in cpu like this pow(x,1.0/y) which is mentioned in OpenCL spec.
E.g. when y=3 and x=-8, rootn should return -2. But when we calculate pow(x, 1.0/y), it will return a Nan.
I didn't find multi-root math function in glibc.

Signed-off-by: Yi Sun <yi.sun@intel.com>
10 years agoutests:add subnormal check by fpclassify.
Yi Sun [Wed, 19 Feb 2014 06:12:03 +0000 (14:12 +0800)]
utests:add subnormal check by fpclassify.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Shui yangwei <yangweix.shui@intel.com>
10 years agoChange %.20f to %e.
Yi Sun [Wed, 19 Feb 2014 06:04:52 +0000 (14:04 +0800)]
Change %.20f to %e.

This can make the error information more readable.

Signed-off-by: Yi Sun <yi.sun@intel.com>
10 years agoGBE: add param to switch the behavior of math func
Guo Yejun [Mon, 17 Feb 2014 21:30:27 +0000 (05:30 +0800)]
GBE: add param to switch the behavior of math func

Add OCL_STRICT_CONFORMANCE to switch the behavior of math func,
The funcs will be high precision with perf drops if it is 1, Fast
path with good enough precision will be selected if it is 0.

This change is to add the code basis, with 'sin' and 'cos' implemented
as examples, other math functions support will be added later.

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
10 years agoutests: Remove test cases for function 'tgamma' 'erf' and 'erfc'
Yi Sun [Mon, 17 Feb 2014 03:32:47 +0000 (11:32 +0800)]
utests: Remove test cases for function 'tgamma' 'erf' and 'erfc'

Since OpenCL conformance doesn't cover these function at the moment,
we remove them temporarily.

Signed-off-by: Yi Sun <yi.sun@intel.com>
10 years agoImprove precision of sinpi/cospi
Ruiling Song [Mon, 17 Feb 2014 08:54:20 +0000 (16:54 +0800)]
Improve precision of sinpi/cospi

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix terminfo library linkage
Boqun Feng [Mon, 17 Feb 2014 01:49:26 +0000 (09:49 +0800)]
GBE: fix terminfo library linkage

In some distros, the terminal libraries are divided into two
libraries, one is tinfo and the other is ncurses, however, for
other distros, there is only one single ncurses library with
all functions.
In order to link proper terminal library for LLVM, find_library
macro in cmake can be used. In this patch, the tinfo is prefered,
so that it wouldn't affect linkage behavior in distros with tinfo.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
10 years agoutests: define python interpreter via cmake variable
Boqun Feng [Sat, 15 Feb 2014 06:52:44 +0000 (14:52 +0800)]
utests: define python interpreter via cmake variable

The reason for this fix is in commit
5b64170ef5e3e78d038186fb1132b11a8fec308e.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoCL: make the scratch size as a device resource attribute.
Zhigang Gong [Fri, 14 Feb 2014 08:11:36 +0000 (16:11 +0800)]
CL: make the scratch size as a device resource attribute.

Actually, the scratch size is much like the local memory size
which should be a device dependent information.

This patch is to put scratch mem size to the device attribute
structure. And when the kernel needs more than the maximum scratch
memory, we just return a out-of-resource error rather than trigger
an assertion.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
10 years agofix typo: blobTempName is assigned but not used
Guo Yejun [Thu, 13 Feb 2014 03:59:48 +0000 (11:59 +0800)]
fix typo: blobTempName is assigned but not used

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Support 64Bit register spill.
Ruiling Song [Fri, 14 Feb 2014 07:04:26 +0000 (15:04 +0800)]
GBE: Support 64Bit register spill.

Now we support DWORD & QWORD register spill/fill.

v2:
  only add poolOffset by 1 when we meet QWord register and poolOffset is 1.

v3:
  allocate reserved register pool unifiedly for src and dst register.
  when it spill a qword register, payload register should be retyped as dword per bottom/top logic.
  put a limit on the scratch space memory size.

v4:
  fix a typo.
  increase the reserved register from 6 to 8 for some complex instruction.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agocmake: Fix linking with LLVM/Terminfo
Igor Gnatenko [Thu, 13 Feb 2014 07:16:35 +0000 (11:16 +0400)]
cmake: Fix linking with LLVM/Terminfo

DEBUG: [  9%] Building CXX object backend/src/CMakeFiles/gbe_bin_generater.dir/gbe_bin_generater.cpp.o
DEBUG: Linking CXX executable gbe_bin_generater
DEBUG: /usr/lib64/llvm/libLLVMSupport.a(Process.o): In function `llvm::sys::Process::FileDescriptorHasColors(int)':
DEBUG: (.text+0x717): undefined reference to `setupterm'
DEBUG: /usr/lib64/llvm/libLLVMSupport.a(Process.o): In function `llvm::sys::Process::FileDescriptorHasColors(int)':
DEBUG: (.text+0x727): undefined reference to `tigetnum'
DEBUG: /usr/lib64/llvm/libLLVMSupport.a(Process.o): In function `llvm::sys::Process::FileDescriptorHasColors(int)':
DEBUG: (.text+0x730): undefined reference to `set_curterm'
DEBUG: /usr/lib64/llvm/libLLVMSupport.a(Process.o): In function `llvm::sys::Process::FileDescriptorHasColors(int)':
DEBUG: (.text+0x738): undefined reference to `del_curterm'

Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoBump to version 0.8.0.
Zhigang Gong [Mon, 10 Feb 2014 08:28:37 +0000 (16:28 +0800)]
Bump to version 0.8.0.

This version brings many improvments compare to the last released version 0.3,
so that we decide to bump the version to 0.8.0 directly. Before the 1.0.0, we
have two steps left. One is the performance optimization and the other is to
support OpenCL 1.2 by default.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoDocs: fix some markdown errors and add some new info.
Zhigang Gong [Wed, 12 Feb 2014 07:20:45 +0000 (15:20 +0800)]
Docs: fix some markdown errors and add some new info.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoFix build errors in llvm3.5 only system.
Yang Rong [Wed, 12 Feb 2014 15:41:26 +0000 (23:41 +0800)]
Fix build errors in llvm3.5 only system.

There are some head files miss if have llvm3.5 only. If has previous llvm, even uninstall,
will still remain these head files in system, so can't trigger it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix the cmake problem in FindLLVM.
Zhigang Gong [Tue, 11 Feb 2014 09:51:50 +0000 (17:51 +0800)]
Fix the cmake problem in FindLLVM.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoUpdate document for LLVM/Clang 3.5.
Zhigang Gong [Mon, 10 Feb 2014 08:28:36 +0000 (16:28 +0800)]
Update document for LLVM/Clang 3.5.

Also change the README.md to link to Beignet.mdw rather than to point to the wiki page.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fixed the unsafe tmpnam_r.
Zhigang Gong [Sat, 8 Feb 2014 06:12:03 +0000 (14:12 +0800)]
GBE: fixed the unsafe tmpnam_r.

Use mkstemps instead.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoSilent compilation warning in sampler functions.
Zhigang Gong [Sat, 8 Feb 2014 06:12:02 +0000 (14:12 +0800)]
Silent compilation warning in sampler functions.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoAdd clang/LLVM 3.5svn support.
Zhigang Gong [Sat, 8 Feb 2014 03:16:43 +0000 (11:16 +0800)]
Add clang/LLVM 3.5svn support.

The clang/llvm 3.3 has some minor bugs such as the vector ++/-- which
was fixed in 3.4. But the 3.4 version introduces severer OCL bugs as
below:
http://llvm.org/bugs/show_bug.cgi?id=18119
http://llvm.org/bugs/show_bug.cgi?id=18120

It seems that the community will only fix these bugs in the ToT version
rather than the llvm 3.4 branch. I think we'd better to enable clang/llvm
3.5 in beignet. Currently, the 18120 was fixed in ToT, but 18119 still
breaks us. When 18119 get fixed, I will switch the preferred version to
3.5.

Please be noted, when you build clang/llvm 3.5, you need to enable the
cxx11 to make it compatible with beignet.

--enable-cxx11

v2:
fix the llvm3.4 issue.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoMake build compatible with Python 2.6
Jon Nordby [Thu, 6 Feb 2014 18:50:59 +0000 (19:50 +0100)]
Make build compatible with Python 2.6

Implicit numbers for format specifiers "{}" can only be used on Py2.7+,
and Py2.6 is still in use on for instance CentOS 6.5 and similar.

Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix the problem by kernel file open in utest
Junyan He [Sun, 26 Jan 2014 10:16:12 +0000 (18:16 +0800)]
Fix the problem by kernel file open in utest

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
10 years agoUpdate documents.
Zhigang Gong [Mon, 20 Jan 2014 10:44:03 +0000 (18:44 +0800)]
Update documents.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: fixed the out-of-range JMPI.
Zhigang Gong [Mon, 27 Jan 2014 01:26:21 +0000 (09:26 +0800)]
GBE: fixed the out-of-range JMPI.

For the conditional jump distance out of S15 range [-32768, 32767],
we need to use an inverted jmp followed by a add ip, ip, distance
to implement. A little hacky as we need to change the nop instruction
to add instruction manually.

There is an optimization method which we can insert a
ADD instruction on demand. But that will need some extra analysis
for all the branching instruction. And need to adjust the distance
for those branch instruction's start point and end point contains
this instruction.

After this patch, the luxrender's slg4 could render the scene "alloy"
correctly.

v2:
fix the unconditional branch too.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang, Rong R <rong.r.yang@intel.com>
10 years agoWhen local_work_size is null, try to choose a local_work_size.
Yang Rong [Sun, 26 Jan 2014 08:36:58 +0000 (16:36 +0800)]
When local_work_size is null, try to choose a local_work_size.

After fix all found fails when local_work_size is not 1, re-enalbe it to
improve performance.

V2: refine to skip some useless loop.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoMultiple register's hstride in suboffset.
Yang Rong [Tue, 28 Jan 2014 03:03:15 +0000 (11:03 +0800)]
Multiple register's hstride in suboffset.

When register's hstride is not 0 or 1, suboffset will get wrong element.
Also change some offsets that already multiple hstride by hard code.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Implement complete register spill policy.
Zhigang Gong [Sun, 26 Jan 2014 06:07:14 +0000 (14:07 +0800)]
GBE: Implement complete register spill policy.

This patch implement a complete register spill policy.

When it needs to spill a register, we always choose the
register which is in the spill candate map and has
maximum endpoint. One tricky I used here is to merge both
the register's endpoint value and the register itself
into one single key. Then I can use one map to implement a
descending order map according to its value( the instruction
endpoint value). This patch supports to spill both vectors
or non-vectors.

And I move the scratch memory allocation from
instruction selection to register allocation. We may latter
use the internal interval information to reduce the scratch
memory comsumption.

Another big change is that I don't perform the real
spill on the fly. Instead, I move the real spill to the end of
all register allocation. Then spilling all the registers which
in the spillSet at one pass. This has the following advantage:
1. It only needs to loop over all instructions once.
2. When spilling one instruction, we know all the registers' status.
   Then it's easy to know the correct scratch id for each register.
   Actually, the previous implementation has a bug here.

The last part is to avoid the spill instruction restrication.
As ruiling pointed out that the spill instruction(scratch read/write)
doesn't support predication correctly for non-DW data type.

This patch avoids to spill any non-supported type register.

After this patch, both luxrender and opencv examples work fine on
my machine.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang, Rong R <rong.r.yang@intel.com>
10 years agoGBE: prepare to optimize the register spilling policy.
Zhigang Gong [Fri, 24 Jan 2014 09:31:29 +0000 (17:31 +0800)]
GBE: prepare to optimize the register spilling policy.

It's better to choose the proper register to spill
rather than always spill current register. This patch
is a preparation of a better spilling policy.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang, Rong R <rong.r.yang@intel.com>
10 years agoGBE: refine register allocation output.
Zhigang Gong [Fri, 24 Jan 2014 04:33:10 +0000 (12:33 +0800)]
GBE: refine register allocation output.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoAdd the device id for haswell GT.
Junyan He [Tue, 14 Jan 2014 08:43:42 +0000 (16:43 +0800)]
Add the device id for haswell GT.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix the bug in removeLOADIs function.
Junyan He [Wed, 22 Jan 2014 06:02:30 +0000 (14:02 +0800)]
Fix the bug in removeLOADIs function.

The logic for replacing the dst of the instruction
using the src number and getSrc. Fix this problem.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: allow the bool registers to be expired.
Zhigang Gong [Thu, 23 Jan 2014 06:25:55 +0000 (14:25 +0800)]
GBE: allow the bool registers to be expired.

After the previous's extra liveness analysis, we can allow bool
registers to be expired now.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: Implement an extra liveness analysis for the Gen backend.
Zhigang Gong [Thu, 23 Jan 2014 06:15:05 +0000 (14:15 +0800)]
GBE: Implement an extra liveness analysis for the Gen backend.

  Consider the following scenario, %100's normal liveness will start from Ln-1's
  position. In normal analysis, the Ln-1 is not Ln's predecessor, thus the liveness
  of %100 will be passed to Ln and then will not be passed to L0.

  But considering we are running on a multilane with predication's vector machine.
  The unconditional BR in Ln-1 may be removed and it will enter Ln with a subset of
  the revert set of Ln-1's predication. For example when running Ln-1, the active lane
  is 0-7, then at Ln the active lane is 8-15. Then at the end of Ln, a subset of 8-15
  will jump to L0. If a register %10 is allocated the same GRF as %100, given the fact
  that their normal liveness doesn't overlapped, the a subset of 8-15 lanes will be
  modified. If the %10 and %100 are the same vector data type, then we are fine. But if
  %100 is a float vector, and the %10 is a bool or short vector, then we hit a bug here.

L0:
  ...
  %10 = 5
  ...
Ln-1:
  %100 = 2
  BR Ln+1

Ln:
  ...
  BR(%xxx) L0

Ln+1:
  %101 = %100 + 2;
  ...

  The solution to fix this issue is to build an extra liveness analysis. We will start with
  those BBs with backward jump. Then pass all the liveOut register as extra liveIn
  of current BB and then forward this extra liveIn to all the blocks. This is very similar
  to the normal liveness analysis just with reverse direction.

  Thanks yang rong who found this bug.

v2:
  Don't remove livein when initialize the extra livein.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: increase the disassembly output's readability.
Zhigang Gong [Wed, 22 Jan 2014 02:32:08 +0000 (10:32 +0800)]
GBE: increase the disassembly output's readability.

Add label information and the instruction address
prefix. Make the address consistent with fulsim.
And also make the register allocation output a little
bit prettier.

Now the disassembly output is as below:
compiler_ceil's disassemble begin:
  L0:
    (0       )  mov(1)          f0<1>UW         0x0UW                           { align1 WE_all };
    ....
    (32      )  (+f0) mov(16)   g1<1>UW         0x1UW                           { align1 WE_normal 1H };
  L1:
    (34      )  mov(16)         g112<1>UD       g0<8,8,1>UD                     { align1 WE_all 1H };
    ...
compiler_ceil's disassemble end.

The register allocation output is as below:
%26      g2  .8   4  B  [0        -> 0       ]
%28      g2  .12  4  B  [0        -> 6       ]
%29      g2  .16  4  B  [0        -> 9       ]
%30      g126.0   64 B  [2        -> 3       ]
%31      g124.0   64 B  [3        -> 4       ]

Please be noted, the register allocation's output is not correct
when the register is a pure scalar(bool) register which allocated
at the backend instruction selection stage. To be fixed.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: fixed a bug in sample instruction.
Zhigang Gong [Tue, 21 Jan 2014 05:15:39 +0000 (13:15 +0800)]
GBE: fixed a bug in sample instruction.

Sample instruction only have 3 source operands now, not 4.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agoGBE: fix some incorrect gen ir output messages.
Zhigang Gong [Tue, 21 Jan 2014 04:13:04 +0000 (12:13 +0800)]
GBE: fix some incorrect gen ir output messages.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: don't allocate grf for those bools which map to flag.
Zhigang Gong [Tue, 21 Jan 2014 00:34:29 +0000 (08:34 +0800)]
GBE: don't allocate grf for those bools which map to flag.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
10 years agobuild: work around an old version cmake bug.
Zhigang Gong [Mon, 20 Jan 2014 09:14:48 +0000 (17:14 +0800)]
build: work around an old version cmake bug.

On fedora core 15 with the cmake 2.8.4, Yi experienced a build error.
It turns out that the cmake may handle the file directorys with double
slashs incorrectly when the file is on a target's dependcy list and
be a output file name of a custom command.

This small patch could work around that issue.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Sun, Yi" <yi.sun@intel.com>
10 years agoGBE: use native exp instruction when enough precision
Guo Yejun [Mon, 20 Jan 2014 00:38:23 +0000 (08:38 +0800)]
GBE: use native exp instruction when enough precision

for the input data with enough precision, use the native exp instruction,
otherwise, use the software path to emulate the exp function.

Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoFix the bug of multi deleting of load instruction in lowering
Junyan He [Mon, 20 Jan 2014 03:28:43 +0000 (11:28 +0800)]
Fix the bug of multi deleting of load instruction in lowering

When the load instruction has multi-value destinations, the load
instruction in buildConstantPush function will be replaced many
times and which can cause the potential problems.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd utest compiler_private_data_overflow
Yongjia Zhang [Fri, 17 Jan 2014 08:20:02 +0000 (16:20 +0800)]
Add utest compiler_private_data_overflow

utests: compiler_private_data_overflow is aimed to hit a larger than
1KB stack. It will fail with the old beignet which allocate 1KB stack
size no matter the actual usage of stack in the kernel.

Signed-off-by: Yongjia Zhang<zhang_yong_jia@126.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd some native functions vector proto.
Yang Rong [Fri, 17 Jan 2014 08:22:56 +0000 (16:22 +0800)]
Add some native functions vector proto.

Native functions just define as normal function before, so don't need
vector proto. Now only native_exp2 and native_sqrt define as exp2 and sqrt,
so enable others'.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRemove builtin function fma from utest_math_gen.py.
Yi Sun [Thu, 9 Jan 2014 07:56:04 +0000 (15:56 +0800)]
Remove builtin function fma from utest_math_gen.py.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoutests: Put all the generated kernel files to .gitignore at runtime.
Zhigang Gong [Tue, 14 Jan 2014 03:10:00 +0000 (11:10 +0800)]
utests: Put all the generated kernel files to .gitignore at runtime.

As there are so many generated kernel files, it's annoying when I use
git status to check the modified files and new added files. This patch
to put all of them to the gitignore file which could make things easier.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fixed the hacky code of 3D image read/write.
Zhigang Gong [Fri, 17 Jan 2014 05:05:20 +0000 (13:05 +0800)]
GBE: fixed the hacky code of 3D image read/write.

The previous implementation use a magic virtual register(0) to
indiate this is a 2D read/write. This is too hacky and may hide
bugs in the future. Now fix it without create any dumy virtual
register.

Also clean up some useless enums.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fix the hack code of sampler offset handling.
Zhigang Gong [Fri, 17 Jan 2014 04:26:47 +0000 (12:26 +0800)]
GBE: fix the hack code of sampler offset handling.

Previous implementation use a virtual register to pass the offset
to the back end side which is too hacky, now fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fixed the stack allocation.
Zhigang Gong [Fri, 17 Jan 2014 02:42:25 +0000 (10:42 +0800)]
GBE: fixed the stack allocation.

Yongjia wrote a case hit the previous 1KB limitation. I took a look at
the stack pointer related code then I found the implementation is not
comply with the OCL spec.

According to OpenCL spec, section 6.9:

d. Variable length arrays and structures with flexible (or unsized) arrays are not supported.

Thus all the local variable size should be constant, and we can
manipulate the stack pointer easier , no need to do the alignment
calculating at runtime, and could get the eaxct stack size then
allocate stack size on demand. I still put a limitation there which
is 64KB.

v2:
don't add the step if the step is zero.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: move the image info register allocation to GEN IR stage.
Zhigang Gong [Thu, 16 Jan 2014 03:56:15 +0000 (11:56 +0800)]
GBE: move the image info register allocation to GEN IR stage.

If we allocate image infor register at code generation stage,
we miss the liveness calculation. Thus there is a potential risk
that some image information register's livenss data is incorrect and
may cause very subtle bug. Now fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: move the image allocation to the GEN IR stage.
Zhigang Gong [Thu, 16 Jan 2014 02:16:36 +0000 (10:16 +0800)]
GBE: move the image allocation to the GEN IR stage.

Image register should be translate to a const at the GEN IR
stage to avoid the register allocator to allocate unnecessary
register for the image id.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE/Sampler: Simplfy the sampler handling.
Zhigang Gong [Wed, 15 Jan 2014 11:50:55 +0000 (19:50 +0800)]
GBE/Sampler: Simplfy the sampler handling.

Mov the sampler allocation to the Gen stage. Then we don't need to
maintain a fake key register which may also confusing the latter
register allocation phase.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoGBE: fixed a register liveness bug for getsamplerinfo instrution.
Zhigang Gong [Wed, 15 Jan 2014 07:26:07 +0000 (15:26 +0800)]
GBE: fixed a register liveness bug for getsamplerinfo instrution.

The previous implementation insert the ocl::samplerinfo to the
instruction after the liveness calculation stage, so the liveness
information is not correct for that register and may cause some
test cases fails. Now fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agotypo: bsically to basically
Igor Gnatenko [Mon, 13 Jan 2014 21:31:39 +0000 (01:31 +0400)]
typo: bsically to basically

Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agocmake: use libdir macros
Igor Gnatenko [Thu, 16 Jan 2014 07:19:53 +0000 (11:19 +0400)]
cmake: use libdir macros

Don't hardcode ${prefix}/lib. More better give choice to maintainer where install libs.
We will use ${LIB_INSTALL_DIR}, which by default will point to
${CMAKE_INSTALL_PREFIX}/lib. But maintainer will can redefine it with
-DLIB_INSTALL_DIR=/usr/lib64 or the same.
Let's use libdir macroses.

Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoChange compiler_function_argument3 to cover llvm.memcpy.
Yang Rong [Wed, 15 Jan 2014 08:31:06 +0000 (16:31 +0800)]
Change compiler_function_argument3 to cover llvm.memcpy.

We found clang wound emit llvm.memcpy when assign a stuct to another,
if sizeof(struct) > 64. Add a assignment to produce llvm.memcpy.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd llvm instrinsic function llvm.memset and llvm.memcpy support.
Yang Rong [Thu, 16 Jan 2014 07:38:30 +0000 (15:38 +0800)]
Add llvm instrinsic function llvm.memset and llvm.memcpy support.

SPIR 1.2 require llvm.memcpy support. And llvm will emit llvm.memset sometimes.
So adding a pass to lower these two intrinsic function, and then inline them.

In intrinsic lowering pass, find all llvm.memset and llvm.memcpy and then replace
them with a function call __gen_memset_x and __gen_memcpy_xx, x and xx is for address space.

Because this pass is after clang, but after clang, the unused function seems be stripped, so
implement the __gen_memset_x and __gen_memcpy_xx functions in pre compiled module, then link
them.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUse OCL_USE_PCH to control the using pch or not.
Yang Rong [Wed, 15 Jan 2014 08:31:04 +0000 (16:31 +0800)]
Use OCL_USE_PCH to control the using pch or not.

Junyan has added the environment variable OCL_USE_PCH, but not using it.
Enable it.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of remquo
Lv Meng [Mon, 13 Jan 2014 05:50:25 +0000 (13:50 +0800)]
GBE: improve precision of remquo

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of hypot
Lv Meng [Mon, 13 Jan 2014 01:17:35 +0000 (09:17 +0800)]
GBE: improve precision of hypot

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of exp10
Lv Meng [Mon, 13 Jan 2014 00:54:02 +0000 (08:54 +0800)]
GBE: improve precision of exp10

Signed-off-by: Lv Meng <meng.lv@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Improve precision of cbrt
Ruiling Song [Fri, 10 Jan 2014 05:39:43 +0000 (13:39 +0800)]
GBE: Improve precision of cbrt

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Improve precision of atan2
Ruiling Song [Fri, 10 Jan 2014 05:39:42 +0000 (13:39 +0800)]
GBE: Improve precision of atan2

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Improve atan precision
Ruiling Song [Fri, 10 Jan 2014 05:39:41 +0000 (13:39 +0800)]
GBE: Improve atan precision

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: improve precision of tan
Ruiling Song [Fri, 10 Jan 2014 05:39:40 +0000 (13:39 +0800)]
GBE: improve precision of tan

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Improve precision of sin/cos/sincos
Ruiling Song [Fri, 10 Jan 2014 05:39:39 +0000 (13:39 +0800)]
GBE: Improve precision of sin/cos/sincos

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd -cl-fast-relaxed-math into incompatible opts and fix the PreprocessorOptions bug
Junyan He [Wed, 15 Jan 2014 07:34:12 +0000 (15:34 +0800)]
Add -cl-fast-relaxed-math into incompatible opts and fix the PreprocessorOptions bug

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoRefine the method to find pch and pcm files.
Zhigang Gong [Thu, 9 Jan 2014 09:36:37 +0000 (17:36 +0800)]
Refine the method to find pch and pcm files.

When compile user kernels, we need to find the precompiled header
file and the precompiled module file. The previous implementation
will find the build directory then find the system directory.

This is not elegant when it is packaged to a distro. It doesn't
need to search the build directory. So I change the default search
path to the system directory only. And for the deveoper, I change
the build script to set a proper environment variable and make the
gbe bin generator and the utest could find the local pch files and
pcm files firstly.

The only change is now, after the build process. Before the user
run the utests, it need to set up the environment firstly. Just
invoke

. utest/setenv.sh.

Then everything should be the same as previous. This setenv.sh also
set the OCL_KERNEL_PATH, so you don't need to set it manually now.

This patch also update the document.

v2:
add the missing setenv.sh.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: enable relocatable pch files.
Zhigang Gong [Thu, 9 Jan 2014 06:20:29 +0000 (14:20 +0800)]
GBE: enable relocatable pch files.

As by default, when include a pch file, clang need to make sure
the original header file is untouched. This is impossible when
we want to distribute a pch file to a new system. We need to
use the relocatable pch feature provided by clang here.
We now create two pch files. One is relocatable pch file which
is used to install to the system directory. The other is a local
pch file which is used during the build time. We need both pch
files because at the build time, we don't have an ocl_stdlib.h
in the system directory. The local pch file is used for the beignet's
build and the utest only. All the other applications will use
the installed pch/pcm files.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoCL: prepare to support ICD if the system has ocl-icd..
Zhigang Gong [Wed, 8 Jan 2014 10:57:31 +0000 (18:57 +0800)]
CL: prepare to support ICD if the system has ocl-icd..

v2:
Only install the intel-beignet.icd if the system has ocl-icd
support.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoCL: back port ICD support to 1.1 branch.
Zhigang Gong [Wed, 8 Jan 2014 11:10:53 +0000 (19:10 +0800)]
CL: back port ICD support to 1.1 branch.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Tested-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fixed a long related bug.
Zhigang Gong [Fri, 10 Jan 2014 09:49:12 +0000 (17:49 +0800)]
GBE: fixed a long related bug.

We need to consider the situation that the 64 bit virtual register
is crossing two GRFs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoRevert faulty pushed patchset
Zhigang Gong [Tue, 14 Jan 2014 01:33:00 +0000 (09:33 +0800)]
Revert faulty pushed patchset

This reverts:
Revert "GBE: fixed a long related bug."
Revert "Refine the method to find pch and pcm files."
Revert "GBE: enable relocatable pch files."
Revert "CL: prepare to support ICD if the system has ocl-icd.."
Revert "CL: back port ICD support to 1.1 branch."

The above patches are merged by accident without review comments and
are broken. Now revert them.

10 years agoGBE: fixed a long related bug.
Zhigang Gong [Fri, 10 Jan 2014 09:49:12 +0000 (17:49 +0800)]
GBE: fixed a long related bug.

We need to consider the situation that the 64 bit virtual register
is crossing two GRFs.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoRefine the method to find pch and pcm files.
Zhigang Gong [Thu, 9 Jan 2014 09:36:37 +0000 (17:36 +0800)]
Refine the method to find pch and pcm files.

When compile user kernels, we need to find the precompiled header
file and the precompiled module file. The previous implementation
will find the build directory then find the system directory.

This is not elegant when it is packaged to a distro. It doesn't
need to search the build directory. So I change the default search
path to the system directory only. And for the deveoper, I change
the build script to set a proper environment variable and make the
gbe bin generator and the utest could find the local pch files and
pcm files firstly.

The only change is now, after the build process. Before the user
run the utests, it need to set up the environment firstly. Just
invoke

. utest/setenv.sh.

Then everything should be the same as previous. This setenv.sh also
set the OCL_KERNEL_PATH, so you don't need to set it manually now.

This patch also update the document.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: enable relocatable pch files.
Zhigang Gong [Thu, 9 Jan 2014 06:20:29 +0000 (14:20 +0800)]
GBE: enable relocatable pch files.

As by default, when include a pch file, clang need to make sure
the original header file is untouched. This is impossible when
we want to distribute a pch file to a new system. We need to
use the relocatable pch feature provided by clang here.
We now create two pch files. One is relocatable pch file which
is used to install to the system directory. The other is a local
pch file which is used during the build time. We need both pch
files because at the build time, we don't have an ocl_stdlib.h
in the system directory. The local pch file is used for the beignet's
build and the utest only. All the other applications will use
the installed pch/pcm files.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoCL: prepare to support ICD if the system has ocl-icd..
Zhigang Gong [Wed, 8 Jan 2014 10:57:31 +0000 (18:57 +0800)]
CL: prepare to support ICD if the system has ocl-icd..

v2:
Only install the intel-beignet.icd if the system has ocl-icd
support.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoCL: back port ICD support to 1.1 branch.
Zhigang Gong [Wed, 8 Jan 2014 11:10:53 +0000 (19:10 +0800)]
CL: back port ICD support to 1.1 branch.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
10 years agoGBE: Remove some noduplicate to let inline works
Ruiling Song [Wed, 8 Jan 2014 06:58:07 +0000 (14:58 +0800)]
GBE: Remove some noduplicate to let inline works

llvm Inliner seems won't inline a function if it contains noduplicate function calls.
So, we just keep the noduplicate for barrier itself. then barrier() could still be inlined.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoMove the memory allocate size check to the callee.
Yang Rong [Tue, 7 Jan 2014 03:30:54 +0000 (11:30 +0800)]
Move the memory allocate size check to the callee.

Because image's alignment, the alloc size may exceed the CL_DEVICE_MAX_MEM_ALLOC_SIZE if the
image's size is calculate from it. So move the size check from cl_mem_allocate to the callee, and
slightly enlarge the limit size when check in allocate image.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>