contrib/beignet.git
10 years agofix left shift warnings in utests
Lu Guanqun [Mon, 19 Aug 2013 04:29:17 +0000 (12:29 +0800)]
fix left shift warnings in utests

We should use the explicit 64 bit types. Otherwise we would have warnings.

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoUtests: enable long/ulong for abs_diff test case.
Zhigang Gong [Fri, 16 Aug 2013 07:28:52 +0000 (15:28 +0800)]
Utests: enable long/ulong for abs_diff test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoenable signed 64-bit version of "abs_diff"
Homer Hsing [Mon, 19 Aug 2013 06:55:29 +0000 (14:55 +0800)]
enable signed 64-bit version of "abs_diff"

fixed operand type in IR instruction "move".
used one less flag register in 64-bit integer comparing.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoenable unsigned 64bit version of "abs_diff"
Homer Hsing [Mon, 19 Aug 2013 02:38:00 +0000 (10:38 +0800)]
enable unsigned 64bit version of "abs_diff"

tested by piglit,

piglit/framework/../bin/cl-program-tester generated_tests/cl/builtin/int/builtin-ulong-abs_diff-1.0.generated.cl

piglit test case passed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: skip instruction pattern match for 64 bit sel_cmp.
Homer Hsing [Mon, 19 Aug 2013 01:43:32 +0000 (09:43 +0800)]
GBE: skip instruction pattern match for 64 bit sel_cmp.

CPU instruction "sel_cmp" don't support 64bit int.
not emit SelectModifierInstructionPattern in that case.
tested by piglit. piglit test cases "long(ulong)-max(min,clamp)" all passed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agofix a typo
Homer Hsing [Mon, 19 Aug 2013 01:41:17 +0000 (09:41 +0800)]
fix a typo

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd async copy and async stride copy test case.
Yang Rong [Fri, 16 Aug 2013 08:24:09 +0000 (16:24 +0800)]
Add async copy and async stride copy test case.

Just hard code the int2 and char4 type. Other types have tested using
comformance test.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement async and prefetch built-in.
Yang Rong [Fri, 16 Aug 2013 08:24:08 +0000 (16:24 +0800)]
Implement async and prefetch built-in.

Using the normal load & store to implement async copy,
and so wait_group_events use barrier.
Prefetch just define an empty function.

V2: fix llvm build error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest 64bit version of "upsample"
Homer Hsing [Fri, 16 Aug 2013 07:54:24 +0000 (15:54 +0800)]
test 64bit version of "upsample"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix unit test compiler_load_bool_imm error.
Yang Rong [Thu, 15 Aug 2013 09:10:15 +0000 (17:10 +0800)]
Fix unit test compiler_load_bool_imm error.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd 64bit version of "upsample"
Homer Hsing [Fri, 16 Aug 2013 01:45:17 +0000 (09:45 +0800)]
add 64bit version of "upsample"

since simple 64bit integer are supported,
add 64bit version of "upsample".

to test this patch, in piglit, run
  bin/cl-program-tester generated_tests/cl/builtin/int/builtin-int-upsample-1.0.generated.cl
  bin/cl-program-tester generated_tests/cl/builtin/int/builtin-uint-upsample-1.0.generated.cl

piglit test cases all pass.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoadd empty 64bit-integer version built-in functions
Homer Hsing [Thu, 15 Aug 2013 06:53:27 +0000 (14:53 +0800)]
add empty 64bit-integer version built-in functions

also change vector built-in generator to auto generate
64bit-integer versions of built-in functions

function body is empty now. detail will add in the future.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport built-in function mad_sat(int) and mad_sat(uint)
Homer Hsing [Wed, 14 Aug 2013 08:23:18 +0000 (16:23 +0800)]
support built-in function mad_sat(int) and mad_sat(uint)

this patch has been tested by piglit.
piglit test cases "int_mad_sat" and "uint_mad_sat" passed.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agouse r112 as source of EOT message
Zou Nan hai [Thu, 15 Aug 2013 23:56:08 +0000 (07:56 +0800)]
use r112 as source of EOT message

Fix random hang cases.
use r112 as source of EOT message.
Bspec requires r112-r127 as EOT message source.

Signed-off-by: Zou Nanhai <nanhai.zou@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: fix an illegal instruction.
Zhigang Gong [Thu, 15 Aug 2013 02:51:33 +0000 (10:51 +0800)]
GBE: fix an illegal instruction.

Per Gen ISA spec:
When ExecSize = Width, VertStride must be set to Width * HorzStride.

For horizontal stride 2 in bottom_half, we always use it simd8 mode,
so we need to set the vertstride to 16 according to the above restrication.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: I64CMP should be treated as CMP in reg allocation and insn scheduling.
Zhigang Gong [Wed, 14 Aug 2013 08:07:15 +0000 (16:07 +0800)]
GBE: I64CMP should be treated as CMP in reg allocation and insn scheduling.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest 64bit-integer comparing
Homer Hsing [Wed, 14 Aug 2013 06:23:51 +0000 (14:23 +0800)]
test 64bit-integer comparing

only work when OCL_POST_ALLOC_INSN_SCHEDULE=0
because the post alloc scheduler puts CMP after SEL, but in IR,
CMP is before SEL, like this
 GT.int64 %34 %31 %33
 LOADI.int64 %38 3
 LOADI.int64 %39 4
 SEL.int64 %35 %34 %38 %39

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviwed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer comparing
Homer Hsing [Wed, 14 Aug 2013 01:40:33 +0000 (09:40 +0800)]
support 64bit-integer comparing

support 64bit-integer comparing,
including EQ(==), NEQ(!=), G(>), GE(>=), L(<), LE(<=)

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviwed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFlush the queue after enqueue.
Zou Nan hai [Tue, 13 Aug 2013 23:29:18 +0000 (07:29 +0800)]
Flush the queue after enqueue.

Flush the queue after enqueue.
This can fix some random fails in unit tests.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviwed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Yi Sun <yi.sun@intel.com>
10 years agoFix event pthread_mutex_lock dead lock.
Yang Rong [Wed, 14 Aug 2013 03:08:01 +0000 (11:08 +0800)]
Fix event pthread_mutex_lock dead lock.

In function cl_event_set_status, between pthread_mutex_lock and pthread_mutex_unlock
will call cl_event_delete, which also require the same lock, cause deak lock.
Unlock it before call cl_event_delete.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: set temporary address register for read64 to U64.
Zhigang Gong [Mon, 12 Aug 2013 07:53:43 +0000 (15:53 +0800)]
GBE: set temporary address register for read64 to U64.

Actually, we really use it as two DWORD rather than U64. But if
we don't set it to U64, in post scheduler, it doesn't know this
is a QWORD register and may cause incorrect scheduling.

We can easily trigger this bug when run compiler_vector_double16_load_store
with SIMD8 mode. This patch can fix the bug.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer multiplication
Homer Hsing [Tue, 13 Aug 2013 03:05:28 +0000 (11:05 +0800)]
support 64bit-integer multiplication

also add test case

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd a load bool imm test case.
Yang Rong [Tue, 13 Aug 2013 09:10:07 +0000 (17:10 +0800)]
Add a load bool imm test case.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd bool move imm support.
Yang Rong [Tue, 13 Aug 2013 07:06:30 +0000 (15:06 +0800)]
Add bool move imm support.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest 64bit-integer shifting
Homer Hsing [Tue, 13 Aug 2013 00:32:54 +0000 (08:32 +0800)]
test 64bit-integer shifting

v2: put shifting in branch

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer shifting
Homer Hsing [Mon, 12 Aug 2013 08:45:12 +0000 (16:45 +0800)]
support 64bit-integer shifting

support left-shifting (<<), right-shifting (>>),
and arithmetic right-shifting (>>).
v2: define temp reg as dest reg of instructions

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport converting shorter int to 64bit int
Homer Hsing [Mon, 12 Aug 2013 02:12:16 +0000 (10:12 +0800)]
support converting shorter int to 64bit int

converting byte/word/dword to int64
also add test case
v2: define temporary reg as dest reg of instruction

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoDefine temporary reg as dest reg of instruction
Homer Hsing [Mon, 12 Aug 2013 02:26:44 +0000 (10:26 +0800)]
Define temporary reg as dest reg of instruction

I defined temporary reg as source reg of instruction.
But instruction scheduler looks source reg as read only reg.
So I define them as dest now.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd event unit test.
Yang Rong [Mon, 12 Aug 2013 08:07:21 +0000 (16:07 +0800)]
Add event unit test.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd openCL event support.
Yang Rong [Mon, 12 Aug 2013 08:07:20 +0000 (16:07 +0800)]
Add openCL event support.

Now use the defer execute to wait events.
If there is no user event waited, then using wait rendering to wait
GPU event complete and call the enqueue api immediately.
If there is the user events waited, then should prepare the the enqueue
data, and resume the enqueue when all user events that waited complete.
The achieve these, add the enqueue callback to user event, and add the all
user event and other wait event list to enqueue callback. When set user event
to complete, check all enqueue callbacks wait this event.

Now, clEnqueueMark/clEnqueueBarrier still not impletement, and clEnqueueMapBuffer
/clEnqueueMapImage is not consistency with spec.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd function cl_command_queue_flush to flush a command
Yang Rong [Mon, 12 Aug 2013 08:07:19 +0000 (16:07 +0800)]
Add function cl_command_queue_flush to flush a command

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd some functions to support event in intel gpgpu.
Yang Rong [Mon, 12 Aug 2013 08:07:18 +0000 (16:07 +0800)]
Add some functions to support event in intel gpgpu.

Now runtime prepare command batch first, if can't flush this command
immediately, call cl_gpgpu_event_pending to append the command to event,
when the command batch's wait events completed, than call cl_gpgpu_event_resume
to flush.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd a struct and a function to handle all implemented enqueue api.
Yang Rong [Mon, 12 Aug 2013 08:07:17 +0000 (16:07 +0800)]
Add a struct and a function to handle all implemented enqueue api.

Event and non-blocking enqueue api may use this function.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoAdd the empty functions of cl_enqueueXXX.
Yang Rong [Mon, 12 Aug 2013 08:07:16 +0000 (16:07 +0800)]
Add the empty functions of cl_enqueueXXX.

Copy from cl_enqueueXXX functions and comment out. This change is for trace only.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agono "div by zero" in smoothstep test case
Homer Hsing [Mon, 12 Aug 2013 05:51:26 +0000 (13:51 +0800)]
no "div by zero" in smoothstep test case

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoDriver: Fix the incorrect size of surface 1.
Zhigang Gong [Mon, 12 Aug 2013 05:34:15 +0000 (13:34 +0800)]
Driver: Fix the incorrect size of surface 1.

According to Ben's comments, the surface 0 and 1 should be exactly
match each other, and the only reason why we need two surfaces rather
than 1 is that for the fulsim usage. Thus we should set surface
1 and 0 with the same memory size.

This patch fixes the flat_address_space unit test case and also
a randome failure reported by yang rong.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: "Yang, Rong R" <rong.r.yang@intel.com>
10 years agoutest: Add test case for function acos/acosh/asin/asinh.
Yi Sun [Mon, 12 Aug 2013 02:35:39 +0000 (10:35 +0800)]
utest: Add test case for function acos/acosh/asin/asinh.

Case contains illegal, boundary and legal values.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoHandle boundary and illegal values.
Yi Sun [Mon, 12 Aug 2013 02:35:38 +0000 (10:35 +0800)]
Handle boundary and illegal values.

Such as |x| = 1.0, |x| < 2**-27 and |x| > 1.

v2. Replace some constant variable with existing macro value.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoSkip spill/unspill instruction when trying to do spill.
Ruiling Song [Fri, 9 Aug 2013 05:23:41 +0000 (13:23 +0800)]
Skip spill/unspill instruction when trying to do spill.

We can only spill virtual registers, should skip physical register.
This fix random failure of compiler_box_blur when do spilling.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a re-schedule issue of scratch write
Ruiling Song [Fri, 9 Aug 2013 05:23:40 +0000 (13:23 +0800)]
Fix a re-schedule issue of scratch write

As scratchMsgHeader+1 will be re-used as scratch write payload.
So, scratchMsgHeader+1 will be first spilled out.
Add the scratch write dependency to keep scratch write in order.
this fix a failure(compiler_box_blur_float) when spilling.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoGBE: Fixed a bug and release 2 or 3 simdWidth register space.
Zhigang Gong [Fri, 9 Aug 2013 02:36:00 +0000 (10:36 +0800)]
GBE: Fixed a bug and release 2 or 3 simdWidth register space.

This patch fix two issues. One is for the sel_cmp pattern matching.
We should not set the sel_cmp instruction state to physicalFlag, as
sel_cmp will never use a flag register. And as it set the physicalFlag
and leave the flagIndex to zero. Then it just increase the virtual
register 0's interval to as long as the last sel_cmp instruction,
thus the virtual register 0 will never be freed.

Another issue is that when we allocate special registers. We
are not always allocate them on demand. For example, the 3
local id registers are always allocated. Thus maybe some of the
registers are not used at all. So the interval's end point will
not get a chance to set to a proper value and it will never be
released. Now just init the end point to 0. And latter, if it's
used, it will be set to a proper value. Otherwise, it will be zero,
and will be deallocated when do expiering.

This patch could fix(work around) a long standing bug:
When disable the pre allocation instruction scheduling by
export OCL_PRE_ALLOC_INSN_SCHEDULE=0
And run the case:
utests/utest_run compiler_menger_sponge_no_shadow
it fails.

I spent almost one day to track down that it's related to the register
allocation. But I haven't root caused that where is the actual buggy
code. I doubt the register allocation, but I reviewed the code very
careful, and haven't found anything wrong. Now the last suspect is
in the register interval handling.

Anyway, by apply these patch to release two registers to the pool
which may change the register allocation/expieration, thus work around
that bug. We may still need to spend some time to investigate the root
cause the failure in the future.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoUtests: enable long/ulong in vector load/store test case.
Zhigang Gong [Fri, 9 Aug 2013 02:35:59 +0000 (10:35 +0800)]
Utests: enable long/ulong in vector load/store test case.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: Fix one bug in instruction scheduling.
Zhigang Gong [Thu, 8 Aug 2013 07:15:44 +0000 (15:15 +0800)]
GBE: Fix one bug in instruction scheduling.

As now we may use 8 byte registers (long and double), then one
register may take two(SIMD8) or four(SIMD16) physical registers.
Thus if we met a register with long or double type, we need to
handle the immediately next index at the same time.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: fix insntruction scheduling related bugs in read64/write64.
Zhigang Gong [Thu, 8 Aug 2013 07:15:43 +0000 (15:15 +0800)]
GBE: fix insntruction scheduling related bugs in read64/write64.

In read64 and write64, we allocate some temporary registers and
we should put all of those temporary registers may be modified
to the instruction's dst array. Otherwise, the latter post instruction
scheduling may rearrange the instruction incorrectly.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
10 years agoGBE: enable double vector load/store support.
Zhigang Gong [Tue, 6 Aug 2013 04:01:46 +0000 (12:01 +0800)]
GBE: enable double vector load/store support.

We have some accurate problem for double calculation
on GPU side. I have to change the test case for double
type to add a tolerate error when check the double
data result.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
10 years agotest 64bit-integer selection operator
Homer Hsing [Wed, 7 Aug 2013 08:28:34 +0000 (16:28 +0800)]
test 64bit-integer selection operator

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer selection operator "?:"
Homer Hsing [Wed, 7 Aug 2013 08:28:33 +0000 (16:28 +0800)]
support 64bit-integer selection operator "?:"

v2: reuse MOV to move 64bit integer. not add MOV_INT64 instruction.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImplement spill/unspill
Ruiling Song [Wed, 7 Aug 2013 07:15:50 +0000 (15:15 +0800)]
Implement spill/unspill

The current implementation works like below:
I reserve a pool of registers for spill/reload. Currently 6 registers
are reserved to handle SelectionVector with at most 5 elements.
The other one is used as scratch message header register. The register
after header register was used as the payload for scratch write.

To do spill, just iterate the instructions. If the virtual register
was used as src, insert reload instruction before it. If the virtual
register was used as dst, insert spill instruction to write the register
content to scratch memory.

Limitations yet:
64bit not support.
SelectionVector > 5 not handled.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoenable scratch memory allocation and read/write
Ruiling Song [Wed, 7 Aug 2013 07:07:40 +0000 (15:07 +0800)]
enable scratch memory allocation and read/write

v2: refine function naming.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agokeep address space qualifier of pointers
Homer Hsing [Wed, 7 Aug 2013 05:19:25 +0000 (13:19 +0800)]
keep address space qualifier of pointers

for built-in function with address-space-qualified pointers,
keep address space when accessing the pointers.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang, Rong R <rong.r.yang@intel.com>
10 years agotest 64bit-integer immediate value, and "and", "or", "xor" arithmetic
Homer Hsing [Wed, 7 Aug 2013 04:19:54 +0000 (12:19 +0800)]
test 64bit-integer immediate value, and "and", "or", "xor" arithmetic

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer AND(&), OR(|), XOR(^) arithmetic
Homer Hsing [Wed, 7 Aug 2013 04:19:53 +0000 (12:19 +0800)]
support 64bit-integer AND(&), OR(|), XOR(^) arithmetic

v3: folded similar code into a "for-loop"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer immediate value
Homer Hsing [Wed, 7 Aug 2013 04:19:52 +0000 (12:19 +0800)]
support 64bit-integer immediate value

v3: folded similar code into a "for-loop"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoImprove the accuracy of built-in function asin.
Yi Sun [Tue, 6 Aug 2013 14:02:49 +0000 (22:02 +0800)]
Improve the accuracy of built-in function asin.

Method: asin(x) = x + (1 * x^3)/(2 * 3) + (1 * 3 * x^5)/(2*4 * 5) + \
(1 * 3 * 5 * x^7)/(2*4*6 * 7) + ...
Iterate this for 30 times.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
10 years agosupport 64bit-integer addition, subtraction
Homer Hsing [Tue, 6 Aug 2013 07:41:40 +0000 (15:41 +0800)]
support 64bit-integer addition, subtraction

also enable GPU command "subb" (subtract with borrow)

also add test cases

v2: renamed GEN_TYPE_UQ/GEN_TYPE_Q to GEN_TYPE_UL/GEN_TYPE_L

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agoFix a bug in stack calculation.
Ruiling Song [Mon, 5 Aug 2013 07:14:39 +0000 (15:14 +0800)]
Fix a bug in stack calculation.

1. the thread_id is located in r0.5[0-8], so we need to get the correct bits.
2. also, we don't need so much stack size, max_compute_unit have already
   been treated as: #EU * max_thread_per_eu.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Xing, Homer <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agosupport 64bit-integer reading(writing)
Homer Hsing [Tue, 6 Aug 2013 06:24:34 +0000 (14:24 +0800)]
support 64bit-integer reading(writing)

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
10 years agotest if register allocation and 64-bit reading are fixed
Homer Hsing [Wed, 31 Jul 2013 01:36:57 +0000 (09:36 +0800)]
test if register allocation and 64-bit reading are fixed

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Song, Ruiling <ruiling.song@intel.com>
10 years agoGBE: refactor double support.
Zhigang Gong [Fri, 2 Aug 2013 17:53:44 +0000 (01:53 +0800)]
GBE: refactor double support.

There are two major issues in double support:
1. Doesn't work at SIMD16 mode.
2. The incorrect usage of vectors. We only need to allocate
those temporary register to contiguous registers.

If you look at the previous implementation of
READ_FLOAT64/WRITE_FLOAT64 in gen_encoder.cpp. You can easily
find it contains many duplicate code and considering the SIMD16
code path never work correctly, it's so difficult to work based
on that code. So I choose to refactor those two major functions.
And refine other parts in the instruction selection stage to fix
the above two major problem with a cleaner code.

Now, it works well on both SIMD16/SIMD8 mode.
Another minor improvement is for the READ_FLOAT64 on SIMD8 mode,
this patch saves one time of send instruction to read all the
8 double data into registers.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Homer Hsing <homer.xing@intel.com>
10 years agotest built-in function "shuffle2"
Homer Hsing [Wed, 31 Jul 2013 02:39:12 +0000 (10:39 +0800)]
test built-in function "shuffle2"

v3: tested two implement of "shuffle2"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang, Rong R <rong.r.yang@intel.com>
10 years agoadd built-in function "shuffle2"
Homer Hsing [Wed, 31 Jul 2013 02:39:11 +0000 (10:39 +0800)]
add built-in function "shuffle2"

v3: convert address of "x" to a pointer, then select element by mask
v3: add two-component return-value overloaded version

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang, Rong R <rong.r.yang@intel.com>
11 years agoNeed to define local to __local.
Zhigang Gong [Tue, 30 Jul 2013 08:44:47 +0000 (16:44 +0800)]
Need to define local to __local.

It seems that the clang 3.3 already support local/global/private
memory space qualifiers directly. But the previous versions don't
support, we still need to define local as __local here.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Sun, Yi <yi.sun@intel.com>
11 years agoAdded memory space parameters support at the autogeneration script.
Zhigang Gong [Thu, 25 Jul 2013 06:52:26 +0000 (14:52 +0800)]
Added memory space parameters support at the autogeneration script.

Enhance the python script to support pointer with memory space
type, such as :

gentype fract (gentype x, __global gentype *iptr)
gentype fract (gentype x, __local gentype *iptr)
gentype fract (gentype x, __private gentype *iptr)

So enable the following function at the builtin function spec file
fract/frexp/modf/nextafter/remquo/sincos.

Remove the duplicate at the ocl_stdlib.tmp.h.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
11 years agoutests: Add a test case for built-in functions get_num_groups.
Yi Sun [Tue, 30 Jul 2013 05:11:04 +0000 (13:11 +0800)]
utests: Add a test case for built-in functions get_num_groups.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutest: Add test for built-in function get_local_id.
Yi Sun [Mon, 29 Jul 2013 15:19:53 +0000 (23:19 +0800)]
utest: Add test for built-in function get_local_id.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutest: Add test for built-in function get_local_size.
Yi Sun [Mon, 29 Jul 2013 06:52:34 +0000 (14:52 +0800)]
utest: Add test for built-in function get_local_size.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoEnable islessgreater/isordered/isunordered builtin vector functions.
Zhigang Gong [Mon, 29 Jul 2013 05:11:04 +0000 (13:11 +0800)]
Enable islessgreater/isordered/isunordered builtin vector functions.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd built-in function "isordered", "isunordered"
Homer Hsing [Mon, 29 Jul 2013 02:25:18 +0000 (10:25 +0800)]
add built-in function "isordered", "isunordered"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd built-in function "islessgreater"
Homer Hsing [Mon, 29 Jul 2013 02:20:33 +0000 (10:20 +0800)]
add built-in function "islessgreater"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd generated header and PCH to gitignore
Simon Richter [Thu, 25 Jul 2013 08:31:43 +0000 (10:31 +0200)]
Add generated header and PCH to gitignore

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoUse access() instead of fopen() to search for PCH
Simon Richter [Thu, 25 Jul 2013 08:31:42 +0000 (10:31 +0200)]
Use access() instead of fopen() to search for PCH

Since all we want to do is find out whether the file exists and can be
read, use the appropriate function.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoFix the indention handling in vector builtin function generator.
Zhigang Gong [Wed, 24 Jul 2013 10:06:06 +0000 (18:06 +0800)]
Fix the indention handling in vector builtin function generator.

As OpenCL use .sa rather than .s10 to represent the tenth element
of a vector. We don't need to adjust one space when we cross the
tenth element (.sa) boundary.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd address space qualifier to "modf"
Homer Hsing [Wed, 24 Jul 2013 07:27:10 +0000 (15:27 +0800)]
add address space qualifier to "modf"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd address space qualifier to "remquo"
Homer Hsing [Wed, 24 Jul 2013 07:20:56 +0000 (15:20 +0800)]
add address space qualifier to "remquo"

renamed origin "remquo" to "__gen_ocl_remquo",
added new "remquo" with address space qualifier

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agorevise built-in function "shuffle"
Homer Hsing [Wed, 24 Jul 2013 06:58:28 +0000 (14:58 +0800)]
revise built-in function "shuffle"

v2 from Zhigang:
Delete the 3-component vectors from the shuffle functions according
to the OpenCL spec.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd misc builtin vector functions.
Zhigang Gong [Fri, 19 Jul 2013 10:44:08 +0000 (18:44 +0800)]
Add misc builtin vector functions.

Although we don't support them yet.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agocheck whether python is installed.
Zhigang Gong [Fri, 19 Jul 2013 09:59:42 +0000 (17:59 +0800)]
check whether python is installed.

We need python to do some source code generation things.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoAdd the PCH support when building the source.
Junyan He [Wed, 17 Jul 2013 08:38:41 +0000 (16:38 +0800)]
Add the PCH support when building the source.

Because the utest grows bigger, the runtime compiling time seems a big
problem to cause the utest_run very slow.  Most of the time wastes on
re-parsing the big header file such as ocl_stdlib.h.
Using the PCH feature of Clang can shorten the compiling time a lot.
Because we build the pch file at building time, but some source
has runtime building option for LLVM, and these options may cause
the compitable problem between the pch header and the CL source code.
We fallback to build the whole header and source when we find the
options are not compitable.

v2 from Zhigang:
Based on junyan's patch, I made the following changes and fixing:
1. Change the way to find correct pch file, now we just define
two possible locations of the header file at build time. Don't
need to query the linker to get correct pch file directory.

2. Use python to generate a big blob of all ocl standard headers
into one file.

3. Fix those broken dependencies in the CMakeLists.txt, now it
works fine.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoSplit the thounsands autogenerated code out from ocl_stdlib header file.
Zhigang Gong [Fri, 19 Jul 2013 06:04:48 +0000 (14:04 +0800)]
Split the thounsands autogenerated code out from ocl_stdlib header file.

This patch split the three auto generated code segments from the
huge ocl_stdlib.h. And we rename ocl_stdlib.h to ocl_stdlib.tmpl.h.

The final ocl_stdlib.h will be generated at runtime. It will
insert the ocl_as.h/ocl_convert.h/ocl_vector (which is also
autogenerated at runtime) into ocl_stdlib.tmpl.h's proper positon.

After this patch, we will get a maintainable header file size.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agoImplement a pyton script to auto generate those builtin vector functions.
Zhigang Gong [Wed, 17 Jul 2013 12:36:36 +0000 (20:36 +0800)]
Implement a pyton script to auto generate those builtin vector functions.

As Beignet is using the SOA model, and we need to lower down all
the vector builtin functions to the scalar version. This type of
thing is ideal to use a script to generate all the code according
to the OpenCL's spec. I just copy/paste most of the prototypes from
the OpenCL spec and put them into builtin_vector_proto.def. Then
I wrote a python script to parse the spec and generate all the
vector inline functions and I removed all existing duplicate
functions in ocl_stdlib.h.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Simon Richter <Simon.Richter@hogyros.de>
11 years agotest builtin function "shuffle"
Homer Hsing [Wed, 24 Jul 2013 03:05:11 +0000 (11:05 +0800)]
test builtin function "shuffle"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoadd builtin function "shuffle"
Homer Hsing [Wed, 24 Jul 2013 03:04:57 +0000 (11:04 +0800)]
add builtin function "shuffle"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoFrexp support global memory directly
Zhigang Gong [Wed, 24 Jul 2013 03:09:06 +0000 (11:09 +0800)]
Frexp support global memory directly

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd address_space modifier to builtin functions' pointer
Homer Hsing [Mon, 22 Jul 2013 08:06:51 +0000 (16:06 +0800)]
add address_space modifier to builtin functions' pointer

I forgot that builtin functions' pointer should be modified
 by address_space such like "global", "local", "private"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
11 years agotest built-in function "remquo"
Homer Hsing [Thu, 18 Jul 2013 06:46:33 +0000 (14:46 +0800)]
test built-in function "remquo"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agoadd built-in function "remquo"
Homer Hsing [Thu, 18 Jul 2013 06:46:32 +0000 (14:46 +0800)]
add built-in function "remquo"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
11 years agotest builtin function "modf"
Homer Hsing [Wed, 17 Jul 2013 03:00:25 +0000 (11:00 +0800)]
test builtin function "modf"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd builtin function "modf"
Homer Hsing [Wed, 17 Jul 2013 03:00:24 +0000 (11:00 +0800)]
add builtin function "modf"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agotest built-in function "nextafter"
Homer Hsing [Tue, 16 Jul 2013 08:55:17 +0000 (16:55 +0800)]
test built-in function "nextafter"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd built-in function "nextafter"
Homer Hsing [Tue, 16 Jul 2013 08:55:16 +0000 (16:55 +0800)]
add built-in function "nextafter"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoutest: add built-in test case for get_global_id.
Yi Sun [Mon, 22 Jul 2013 07:59:42 +0000 (15:59 +0800)]
utest: add built-in test case for get_global_id.

v2. Remove the useless argument in kernel.

Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoAdd build clang option fno-builtin to disable intrinsics.
Yang Rong [Mon, 22 Jul 2013 05:24:18 +0000 (13:24 +0800)]
Add build clang option fno-builtin to disable intrinsics.

Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agotest built-in function "frexp"
Homer Hsing [Tue, 16 Jul 2013 06:34:20 +0000 (14:34 +0800)]
test built-in function "frexp"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoadd built-in function "frexp"
Homer Hsing [Tue, 16 Jul 2013 06:34:19 +0000 (14:34 +0800)]
add built-in function "frexp"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agoImprove the clGetMemObjectInfo API, add more info option
Junyan He [Fri, 12 Jul 2013 09:52:59 +0000 (17:52 +0800)]
Improve the clGetMemObjectInfo API, add more info option

Improve the clGetMemObjectInfo API, add more info option.
CL_MEM_ASSOCIATED_MEMOBJECT and CL_MEM_OFFSET need create
subbuffer implememted firstly.
Attach the test case in get_cl_info.cpp

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
11 years agoAdd the support for clSetMemObjectDestructorCallback API
Junyan He [Fri, 12 Jul 2013 08:02:32 +0000 (16:02 +0800)]
Add the support for clSetMemObjectDestructorCallback API

Reviewed-by: "Xing, Homer" <homer.xing@intel.com>
11 years agoImprove the clEnqueueMapBuffer and clCreateBuffer API
Junyan He [Fri, 12 Jul 2013 06:31:14 +0000 (14:31 +0800)]
Improve the clEnqueueMapBuffer and clCreateBuffer API

In clCreateBuffer API, add the CL_MEM_ALLOC_HOST_PTR and
CL_MEM_USE_HOST_PTR flag support.
CL_MEM_ALLOC_HOST_PTR flag seem nothings special to do.
CL_MEM_USE_HOST_PTR flag will request clEnqueueMapBuffer API:
1> The host_ptr specified in clCreateBuffer is guaranteed to
contain the latest bits in the region being mapped when the
clEnqueueMapBuffer command has completed.
2> The pointer value returned by clEnqueueMapBuffer will be
derived from the host_ptr specified when the buffer object is created.

We improve the clEnqueueMapBuffer to setup a map for the mapped
address and do the data sync problem based on the address when
mapped and unmapped.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agofix vectorial built-in functions "min, max, clamp"
Homer Hsing [Wed, 10 Jul 2013 04:38:58 +0000 (12:38 +0800)]
fix vectorial built-in functions "min, max, clamp"

vectorial versions of "min,max,clamp" was missing.

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agotest built-in function "sign"
Homer Hsing [Fri, 12 Jul 2013 04:10:26 +0000 (12:10 +0800)]
test built-in function "sign"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>
11 years agobuilt-in function "sign"
Homer Hsing [Fri, 12 Jul 2013 04:10:25 +0000 (12:10 +0800)]
built-in function "sign"

Signed-off-by: Homer Hsing <homer.xing@intel.com>
Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com>