GBE: Optimize read_image performance for CL_ADDRESS_CLAMP..
authorZhigang Gong <zhigang.gong@intel.com>
Wed, 9 Apr 2014 10:25:22 +0000 (18:25 +0800)
committerZhigang Gong <zhigang.gong@intel.com>
Wed, 16 Apr 2014 01:45:50 +0000 (09:45 +0800)
commitadd15cb38aa2ae0dc8576cb653c8d05584087c5d
treefa05c84a3243e65947cf5d7dc1b83ef17e00c028
parentd7ad5ee6f79fc28cf82321c8b527ae73da9f10f2
GBE: Optimize read_image performance for CL_ADDRESS_CLAMP..

The previous work around(due to hardware restriction.) is to use
CL_ADDRESS_CLAMP_TO_EDGE to implement CL_ADDRESS_CLAMP which is
not very efficient, especially for the boundary checking overhead.
The root cause is that we need to check each pixel's coordinate.

Now we change to use the LD message to implement CL_ADDRESS_CLAMP. For
integer coordinates, we don't need to do the boundary checking. And for
the float coordinates, we only need to check whether it's less than zero
which is much simpler than before.

This patch could bring about 20% to 30% performance gain for luxmark's
medium and simple scene.

v2:
simplfy the READ_IMAGE0.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
12 files changed:
backend/src/backend/gen_context.cpp
backend/src/backend/gen_defs.hpp
backend/src/backend/gen_encoder.cpp
backend/src/backend/gen_encoder.hpp
backend/src/backend/gen_insn_selection.cpp
backend/src/backend/gen_insn_selection.hpp
backend/src/llvm/llvm_gen_backend.cpp
backend/src/llvm/llvm_gen_ocl_function.hxx
backend/src/llvm/llvm_scalarize.cpp
backend/src/ocl_stdlib.tmpl.h
src/intel/intel_driver.c
src/intel/intel_gpgpu.c