review.tizen.org Git - contrib/beignet.git/commit

author	Zhigang Gong <zhigang.gong@intel.com>
	Wed, 8 Oct 2014 04:58:59 +0000 (12:58 +0800)
committer	Zhigang Gong <zhigang.gong@intel.com>
	Fri, 17 Oct 2014 07:10:44 +0000 (15:10 +0800)
commit	0ccfdf53f80782b29835cea867fa1db891bcdcc5
tree	dbcad32f44ffd643dce181a6c3fbb88c7dcfbaef	tree \| snapshot
parent	74ea659e2ba624cbb07a6a99f1c7edc5bc144435	commit \| diff

GBE: Add a customized loop unrolling handling mechanism.

By default, the unrolling threshold is relatively small.
Thus some relative large loops which access private array
will not be unrolled, thus those private array can't
be scalarized latter. And the private array is allocated
in stack which is extreme slow for Gen backend currently.

To increase the unrolling threshold for all loops is not
a good idea, as most of the loops don't need to do unrolling
for this purpose and a large unrolling threshold will cause
a big code size and unecessary big register pressure which
may lead to register spilling.

So this patch introduce a trade-off pass to identify those
loops which still have private load/store in the outer most
of the loop. Then add a metadata to it to indicate aggresive
unrolling on those loops. Then do another round loop unrolling.

This patch with the previous small patch, can bring significant
performance improvement for some cases. I just tested with some
opencv test cases, and observed it can bring 2x to 10x improvement.

v2:
refine the parent loop unroll analysis method.

v3:
disable this pass for LLVM 3.3/3.4.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>

backend/src/CMakeLists.txt		diff \| blob \| history
backend/src/llvm/llvm_gen_backend.hpp		diff \| blob \| history
backend/src/llvm/llvm_to_gen.cpp		diff \| blob \| history
backend/src/llvm/llvm_unroll.cpp	[new file with mode: 0644]	blob