review.tizen.org Git - platform/upstream/tvm.git/commit

author	wpan11nv <60017475+wpan11nv@users.noreply.github.com>
	Sat, 18 Jan 2020 02:58:11 +0000 (18:58 -0800)
committer	Wuwei Lin <wuwei@apache.org>
	Sat, 18 Jan 2020 02:58:11 +0000 (21:58 -0500)
commit	2630ffcbc52973aaf86fd6b7000a6f2f30d5f25c
tree	e28e41cad92b701004735b2360e0358d74bfc0a7	tree \| snapshot
parent	2738eddf4ad7aded6760466dff36b15e6503050d	commit \| diff

[CodeGen][CUDA] Improve CUDA vectorizer (#4736)

- Fixes issues to enable fp16 vectorizer. Now correct packing and
  unpacking CUDA code will be emitted. Enabled more unit tests.

- Do not emit code to read the first lane from an undef variable

  int _3;
  _3 = _3 & ~(0x000000ff << 0) | ...

  and emit the following code instead:

  _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0);

  Note that nvcc 10.2 is forgiving and emits the same code for both cases.
  A warning appears in test_codegen_cuda.py.

Signed-off-by: Wei Pan <weip@nvidia.com>

include/tvm/runtime/data_type.h		diff \| blob \| history
src/codegen/codegen_cuda.cc		diff \| blob \| history
src/codegen/literal/cuda_half_t.h		diff \| blob \| history
tests/python/unittest/test_codegen_cuda.py		diff \| blob \| history