[TIR][REFACTOR][API-CHANGE] Change Call.name to Call.op(RelayExpr) (#5863)
authorTianqi Chen <tqchen@users.noreply.github.com>
Tue, 23 Jun 2020 00:47:01 +0000 (17:47 -0700)
committerGitHub <noreply@github.com>
Tue, 23 Jun 2020 00:47:01 +0000 (17:47 -0700)
commit82d157f0b83ae17fde7bbfca14110aa2f2b80b61
tree62504e3f22f6b1c64c567ba6ab8ccfa89fb1247b
parent6fea4bdabfb1b2a3c7ac7f27fb326810c110c333
[TIR][REFACTOR][API-CHANGE] Change Call.name to Call.op(RelayExpr) (#5863)

* [TIR][REFACTOR][API-CHANGE] Change Call.name(string) to Call.op(tvm::Op/RelayExpr)

This PR brings a major refactor to the tir::Call structure.
The current Call structure uses a string field(name) to identify the
function/intrinsic being called. This approach is limited as we start
to expand TIR to be more structured. In particular, we are interested in
the following aspects:

- Type a function and perform better compile time type checking so that we
  can find errors early.
- Register additional properties about an operator, such as:
  - Whether an intrinsic can be vectorized
  - What is the adjoint function of the intrinsic(for tensor expression AD)
  - Whether the operator has side effect.
- Perform specific codegen about an intrinsic if necessary.
- Call into another function in the same module.

The refactor changes the Call.name field to Call.op.
The Call.op field has a RelayExpr type, and we can pass:

- A tvm::Op which represents the corresponding intrinsic.
- A tvm::GlobalVar for calling into another function in the IRModule.

All the current intrinsics are migrated by registering an tvm::Op.
Because the unified IR shares a single Op registry. We use the "tir"
namespace for tir related intrinsics, for example bitwise and is now registered
under `tir.bitwise_and`.

To simplify upgrade, we introduce a `tir.call_extern` intrinsic
that allows us to call into arbitary external function without type checking.
However, we should move towards more type checked variants in the system.

Under the new op design. We should no longer try to pattern match all the
specific intrincis. Instead, we should rely on attr of each Op to do transformation.
For example, the vectorization pass depends on the TVectorizable property of the op,
which can be registered independently.

In this way, we can still grow the number of intrinsics when necessary
without having to change all the passes.

The same rule applies for tensor expression AD. Currently we are performing
AD by pattern match on operators like exp, sin, cos. We should instead
change to the ajoint registeration mechanism like those in relay.

Followup refactors need to be performed, including:
- Fold the Call.call_type into operator's attribute.
- Enrich the operator registry information
- Refactor passes(e.g. AD, intrin lowering) to use the attribute based transformation

* Fix nms

* Fix remaining testcase

* Address review comment
120 files changed:
include/tvm/relay/expr.h
include/tvm/tir/builtin.h [new file with mode: 0644]
include/tvm/tir/expr.h
include/tvm/tir/function.h
include/tvm/tir/op.h
include/tvm/tir/op_attr_types.h [new file with mode: 0644]
include/tvm/tir/stmt.h
python/tvm/contrib/nvcc.py
python/tvm/target/datatype.py
python/tvm/target/intrin.py
python/tvm/te/hybrid/calls.py
python/tvm/tir/expr.py
python/tvm/tir/ir_builder.py
python/tvm/tir/op.py
src/arith/const_int_bound.cc
src/arith/ir_mutator_with_analyzer.cc
src/arith/modular_set.cc
src/arith/pattern_match.h
src/arith/rewrite_simplify.cc
src/contrib/hybrid/codegen_hybrid.cc
src/ir/op.cc
src/printer/tir_text_printer.cc
src/relay/transforms/pass_util.h
src/target/intrin_rule.h
src/target/llvm/codegen_arm.cc
src/target/llvm/codegen_cpu.cc
src/target/llvm/codegen_cpu.h
src/target/llvm/codegen_llvm.cc
src/target/llvm/codegen_llvm.h
src/target/llvm/codegen_nvptx.cc
src/target/llvm/codegen_x86_64.cc
src/target/llvm/intrin_rule_llvm.cc
src/target/llvm/intrin_rule_llvm.h
src/target/llvm/intrin_rule_nvptx.cc
src/target/llvm/intrin_rule_rocm.cc
src/target/source/codegen_c.cc
src/target/source/codegen_c.h
src/target/source/codegen_c_host.cc
src/target/source/codegen_cuda.cc
src/target/source/codegen_cuda.h
src/target/source/codegen_metal.cc
src/target/source/intrin_rule_cuda.cc
src/target/source/intrin_rule_opencl.cc
src/target/spirv/codegen_spirv.cc
src/target/spirv/intrin_rule_spirv.cc
src/target/stackvm/codegen_stackvm.cc
src/target/stackvm/codegen_stackvm.h
src/te/autodiff/jacobian.cc
src/te/operation/compute_op.cc
src/te/operation/cross_thread_reduction.cc
src/te/operation/extern_op.cc
src/te/operation/tensor_compute_op.cc
src/te/operation/tensorize.cc
src/te/schedule/schedule_postproc_rewrite_for_tensor_core.cc
src/tir/analysis/verify_memory.cc
src/tir/ir/buffer.cc
src/tir/ir/expr.cc
src/tir/ir/expr_functor.cc
src/tir/ir/stmt.cc
src/tir/op/builtin.cc [new file with mode: 0644]
src/tir/op/op.cc [moved from src/tir/ir/op.cc with 82% similarity]
src/tir/op/runtime.cc [new file with mode: 0644]
src/tir/transforms/arg_binder.cc
src/tir/transforms/bf16_legalize.cc
src/tir/transforms/bound_checker.cc
src/tir/transforms/combine_context_call.cc
src/tir/transforms/coproc_sync.cc
src/tir/transforms/inject_virtual_thread.cc
src/tir/transforms/ir_util.h
src/tir/transforms/loop_partition.cc
src/tir/transforms/lower_device_storage_access_info.cc
src/tir/transforms/lower_intrin.cc
src/tir/transforms/lower_thread_allreduce.cc
src/tir/transforms/lower_tvm_builtin.cc
src/tir/transforms/lower_warp_memory.cc
src/tir/transforms/make_packed_api.cc
src/tir/transforms/narrow_datatype.cc
src/tir/transforms/rewrite_unsafe_select.cc
src/tir/transforms/split_host_device.cc
src/tir/transforms/storage_access.cc
src/tir/transforms/storage_flatten.cc
src/tir/transforms/storage_rewrite.cc
src/tir/transforms/tensorcore_infer_fragment.cc
src/tir/transforms/thread_storage_sync.cc
src/tir/transforms/vectorize_loop.cc
tests/cpp/ir_functor_test.cc
tests/python/relay/test_ir_parser.py
tests/python/unittest/test_arith_canonical_simplify.py
tests/python/unittest/test_target_codegen_c_host.py
tests/python/unittest/test_target_codegen_llvm.py
tests/python/unittest/test_target_codegen_static_init.py
tests/python/unittest/test_te_schedule_tensor_core.py
tests/python/unittest/test_tir_constructor.py
tests/python/unittest/test_tir_nodes.py
tests/python/unittest/test_tir_stmt_functor_ir_transform.py
tests/python/unittest/test_tir_transform_bf16_legalize.py
tests/python/unittest/test_tir_transform_combine_context_call.py
tests/python/unittest/test_tir_transform_coproc_sync.py
tests/python/unittest/test_tir_transform_inject_double_buffer.py
tests/python/unittest/test_tir_transform_inject_virtual_thread.py
tests/python/unittest/test_tir_transform_rewrite_unsafe_select.py
tests/python/unittest/test_tir_transform_storage_flatten.py
tests/python/unittest/test_tir_transform_thread_sync.py
tests/python/unittest/test_tir_transform_vectorize.py
topi/include/topi/detail/extern.h
topi/include/topi/elemwise.h
topi/python/topi/arm_cpu/bitserial_conv2d.py
topi/python/topi/arm_cpu/tensor_intrin.py
topi/python/topi/cuda/nms.py
topi/python/topi/cuda/rcnn/proposal.py
topi/python/topi/cuda/sort.py
topi/python/topi/cuda/tensor_intrin.py
topi/python/topi/x86/tensor_intrin.py
topi/tests/python/test_topi_basic.py
topi/tests/python/test_topi_math.py
tutorials/language/intrin_math.py
tutorials/optimize/opt_conv_tensorcore.py
vta/python/vta/environment.py
vta/python/vta/intrin.py
vta/python/vta/transform.py