Zhao Wu [Wed, 15 Jul 2020 22:23:40 +0000 (06:23 +0800)]
[clflush] Enable x86 cpu cache flush (#5914)
Mahesh Ambule [Wed, 15 Jul 2020 20:24:24 +0000 (01:54 +0530)]
[TARGET] ONNX codegen (#5052)
* Relay to ONNX converter
* Relay to ONNX op test cases
* Relay to ONNX end to end model test cases
* Add test cases to jenkins
* CI CD fixes
* ONNX codegen
* ONNX codegen
* ONNX codegen
* onnx testcases
* ONNX codegen
* test onnx
* ONNX codegen
* shape calculation
* move onnx codegen to contrib/target
* review comments
* ONNX target use visitor
* onnx fixes
* lint fixes
* doc string changes
* review comments
* review comment fixes
* review comment
* pytest skip
* rename type to node type
* test
* Fix for constantshpae, add exp, fix for metadatamodule
* Fix cpplint
* change error tol values
Zhao Wu [Wed, 15 Jul 2020 17:06:36 +0000 (01:06 +0800)]
[Doc] update frontend tutorials to new model based runtime (#6063)
Chenfan [Wed, 15 Jul 2020 07:24:19 +0000 (15:24 +0800)]
[Ansor][AutoTVM v2.0] Part 1: Rename namspace form auto_schedule to auto_scheduler (#6059)
* Rename namespace auto_schedule to auto_scheduler
* Update
* Lint fix
Zhao Wu [Wed, 15 Jul 2020 03:07:43 +0000 (11:07 +0800)]
[RUNTIME] Support module based interface runtime (#5753)
Andrew Reusch [Wed, 15 Jul 2020 01:11:56 +0000 (18:11 -0700)]
Build crttest and cpptest separately. (#6057)
* Build crttest and cpptest separately.
* Try to fix random CI crashing, likely caused by concurrent cmake execution.
* Revert to -j8
Chenfan [Wed, 15 Jul 2020 00:16:22 +0000 (08:16 +0800)]
[Ansor][AutoTVM v2.0] Part 0: Ansor minimum system for auto schedule generating (#5962)
* Code migration Start (#1)
* Init commit: Code migration Start
* Add loop_state.cc/h
* Add ComputeDAG basic test
* Split transform_step out & Update more UTs (#3)
* Split transform_step out
* Update GetProducers & GetConsumers
* Update UTs
* Add UT for CacheReadWrite & Some bug fix
* Add search_task, measure and serialization (#4)
* Add FollowSplit & FollowFusedSplit tests
* Update dag.InferBound & its UT
* Add search_task, measure and serialization
* Update Serialization UT
* Add MetaTileRewritePolicy (#5)
* Add feature
* Add cost_model, meta_tile_rewrite_policy
* Add MetaTileRewritePolicy basic UT
* Basic Python API for State (#6)
* Add Basic Python API for State
* Add UTs for State
* Add Python API: Measure & Task (#7)
* Update the return value of state operation
* Add task
* Copy measure.py & utils.py
* Fix LocalBuilder
* Fix LocalRunner
* Add ansor.auto_schedule() API; First AutoSchedule working version(#8)
* Add basic Python support for ansor.auto_schedule
* Update AutoSchedule API
* Bug fix for get the attach point of a fused iter
* Update UT after infer bug fix
* Bug fix & Add python serialization API (#10)
* Delete C++ UT hack since Python is ready
* Add ndarray.non_empty
* Update Serialization python API
* Improve code style, python wrapper and test cases (#11)
* Update c++ code style and unit test
* Update python State wrapper and test cases
* fix unit tests
* Add RPCRunner & OpenCL/CUDA test (#12)
* Add RPCRunner & OpenCL search test
* Add CUDA search test
* Add RPCRunner test
* rebase to upstream/master
* Add Ansor basic tutorial (#13)
* Add basic tutorial
* migrate feature extraction (#14)
* Add XGBModel & RPCRunnerWarpper (#15)
* Add XGBModel & RPCRunnerWarpper
* Revert "Add Parallel Granularity Mutation"
* Migrate workload_registry.py (#16)
* add workload registry
* update
* update
* add task scheduler (#17)
* Add conv2d cuda tutorial with workload registry (#18)
* add tune_test.py (the old tune_wkl.py) (#19)
* add tune_test.py (the old tune_wkl.py)
* update
* fix measure
* fix for gpu
* Code refine for tune_test.py & Add a pre load callback (#20)
* Bug fix for tutorials
* Add PreLoadMeasuredStates
* Add search_callback support for task tuner
* Code refine for tune_test.py
* Update
* Update
* Update
* Update
* Bug fix
* Add python custom sketch rule (#21)
* Add custom sketch rule
* Bug fix
* Ansor Relay Integration (without layout rewrite) (#22)
* relay integration
* Add tune_op_subgraph.py & Some code clean for tune_network.py (#23)
* Add single op tune scripts
* Add tune subgraph support
* Merge all op & all subgraph to one file
* Rename file
* add explicit_unroll_max_extent (#25)
* Add Index simplification & API update (#26)
* Add vectorized cooperative_fetching test
* Update math simplify for vectorized CF
* File rename
* Update tune_network
* API update
* Update PreLoadMeasuredStates & Some bug fix (#27)
* Add a threading wrapper to fix the test bug
* Set default TVM_USE_AUTO_SCHEDULER to false
* Update PreLoadMeasuredStates callback
* Add tensorize step for loop_state (#31)
* Add tensorize step
* State python api update (#33)
* Start to update api
* Add compute_dag to state
* API update
* kernel layout rewrite (#28)
* kernel layout rewrite
* remove some hacks
* add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass
* set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite
* [cache flush] port cache flush to ansor (#32)
* Improve relay integration (#34)
* tmp checkpoint
* Improve relay integration
* Improve relay integration
* Fix xgb error & Simplify dispatcher (#35)
* Rename "MetaTileRewritePolicy" to "SketchPolicy". (#36)
* Rename "MetaTileRewritePolicy" to "SketchPolicy".
* Add a new class for auto_unroll_max_step, storage_offset in StageNode
* fix tune_op_subgraph.py
* rebase
* Migrate all node::make to noderef's construct function (#37)
* Start to move xxxnode::make to noderef()
* Update
* Update
* Finish transform_step
* Finish comute dag & auto schedule
* Update
* Update
* Update
* Update
* Update
* Code refine
* Code refine
* Code refine
* Update
* Update
* Some lint fix & Recover the double constructor of tvm::PrimExpr (#39)
* lint fix
* clang-format-fix
* pylint fix
* Update
* Recover the double constructor of tvm::PrimExpr
* Fix pylint
* pylint fix
* pylint fix
* Add MutateComputeLocation and MutateParallel in evolutionary search (#40)
* Add MutateComputeLocation and MutateParallel in evolutionary search
* fix lint
* Improve loop state python API (stage_tensors -> stage_ops) (#41)
* improve loop state python API (stage_tensors -> stage_ops)
* fix
* ComputeDAG bug fix & Add Custom TensorCore Matmul Example (#42)
* Bug Fix
* Sample example of Custom TensorCore Matmul
* Rever Commits, Start to build minimum Ansor system
* Code clean for minimum Ansor system
* Bug fix & Delete AccessAnalyzer
* Delete attachmap & Code clean
* Doc update
Update statenode::stages from vector to Array
* Headfile update & Python doc update
* clang-format fix
* pylint fix
* Update
* Doc update
* Update
* Bug fix after code merge to the new master
* clang-format fix
* Update
* Update
* Update std::vector to Array; Update verbosity setting; Some commemts
addressed
* std::vector->Array & std::string->String
* Add init_state to ComputeDAG
* Update
* Update some unordered_map to Map
* clang-format fix
* Comments addressed
Delete ReplayAndInferBound
Delete ReplaySteps & InferBoundCommon
* Lint fix
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Rename ansor namespace to auto_schedule
* Update
* Rename ThreadPool to ParallelFor
* Add parallel_for
* Remove ThreadPool
* Update python/tvm/auto_schedule/auto_schedule.py
* trigger CI
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Minmin Sun (孙敏敏) <minmin.smm@alibaba-inc.com>
Co-authored-by: Zhao Wu <zhaowu@apache.org>
Lily Orth-Smith [Tue, 14 Jul 2020 22:43:07 +0000 (15:43 -0700)]
[RELAY][DYN] Dynamic broadcast_to, zeros, ones (#6007)
* Dynamic BroadcastTo
* fixed lint!
* add test_one_hot() back
* add one_hot registration back
* Dynamic BroadcastTo
* fixed lint!
* add one_hot registration back
* fixed lint.. again
* fixed lint
* lint
* responding to comments
* skipping cuda in dynamic test
* skipping cuda in dynamic test
* fixed i386 test and GPU test
* lint
* starting ones and zeros
* fixed dynamic ones and zeros, wrote dyn ones and zeros test
* added static version of zeros, ones and added a check for size of types to static BroadCastToRel
* added dynamic to static pass for zeros and ones, dynamic test and dynamic to static test
* removed op_str in dyn to static pass test
* fixed lint
* fix lint hopefully
* removed import const
* removed import that was actually used
* copy all attributes from broadcast_to, ones, zeros, full
* responding to comments
* fixed build error
* finishing rebase
* fix lint
Co-authored-by: Lily Orth-Smith <lorthsmith@Lilys-MacBook-Pro.local>
Krzysztof Parzyszek [Tue, 14 Jul 2020 21:10:44 +0000 (16:10 -0500)]
[Hexagon] Remove use of designated initializers from hexagon_module.cc (#6055)
They are an extension, not yet a part of the C++ standard.
MORITA Kazutaka [Tue, 14 Jul 2020 15:53:56 +0000 (00:53 +0900)]
[BYOC][COREML] Handle one symbol for each runtime (#5989)
* [BYOC][COREML] Handle one symbol for each runtime
* LOG -> DLOG
Jinyu Xie [Tue, 14 Jul 2020 06:51:54 +0000 (02:51 -0400)]
Fix pytorch frontend prim::Constant issue (#6051)
Matthew Brookhart [Tue, 14 Jul 2020 03:42:54 +0000 (20:42 -0700)]
Refactor to expose MakeOp functions to C++ (#6047)
* Initial Refactor
* add templated nn Make* functions
* fix build typo
* inline functions, fix unit tests
Liangfu Chen [Tue, 14 Jul 2020 03:16:49 +0000 (11:16 +0800)]
[IR] Fix a primitive check error (#5991)
* fix primitive check error
* assuming every Op has Type defined
* CHECK_NE -> CHECK
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Giuseppe Rossini [Tue, 14 Jul 2020 03:15:42 +0000 (04:15 +0100)]
Fix conv2_gemm after target structure update (#6037)
After target structure changed in this RFC:
https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844/42
The conv2d optimizations was broken for the following reasons:
- "target" is now called mtriple (this changes how we test if the
architecture is AArch64)
- when we invoke "clang.create_llvm" we still need to specify the
"--target" option (set to aarch64-linux-gnu)
This submission reverts those changes
Change-Id: I04c597b91ca5800ddf4471255e2a358c60bc048e
Trevor Morris [Tue, 14 Jul 2020 01:12:28 +0000 (18:12 -0700)]
[Frontend][TFLite] Fix fully_connected converter when batch size is not 1 (#6038)
* Fix fully_connected when batched
* Remove unused variable
Dmitriy Smirnov [Mon, 13 Jul 2020 23:08:56 +0000 (00:08 +0100)]
Add support for tflite arg_min and arg_max (#5992)
* [Relay][Frontend][TFLite] Add parser support for arg_min_max
* this implementation supports only the case when the axis is a scalar
* tflite 1.13 removes all dims of size 1, Relay doesn't do this
* WARNING: every newer version of tflite > 1.13 needs keepdims=TRUE
* Migrated to tflite 2.1.0
keepdims set to False and added some checks
Note the unit tests emmitted following warning:
/workspace/src/te/schedule/bound.cc:119: not in feed graph consumer = compute(T_multiply_red_temp, 0x53f5050)
* linter
* Removed quantized argmin
Removed quantized argmin due to inablility to provide proper test case
* added negative ranges
* re-trigger CI
Co-authored-by: Ina_Dobreva <Ina.Dobreva@arm.com>
Yi-Hsiang (Sean) Lai [Mon, 13 Jul 2020 22:39:10 +0000 (18:39 -0400)]
[Relay] Add pass for getting calibration data from a relay module (#5997)
* add simple pass to extract outputs
* complete pass that collects all function inputs/outputs
* add analysis pass for collecting outputs
* reorganize the files
* add the first test
* update test with tuples
* clean up Python code
* merge with upstream
* clean up transform.py
* add comments for cpp files
* fix lint issues
* update submodules
* modify files according to the review
* fix style and typo
* fix lint error
* add checks for repeated function calls
* fix lint error
* merge review comments
* small simplification
* revise the code according to the review comments
* add username in TODO
* use IRModule directly
* use better APIs according to the review
* apply comments from the reviewer
* retrigger ci
Krzysztof Parzyszek [Mon, 13 Jul 2020 22:29:52 +0000 (17:29 -0500)]
[LLVM] Create TBAA information based on the unrelying buffer type (#6046)
Currently, the TBAA information is based on the access type, i.e.
the data type from the load or store instruction. When the same
memory area is accessed with different types, the corresponding
load/store instruction may end up not being aliased to each other.
This could lead to incorrect code being generated.
An example of when such a situation can occur is when two different
buffer_decl's are created for the same buffer:
ba = buffer_decl(... dtype = 'int16' ...)
bb = buffer_decl(data = ba.data, dtype = 'int32x32' ...)
Then instructions
ba[x] = 0
... = bb[x]
may be reordered in the final code due to the alias info indicating
that they are not aliased.
Lianmin Zheng [Mon, 13 Jul 2020 17:46:27 +0000 (10:46 -0700)]
[CODEGEN] Fix code generation bugs for C/CUDA & Improve VerifyGPUCode pass (#6041)
Josh Fromm [Sun, 12 Jul 2020 12:26:02 +0000 (05:26 -0700)]
[Relay][Frontend][Onnx] GRU Layer Support (#6020)
* GRU debugging and testing added to onnx frontend.
* All tests working and code formatted.
* Fix lint issues.
* Add a test case and changed RNN argument parsing.
* Small refactor.
Andrew Reusch [Sun, 12 Jul 2020 09:28:31 +0000 (02:28 -0700)]
µTVM CRT modifications for on-device RPC server (#5921)
* Reorganize CRT into parts, public API, and add standalone build.
* Create a make-based build in src/runtime/crt. This is intended to
be built in build/standalone_crt (generated by running ninja
standalone_crt in build/). Its job is to build CRT without
depending on headers not explicitly allowed in CRT.
* Create a "public-facing" CRT API targeted to firmware running
alongside CRT in include/tvm/runtime/crt. Developers who are
integrating the CRT are the target of this API.
* Reorganize CRT internally into common/ and graph_runtime/
pieces. Build each pieces as a separate statically-linked library.
* Slim down TVMGraphRuntime public-facing API to just the functions
that are used externally.
* Updates to apps/bundle_deploy to make this work.
* Add TVMFuncRegistry, CRT test infrastructure, and tests.
* Also add error_codes.h, a file containing error codes returned by CRT.
* Add TVMErrorf()
* [API_CHANGE] Integrate func registry into CRT.
* NOTE: This changes the default API for functions exposed under the
CRT by the TVMFuncCall API. `resource_handle` is now always given
as a new 6th parameter.
* `resource_handle` is NULL when invoked on a global function and a
pointer to the module owning the function otherwise.
* Generalize arena-based memory manager.
* lint
* Fix git-clang-format arg parsing
* add apache header
* add mutable func registry tests
* git-clang-format
* fix more lint
* Move memory_test to crttests.
* fix tests
* checkpoint
* checkpoint
* bundle_deploy demo_static works
* rm debug printf
* git-clang-format
* fix lint
* add asf header
* pylint
* update build configs for jenkins
* make regression compiler happy
* fix build errors in regression GCC
* address comments
* git-clang-format
* fix for 32-bit cpp regression
* fix incorrect use of memcpy and tests for 32-bit
* clang-format
Krzysztof Parzyszek [Fri, 10 Jul 2020 22:36:17 +0000 (17:36 -0500)]
[LLVM/CPU] Terminate basic block after "ret" instruction (#6036)
* [LLVM/CPU] Terminate basic block after "ret" instruction
"Ret" is a terminator in LLVM IR and there should be no instructions
in the basic block following it. When generating a "ret", end the
current block and start a new one.
Matthew Brookhart [Fri, 10 Jul 2020 18:18:34 +0000 (11:18 -0700)]
[Relay][Dyn] Dynamic TopK Op (#6008)
* add dynamic topk op
* add topk to dynamic_to_static pass
* fix TF test
* fix pylint
Zhi [Fri, 10 Jul 2020 18:03:23 +0000 (11:03 -0700)]
[REFACTOR][RELAY] Move invoke_tvm_op and shape_func to vm dialect (#5958)
* [REFACTOR][RELAY] Move invoke_tvm_op and shape_func to vm dialect
* address comments
Giuseppe Rossini [Fri, 10 Jul 2020 17:58:22 +0000 (18:58 +0100)]
[Bug fix] Fix in arm_cpu/conv2d_alter_op for NHWC quantized (#6027)
* Bug fix] Fix in arm_cpu/conv2d_alter_op for NHWC quantized
Few minor typos to be fixed in topi/arm_cpu/conv2d_alter_op.py for the
NHWC quantized route:
- Kernel shape was misread (CO, IC, KH, KW) -> (KH, KW, IC, OC)
- Pad along the K dimension was misspelled: pad_k -> pad_K
- Workload name was wrong: "conv2d_NHWC_int8_without_tranform.arm_cpu"
-> "conv2d_NHWC_quantized_without_transform.arm_cpu"
This submission fixes those errors and add a further test for conv2d_alter_op.py
Change-Id: I0622df05f1d4d15311946f6e75f1840a34815a5b
* Move -target to -mtriple
Change-Id: Ieff80c774e8ab0fa7f48d83d50a79f3a62e8fe13
* Retrigger tests
Change-Id: I5541bed54eacc5063bf4a4fda725209cc23f621e
Krzysztof Parzyszek [Fri, 10 Jul 2020 17:57:05 +0000 (12:57 -0500)]
Add creation of Hexagon device in RPC client (#6035)
lhutton1 [Fri, 10 Jul 2020 16:43:25 +0000 (17:43 +0100)]
[CI][ACL] Enable ACL installation in ci_cpu docker container (#5916)
This patch adds a cross-compiled ACL build to the ci_cpu dockerfile used for CI.
Change-Id: I66e1521ab553306bc7367b65acc0363e750f0211
Cody Yu [Fri, 10 Jul 2020 11:38:46 +0000 (04:38 -0700)]
[BYOC] JSON Runtime with DNNL End-to-End Flow (#5919)
* json runtime
* json dnnl WIP
* fix ArrayNode usages
* Support composite functions
* DNNL json runtime: conv2d/add/relu/dense/bn
* add a more complex example
* fix bias memory issue
* rebase to upstream
* merge to metadata module, remove the unused driver
* handle constant
* support composite functions
* support DNNL constant
* clean up
* Simplify dnnl user code
* GetDataSize
* fix dense bug
* improve cmake
* zero copy
* add unit test
* move json to contrib/json
* fix cmake
* lint
* max_digits10 for fp serialization
* only keep base getfunction
* fix lint
* zero copy for all data entries
* address comments
* enable ci
* address comment; fix bug
* address comment
Co-authored-by: Zhi Chen <chzhi@amazon.com>
Tianqi Chen [Fri, 10 Jul 2020 05:28:15 +0000 (22:28 -0700)]
[CI] Update ci-cpu to the latest (#6031)
Tianqi Chen [Fri, 10 Jul 2020 05:27:50 +0000 (22:27 -0700)]
[DOCKER] Pin keras version (#6032)
windclarion [Thu, 9 Jul 2020 20:12:42 +0000 (04:12 +0800)]
[TARGET] each option of target str should only contain one '=' (#5988)
src/target/target_id.cc ParseAttrsFromRawString L222:
if ((pos = FindUniqueSubstr(s, "=")) != -1)
require option contains only one '='
Signed-off-by: windclarion <windclarion@gmail.com>
Zheng Jiang [Thu, 9 Jul 2020 04:54:49 +0000 (12:54 +0800)]
fix typos in comments and relay tutorial (#5999)
* [TypoFix]fix typos in comments and relay tutorial
* retrigger
windclarion [Thu, 9 Jul 2020 04:52:24 +0000 (12:52 +0800)]
[RUNTIME] if a param not in input, we still consume it's data (#5990)
so the read pointer of stream can move forward
Signed-off-by: windclarion <windclarion@gmail.com>
Siju Samuel [Thu, 9 Jul 2020 01:11:26 +0000 (06:41 +0530)]
[PYTORCH]Gather op support added (#6013)
* [PYTORCH]Gather op support added
* retrigger
HUAN-PING SU [Wed, 8 Jul 2020 20:30:41 +0000 (04:30 +0800)]
Remove deplicate line (#6017)
Jared Roesch [Wed, 8 Jul 2020 20:04:42 +0000 (13:04 -0700)]
[Frontend][Relay] Add Parser 2.0 (#5932)
lhutton1 [Wed, 8 Jul 2020 17:14:58 +0000 (18:14 +0100)]
Option to specify alternate directory to output build to (#6016)
This is useful when you would like to manage 2 separate builds in the same tvm tree. You can specify a build directory when using make by adding OUTDIR=alternate-build-dir.
Change-Id: I3efed1135343f3903007115ce5dd683ef7bd9e8c
Yizhi Liu [Wed, 8 Jul 2020 15:35:28 +0000 (08:35 -0700)]
[TEST][FLAKY] test_arith_solve_linear_inequality.py::test_multi_equal (#6014)
Haibin Lin [Wed, 8 Jul 2020 05:37:10 +0000 (22:37 -0700)]
[Frontend][MXNet] MXNet frontend support for AMP cast op (#5976)
* amp_cast
* fix test
* more tests
* test more ctxs
* fix doc
* fix typo
* address CR comment
* fix lint
* revert doc change
* Revert "revert doc change"
This reverts commit
a410dd5569730ac81af67ddb333c3afbe97eddd7.
* fix doc
* Update relay_pass_infra.rst
Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-138.ec2.internal>
Tianqi Chen [Wed, 8 Jul 2020 03:55:44 +0000 (20:55 -0700)]
[VTA] Move compiler related registry items to vta/build_module.py (#6012)
Krzysztof Parzyszek [Wed, 8 Jul 2020 00:34:10 +0000 (19:34 -0500)]
Cache object refs in loop partitioner instead of object pointers (#6004)
* Cache object refs in loop partitioner instead of object pointers
Loop partitioner modifies the IR, which can cause TIR objects to
become dead and be destroyed. To avoid working on junk data cache
object references instead of object pointers.
* Fix format/lint errors
Krzysztof Parzyszek [Wed, 8 Jul 2020 00:31:59 +0000 (19:31 -0500)]
[LLVM] Auto-convert shuffle with single index to "extract element" (#6006)
* [LLVM] Auto-convert shuffle with single index to "extract element"
Data types with a single lane are treated as scalars in TVM. On the
other hand, in LLVM there is a difference between a scalar type and
a vector type with a single lane. Because of that, a shuffle with
a single index is equivalent to extracting an element in TIR, but
not in the generated LLVM IR. This patch changes the LLVM codegen
for shuffle to auto-convert single-lane vectors to scalars.
* Try another build
Jared Roesch [Wed, 8 Jul 2020 00:15:13 +0000 (17:15 -0700)]
Fix what looks like bizzare copy-paste issue (#6010)
Tianqi Chen [Tue, 7 Jul 2020 19:53:51 +0000 (12:53 -0700)]
[DOCKER] Only pass pythonpath for ci images (#6005)
Matthew Brookhart [Tue, 7 Jul 2020 18:50:35 +0000 (11:50 -0700)]
Dynamic Tile Op (#5983)
* first working dynamic tile passes first test
* add dyn tile to dynamic_to_static
* fix cpplintt
* respond to review comments. Thanks @siju-samuel
* make dynamic tile compatible with numpy API
Christian Clauss [Tue, 7 Jul 2020 16:10:24 +0000 (18:10 +0200)]
Undefuned names: import os for line 324 & import re for line 308 (#6003)
Christian Clauss [Tue, 7 Jul 2020 16:10:10 +0000 (18:10 +0200)]
Update main.yml (#6002)
Lianmin Zheng [Tue, 7 Jul 2020 15:18:31 +0000 (08:18 -0700)]
Fix tune_relay_cuda.py (#6001)
Yizhi Liu [Mon, 6 Jul 2020 17:04:42 +0000 (10:04 -0700)]
[Arith] Inequalities solver (#5618)
Josh Fromm [Mon, 6 Jul 2020 02:38:31 +0000 (19:38 -0700)]
[Relay][Frontend][Onnx] Small bug fix for Conv1D imports. (#5995)
* Fix autopad bug in onnx importer for conv1d.
* Fix output shape in test.
* Undo commented out lines oops.
Junru Shao [Sun, 5 Jul 2020 16:52:08 +0000 (09:52 -0700)]
[Target] Use TargetNode::attrs for Target serialization (#5993)
Animesh Jain [Fri, 3 Jul 2020 01:13:56 +0000 (18:13 -0700)]
[TFLite] QNN support for TFLite 2.1.0 quantized models (#5848)
* [TFLite] TFLite 2.x parser quantization support.
* Address comments. Fix a bug for depthwise conv
* Added tests for relu, conv, quantize. Address comments.
* Using web-data. Minor refactoring.
* Removing TF hub package
* Trigger CI.
* Handle TFLite input layer naming.
* Addressing reviews.
* Retrigger CI.
Krzysztof Parzyszek [Fri, 3 Jul 2020 00:04:22 +0000 (19:04 -0500)]
[LLVM] VectorType::get with two parameters is deprecated in LLVM 11+ (#5984)
In LLVM 11+ the distinction between fixed and scalable vector types
has become more explicit. Before the introduction of scalable vector
types VectorType::get(e,n) created what is now a fixed vector type.
With the addition of scalable types, it is recommended to use
FixedVectorType and ScalableVectorType classes directly. Alternatively,
there is a VectorType::get that accepts a 3rd parameter indicating
whether the type should be fixed or scalable.
Using the older VectorType::get that implicitly assumes the fixed type
is deprecated and LLVM now generates a warning.
Change calls to VectorType::get to FixedVectorType::get to avoid
compilation warnings.
Josh Fromm [Thu, 2 Jul 2020 21:19:41 +0000 (14:19 -0700)]
[Tutorial] Demo showing how to run a pruned 🤗 model. (#5975)
Junru Shao [Thu, 2 Jul 2020 19:23:57 +0000 (12:23 -0700)]
[Target] Migrate data structure of TargetNode (#5960)
Krzysztof Parzyszek [Thu, 2 Jul 2020 19:19:16 +0000 (14:19 -0500)]
[LLVM] Remove redundant function CreateBufferVecPtr (#5982)
The functions CreateBufferPtr and CreateBufferVecPtr do the exact
same thing, so there is no need for both of them to exist. The
latter is only used in place, which further suggests that the
distinction is unnecessary.
Leslie-Fang [Thu, 2 Jul 2020 14:58:46 +0000 (22:58 +0800)]
fix tvm relay testing tf.py typo error (#5977)
Lianmin Zheng [Thu, 2 Jul 2020 05:59:21 +0000 (22:59 -0700)]
[TOPI] Fix x86 conv2d template when tuning with unpacked layout (#5938)
* fix x86 conv2d and conv2d_transpose template
* address comments
Trevor Morris [Thu, 2 Jul 2020 02:14:33 +0000 (19:14 -0700)]
[Relay/TOPI][OP] Add meshgrid op in Relay, TOPI, Pytorch frontend (#5961)
* Add meshgrid op with pytorch importer
* Fix c++ lint
* Fix pylint
* Meshgrid: add scalar test for pytorch, add topi python wrapper
* Add indexing mode attr.
* Add MeshgridAttrs python binding
* c++ lint
Lianmin Zheng [Wed, 1 Jul 2020 19:17:54 +0000 (12:17 -0700)]
[RELAY] Add resnet-3d & Update network definitions for NHWC layout (#5945)
Matthew Brookhart [Wed, 1 Jul 2020 18:39:21 +0000 (11:39 -0700)]
[DYNAMIC] Add Dynamic reshape to a dynamic namespace and add DynamicToStatic Pass (#5826)
* Dynamic reshape passing tests
* Add Dynamic to Static Pass
* rename test file to prevent pytest conflicts
* fix clang build
* add nested dynamic shape test
* remove cuda tests until VM supports dynamic shapes
* rename namespace from dynamic to dyn
* fix lint
* fix lint again
* Remove incorrect doc strings
* remove dynamic behavior from standard reshape
* fix some tests
* merge dynamic and static interfaces in python
* fix missing import
* missed a reference to relay.dyn.reshape
* fix vta example
* respond to review comments
Trevor Morris [Wed, 1 Jul 2020 15:04:15 +0000 (08:04 -0700)]
Add MXnNet parser for box_decode (#5967)
Andrew Reusch [Wed, 1 Jul 2020 15:03:59 +0000 (08:03 -0700)]
Improve docker/bash.sh to handle git worktrees (#5970)
* improve error code when git ls-files fails
* fix docker/bash to handle git worktrees
Krzysztof Parzyszek [Tue, 30 Jun 2020 22:25:33 +0000 (17:25 -0500)]
Print right number of parentheses for LoadNode (#5965)
Stop printing the unnecessary ')' after each LoadNode that didn't
have a matching '('.
Krzysztof Parzyszek [Tue, 30 Jun 2020 17:58:58 +0000 (12:58 -0500)]
Raise an exception when extern function does not return Stmt (#5964)
The function for tvm.te.extern should return either PrimExpr or Stmt,
however there is no check if it actually does so. If it does not, the
result may be a segmentation fault later on. Catch this case early on,
so an informative message can be shown.
Giuseppe Rossini [Tue, 30 Jun 2020 15:49:46 +0000 (16:49 +0100)]
Fix small typo in nn.conv2d_gemm_weight_transform (#5925)
* Fix small typo in nn.conv2d_gemm_weight_transform
Change-Id: I7844d898ebf82592f78f478982262ef95f83cc3e
* Add TOPI conv2d_gemm unit tests
Change-Id: I9ed82a68acffcf0dd9720781f8be4aada9d8e6e4
Thomas Viehmann [Tue, 30 Jun 2020 15:48:44 +0000 (17:48 +0200)]
Make first order gradient graphs more efficient (#5959)
Previously, nodes are visited as often as they are used and each time a
derivative is computed. Only at the leaves were the contributions of
everything added. This patch changes this to add at any node that is
used several times.
abergeron [Tue, 30 Jun 2020 07:05:43 +0000 (03:05 -0400)]
Fix the meaning of conv{1,2}d_transpose output_padding parameter. (#5758)
* Add output_padding to generic
* Add output_padding to the reference impl
* Add output_padding to arm_cpu
* Add output_padding to the test
* Add output_padding for cuda
* Add output_padding for x86
* Make use of the new output_padding argument in Relay
* Adjust conv2d_transpose Relay test
* Fix lint errors
* Fix the VTA declaration of conv2d_transpose
* support for output padding in conv2d transpose
* some output padding will break IR pass
* Fix new conv2d_transpose test
* Update tophub
* Fix conv1d output_padding too.
* Fix the conv1d_transpose reference function.
* Fix the cuda impl
* fix the topi test for conv1d
* format
* Add tests for conv1d_transpose output_padding and some check that the values are valid.
* Add check in the implementations
* Add checks to the implementations of conv2d
* Make use of the output_padding argument from topi in relay.
* Fix relay tests asking for invalid output_padding
* Fix line length
* Fix vta tests
* Update tophub references
* Trigger CI
Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
Thomas Viehmann [Tue, 30 Jun 2020 03:35:36 +0000 (05:35 +0200)]
Amendments for gradients (#5941)
* Amendments for gradients
- We fix the dtype handling of consts in generated gradients.
- We add a collapse_sum_to instruction mirroring the collapse_sum_like.
While for general definitions (potentially dynamic shapes),
collapse_sum_like is the first choice, when moving to static,
using collapse_sum_to will greatly simplify the graph.
(This simplification is not part of the PR.)
* Fix Broadcast rel description in comment
Thank you, @MarisaKirisame
Thomas Viehmann [Tue, 30 Jun 2020 03:34:20 +0000 (05:34 +0200)]
[RELAY][GRAD] handle Tuple/TupleGetItem in first order gradient (#5946)
* handle Tuple/TupleGetItem in first order gradient
* Unify MultiOnes/MultiZeros.
Leon Wang [Tue, 30 Jun 2020 03:28:30 +0000 (11:28 +0800)]
Fix some typo errors in license header (#5956)
Signed-off-by: leonwanghui <wanghui71leon@gmail.com>
Trevor Morris [Tue, 30 Jun 2020 00:55:22 +0000 (17:55 -0700)]
[OpenCL] Fix OpenCL get_valid_counts errors due to intrinsic atomic_add (#5857)
* [OpenCL] Fix atomic add used by get_valid_counts
* Rename l -> load, add flag to enable atomics
* Opencl doesn't do data rearrangement
Tianqi Chen [Mon, 29 Jun 2020 15:25:46 +0000 (08:25 -0700)]
[TIR][ANALYSIS] Refine side effect analysis. (#5954)
Yong Wu [Mon, 29 Jun 2020 06:18:38 +0000 (14:18 +0800)]
[Relay] symbolic max_output_size (#5844)
* symbolic max_output_size
* pylint
* fix ci
Yanming Wang [Sun, 28 Jun 2020 23:10:19 +0000 (23:10 +0000)]
[BUGFIX] Add cuda 11 to contrib.nvcc.find_libdevice_path() (#5902)
Tianqi Chen [Sun, 28 Jun 2020 23:02:06 +0000 (16:02 -0700)]
[REFACTOR][TIR][API-Change] Range/IntSet API style consistency. (#5953)
- Range::make_by_min_extent -> Range::FromMinExtent
- Update the APIs in IntSet to use CamelCase
Zhi [Sun, 28 Jun 2020 17:05:50 +0000 (10:05 -0700)]
[RELAY][VM] Add shape_of instruction (#5855)
Meteorix [Sun, 28 Jun 2020 16:28:33 +0000 (00:28 +0800)]
add rm xla attributes in tf docs (#5950)
Meteorix [Sun, 28 Jun 2020 16:28:15 +0000 (00:28 +0800)]
raise right error in tensorflow split op (#5951)
Tianqi Chen [Sun, 28 Jun 2020 16:22:11 +0000 (09:22 -0700)]
[TIR] Improve Let/LetStmt support. (#5949)
Let/LetStmt are useful primitives to create variable bindings.
While let binding are harmful for simplification and integer analysis,
they are useful for other cases:
- C0: LetStmt is useful to represent a step that has side effect(e.g. call a PRNG)
- C1: Let expression can be used to create deep nested expression for complicated functions.
This PR improves the let support in the following ways:
- Enable vectorization support for let
- Change let simplification strategy to simplify the most trivial case
while ignore more complicated cases(to avoid deep nest explosion)
- Enhance arith module to handle const bound and modular set for let.
The overall recommendation is to only use Let in the cases when necessary(C0, C1).
Yizhi Liu [Sun, 28 Jun 2020 08:24:38 +0000 (01:24 -0700)]
[Doc] minor fix for release doc (#5948)
Lianmin Zheng [Sun, 28 Jun 2020 00:09:39 +0000 (17:09 -0700)]
fix string argument mismatch in GraphRuntimeCodegen (#5933)
Tianqi Chen [Sat, 27 Jun 2020 21:56:13 +0000 (14:56 -0700)]
[TIR][PASS] Remove legacy HoistIfThenElse (#5944)
This pass has not been migrated to the new transform API,
and contains potential bugs per https://github.com/apache/incubator-tvm/issues/5559.
Given that it is not being actively used, this PR remove this pass
from the collection.
Followup PRs are more than welcomed to land a better version that
conforms with the new transform API.
Tianqi Chen [Sat, 27 Jun 2020 19:31:47 +0000 (12:31 -0700)]
Update date in the NOTICE (#5942)
Tianqi Chen [Sat, 27 Jun 2020 17:54:26 +0000 (10:54 -0700)]
[TIR][OP][API-CHANGE] Remove CallNode.call_type in favor of attribute. (#5937)
This is a followup refactor for tir::Call.
Now that we have switched call->name to call->op, the function effect property
can be registered through the op itself, so we no longer need the call_type in the CallNode.
- Introduce CallEffectKind to provide a more fine grained categorization of calls.
- Introduce call_pure_extern and call_llvm_pure_intrin to
allow us to indicate pure calls in those cases.
- Migrate existing usecases to the new API.
Cody Yu [Fri, 26 Jun 2020 23:02:16 +0000 (16:02 -0700)]
add dnnl (#5936)
MORITA Kazutaka [Fri, 26 Jun 2020 17:34:33 +0000 (02:34 +0900)]
[CODEGEN][CONTRIB] Various update for CoreML codegen (#5934)
* [CODEGEN][CONTRIB] Various update for CoreML codegen
* fix lint error
Cody Yu [Fri, 26 Jun 2020 15:05:12 +0000 (08:05 -0700)]
[Runtime] Only initialize required module (#5926)
* init required modules
* trigger ci
* trigger ci
Baden Hughes [Fri, 26 Jun 2020 14:23:08 +0000 (00:23 +1000)]
Update code_review.rst (#5923)
editorial pass with corrections
Matthew Brookhart [Fri, 26 Jun 2020 14:22:43 +0000 (07:22 -0700)]
Add TupleGetItem to CSE (#5931)
* Add TupleGetItem to CSE
* rename a local variable
Chenfan [Fri, 26 Jun 2020 14:19:47 +0000 (22:19 +0800)]
[Arith][GPU]Rewrite simplify fix for Vectorized Cooperative Fetching (#5924)
Matthew Brookhart [Fri, 26 Jun 2020 14:15:07 +0000 (07:15 -0700)]
[PatternLang] Don't rewrite expressions used outside of the pattern (#5930)
* Don't rewrite expressions used outside of the pattern
* add comments
Lianmin Zheng [Fri, 26 Jun 2020 05:52:19 +0000 (22:52 -0700)]
[TE] Add LegalizeInvalidAttach to legalize the compute_at location after split or fuse (#5917)
* Add LegalizeInvalidAttach
* lint & typo
* lint & typo
* address comment
* fix lint
Cody Yu [Fri, 26 Jun 2020 02:26:03 +0000 (19:26 -0700)]
refine error (#5929)
Yizhi Liu [Fri, 26 Jun 2020 02:13:03 +0000 (19:13 -0700)]
[BACKPORT-0.6][Bugfix][Arith] keep div_mode during floordiv simplify (#5922)
Thomas Viehmann [Thu, 25 Jun 2020 16:59:12 +0000 (18:59 +0200)]
Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ (#5920)
- For LLVM 10+ we need to avoid calling Align with 0, or else
we get a crash.
- For ROCm 3.5+ we need to use code object 3 (the default in LLVM 9+)
but for ROCm < 3.5 we want the code object 2.
- As we want to separate codegen from the API, we need to add
a device api query for the version.
But every one else wants now one, too. (But I only filled it
in for CUDA for now.)
- I'm throwing in an addition of kMaxRegistersPerBlock for ROCm.
This was introduced for CUDA in #5898.
Baden Hughes [Thu, 25 Jun 2020 16:38:40 +0000 (02:38 +1000)]
Update install.rst (#5858)
* Update install.rst
minor cleanups/corrections
* Update install.rst
Fixed broken link
Haichen Shen [Thu, 25 Jun 2020 14:55:40 +0000 (07:55 -0700)]
[Relay][Vm] Some performance improvement to VM (#5901)
* make alignment constant
* tweak copyto and loadscalarint
* some safety check
* x
* lint
* fix
Chenfan [Thu, 25 Jun 2020 05:44:39 +0000 (13:44 +0800)]
CUDA device API & VerifyGPUCode pass update (#5898)
* Add kMaxRegistersPerBlock device api for cuda
* Add vectorize check to verify_gpu_code
* Lint fix
* Cast fix
Yao Wang [Thu, 25 Jun 2020 05:29:51 +0000 (22:29 -0700)]
[Thread Backend]Fix CPU Thread Binding for Multiple Sockets (#5918)
* Fix CPU Thread Binding for Multiple Sockets
* Backward compatibility