review.tizen.org Git - platform/upstream/tvm.git/log

projects / platform / upstream / tvm.git / log

Yulun Yao [Fri, 2 Aug 2019 15:51:14 +0000 (08:51 -0700)]

[DOCKER] Add DGL to {ci_gpu, demo_cpu, demo_gpu} docker images (#3692)

* add dgl to docker file

* add dgl to docker file

commit | commitdiff | tree

Lianmin Zheng [Fri, 2 Aug 2019 15:50:33 +0000 (23:50 +0800)]

[TOPI] Memoize winograd matrix (#3687)

* [TOPI] Memoize winograd matrix

* lint

* Fix name

commit | commitdiff | tree

Wuwei Lin [Fri, 2 Aug 2019 03:55:27 +0000 (20:55 -0700)]

[Relay][Quantization] KL-divergence-based per-layer calibration (#3538)

* [Relay][Quantization] Support floating-point scale

* [Relay][Quantization] KL-divergence calibration on dataset

* Fix unhandled LeftShift case in QuantizeRealize

* Fix lint

* drop QBias

* fix lint

* address comments

* address comments

* Update comments

* address comments

* lint

* kQIdentity = 0

commit | commitdiff | tree

Wei Chen [Thu, 1 Aug 2019 21:47:11 +0000 (14:47 -0700)]

[Relay][VM] Support execution on devices (#3678)

* [Relay][VM] Support execution on devices

* Reduce Copy calls

* Cleanup

* Lint

* CR comments

* Merge test into test_vm.py

commit | commitdiff | tree

Jian Weng [Thu, 1 Aug 2019 19:52:33 +0000 (12:52 -0700)]

Add shuffle support to TVM (#3633)

commit | commitdiff | tree

sf-wind [Thu, 1 Aug 2019 19:49:40 +0000 (12:49 -0700)]

Enable the sparse schedule (#3651)

commit | commitdiff | tree

alexgl-github [Thu, 1 Aug 2019 19:46:39 +0000 (12:46 -0700)]

Add support for Tensorflow operators log1p, cos, sin (#3614)

The patch adds support for Tensorflow operators log1p and cos
Tensorflow log1p is described at https://www.tensorflow.org/api_docs/python/tf/math/log1p
Tensorflow cos is described at https://www.tensorflow.org/api_docs/python/tf/math/cos
Tensorflow sin is described at https://www.tensorflow.org/api_docs/python/tf/math/sin

commit | commitdiff | tree

雾雨魔理沙 [Thu, 1 Aug 2019 18:52:13 +0000 (11:52 -0700)]

[Relay] Strict mode in pattern matching (#3620)

* add fatal

lint

lint

lint

do

make completeness check an error

lint

remove fatal

* fix test

* reset parser file

* remove unneeded import

* Update python/tvm/relay/adt.py

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Update include/tvm/relay/adt.h

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Eliminate trailing whitespace (my fault)

commit | commitdiff | tree

Yifan Xiong [Thu, 1 Aug 2019 16:46:23 +0000 (00:46 +0800)]

[Relay][Frontend] Fix typo names in frontend (#3685)

Fix typo names in caffe2 and onnx frontend:
* sotrage_order -> storage_order
* OpNotInplemented -> OpNotImplemented

commit | commitdiff | tree

Tim Hatch [Thu, 1 Aug 2019 16:27:58 +0000 (09:27 -0700)]

Make tests multi-process friendly. (#3683)

This side effect at module import time has a race condition between the "exists" check and the "mkdir" call. The safer thing is to just call mkdir and catch the "already exists" error which is what makedirs does.

commit | commitdiff | tree

Alexander Pivovarov [Thu, 1 Aug 2019 15:31:09 +0000 (08:31 -0700)]

Replace learnt with learned (#3684)

commit | commitdiff | tree

Leyuan Wang [Wed, 31 Jul 2019 20:45:58 +0000 (13:45 -0700)]

[DOC] Update ssd doc to avoid confusion. (#3677)

* intel graphics conv2d bugs fixed for inception_v3

* intel conv2d api updated, nn input size 4 condition added

* review addressed

* move conv_tags to attributes

* ssd doc updated

* address comment

commit | commitdiff | tree

Zhi [Wed, 31 Jul 2019 16:02:15 +0000 (09:02 -0700)]

[Relay][VM] Relay VM serialization (#3647)

* relay vm serialization

* fix lint

* load params, fix stream

* lint

* fix typo

commit | commitdiff | tree

lixiaoquan [Wed, 31 Jul 2019 15:37:54 +0000 (23:37 +0800)]

[TEST] Comptiable with python3.5 (#3675)

commit | commitdiff | tree

Wuwei Lin [Wed, 31 Jul 2019 08:26:05 +0000 (16:26 +0800)]

[TOPI][CUDA] schedule for group_conv2d (#3663)

* [TOPI][CUDA] schedule for group_conv2d

* Fix #flops

commit | commitdiff | tree

Liangfu Chen [Wed, 31 Jul 2019 07:19:54 +0000 (15:19 +0800)]

[VTA] VTA Compilation Script for Intel FPGA (#3494)

* initial compilation script for chisel-vta;

* replace tabs with spaces;

* compile script for de10-nano;

* remove generated verilog source code;

* remove `altsource_probe`, `debounce`, `edge_detect` ip;

* replace quartus project files with a single tcl script;

* Update install.md

* improved makefile-based compilation script;

* complete makefile-based compilation of chisel-vta for de10-nano;

* install quartus;

* conversion to .rbf file;

* document chisel-vta compilation process for de10-nano;

* rename generated bitstream file;

* download and extract custom ip for de10-nano;

* minor change

* minor change

* fix indentation;

* bug fix;

* improved robustness in makefile;

* clean up;

* add `.sdc .ipx .qsys` allowance in jenkins;

* add ASF header;

* add ASF header;

* remove IntelShell.scala, update vta_hw.tcl, clean up Makefile & soc_system.qsys;

* add ASF header;

* keep sources compact;

* keep sources compact;

* it's not necessary now

* AXI4LiteClient -> AXI3Client for IntelShell

* remove connection to fpga_only_master;

* a few important bug fix: wire reset pin, and set host_r_last to high

* remove intel specific interface definition;

* add NO_DSP option in Makefile;

* AXI4Lite is not used in IntelShell;

* minor fix: disable dsp and use logic instead;

* quartus version change: 18.0 -> 18.1

* remove altera related statement;

* compose compile_design.tcl

* initial tcl script for soc_system generation;

* remove .qsys file;

* remove unused;

* .qsys can be generated by tcl script;

* remove hps_io and shrink size of soc_system;

* integrate into makefile;

* version change: 18.0 -> 18.1

* add sample config file for de10-nano;

* parameterize DEVICE and PROJECT_NAME

* remove extra lines;

* brief description on flashing sd card image for de10-nano

* docs on building additional components

* parameterize DEVICE and DEVICE_FAMILY

* parameterize DEVICE and DEVICE_FAMILY

* parameterize DEVICE and DEVICE_FAMILY

* de10-nano -> de10nano

* minor change

* add comment in code and document in order to address review comments;

commit | commitdiff | tree

Balint Cristian [Wed, 31 Jul 2019 07:10:16 +0000 (10:10 +0300)]

Add yolov3-tiny to the tutorial. (#3674)

commit | commitdiff | tree

Haichen Shen [Wed, 31 Jul 2019 01:22:51 +0000 (18:22 -0700)]

add reviewer - slyubomirsky (#3673)

commit | commitdiff | tree

Balint Cristian [Tue, 30 Jul 2019 22:06:50 +0000 (01:06 +0300)]

[RPC] Terminate worker's childs first. (#3669)

commit | commitdiff | tree

Thierry Moreau [Tue, 30 Jul 2019 21:01:31 +0000 (14:01 -0700)]

[VTA] Support for batched inference (#3661)

* fix in IR pass to support padding on 6-d tensors

* support for both N>1 and N==1 for padding

* batch size > 1 tuning and base config

* output formatting

* batch conv2d

* print all category results

* revert to single-batch config

* pick record best

* fix conv test

* improving reporting

* address batching bug in fast simulator

* fix

commit | commitdiff | tree

Thierry Moreau [Tue, 30 Jul 2019 21:00:38 +0000 (14:00 -0700)]

removing deprecated script (#3667)

commit | commitdiff | tree

Josh Fromm [Tue, 30 Jul 2019 16:29:56 +0000 (09:29 -0700)]

[TOPI] Enable standalone wheel build (#3657)

* Fixed topi bdist_wheel build to include libraries.

* Removed unneeded imports

commit | commitdiff | tree

Wuwei Lin [Tue, 30 Jul 2019 15:25:15 +0000 (23:25 +0800)]

[TOPI] Fix traverse function not inline zero-input op (#3623)

* Fix traverse_inline not inline zero input op properly

* Add where to python and set tag to broadcast

* Fix inline

* test

* fix test target

* fix

commit | commitdiff | tree

Thomas Viehmann [Tue, 30 Jul 2019 14:54:16 +0000 (16:54 +0200)]

ROCm: Add SaveToFile and LoadFile (#3665)

...and add rocm module_save to the tests.

commit | commitdiff | tree

Thomas Viehmann [Tue, 30 Jul 2019 10:40:50 +0000 (12:40 +0200)]

tvm/contrib/rocm: improve finding of ld.lld (#3664)

This refines the detection of ld.lld matching the neighbouring clang
file. This is particularly helpful on Ubuntu/Debian when either the
default ld.lld is not installed or the versioned one is preferable for
consistency.

@tqchen I think you last touched the clang equivalent in #3590 .

commit | commitdiff | tree

Thomas Viehmann [Tue, 30 Jul 2019 09:30:46 +0000 (11:30 +0200)]

Print llvm source by default in ROCMModuleNode::GetSource (#3662)

commit | commitdiff | tree

雾雨魔理沙 [Tue, 30 Jul 2019 04:58:08 +0000 (21:58 -0700)]

[Relay] Fix typo in ChangeBatch (#3660)

commit | commitdiff | tree

雾雨魔理沙 [Tue, 30 Jul 2019 03:18:55 +0000 (20:18 -0700)]

[Relay][VTA] Add ChangeBatch pass (#3656)

* init

* lint

* lint

commit | commitdiff | tree

Luis Vega [Mon, 29 Jul 2019 18:11:53 +0000 (11:11 -0700)]

[VTA] [Chisel] make dram offset configurable for uops different than 4-bytes (#3654)

commit | commitdiff | tree

Luis Vega [Mon, 29 Jul 2019 07:22:06 +0000 (00:22 -0700)]

[VTA] [CMake] hotfix tsim rules (#3650)

commit | commitdiff | tree

Thierry Moreau [Mon, 29 Jul 2019 01:41:10 +0000 (18:41 -0700)]

[VTA] Refactor to increase platform coverage (Ultra96 etc.) (#3496)

* hardware refactor for increased FPGA coverage, small optimizations

* fix header

* cleaning up parameters that won't be needed for now

* streamlining makefile, and simplifying tcl scripts

* moving parameter derivation into pkg_config.py, keeping tcl scripts lightweight

* refactoring tcl script to avoid global variables

* deriving AXI signals in pkg_config.py

* unifying address map definition for hardware and software drivers

* single channel design for ultra96 to simplify build

* enable alu by default, no mul opcode for now

* hardware fix

* new bitstream; vta version

* avoid error when env variable is not set

* ultra96 cleanup

* further cleaning up tcl script for bitstream generation

* preliminary rpc server support on ultra96

* rpc server tracker scripts

* ultra96 ldflag

* ultra96 support

* ultra96 support

* cleanup line

* cmake support for ultra96

* simplify memory instantiation

* cleaning up IP parameter initialization

* fix queue instantiation

* 2019.1 transition

* fix macro def

* removing bus width from config

* cleanup

* fix

* turning off testing for now

* cleanup ultra96 ps insantiation

* minor refactor

* adding comments

* upgrading to tophub v0.6

* model used in TVM target now refers to a specific version of VTA for better autoTVM scheduling

* revert change due to bug

* rename driver files to be for zynq-type devices

* streamlining address mapping

* unifying register map offset values between driver and hardware generator

* rely on cma library for cache flush/invalidation

* coherence management

* not make buffer packing depend on data types that can be wider than 64bits

* refactor config derivation to minimize free parameters

* fix environment/pkg config interaction

* adding cfg dump property to pkgconfig:

* fix rpc reconfig

* fix spacing

* cleanup

* fix spacing

* long line fix

* fix spacing and lint

* fix line length

* cmake fix

* environment fix

* renaming after pynq since the driver stack relies on the pynq library - see pynq.io

* update doc

* adding parameterization to name

* space

* removing reg width

* vta RPC

* update doc on how to edit vta_config.json

* fix path

* fix path

commit | commitdiff | tree

Luis Vega [Sun, 28 Jul 2019 23:18:34 +0000 (16:18 -0700)]

fix comment/doc in TensorLoad (#3646)

commit | commitdiff | tree

Balint Cristian [Sun, 28 Jul 2019 08:05:37 +0000 (11:05 +0300)]

Hotfix for issue #3641. (#3644)

commit | commitdiff | tree

Luis Vega [Sun, 28 Jul 2019 07:20:53 +0000 (00:20 -0700)]

fix case when offset is odd and size is even (#3643)

commit | commitdiff | tree

Luis Vega [Sat, 27 Jul 2019 20:39:37 +0000 (13:39 -0700)]

[VTA] [Chisel] fix tensor issue/commit in gemm (#3637)

* fix tensor issue/commit in gemm

* remove trailing spaces

commit | commitdiff | tree

Yong Wu [Sat, 27 Jul 2019 16:44:22 +0000 (09:44 -0700)]

[Relay][TF] add BatchMatMul (#3634)

commit | commitdiff | tree

peterjc123 [Sat, 27 Jul 2019 16:43:34 +0000 (00:43 +0800)]

Improve the x86 auto-tune tutorial (#3609)

commit | commitdiff | tree

YPBlib [Fri, 26 Jul 2019 22:14:39 +0000 (06:14 +0800)]

Update tensorflow.py (#3632)

commit | commitdiff | tree

Logan Weber [Fri, 26 Jul 2019 22:14:18 +0000 (15:14 -0700)]

Make Google Test usage configurable in CMake files (#3628)

* Add USE_GTEST as a CMake variable

* Add GTest section in installation docs

* Incorporate feedback

commit | commitdiff | tree

lixiaoquan [Fri, 26 Jul 2019 18:05:14 +0000 (02:05 +0800)]

[TensorFlow] Fix a bug output index is ignored (#3631)

Enhance test to cover this case

commit | commitdiff | tree

Wuwei Lin [Fri, 26 Jul 2019 06:49:28 +0000 (14:49 +0800)]

[TOPI][CUDA] Schedule for pool_grad (#3622)

* [TOPI][CUDA] Schedule for pool_grad

* Relay test

* Fix fused op

* doc

* Remove set scope local

commit | commitdiff | tree

雾雨魔理沙 [Fri, 26 Jul 2019 05:17:47 +0000 (22:17 -0700)]

[Relay] [Training] Add numerical gradient check. (#3630)

* add check_grad

* finish

* what does the fox say?

* lint lint lint lint lint lint lint lint lint

commit | commitdiff | tree

Benjamin Tu [Fri, 26 Jul 2019 01:47:04 +0000 (18:47 -0700)]

[VTA] [Chisel] support for different inp/wgt bits, rewrote DotProduct for clarity (#3605)

* support for different inp/wgt bits, rewrote dot for clarity

* [VTA] [Chisel] support for different inp/wgt bits, rewrote DotProduct for clarity

* [VTA] [Chisel] support for different inp/wgt bits, rewrote DotProduct for clarity

* change back to sim

* fix index

* fix index

* fix indent

* fix indent

* fix indent

* fix trailing spaces

* fix trailing spaces

* change to more descriptive name

* matric->matrix

* fix spacing

* fix spacing & added generic name for dot

* better parameter flow

* spacing

* spacing

* spacing

* update requirement (tested) for dot, spacing

* function call convention

* small edit

commit | commitdiff | tree

Lianmin Zheng [Thu, 25 Jul 2019 22:28:59 +0000 (06:28 +0800)]

[IR] Make iterators compatible with constructors of STL containers (#3624)

commit | commitdiff | tree

Balint Cristian [Thu, 25 Jul 2019 21:56:22 +0000 (00:56 +0300)]

Add Winograd matrices computation. (#3553)

commit | commitdiff | tree

Logan Weber [Thu, 25 Jul 2019 17:12:57 +0000 (10:12 -0700)]

Implementation of uTVM (#3227)

* uTVM interfaces (#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (#18)

* added micro_common implementation and python interfaces (#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace. Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug. Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint

commit | commitdiff | tree

Philip Hyunsu Cho [Thu, 25 Jul 2019 06:12:44 +0000 (23:12 -0700)]

Add a missing header in cuda_device_api.cc (#3621)

commit | commitdiff | tree

Yong Wu [Thu, 25 Jul 2019 06:12:11 +0000 (23:12 -0700)]

[Relay][Keras] Permute, Softmax support (#3618)

commit | commitdiff | tree

Jian Weng [Thu, 25 Jul 2019 01:07:51 +0000 (18:07 -0700)]

fix typo (#3611)

commit | commitdiff | tree

Animesh Jain [Thu, 25 Jul 2019 01:06:36 +0000 (18:06 -0700)]

[TOPI] Average Pool2D Bug. (#3607)

* [TOPI] Average Pool2D Bug.

Issue - https://github.com/dmlc/tvm/issues/3581

* Add uint16 test.

commit | commitdiff | tree

Logan Weber [Wed, 24 Jul 2019 22:39:10 +0000 (15:39 -0700)]

Remove prints in `generic_op_impl.py` (#3616)

commit | commitdiff | tree

Tianqi Chen [Wed, 24 Jul 2019 20:53:22 +0000 (13:53 -0700)]

Hotfix pylint (#3615)

commit | commitdiff | tree

Tianqi Chen [Wed, 24 Jul 2019 18:48:39 +0000 (11:48 -0700)]

[TEST] Fix testcase to make them more compatible to zero-rank (#3612)

commit | commitdiff | tree

雾雨魔理沙 [Wed, 24 Jul 2019 18:31:19 +0000 (11:31 -0700)]

init (#3571)

quickfix

commit | commitdiff | tree

Wuwei Lin [Wed, 24 Jul 2019 18:30:46 +0000 (02:30 +0800)]

[TOPI][Relay] max_pool2d & avg_pool2d gradient (#3601)

commit | commitdiff | tree

Zhi [Wed, 24 Jul 2019 17:13:16 +0000 (10:13 -0700)]

[Relay][vm] Small bug fix for DataTypeObject (#3604)

* small bug fix for DataTypeObject

* retrigger ci

commit | commitdiff | tree

Andrew Tulloch [Tue, 23 Jul 2019 21:44:27 +0000 (14:44 -0700)]

We observe multiple groups across a range of domains (ASR, NMT, LM, etc), (#3566)

internally and externally, interested in replacing standard dense layers with
block-sparse matrix multiplication layers. The motivations are generally: higher
performance (due to reduction in FLOPs, memory bandwidth/cache footprint),
enabling larger models (e.g. fitting more layers in a given memory budget).

Some public work along these lines:

* https://openai.com/blog/block-sparse-gpu-kernels/
* https://openai.com/blog/sparse-transformer/
* https://arxiv.org/abs/1802.08435
* https://arxiv.org/abs/1711.02782

Various groups have been able to successfully train models with reasonable
levels of sparsity (90%+) with marginal accuracy changes, which suggests
substantial speedups are possible (as this implies a >10x reduction in FLOPs).

It is fairly straightforward to realize these theoretical speedups, see e.g. TVM
benchmarks for Intel CPUs in
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA
results in https://github.com/openai/blocksparse, etc.

* https://github.com/openai/blocksparse (CUDA)
* https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM)
* https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation)

This is extracted from an internal patch we've been using internally. There are
various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but
this is a reasonable starting point. This needs more thorough unit test coverage
however.

We follow the conventions established by scipy.sparse.bsr_matrix and other
libraries, see the unit tests for details.

For folks interested in experimenting with scheduling/AutoTVM etc,
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful
starting point.

commit | commitdiff | tree

Andrew Tulloch [Tue, 23 Jul 2019 21:43:27 +0000 (14:43 -0700)]

{relay,topi}.reinterpret support (#3599)

= Motivation

It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as
this allows them to build (fused) operators leveraging the bitwise
reinterpretation of an operator. An example is approximate transcendental
functions, which can be implemented similar to:

```.py
    def C(x):
        return relay.expr.const(x, "float32")

    def approx_exp(x):
        x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0))
        x = C(127.0) + x * C(1.44269504)
        xf = relay.floor(x)
        i = relay.cast(xf, "int32")
        x = x - xf
        Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523)))
        exponent = relay.left_shift(i, relay.expr.const(23, "int32"))
        exponent = relay.reinterpret(exponent, "float32")
        return exponent * Y

    def approx_sigmoid(x):
        # <2.0e-5 absolute error over [-5, 5]
        y = approx_exp(x)
        return y / (y + C(1.0))

    def approx_tanh(x):
        # <4.0e-5 absolute error over [-5, 5]
        x = x * C(2.0)
        y = approx_exp(x)
        return (y - C(1.0)) / (y + C(1.0))
```

See unit tests for implementations of these approximate transendentals.

commit | commitdiff | tree

Luis Vega [Tue, 23 Jul 2019 17:23:10 +0000 (10:23 -0700)]

remove tabs (#3603)

commit | commitdiff | tree

Steven S. Lyubomirsky [Tue, 23 Jul 2019 17:14:53 +0000 (10:14 -0700)]

[Relay][Pass][Docs] Update the doc for adding a Relay pass to mention the pass infra (#3583)

* Update the Relay adding pass doc to reference the new pass infrastructure

* Correct pass name

Co-Authored-By: Zhi <5145158+zhiics@users.noreply.github.com>
* Align header equals signs

commit | commitdiff | tree

Animesh Jain [Tue, 23 Jul 2019 16:59:40 +0000 (09:59 -0700)]

Checking the correct dtypes for choosing the Intel int8 instructions. (#3516)

commit | commitdiff | tree

雾雨魔理沙 [Tue, 23 Jul 2019 09:52:44 +0000 (02:52 -0700)]

[Relay] [Training] Allow gradient to return a tuple (#3600)

commit | commitdiff | tree

Andrew Tulloch [Tue, 23 Jul 2019 02:48:55 +0000 (19:48 -0700)]

[Runtime] [ThreadPool] Make SpscTaskQueue::Pop(..) spin_count configurable (#3577)

In cases where we have multiple models or threadpools active, spinning around
`sched_yield()` may not be desirable, as it prevents the OS from effectively
scheduling other threads.

Thus, allow users to conditionally disable this behaviour (via an environment
variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for
the thread pool such as `TVM_BIND_THREADS`, etc).

This substantially improves tail latencies in some of our multi-tenant
workloads in practice.

Unit tests have been added - on my laptop, running:

```
TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test;
TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test;
./build/threading_backend_test;
```

gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e.
97ms -> <1ms after this change)

commit | commitdiff | tree

Ramana Radhakrishnan [Mon, 22 Jul 2019 22:12:09 +0000 (23:12 +0100)]

Add support for Tflite operator SPLIT (#3520)

* [RFC] Initial support for Tflite operator SPLIT

This patch adds initial support for the tflite operator split. However
I am not yet sure how to handle the axis parameter for the split
operator and support it in the test infrastructure. Putting this up for
an initial review and comment.

The split operator in tflite according to
https://www.tensorflow.org/lite/guide/ops_compatibility

appears to take num_or_size_split as a 0D tensor.

I also note that tflite.split is one of the few operators that returns
multiple outputs and thus the helper routines in the tests needed some
massaging to make this work.

@apivarov , could you please review this ?

Thanks,
Ramana

* Fix the axis parameter

Add more tests

* Address review comments

* Try out frozen_gene's suggestion

* Handle split of 1 element

* int32 is only supported in tflite 1.14, let's check that version here.

* Keep this at python3.5

* Add packaging as a python package to be installed

commit | commitdiff | tree

Tianqi Chen [Mon, 22 Jul 2019 18:37:43 +0000 (11:37 -0700)]

Update Jenkinsfile

commit | commitdiff | tree

Thierry Moreau [Mon, 22 Jul 2019 15:31:37 +0000 (08:31 -0700)]

[VTA] Runtime refactor to allow for non-shared memory FPGAs (e.g. F1) (#3554)

* updated runtime to support non-shared memory FPGAs for instruction and micro-op kernels

* adding driver-defined memcpy function to handle F1 cases

* refactor to include flush/invalidate in memcpy driver function

* update tsim driver

* bug fixes

* cleanup

* pre-allocate fpga readable buffers to improve perf

* fix

* remove instruction stream address rewrite pass for micro op kernels

* fix:

* white spaces

* fix lint

* avoid signed/unsigned compilation warning

* avoid signed/unsigned compilation warning

* fix

* fix

* addressing comments

* whitespace

* moving flush/invalidate out of memmove

* clearnup

* fix

* cosmetic

* rename API

* comment fix

commit | commitdiff | tree

Tianqi Chen [Sun, 21 Jul 2019 23:30:31 +0000 (16:30 -0700)]

[CI] Upgrade LLVM envs (#3590)

commit | commitdiff | tree

Luis Vega [Sun, 21 Jul 2019 22:45:48 +0000 (15:45 -0700)]

add coherent, length, and user bits option to Shell Config (#3593)

commit | commitdiff | tree

Luis Vega [Sat, 20 Jul 2019 03:33:18 +0000 (20:33 -0700)]

bugfix function args order in alu instruction generation (#3592)

commit | commitdiff | tree

雾雨魔理沙 [Fri, 19 Jul 2019 22:55:28 +0000 (15:55 -0700)]

[Relay] add some check for the ad algorithm (#3585)

* do

* fix test

commit | commitdiff | tree

Yong Wu [Fri, 19 Jul 2019 22:06:34 +0000 (15:06 -0700)]

[TOPI][RELAY] Add op Size (#3094)

commit | commitdiff | tree

Yao Wang [Fri, 19 Jul 2019 20:19:37 +0000 (13:19 -0700)]

[AutoTVM]Improve graph tuner for multiple subgraphs (#3490)

* Improve boundary nodes in graph tuner

* Limit output node number

* Fix test

* Improve warning.

* Fix test

commit | commitdiff | tree

Balint Cristian [Fri, 19 Jul 2019 16:22:46 +0000 (19:22 +0300)]

[RPC] Better handle tempdir if subprocess killed. (#3574)

commit | commitdiff | tree

Yizhi Liu [Fri, 19 Jul 2019 16:21:47 +0000 (09:21 -0700)]

Add printer for Layout/BijectiveLayout (#3582)

commit | commitdiff | tree

Ramana Radhakrishnan [Fri, 19 Jul 2019 16:21:25 +0000 (17:21 +0100)]

Mention minimum version of python features one should stick to (#3588)

commit | commitdiff | tree

Thierry Moreau [Fri, 19 Jul 2019 01:35:19 +0000 (18:35 -0700)]

avoiding cast None to int errors (#3578)

commit | commitdiff | tree

zacario-li [Fri, 19 Jul 2019 01:34:44 +0000 (09:34 +0800)]

fix topi c++ conv2d_nchw lambda expr issue (#3570)

commit | commitdiff | tree

雾雨魔理沙 [Thu, 18 Jul 2019 22:29:22 +0000 (15:29 -0700)]

[Relay] parser/pretty printer roundtripping (#3536)

commit | commitdiff | tree

Tianqi Chen [Thu, 18 Jul 2019 22:06:38 +0000 (15:06 -0700)]

[ARITH] Simplify let (#3568)

commit | commitdiff | tree

Andrew Tulloch [Thu, 18 Jul 2019 21:56:14 +0000 (14:56 -0700)]

Emit DWARF debug information (#3420)

commit | commitdiff | tree

Ramana Radhakrishnan [Thu, 18 Jul 2019 17:21:03 +0000 (18:21 +0100)]

Support additional architectures beyond x86_64 in ubuntu_install_java (#3546)

* Support additional architectures beyond x86_64 in ubuntu_install_java

While attempting to get a development environment going for TVM
on my AArch64 desktop I ran into some hardcoding of relevant architectures.

commit | commitdiff | tree

Logan Weber [Thu, 18 Jul 2019 17:20:16 +0000 (10:20 -0700)]

Disable MicroTVM on i386 CI (#3569)

commit | commitdiff | tree

Thierry Moreau [Thu, 18 Jul 2019 17:12:22 +0000 (10:12 -0700)]

[Community] Zhi Chen -> Committer (#3572)

Let's welcome Zhi as a new Apache TVM Committer!

commit | commitdiff | tree

bulanova-huawei [Thu, 18 Jul 2019 00:15:09 +0000 (20:15 -0400)]

tightening bounding box for IntSet fused in PassUpDomain (#3073)

Apply suggestions from code review

Co-Authored-By: Wei Chen <ipondering.weic@gmail.com>

commit | commitdiff | tree

Zhi [Wed, 17 Jul 2019 20:28:03 +0000 (13:28 -0700)]

[docs] Add a tutorial for the pass manager (#3515)

* [docs] Add a tutorial for the pass manager

* address comment

* address more comments

* retrigger ci

* address steven's comments

* address comments

* retrigger ci

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>
* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Logan Weber <36520469+weberlo@users.noreply.github.com>
* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Logan Weber <36520469+weberlo@users.noreply.github.com>

commit | commitdiff | tree

Wei Chen [Wed, 17 Jul 2019 20:17:18 +0000 (13:17 -0700)]

[Relay][VM]Fix debug statement (#3565)

* [Relay][VM]Fix debug statement

* Change debug statement

commit | commitdiff | tree

Luis Vega [Wed, 17 Jul 2019 06:49:40 +0000 (23:49 -0700)]

fix pynq 32-bit address pointers (#3558)

commit | commitdiff | tree

Yinghai Lu [Wed, 17 Jul 2019 05:01:41 +0000 (22:01 -0700)]

Fix build error (#3552)

* Fix build error

* comments

commit | commitdiff | tree

Joshua Z. Zhang [Wed, 17 Jul 2019 04:55:15 +0000 (21:55 -0700)]

fix js test load module example (#3556)

commit | commitdiff | tree

Haichen Shen [Wed, 17 Jul 2019 00:18:43 +0000 (17:18 -0700)]

fix (#3550)

commit | commitdiff | tree

zhengdi [Tue, 16 Jul 2019 17:41:10 +0000 (01:41 +0800)]

[FRONTEND][TENSORFLOW] Some bug fixes for tensorflow NCHW data_format (#3514)

commit | commitdiff | tree

Haichen Shen [Tue, 16 Jul 2019 04:02:12 +0000 (21:02 -0700)]

[Relay][VM] Port VM, VM compiler, and Object into python (#3391)

* tmp

* Port vm and object to python

* clean up

* update vm build module

* update

* x

* tweak

* cleanup

* update

* fix rebase

* Rename to VMCompiler

* fix

commit | commitdiff | tree

Yinghai Lu [Mon, 15 Jul 2019 20:21:04 +0000 (13:21 -0700)]

[Runtime] Enable set_input_zero_copy in GraphRuntime (#3416)

* Enable set_input_zero_copy in GraphRuntime

* Fix LoadParams

* Fix

* lint

* Fix remote context issue

* Fix

* Remove LOG

* Remove unused variables

* Add tests

* works

* More test scenarios

* make it simpler

* Remove unnecessary changes

* Address comments

* More comments

* Address comments

* Fix build

commit | commitdiff | tree

Sergei Grechanik [Sun, 14 Jul 2019 19:30:12 +0000 (22:30 +0300)]

[ARITH][BOUND] Fix bound inference to avoid allocating too much (#3526)

* [TVM] Fix bound inference to avoid allocating too much

* [ARITH][BOUND] Pass analyzer to PropBoundToInputs

commit | commitdiff | tree

Tianqi Chen [Sat, 13 Jul 2019 05:06:29 +0000 (22:06 -0700)]

[ARITH][IR] Introduce FloorDiv/Mod (#3479)

* [ARITH][IR] Introduce FloorDiv/Mod

* Address review comments

* address review comments, fix div sub rule

commit | commitdiff | tree

Wuwei Lin [Fri, 12 Jul 2019 04:26:39 +0000 (12:26 +0800)]

[Relay][Quantization] Fix add_rewrite and UnifyDTypeScale (#3534)

* [Relay][Quantization] Fix issue introduced in #3135

* Recover StopFusion

* Fix fmultiref

* Fix lint

commit | commitdiff | tree

Tianqi Chen [Fri, 12 Jul 2019 00:10:57 +0000 (17:10 -0700)]

[DEP] Remove HalideIR from submodule (#3535)

commit | commitdiff | tree

Tianqi Chen [Thu, 11 Jul 2019 21:26:43 +0000 (14:26 -0700)]

[INFA][IR] Build and Evolve Low-level IR. Remove HalideIR dep. (#3533)

* [INFA][IR] Build and Evolve Low-level IR. Remove dep from HalideIR.

* Update include/tvm/node/ir_functor.h

Co-Authored-By: Jared Roesch <roeschinc@gmail.com>
* Update include/tvm/node/ir_functor.h

Co-Authored-By: Jared Roesch <roeschinc@gmail.com>

commit | commitdiff | tree

hlu1 [Thu, 11 Jul 2019 16:52:36 +0000 (09:52 -0700)]

posix_memalign appears in API 17, not 16 (#3532)

commit | commitdiff | tree

tristan-arm [Wed, 10 Jul 2019 22:03:14 +0000 (23:03 +0100)]

Add Pack operator to TFLite (#3521)

Domain: Machine Learning / ML Framework;

RSS Atom