Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
authorBert Maher <bertrand@fb.com>
Wed, 25 Aug 2021 01:52:29 +0000 (18:52 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Wed, 25 Aug 2021 01:56:55 +0000 (18:56 -0700)
commit8dda299d9631e0f6e121dcb9f8f94bbdd8435515
tree32554311aea31de55cf310a5c024d805f1fc7908
parent1787b905c4a571ff1ae09ddc56ce56cb04e52136
Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
12 files changed:
test/cpp/tensorexpr/test_kernel.cpp
test/cpp/tensorexpr/test_te_fuser_pass.cpp
test/jit/test_profiler.py
test/test_jit_fuser_te.py
test/test_tensorexpr.py
torch/csrc/jit/passes/tensorexpr_fuser.cpp
torch/csrc/jit/passes/tensorexpr_fuser.h
torch/csrc/jit/python/init.cpp
torch/csrc/jit/tensorexpr/kernel.cpp
torch/csrc/jit/tensorexpr/llvm_codegen.cpp
torch/csrc/jit/tensorexpr/llvm_jit.h
torch/csrc/jit/tensorexpr/loopnest.cpp