Add support for batch_norm fusion to the JIT (#15146)
authorAdam Paszke <adam.paszke@gmail.com>
Tue, 8 Jan 2019 14:57:45 +0000 (06:57 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Tue, 8 Jan 2019 15:00:19 +0000 (07:00 -0800)
commit5e1b35bf2827d24d626739d14f462f4c87875892
tree8734738f44082f74c5b30a9f88724bccbda23087
parentc3a0000864e78513f3a7d9b3bbab0e216b783be0
Add support for batch_norm fusion to the JIT (#15146)

Summary:
We don't support reductions yet, but simply decomposing batch_norm
into a kernel that computes the stats, and the fusing everything else
with ReLU and following pointwise ops provides nice speedups.

Note that this is only limited to inference mode for now, because we
don't support convolutions and batch norm in AD, so the fuser isn't
applied to those parts.

This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard `torchvision` models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe).

cc zou3519 zdevito mruberry ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15146

Differential Revision: D13548303

Pulled By: zou3519

fbshipit-source-id: a2e2e5abc383f637fae19bd1b423f20c2cbc056a
aten/src/ATen/core/interned_strings.h
aten/src/ATen/native/Normalization.cpp
aten/src/ATen/native/cuda/Normalization.cu
aten/src/ATen/native/cuda/Normalization.cuh
aten/src/ATen/native/native_functions.yaml
test/test_jit.py
torch/csrc/jit/fuser/codegen.cpp
torch/csrc/jit/passes/graph_fuser.cpp
torch/csrc/jit/passes/utils/subgraph_utils.cpp
torch/csrc/jit/passes/utils/subgraph_utils.h