Reduce broadcasted inputs in derivative code (#14485)
authorAdam Paszke <adam.paszke@gmail.com>
Mon, 3 Dec 2018 21:41:05 +0000 (13:41 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 3 Dec 2018 21:44:18 +0000 (13:44 -0800)
commit68ffe469918a5eaa014e230efe81af9c298857aa
treeb90b39cca05236808a5007a1acd55a2527848d66
parentb768db081005b22b14e74907feb181360314d63f
Reduce broadcasted inputs in derivative code (#14485)

Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).

Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`

During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)

This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.

cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485

Differential Revision: D13280899

Pulled By: soumith

fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c
15 files changed:
aten/src/ATen/ExpandUtils.h
aten/src/ATen/core/interned_strings.h
test/cpp/jit/tests.h
test/expect/TestJit.test_cpp_cuda.expect
test/expect/TestScript.test_lstm_fusion_cuda-backward.expect
test/expect/TestScript.test_lstm_fusion_cuda-forward.expect
test/expect/TestScript.test_milstm_fusion_cuda-backward.expect
test/expect/TestScript.test_milstm_fusion_cuda-forward.expect
test/test_jit.py
torch/csrc/autograd/engine.cpp
torch/csrc/jit/argument_spec.h
torch/csrc/jit/autodiff.cpp
torch/csrc/jit/register_prim_ops.cpp
torch/csrc/jit/register_special_ops.cpp
torch/csrc/jit/symbolic_variable.h