[Static Runtime] Optimize memory planner initialization (#64101)
authorMike Iovine <mikeiovine@fb.com>
Sat, 28 Aug 2021 00:37:05 +0000 (17:37 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Sat, 28 Aug 2021 00:40:43 +0000 (17:40 -0700)
commit07c5cb8c48d655ba73adc2da2b88399f3ab48638
tree035d6272bfcac5471b770485ccf86a6bc851c4d7
parent2d75ab0c8fe793ceddd3aee74f25c956d5d8d2ec
[Static Runtime] Optimize memory planner initialization (#64101)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101

Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference.

There are two optimizations in this diff:
* Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType`
* Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs.

Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: movefast1990

Differential Revision: D30595579

fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2
torch/csrc/jit/runtime/static/impl.cpp
torch/csrc/jit/runtime/static/impl.h
torch/csrc/jit/runtime/static/ops.cpp
torch/csrc/jit/runtime/static/ops.h