From: Jan Schlüter <github@jan-schlueter.de>
Date: Tue, 2 Apr 2019 22:15:31 +0000 (-0700)
Subject: Add helpful information to the gradient/inplace operation exception (#18523)
X-Git-Tag: accepted/tizen/6.5/unified/20211028.231830~469
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=b77e3c2ca1790cfd4414d15f0a0952a630dc22c9;p=platform%2Fupstream%2Fpytorch.git

Add helpful information to the gradient/inplace operation exception (#18523)

Summary:
To debug a `one of the variables needed for gradient computation has been modified by an inplace operation` error, I wanted to know *which* variable has been modified, so I extended the error message with what information is easily available at this point.

Before:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
```

After:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [80, 1]], which is output 0 of UnsqueezeBackward0, is at version 1, not expected version 0. Hint: enable anomaly detection to find the forward pass operation which modified it.
```

The hint to enable anomaly detection is only shown when it is not enabled. It's meant to save people some googling. I'd even go further and reference `torch.autograd.set_detect_anomaly(True)`, but maybe we're not running Python?

Disclaimer: I haven't looked at other parts of the code to check if using `std::stringstream` is acceptable practice, let me know if it isn't. Similarly, I haven't checked about indentation practices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18523

Differential Revision: D14683249

Pulled By: soumith

fbshipit-source-id: f97a99d4aabea7461df766d66cd72300b48e2350
---

diff --git a/torch/csrc/autograd/saved_variable.cpp b/torch/csrc/autograd/saved_variable.cpp
index f9789e8..a457d87 100644
--- a/torch/csrc/autograd/saved_variable.cpp
+++ b/torch/csrc/autograd/saved_variable.cpp
@@ -3,12 +3,14 @@
 #include <torch/csrc/autograd/edge.h>
 #include <torch/csrc/autograd/function.h>
 #include <torch/csrc/autograd/variable.h>
+#include <torch/csrc/autograd/anomaly_mode.h>
 
 #include <ATen/Tensor.h>
 
 #include <cstdint>
 #include <list>
 #include <memory>
+#include <sstream>
 
 namespace torch { namespace autograd {
 
@@ -39,12 +41,6 @@ Variable SavedVariable::unpack(std::shared_ptr<Function> saved_for) const {
     return Variable();
   }
 
-  if (saved_version_ != version_counter_.current_version()) {
-    throw std::runtime_error(
-        "one of the variables needed for gradient computation has been "
-        "modified by an inplace operation");
-  }
-
   auto grad_fn = grad_fn_;
   if (has_grad_fn_ && !grad_fn) {
     if (!saved_for) {
@@ -55,6 +51,30 @@ Variable SavedVariable::unpack(std::shared_ptr<Function> saved_for) const {
     grad_fn = std::move(saved_for);
   }
 
+  if (saved_version_ != version_counter_.current_version()) {
+    std::stringstream message;
+    message << "one of the variables needed for gradient computation has been "
+        "modified by an inplace operation: [" << data_.type().toString() << " "
+        << data_.sizes() << "]";
+    if (grad_fn) {
+        message << ", which is output " << output_nr_
+            << " of " << grad_fn->name() << ",";
+    }
+    message << " is at version " << version_counter_.current_version()
+        << "; expected version " << saved_version_ << " instead.";
+    if (!AnomalyMode::is_enabled()) {
+        message << " Hint: enable anomaly detection to find the operation "
+            "that failed to compute its gradient, with torch.autograd."
+            "set_detect_anomaly(True).";
+    }
+    else {
+        message << " Hint: the backtrace further above shows the operation "
+            "that failed to compute its gradient. The variable in question "
+            "was changed in there or anywhere later. Good luck!";
+    }
+    throw std::runtime_error(message.str());
+  }
+
   // NB: saved views are unpacked as normal Variables (not views) even though
   // they still share the same storage. This works only because we never call
   // in-place functions on unpacked variables.