Fix race in AtomicFetchAdd. (#13479)
authorYavuz Yetim <yyetim@fb.com>
Mon, 19 Nov 2018 23:57:28 +0000 (15:57 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Tue, 20 Nov 2018 00:11:58 +0000 (16:11 -0800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13479

Increases the lock scope to above Output() calls.

These calls potentially allocate the underlying blob/tensor
objects and multiple invocations race each other over the
same output blobs/tensors.

Reviewed By: bwasti

Differential Revision: D12891629

fbshipit-source-id: a6015cfdb08e352521a1f062eb9d94a971cfbdb0

caffe2/operators/atomic_ops.cc

index 6c5f9e3..ebad7f1 100644 (file)
@@ -1,4 +1,5 @@
 #include <mutex>
+#include <thread>
 #include "caffe2/core/context.h"
 #include "caffe2/core/operator.h"
 
@@ -30,11 +31,11 @@ class AtomicFetchAddOp final : public Operator<CPUContext> {
 
   bool RunOnDevice() override {
     auto& mutex = OperatorBase::Input<std::unique_ptr<std::mutex>>(0);
+    std::lock_guard<std::mutex> lg(*mutex);
     auto& a = Input(1);
     auto& b = Input(2);
     auto* c = Output(0);
     auto* d = Output(1);
-    std::lock_guard<std::mutex> lg(*mutex);
     c->Resize();
     d->Resize();
     auto* aPtr = a.data<int32_t>();