Record Caffe2's current stream ID in c10_cuda. (#15174)
authorEdward Yang <ezyang@fb.com>
Fri, 21 Dec 2018 05:51:25 +0000 (21:51 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Fri, 21 Dec 2018 05:54:05 +0000 (21:54 -0800)
commit26b04523b1402e2ae5e0b822b9d6668e806ab8a7
tree14ad0120ec44607b1c6cc3022536af488472b6e4
parent335306406032489bfdfa5bf06220da5429ca3e8f
Record Caffe2's current stream ID in c10_cuda. (#15174)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174

Previously, Caffe2 maintained a separate per-thread per-device
current logical CUDA stream ID.  In this PR, we switch Caffe2 over
to using c10::Stream to manage the current stream, and also
manage the allocation of cudaStream_t objects.

This results in a slight behavior change: previously, Caffe2
would have been willing to allocate an arbitrary number of
CUDA streams, depending on how high the logical stream IDs
went.  The c10::Stream pool has a fixed number of streams, once
you exceed it, it wraps around.

Reviewed By: dzhulgakov

Differential Revision: D13451550

fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc
caffe2/core/context_gpu.h
caffe2/core/context_gpu_test.cc
caffe2/python/operator_test/recurrent_net_executor_test.py