From: Natalia Gimelshein <ngimel@fb.com>
Date: Fri, 13 Aug 2021 16:49:15 +0000 (-0700)
Subject: [hackathon] fix benchmarking script in CONTRIBUTING (#63199)
X-Git-Tag: accepted/tizen/8.0/unified/20231005.095509~1037
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=720a7a0d81d4f2ddd3ad90cf3342aad0352ecb70;p=platform%2Fupstream%2Fpytorch.git

[hackathon] fix benchmarking script in CONTRIBUTING (#63199)

Summary:
[skip ci]
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63199

Reviewed By: mruberry

Differential Revision: D30305487

Pulled By: ngimel

fbshipit-source-id: 2704c4f08ab976a55c9f8c2fe54cd4f3f39412cf
---

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9519a07..d918e3f 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -934,7 +934,7 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:
    kernel.
    ```python
    import torch
-   import time
+   from torch.utils.benchmark import Timer
    size = 128*512
    nrep = 100
    nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.
@@ -942,20 +942,16 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:
    for i in range(10):
        a=torch.empty(size).cuda().uniform_()
        torch.cuda.synchronize()
-       start = time.time()
-       # dry run to alloc
        out = a.uniform_()
        torch.cuda.synchronize()
-       start = time.time()
-       for i in range(nrep):
-         out = a.uniform_()
-       torch.cuda.synchronize()
-       end = time.time()
-       timec = (end-start)/nrep
+       t = Timer(stmt="a.uniform_()", globals=globals())
+       res = t.blocked_autorange()
+       timec = res.median
        print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)
        size *=2
    ```
 
+  See more cuda development tips [here](https://github.com/pytorch/pytorch/wiki/CUDA-basics)
 
 ## Windows development tips