From: Natalia Gimelshein Date: Fri, 13 Aug 2021 16:49:15 +0000 (-0700) Subject: [hackathon] fix benchmarking script in CONTRIBUTING (#63199) X-Git-Tag: accepted/tizen/8.0/unified/20231005.095509~1037 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=720a7a0d81d4f2ddd3ad90cf3342aad0352ecb70;p=platform%2Fupstream%2Fpytorch.git [hackathon] fix benchmarking script in CONTRIBUTING (#63199) Summary: [skip ci] Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/63199 Reviewed By: mruberry Differential Revision: D30305487 Pulled By: ngimel fbshipit-source-id: 2704c4f08ab976a55c9f8c2fe54cd4f3f39412cf --- diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9519a07..d918e3f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -934,7 +934,7 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips: kernel. ```python import torch - import time + from torch.utils.benchmark import Timer size = 128*512 nrep = 100 nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel. @@ -942,20 +942,16 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips: for i in range(10): a=torch.empty(size).cuda().uniform_() torch.cuda.synchronize() - start = time.time() - # dry run to alloc out = a.uniform_() torch.cuda.synchronize() - start = time.time() - for i in range(nrep): - out = a.uniform_() - torch.cuda.synchronize() - end = time.time() - timec = (end-start)/nrep + t = Timer(stmt="a.uniform_()", globals=globals()) + res = t.blocked_autorange() + timec = res.median print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec) size *=2 ``` + See more cuda development tips [here](https://github.com/pytorch/pytorch/wiki/CUDA-basics) ## Windows development tips