review.tizen.org Git - platform/upstream/pytorch.git/commit

Reverting launch bounds change in topK that induced a regression in perf (#63431)

Summary:
[topkwsyncs.zip](https://github.com/pytorch/pytorch/files/7003077/topkwsyncs.zip)

Running this script on nvidia containers 21.08 vs 21.07 we see the following perf drops:
topk(input=(dtype=torch.float16,shape=[60, 201600]), k=2000, dim=1, sorted=True) - 0.63

topk(input=(dtype=torch.float32,shape=[120000]), k=12000, dim=0, sorted=False) - 0.55

topk(input=(dtype=torch.float16,shape=[5, 201600]), k=2000, dim=1, sorted=True) - 0.55

topk(input=(dtype=torch.float32,shape=[1, 10000]), k=1000, dim=1, sorted=False) - 0.33

The relative perf drop is reported as (21.08_time - 21.07_time) / 21.07_time

I narrowed down the source of the regression to this commit: https://github.com/pytorch/pytorch/pull/60314
which reduced launch bounds from 1024 to 512.

The perf did not seem to regress in the original evidence provided to change 1024 to 512 due to the input shapes in the benchmark being a lot smaller than the input shapes of the tensors which I am witnessing perf regression in. I suggest reverting back to 1024 as with 512 there was no considerable improvement in perf for small inputs and a major regression in perf for large tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63431

Reviewed By: mruberry

Differential Revision: D30384087

Pulled By: ngimel

fbshipit-source-id: 11eecbba82a069b1d4579d674c3f644ab8060ad2

author	Rishi Puri <puririshi98@berkeley.edu>
	Wed, 18 Aug 2021 16:41:37 +0000 (09:41 -0700)
committer	Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
	Wed, 18 Aug 2021 16:44:07 +0000 (09:44 -0700)
commit	e2ddaec5cf6608b8e06667d4873505609ff1d674
tree	e286a27f837319398fc4a157921160fd8a8fd47a	tree \| snapshot
parent	383a33a0eb28ae454c0c8965650aea8ce1608943	commit \| diff