Fix sparse mm for ROCm (#18985)
authorJohannes M Dieterich <johannes.dieterich@amd.com>
Mon, 8 Apr 2019 01:13:33 +0000 (18:13 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 8 Apr 2019 01:16:16 +0000 (18:16 -0700)
commit5241e6ec5c538ed0475a880e5cbaaa8211445478
tree6885b78e7aa65b0941fea5c49742227be86d86ff
parent6c91610f0c86153232bf3a66f3a23e42b96e79b6
Fix sparse mm for ROCm (#18985)

Summary:
* Annotate also two pass reduction with launch bounds
* ifdef some shortcomings of ROCm w.r.t. short-circuit returns - internal tickets filed
* while there, plug memory leak by destroying matrix descriptor after the sparse call (applicable to cuSPARSE)
* while there, fix types for cusparseXcoo2csr as per cuSPARSE documentation
* enable test_dsmm in test_sparse which now passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18985

Differential Revision: D14822009

Pulled By: bddppq

fbshipit-source-id: 757267a47a63ee56ef396c33059f7eca099f4833
aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu
aten/src/THC/THCReduceAll.cuh
test/test_sparse.py