Implementing cuda kernel for tril_indices and triu_indices (#15203)
authorShen Li <shenli@fb.com>
Thu, 20 Dec 2018 18:21:02 +0000 (10:21 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Thu, 20 Dec 2018 18:23:38 +0000 (10:23 -0800)
commit06a7cb59019ee57c679ba2cf7d51e36bd3710ad4
treef2626119c931d0665a731a1a84556793666f77a7
parent5c66662e58c5b87b3f39913bde056d5ecfd4e58e
Implementing cuda kernel for tril_indices and triu_indices (#15203)

Summary:
Followup PR of #14904, and the stretch goal of #12653.

Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor.

The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers.

Algorithm details are describe in [comments of TensorFactories.cu](https://github.com/pytorch/pytorch/blob/23ddb6f58a1c8a7a660a793f174cf014230176c6/aten/src/ATen/native/cuda/TensorFactories.cu#L109-L255).

zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203

Reviewed By: zou3519

Differential Revision: D13517695

Pulled By: mrshenli

fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea
aten/src/ATen/native/TensorFactories.cpp
aten/src/ATen/native/TensorFactories.h [new file with mode: 0644]
aten/src/ATen/native/cuda/TensorFactories.cu
aten/src/ATen/native/native_functions.yaml
test/common_methods_invocations.py
test/test_cuda.py
test/test_torch.py
torch/_torch_docs.py