Batched upper triangular, lower triangular (#15257)
Summary:
Changelog:
- Implements `triu` and `tril` for batches of 2D tensors.
- Remove TH/THC binding for `tril`
- Fix CUDA implementation
- Update docstrings for tril and triu.
- Remove mask-based `triu` and `tril` in cholesky forward and backward.
- Remove batched tril in torch.distributions.utils
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15257
Differential Revision:
D13613888
Pulled By: mrshenli
fbshipit-source-id:
0949a05b9b8e974c1acfaf02a6284848ec5cc1c4