Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915)
authoranjali411 <chourdiaanjali123@gmail.com>
Thu, 19 Aug 2021 15:41:08 +0000 (08:41 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Thu, 19 Aug 2021 15:42:24 +0000 (08:42 -0700)
commite1334512a3aa0f8f8a3a0a59cb868355a33b6233
tree867cf092c495e1bdbd26623845cd3f07a69d5f6d
parentf596aa8b77d6c57dd82f33a45926fad95ab2a21e
Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915

As much as 45% and 20% perf improvement on CUDA and CPU respectively.
consistent improvement in perf for all cases -- see perf numbers in comments below

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30404006

Pulled By: anjali411

fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d
aten/src/ATen/ConjugateFallback.cpp
aten/src/ATen/native/Blas.cpp
aten/src/ATen/native/cuda/Blas.cpp
torch/testing/_internal/common_methods_invocations.py