Add acc_gpu_kernel_with_scalars and port add to use it (#63884)
authorEdward Yang <ezyang@fb.com>
Tue, 31 Aug 2021 02:08:45 +0000 (19:08 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Tue, 31 Aug 2021 02:10:16 +0000 (19:10 -0700)
commitffc2612087be1ab469e5a2cd5a1106bf8ec9e753
tree8257b85801a51a2cf50146cdda3bbdb338c68942
parenta49907f984670781a718ef6aa0046709886eae5a
Add acc_gpu_kernel_with_scalars and port add to use it (#63884)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63884

See https://dev-discuss.pytorch.org/t/cuda-loops-case-study-code-generation-vs-templates/302
for explanation of what's going on here.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30545296

Pulled By: ezyang

fbshipit-source-id: f0da52153ae63599fe1d57e90e73f50ca2116939
aten/src/ATen/native/cuda/BinaryAddSubKernel.cu
aten/src/ATen/native/cuda/Loops.cuh