review.tizen.org Git - platform/upstream/pytorch.git/commit

Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461)

Summary:
When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning.

This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461

Differential Revision: D13633952

Pulled By: soumith

fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff

author	Syed Tousif Ahmed <syed.ahmed.emails@gmail.com>
	Fri, 11 Jan 2019 05:41:48 +0000 (21:41 -0800)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Fri, 11 Jan 2019 05:44:39 +0000 (21:44 -0800)
commit	86af14b0c7a9f6fd96ad62a61ce2b6b105761fe1
tree	2ebcc3f76ba6fa7c6f52a5bb58b4be33b8849bde	tree \| snapshot
parent	07ea3e035e41fc0bf92828c6cf758ee76866cc1e	commit \| diff

aten/src/ATen/cuda/CUDAApplyUtils.cuh		diff \| blob \| history
aten/src/ATen/native/cuda/Dropout.cu		diff \| blob \| history
aten/src/ATen/native/cuda/GridSampler.cu		diff \| blob \| history
aten/src/ATen/native/cuda/Loops.cuh		diff \| blob \| history
aten/src/ATen/native/cuda/LossCTC.cu		diff \| blob \| history
aten/src/ATen/native/cuda/RNN.cu		diff \| blob \| history
aten/src/ATen/native/cuda/Reduce.cuh		diff \| blob \| history
aten/src/ATen/native/cuda/TensorTransformations.cu		diff \| blob \| history
aten/src/THC/THCReduce.cuh		diff \| blob \| history
aten/src/THC/THCReduceAll.cuh		diff \| blob \| history
aten/src/THC/THCSortUtils.cuh		diff \| blob \| history
aten/src/THCUNN/MultiLabelMarginCriterion.cu		diff \| blob \| history
aten/src/THCUNN/SpatialClassNLLCriterion.cu		diff \| blob \| history
aten/src/THCUNN/SpatialCrossMapLRN.cu		diff \| blob \| history
aten/src/THCUNN/SpatialDilatedMaxPooling.cu		diff \| blob \| history
aten/src/THCUNN/VolumetricConvolution.cu		diff \| blob \| history
aten/src/THCUNN/VolumetricUpSamplingTrilinear.cu		diff \| blob \| history
aten/src/THCUNN/im2col.h		diff \| blob \| history
c10/macros/Macros.h		diff \| blob \| history
torch/csrc/generic/StorageMethods.cpp		diff \| blob \| history