[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
authorAlexander Timofeev <Alexander.Timofeev@amd.com>
Thu, 3 Nov 2016 14:37:13 +0000 (14:37 +0000)
committerAlexander Timofeev <Alexander.Timofeev@amd.com>
Thu, 3 Nov 2016 14:37:13 +0000 (14:37 +0000)
commitf867a40bf60ad813560fe4cc3d2cc100472ffef4
treee888ef6d503dc980fc536452f72a71ab5182b7af
parent73aba6229f7f6cdc1aa5b107518684a95da4851e
[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.

hange explores the fact that LDS reads may be reordered even if access
the same location.

Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.

Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.

Differential Revision: https://reviews.llvm.org/D25944

llvm-svn: 285919
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
llvm/test/CodeGen/AMDGPU/ds_read2.ll