[AMDGPU] Vectorize misaligned global loads & stores
authorJeffrey Byrnes <Jeffrey.Byrnes@amd.com>
Thu, 2 Mar 2023 00:29:03 +0000 (16:29 -0800)
committerJeffrey Byrnes <Jeffrey.Byrnes@amd.com>
Fri, 3 Mar 2023 21:18:25 +0000 (13:18 -0800)
commitb89236a96f2f2f3e9b88d198585a8eda7fb2c443
treebca48672c2c1615c41079c389f6fbfbc8e5f1bed
parent7442f8635b4d7363b07152e5304c9a0c660eead4
[AMDGPU] Vectorize misaligned global loads & stores

Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Differential Revision: https://reviews.llvm.org/D145170

Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1
llvm/lib/Target/AMDGPU/AMDGPU.h
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
llvm/test/CodeGen/AMDGPU/global-i16-load-store.ll [new file with mode: 0644]
llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
llvm/test/CodeGen/AMDGPU/load-global-i16.ll
llvm/test/CodeGen/AMDGPU/udiv.ll
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll