review.tizen.org Git - platform/upstream/llvm.git/commit

projects / platform / upstream / llvm.git / commit

author	Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
	Thu, 2 Mar 2023 00:29:03 +0000 (16:29 -0800)
committer	Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
	Fri, 3 Mar 2023 21:18:25 +0000 (13:18 -0800)
commit	b89236a96f2f2f3e9b88d198585a8eda7fb2c443
tree	bca48672c2c1615c41079c389f6fbfbc8e5f1bed	tree \| snapshot
parent	7442f8635b4d7363b07152e5304c9a0c660eead4	commit \| diff

[AMDGPU] Vectorize misaligned global loads & stores

Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Differential Revision: https://reviews.llvm.org/D145170

Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1

Domain: System / Toolchain;

RSS Atom

llvm/lib/Target/AMDGPU/AMDGPU.h		diff \| blob \| history
llvm/lib/Target/AMDGPU/SIISelLowering.cpp		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/global-i16-load-store.ll	[new file with mode: 0644]	blob
llvm/test/CodeGen/AMDGPU/load-constant-i16.ll		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/load-global-i16.ll		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/udiv.ll		diff \| blob \| history
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll		diff \| blob \| history