review.tizen.org Git - platform/upstream/llvm.git/commit

author	Tom Stellard <thomas.stellard@amd.com>
	Mon, 28 Mar 2016 16:10:13 +0000 (16:10 +0000)
committer	Tom Stellard <thomas.stellard@amd.com>
	Mon, 28 Mar 2016 16:10:13 +0000 (16:10 +0000)
commit	a76bcc2ea1474fe7df0c757d3a4ca0bfaeed8913
tree	286d0524226e8af2d487bdad1958e2600d55f567	tree \| snapshot
parent	6db1dcbf6b42d1eedc070585c65e6fe7dab25e54	commit \| diff

AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions

Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/ctpop.ll		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/madak.ll		diff \| blob \| history
llvm/test/CodeGen/AMDGPU/schedule-kernel-arg-loads.ll		diff \| blob \| history