[PATCH] D18451: AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions

Thu Mar 24 09:34:05 PDT 2016

tstellarAMD created this revision.
tstellarAMD added reviewers: nhaehnle, arsenm.
tstellarAMD added a subscriber: llvm-commits.
Herald added a subscriber: arsenm.

This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

http://reviews.llvm.org/D18451

Files:
  lib/Target/AMDGPU/SIInstrInfo.cpp
  test/CodeGen/AMDGPU/ctpop.ll
  test/CodeGen/AMDGPU/madak.ll
  test/CodeGen/AMDGPU/schedule-kernel-arg-loads.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D18451.51565.patch
Type: text/x-patch
Size: 4899 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160324/99ff6102/attachment.bin>