[PATCH] D28782: [AMDGPU] Do not allow register coalescer to create big superregs

Mon Jan 16 19:28:51 PST 2017

rampitec added a comment.

This is a very conservative limitation to fix bloat in clFFT, where it saves ~600 bytes of scratch per kernel by creating vreg_96 from vreg_64. I have no doubt this place will be revisited much more times to improve heuristics as more codes are analyzed. This is really just a start of it.

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:1484-1486
+  unsigned SrcSize = SrcRC->getSize();
+  unsigned DstSize = DstRC->getSize();
+  unsigned NewSize = NewRC->getSize();
----------------
arsenm wrote:
> This isn't being used for the spill size, so this is supposed to use getRegBitWidth
What do you mean?

```
/// getSize - Return the size of the register in bytes, which is also the size
/// of a stack slot allocated to hold a spilled copy of this register.
```

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:1491-1493
+  // Always allow dword and sub-dword coalescing.
+  if (SrcSize <= 4 || DstSize <= 4)
+    return true;
----------------
arsenm wrote:
> We don't have sub-dword registers, so the < and comment are misleading
This is for packed f16, we do not want to revisit this.

Repository:
  rL LLVM

https://reviews.llvm.org/D28782