[PATCH] D31993: [AMDGPU] Combine DS operations with offsets bigger than byte

Thu Apr 13 15:54:57 PDT 2017

rampitec added inline comments.

================
Comment at: llvm/trunk/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:387-389
+    *BuildMI(*MBB, CI.Paired, DL, TII->get(AMDGPU::V_ADD_I32_e32), BaseReg)
+                        .addImm(CI.BaseOff)
+                        .addReg(AddrReg->getReg());
----------------
arsenm wrote:
> rampitec wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > This should use the e64 version with an unused carry. We should add a helper to TII to emit this since it will change with GFX9
> > > Matt, I doubt we should use e64 version here. It does not accept immediate, which effectively would require one more SGPR and one more mov. A vcc thrashing seems to be less issue.
> > Even new no-carry variant is VOP3, so same issue.
> You can materialize the constant in a register. It will be folded and shrunk later
It is not folded though:

```
        s_movk_i32 s0, 0x960
        v_add_i32_e32 v0, vcc, s0, v4
        ds_read2_b32 v[0:1], v0 offset1:100

```
Also note, that values in this case will be mostly unique, because they are added to the base pointer, not to a previously incremented pointer. In latter case the would be high probability that many adds would have the same literal.

I would lean towards using e64 version and an SGPR when the pass is redesigned to first collect all pairs, build a chain of adds, and only then combine. In this case we will use single SGPR.

Repository:
  rL LLVM

https://reviews.llvm.org/D31993