[PATCH] D31993: [AMDGPU] Combine DS operations with offsets bigger than byte
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 13 15:54:57 PDT 2017
rampitec added inline comments.
================
Comment at: llvm/trunk/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:387-389
+ *BuildMI(*MBB, CI.Paired, DL, TII->get(AMDGPU::V_ADD_I32_e32), BaseReg)
+ .addImm(CI.BaseOff)
+ .addReg(AddrReg->getReg());
----------------
arsenm wrote:
> rampitec wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > This should use the e64 version with an unused carry. We should add a helper to TII to emit this since it will change with GFX9
> > > Matt, I doubt we should use e64 version here. It does not accept immediate, which effectively would require one more SGPR and one more mov. A vcc thrashing seems to be less issue.
> > Even new no-carry variant is VOP3, so same issue.
> You can materialize the constant in a register. It will be folded and shrunk later
It is not folded though:
```
s_movk_i32 s0, 0x960
v_add_i32_e32 v0, vcc, s0, v4
ds_read2_b32 v[0:1], v0 offset1:100
```
Also note, that values in this case will be mostly unique, because they are added to the base pointer, not to a previously incremented pointer. In latter case the would be high probability that many adds would have the same literal.
I would lean towards using e64 version and an SGPR when the pass is redesigned to first collect all pairs, build a chain of adds, and only then combine. In this case we will use single SGPR.
Repository:
rL LLVM
https://reviews.llvm.org/D31993
More information about the llvm-commits
mailing list