[PATCH] D75910: [AMDGPU] Improve scheduling model for VOP3b instructions

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 11 11:56:19 PDT 2020


rampitec added a comment.

Mostly LGTM, except for the name of the resource.



================
Comment at: llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll:1336
+; GFX8-NEXT:    v_readfirstlane_b32 s0, v1
+; GFX8-NEXT:    v_add_u32_e32 v1, vcc, v4, v3
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s1
----------------
foad wrote:
> rampitec wrote:
> > foad wrote:
> > > rampitec wrote:
> > > > What about AMDGPUMacroFusion which tries to do exactly the opposite?
> > > Why do you say "the opposite"? Macro fusion tries to put the v_add next to the v_addc (but apparently it fails in this case). My patch should not stop this from working.
> > Aren't you adding a latency between vcc def and its use?
> No I'm just replacing WriteSALU with WriteVCC which has the same latency. But macro fusion overrides this anyway and forces the latency to 0 for any dependencies between the instructions that it fuses.
OK, makes sense.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75910/new/

https://reviews.llvm.org/D75910





More information about the llvm-commits mailing list