[PATCH] D40343: AMDGPU: Do not combine loads/store across physreg defs

Tue Nov 28 00:53:16 PST 2017

nhaehnle added a comment.

In https://reviews.llvm.org/D40343#933121, @rampitec wrote:

> As far as I understand that is only a concern if defined register is M0 since it is read by the ds_* instructions. I.e. it should be better to check to M0, not just any physreg.
>  Also this should not be a concern on GFX9 since we have lds instructions which do not read M0 there (check Subtarget->ldsRequiresM0Init()).

The LDS use here is a bit of a red herring; it's really not about that. The original case where I found the bug had no "proper" LDS instructions at all, only v_interp, and the merged memory instructions were buffer instructions. You could probably construct a similar case also with e.g. only relative indexing with different indices, or with relative indexing and s_sendmsg (which uses M0).

The real problem is that the pass assumes machine-SSA form, and this assumption is broken with physreg-defs. Here's what happens:

  %vreg0 = mem-instruction-1
  M0 = def-1
  use(%vreg0, M0)
  M0 = def-2
  ...
  mem-instruction-2

Without this fix, this gets changed to:

  M0 = def-1
  M0 = def-2
  ...
  %vreg0, ... = merged-mem-instruction
  use(%vreg0, M0)

So the %vreg0-use gets moved (because it depends on mem-instruction-1), without regard for the fact that the instruction reads a register that is later overwritten by a different value.

This possible write-after-read hazard of registers is something that the pass simply hasn't tracked before, because for virtual registers in machine-SSA its unnecessary. It's only with physical registers that it becomes necessary; we don't have many of those at this stage in the flow in practice, but M0 is not a special case.

https://reviews.llvm.org/D40343