[PATCH] D40343: AMDGPU: Do not combine loads/store across physreg defs

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 28 11:51:25 PST 2017


rampitec added a comment.

In https://reviews.llvm.org/D40343#937312, @nhaehnle wrote:

> In https://reviews.llvm.org/D40343#933121, @rampitec wrote:
>
> > As far as I understand that is only a concern if defined register is M0 since it is read by the ds_* instructions. I.e. it should be better to check to M0, not just any physreg.
> >  Also this should not be a concern on GFX9 since we have lds instructions which do not read M0 there (check Subtarget->ldsRequiresM0Init()).
>
>
> The LDS use here is a bit of a red herring; it's really not about that. The original case where I found the bug had no "proper" LDS instructions at all, only v_interp, and the merged memory instructions were buffer instructions. You could probably construct a similar case also with e.g. only relative indexing with different indices, or with relative indexing and s_sendmsg (which uses M0).
>
> The real problem is that the pass assumes machine-SSA form, and this assumption is broken with physreg-defs. Here's what happens:
>
>   %vreg0 = mem-instruction-1
>   M0 = def-1
>   use(%vreg0, M0)
>   M0 = def-2
>   ...
>   mem-instruction-2
>
>
> Without this fix, this gets changed to:
>
>   M0 = def-1
>   M0 = def-2
>   ...
>   %vreg0, ... = merged-mem-instruction
>   use(%vreg0, M0)
>
>
> So the %vreg0-use gets moved (because it depends on mem-instruction-1), without regard for the fact that the instruction reads a register that is later overwritten by a different value.
>
> This possible write-after-read hazard of registers is something that the pass simply hasn't tracked before, because for virtual registers in machine-SSA its unnecessary. It's only with physical registers that it becomes necessary; we don't have many of those at this stage in the flow in practice, but M0 is not a special case.


I see. But my point is we can easily get the situation:

  m0 = def-1
  ds_read
  m0 = def-2
  ds_read

That worth nothing that defs are euqal in our case. Unless we are going to prove defs are equal we shall be unable to combine those two ds_reads. Now with your patch we will be unable to combine them.
This is fine unless you think about GFX9 versions of ds_read which do not imp-use M0. We still shall be able to combine them, but we will not after this patch.

So what I mean not just bail out on physreg def, but check if the register is actually used by any instructions you are going to move before the required def.
Does it make sense to you?


https://reviews.llvm.org/D40343





More information about the llvm-commits mailing list