[PATCH] D123835: AMDGPU/SDAG: Refine the fold to v_mad_[iu]64_[iu]32

Tue Apr 19 09:34:06 PDT 2022

nhaehnle added a comment.

In D123835#3454046 <https://reviews.llvm.org/D123835#3454046>, @arsenm wrote:

> I would have expected this to put this back together after the generic multiply expansion though.

So the original code is `(add (mul x, y), z)`. The generic expansion turns that into `(add M, z)`, where the generic multiply expansion expands `M` into something along the lines of:

  lo,hi = umul_lohi (trunc x), (trunc y)
  a = mul (trunc x), (shr y)
  b = mul (shr x), (trunc y)
  M = or (shl (zext (add3 hi, a, b)), 32), (zext lo)

The `umul_lohi` becomes `v_mad_u64_u32` but it's used only as an addition, as you can see in the pre-change lit tests. Reliably reassociating the code sequence above into a form where the pre-change performAddCombine would trigger seems pretty complicated.

> Also it would be nice to get this one ported to GlobalISel

Agreed, but it looks like GlobalISel doesn't generate mad_64_32 at all at the moment. So it's more than just porting this particular tweak.

================
Comment at: llvm/test/CodeGen/AMDGPU/mad_64_32.ll:535-539
+; CI-NEXT:    v_mad_i64_i32 v[0:1], s[4:5], v0, v1, 0
+; CI-NEXT:    v_add_i32_e32 v2, vcc, v0, v2
+; CI-NEXT:    v_addc_u32_e32 v3, vcc, v1, v3, vcc
+; CI-NEXT:    v_add_i32_e32 v0, vcc, v0, v4
+; CI-NEXT:    v_addc_u32_e32 v1, vcc, v1, v5, vcc
----------------
rampitec wrote:
> arsenm wrote:
> > This is a regression? It looks to be the same cycle count for more code size
> Actually since gfx90a v_mad_u64/i64 is full rate, so it is even more cycles in that case.
Good to know, I'm going to rework that part.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123835/new/

https://reviews.llvm.org/D123835