[PATCH] D113679: [AMDGPU] Simplify 64-bit division/remainder expansion

Thu Nov 11 07:59:50 PST 2021

foad added reviewers: arsenm, rampitec, b-sumner.
foad added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/udiv64.ll:257-258
 ; GCN-NEXT:    v_addc_u32_e32 v9, vcc, v14, v10, vcc
-; GCN-NEXT:    v_add_i32_e64 v4, s[4:5], v4, v8
-; GCN-NEXT:    v_addc_u32_e64 v8, vcc, v5, v9, s[4:5]
-; GCN-NEXT:    v_mul_lo_u32 v10, v6, v8
----------------
This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later...

================
Comment at: llvm/test/CodeGen/AMDGPU/udiv64.ll:279-280
 ; GCN-NEXT:    v_addc_u32_e32 v7, vcc, v14, v8, vcc
-; GCN-NEXT:    v_add_i32_e32 v5, vcc, v5, v9
-; GCN-NEXT:    v_addc_u32_e64 v5, vcc, v5, v7, s[4:5]
 ; GCN-NEXT:    v_add_i32_e32 v4, vcc, v4, v6
----------------
.. and here we recompute v5+v9 but //without// carry-in from the corresponding low part addition v4+v8, but in the very next instruction we add back in the missing carry!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113679/new/

https://reviews.llvm.org/D113679