[PATCH] D113679: [AMDGPU] Simplify 64-bit division/remainder expansion
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 11 07:59:50 PST 2021
foad added reviewers: arsenm, rampitec, b-sumner.
foad added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/udiv64.ll:257-258
; GCN-NEXT: v_addc_u32_e32 v9, vcc, v14, v10, vcc
-; GCN-NEXT: v_add_i32_e64 v4, s[4:5], v4, v8
-; GCN-NEXT: v_addc_u32_e64 v8, vcc, v5, v9, s[4:5]
-; GCN-NEXT: v_mul_lo_u32 v10, v6, v8
----------------
This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later...
================
Comment at: llvm/test/CodeGen/AMDGPU/udiv64.ll:279-280
; GCN-NEXT: v_addc_u32_e32 v7, vcc, v14, v8, vcc
-; GCN-NEXT: v_add_i32_e32 v5, vcc, v5, v9
-; GCN-NEXT: v_addc_u32_e64 v5, vcc, v5, v7, s[4:5]
; GCN-NEXT: v_add_i32_e32 v4, vcc, v4, v6
----------------
.. and here we recompute v5+v9 but //without// carry-in from the corresponding low part addition v4+v8, but in the very next instruction we add back in the missing carry!
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D113679/new/
https://reviews.llvm.org/D113679
More information about the llvm-commits
mailing list