[PATCH] D149348: RFD: Do not CSE convergent calls in different basic blocks

Thu Apr 27 07:51:07 PDT 2023

foad added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/cse-convergent.ll:37
+; GCN-NEXT:    s_or_saveexec_b32 s5, -1
+; GCN-NEXT:    v_mov_b32_dpp v2, v3 row_xmask:1 row_mask:0xf bank_mask:0xf
+; GCN-NEXT:    s_mov_b32 exec_lo, s5
----------------
This is the effect of the fix: we repeat the DPP subgroup operation over a reduced set of lanes, instead of reusing the result of the first DPP subgroup operation over all lanes.

================
Comment at: llvm/test/Transforms/SimplifyCFG/convergent.ll:85
 ; SINK-NEXT:    [[CMP_NOT:%.*]] = icmp eq i32 [[REM]], 0
+; SINK-NEXT:    [[IDXPROM4:%.*]] = zext i32 [[TMP0]] to i64
+; SINK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[Y_COERCE:%.*]], i64 [[IDXPROM4]]
----------------
This is a completely accidental hoisting improvement due to https://reviews.llvm.org/D129370#inline-1442432. Convergent calls in the "then" and "else" branches are now treated as not identical, which weirdly allows *more* hoisting than when they were considered identical.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149348/new/

https://reviews.llvm.org/D149348