[PATCH] D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection

Fri Dec 31 03:40:35 PST 2021

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:1028
+      (i32 (V_BCNT_U32_B32_e64 (i32 (EXTRACT_SUBREG i64:$src, sub0)), (i32 0)))), sub0,
+    (i32 (COPY_TO_REGCLASS (i32 (V_MOV_B32_e32 (i32 0))), VGPR_32)), sub1)
+>;
----------------
Do you really need COPY_TO_REGCLASS here?

================
Comment at: llvm/lib/Target/AMDGPU/SOPInstructions.td:1374
 def : GCNPat <
-  (i64 (ctpop i64:$src)),
+  (i64 (UniformUnaryFrag<ctpop> i64:$src)),
     (i64 (REG_SEQUENCE SReg_64,
----------------
Do we really need both this pattern and the one on line 252? Surely one of them is redundant?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116284/new/

https://reviews.llvm.org/D116284