[PATCH] D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection

Thu Jan 6 05:49:39 PST 2022

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

LGTM.

================
Comment at: llvm/lib/Target/AMDGPU/SOPInstructions.td:1374
 def : GCNPat <
-  (i64 (ctpop i64:$src)),
+  (i64 (UniformUnaryFrag<ctpop> i64:$src)),
     (i64 (REG_SEQUENCE SReg_64,
----------------
alex-t wrote:
> foad wrote:
> > Do we really need both this pattern and the one on line 252? Surely one of them is redundant?
> These two are not exactly identical. 
> The first one, at line 252, accepts i64 and returns i32.
> The second one - accepts i64 and returns i64.
> W/o the latter one, no implicit zero extend occurs.
Actually I see now, the i64 to i32 pattern is used for GlobalISel only, and the i64 to i64 pattern is used for SelectionDAG only.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116284/new/

https://reviews.llvm.org/D116284