[PATCH] D116270: [AMDGPU] Enable divergence-driven XNOR selection

Mon Jan 10 01:13:40 PST 2022

foad added a comment.

> SITargetLowering::reassociateScalarOps exists to fix the instruction selection that is done in a wrong way.

No! It's not trying to fix anything, it's just trying to reassociate expressions to keep more of the intermediate results uniform, so we use fewer vgprs and fewer valu instructions. For example:

  // v1 = (v0 + s0) + s1
  v_add v1, v0, s0
  v_add v1, v1, s1
   -->
  // v1 = (s0 + s1) + v0 ; reassociated
  s_add s2, s0, s1
  v_add v1, s2, v0

This is exactly the same kind of thing you need to do to restore the missed optimization in xnor.ll:

  // v1 = ~(s0 ^ v0)
  v_xor v1, s0, v0
  v_not v1, v1
   -->
  // v1 = ~s0 ^ v0
  s_not s1, s0
  v_xor v1, s1, v0

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116270/new/

https://reviews.llvm.org/D116270