[llvm] [AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (PR #88512)

Tue Apr 16 07:54:08 PDT 2024

================
@@ -3075,20 +3075,28 @@ static bool isCttzOpc(unsigned Opc) {
 SDValue AMDGPUTargetLowering::lowerCTLZResults(SDValue Op,
                                                SelectionDAG &DAG) const {
   auto SL = SDLoc(Op);
+  auto Opc = Op.getOpcode();
   auto Arg = Op.getOperand(0u);
   auto ResultVT = Op.getValueType();
 
   if (ResultVT != MVT::i8 && ResultVT != MVT::i16)
     return {};
 
-  assert(isCtlzOpc(Op.getOpcode()));
+  assert(isCtlzOpc(Opc));
   assert(ResultVT == Arg.getValueType());
 
-  auto const LeadingZeroes = 32u - ResultVT.getFixedSizeInBits();
-  auto SubVal = DAG.getConstant(LeadingZeroes, SL, MVT::i32);
+  auto const NumBits = ResultVT.getFixedSizeInBits();
+  auto NumExtBits = DAG.getConstant(32u - NumBits, SL, MVT::i32);
   auto NewOp = DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i32, Arg);
----------------
jayfoad wrote:

For the zero-undef case it would be better to generate ANY_EXTEND here. (Later combines can probably convert the ZERO_EXTEND to ANY_EXTEND by using demanded bits info, but it would be better to generate the right node up-front.)

https://github.com/llvm/llvm-project/pull/88512