[llvm] [AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (PR #88512)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 16 07:54:08 PDT 2024
================
@@ -3075,20 +3075,28 @@ static bool isCttzOpc(unsigned Opc) {
SDValue AMDGPUTargetLowering::lowerCTLZResults(SDValue Op,
SelectionDAG &DAG) const {
auto SL = SDLoc(Op);
+ auto Opc = Op.getOpcode();
auto Arg = Op.getOperand(0u);
auto ResultVT = Op.getValueType();
if (ResultVT != MVT::i8 && ResultVT != MVT::i16)
return {};
- assert(isCtlzOpc(Op.getOpcode()));
+ assert(isCtlzOpc(Opc));
assert(ResultVT == Arg.getValueType());
- auto const LeadingZeroes = 32u - ResultVT.getFixedSizeInBits();
- auto SubVal = DAG.getConstant(LeadingZeroes, SL, MVT::i32);
+ auto const NumBits = ResultVT.getFixedSizeInBits();
+ auto NumExtBits = DAG.getConstant(32u - NumBits, SL, MVT::i32);
auto NewOp = DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i32, Arg);
----------------
jayfoad wrote:
For the zero-undef case it would be better to generate ANY_EXTEND here. (Later combines can probably convert the ZERO_EXTEND to ANY_EXTEND by using demanded bits info, but it would be better to generate the right node up-front.)
https://github.com/llvm/llvm-project/pull/88512
More information about the llvm-commits
mailing list