[PATCH] D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 15 10:17:56 PDT 2017
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2036-2042
+ if (isCtlzOpc(Op.getOpcode())) {
+ ISDOpc = ISD::CTLZ_ZERO_UNDEF;
+ AMDGPUISDOpc = AMDGPUISD::FFBH_U32;
+ } else if (isCttzOpc(Op.getOpcode())){
+ ISDOpc = ISD::CTTZ_ZERO_UNDEF;
+ AMDGPUISDOpc = AMDGPUISD::FFBL_U32;
+ } else
----------------
Indentation wrong
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2043
+ } else
+ assert("Invalid OPCode!!!");
+
----------------
llvm_unreachable
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2061-2064
+ if (isCtlzOpc(Op.getOpcode()))
+ Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);
+ else
+ Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, One, ISD::SETEQ);
----------------
Select between Zero and One as input to getSetCC
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:3008-3009
+ unsigned Opc) const {
+ unsigned AMDGPUISDOpc = isCttzOpc(Opc) ? AMDGPUISD::FFBL_U32 :
+ AMDGPUISD::FFBH_U32;
+//unsigned AMDGPUISDOpc = AMDGPUISD::FFBH_U32;
----------------
You could just pass in the new opcode directly rather than selecting it again
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:3010
+ AMDGPUISD::FFBH_U32;
+//unsigned AMDGPUISDOpc = AMDGPUISD::FFBH_U32;
+
----------------
Commented out code
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:3035
// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.
-SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,
+SDValue AMDGPUTargetLowering::performCtlz_CttzCombine(const SDLoc &SL, SDValue Cond,
SDValue LHS, SDValue RHS,
----------------
You didn't add tests for this part
================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.h:374
FFBH_I32,
+ FFBL_U32, // cttz with -1 if input is zero.
MUL_U24,
----------------
Should be name FFBL_B32 to match the instruction
================
Comment at: test/CodeGen/AMDGPU/cttz_zero_undef.ll:131
+}
+
----------------
Also should have some tests with i8/i16
https://reviews.llvm.org/D37348
More information about the llvm-commits
mailing list