[PATCH] D37348: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Wed Oct 11 11:05:09 PDT 2017

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2031
+
+SDValue AMDGPUTargetLowering:: LowerCTLZ_CTTZ(SDValue Op, SelectionDAG &DAG) const {
   SDLoc SL(Op);
----------------
Extra space after ::

================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2040
+    ISDOpc = ISD::CTLZ_ZERO_UNDEF;
+    AMDGPUISDOpc = AMDGPUISD::FFBH_U32;
+  } else if (isCttzOpc(Op.getOpcode())){
----------------
Don't includ eAMDGPUISD in variable name

================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2041
+    AMDGPUISDOpc = AMDGPUISD::FFBH_U32;
+  } else if (isCttzOpc(Op.getOpcode())){
+    ISDOpc = ISD::CTTZ_ZERO_UNDEF;
----------------
Missing space before {

================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2073
+    Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprLo, Bits32);
+    //// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x)
+    NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0orLo0, Add, OprHi);
----------------
Double // and missing closing )

================
Comment at: lib/Target/AMDGPU/AMDGPUISelLowering.cpp:2077
+    Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprHi, Bits32);
+    //// cttz(x) = lo_32(x) == 0 ? cttz(hi_32(x)) + 32 : cttz(lo_32(x))
+    NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0orLo0, Add, OprLo);
----------------
Double //

================
Comment at: test/CodeGen/AMDGPU/cttz_zero_undef.ll:1
-; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s
-; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s
+; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI -check-prefix=SI-NOSDWA -check-prefix=FUNC %s
+; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=SI -check-prefix=SI-SDWA  -check-prefix=FUNC %s
----------------
Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken

================
Comment at: test/CodeGen/AMDGPU/cttz_zero_undef.ll:171-172
+; FUNC-LABEL: {{^}}v_cttz_zero_undef_i64_with_select:
+; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
+; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
+; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
----------------
This isn't checking the outputs and select

================
Comment at: test/CodeGen/AMDGPU/cttz_zero_undef.ll:184
+; FUNC-LABEL: {{^}}v_cttz_i32_sel_eq_neg1:
+; SI: v_ffbl_b32_e32 [[RESULT:v[0-9]+]], [[VAL]]
+; SI: s_endpgm
----------------
Using undefined VAL

================
Comment at: test/CodeGen/AMDGPU/cttz_zero_undef.ll:198
+; FUNC-LABEL: {{^}}v_cttz_i32_sel_ne_neg1:
+; SI: v_ffbl_b32_e32 [[RESULT:v[0-9]+]], [[VAL]]
+; SI: s_endpgm
----------------
Undefined VAL

Repository:
  rL LLVM

https://reviews.llvm.org/D37348