[PATCH] D55163: AMDGPU: Add optimization patterns to combine fp32->fp16 conversions

Sun Dec 16 22:32:36 PST 2018

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2045
+
+  // no constant handling unlike SelectHi16Elt() due to lack of need
+
----------------
Capitalize and punctuate

================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2080-2081
+bool AMDGPUDAGToDAGISel::SelectCvtRtzF16F32(SDValue In, SDValue &Src) const {
+  ConstantSDNode *CI = dyn_cast<ConstantSDNode>(In);
+  ConstantFPSDNode *CF = dyn_cast<ConstantFPSDNode>(In);
+  if (CI || CF) {
----------------
I think you should need only one or the other here, not both. Either way, you could move this out of the function (and I think there's a wrapper for this somewhere already)

================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2091
+    bool LostInfo;
+    APFloat FVal = APFloat(APFloatBase::IEEEhalf(), APInt(16, Val));
+    FVal.convert(APFloatBase::IEEEsingle(), APFloatBase::rmNearestTiesToAway, &LostInfo);
----------------
You can use the constructor directly without the extra = APFloat

================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2125
+
+bool AMDGPUDAGToDAGISel::SelectCvtRtzF16F32Impl(SDValue In, SDValue &Src, bool hi) const {
+  In = stripBitcast(In);
----------------
Capitalize hi

================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2154-2159
+  unsigned shiftOpcode = hi ? ISD::SHL : ISD::SRL;
+  int shiftOperand = hi ? 0 : 1;
+  uint32_t andMask = hi ? 0xffff0000u : 0xffffu;
+  int andOperand = hi ? 1 : 0;
+
+  if (In.getOpcode() == ISD::AND) {
----------------
computeKnownBits?

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:1597

+let SubtargetPredicate = isGCN in {
+
----------------
Should use isVI, or maybe these should be distinguished by GCN3Encoding? Needs a comment for why these are separated 

================
Comment at: test/CodeGen/AMDGPU/cvt_pkrtz_f16_f32_combine.ll:23
+; ALL-LABEL: {{^}}packed_convert_high:
+; ALL: v_cvt_pkrtz_f16_f32{{(_e32|_e64)?}} v{{[0-9]*}}, v0, v1
+; ALL-NOT: v_cvt_pkrtz_f16_f32
----------------
Probably should explicitly test the different encodings for the different sub targets 

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55163/new/

https://reviews.llvm.org/D55163