[PATCH] D25975: AMDGPU/SI: Make f16 a legal type for VI subtargets

Tue Nov 8 14:20:43 PST 2016

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:2041-2044
+  if (Op.getOperand(0).getValueType() == MVT::i64)
+    return Op.getOpcode() == ISD::SINT_TO_FP ?
+        AMDGPUTargetLowering::LowerSINT_TO_FP(Op, DAG) :
+        AMDGPUTargetLowering::LowerUINT_TO_FP(Op, DAG);
----------------
Braces

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3768
     // not support denormals ever.
-    if (VT == MVT::f32 &&
-        !Subtarget->hasFP32Denormals()) {
+    if (VT == MVT::f32 && !Subtarget->hasFP32Denormals()) {
       SDValue LHS = N->getOperand(0);
----------------
Should this also handle f16? Could be separate optimization patch

================
Comment at: lib/Target/AMDGPU/SIISelLowering.h:51
+  /// floating point type \p VT, by either extending or truncating it.
+  SDValue GetFPExtOrFPTrunc(SelectionDAG &DAG,
+                            SDValue Op,
----------------
should start with lower case

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:422-423
+def : Pat <
+  (f16 (fpround f64:$src)),
+  (V_CVT_F16_F32_e32 (V_CVT_F32_F64_e32 $src))
+>;
----------------
Is this (and fpextend) correct? A correct lowering for these was just added I thought

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.td:254
 // See comments in SIInstructions.td for more info.
 def SReg_32_XM0 : RegisterClass<"AMDGPU", [i32, f32], 32,
   (add SGPR_32, VCC_LO, VCC_HI, EXEC_LO, EXEC_HI, FLAT_SCR_LO, FLAT_SCR_HI,
----------------
Missing i16/f16 but it probably doesn't matter

================
Comment at: lib/Target/AMDGPU/SISchedule.td:29
 // Vector ALU instructions
+def Write16Bit         : SchedWrite;
 def Write32Bit         : SchedWrite;
----------------
I don't think we need this since they run at the same rat as 32-bit

================
Comment at: lib/Target/AMDGPU/VOP2Instructions.td:326
+
+// FIXME: V_LDEXP_F16 requires a change to llvm.amdgcn.ldexp intrinsic.
+defm V_LDEXP_F16 : VOP2Inst <"v_ldexp_f16", VOP_F16_F16_I16/*, AMDGPUldexp*/>;
----------------
Why is this necessary? It can already mangle the FP type. This should work if you change it to VOP_F16_F16_I32, the int type doesn't matter

https://reviews.llvm.org/D25975