[PATCH] D25975: AMDGPU/SI: Make f16 a legal type for VI subtargets

Wed Oct 26 09:52:43 PDT 2016

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:82-83

   // TODO: Subtarget feature for i16
-  if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+  if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
     addRegisterClass(MVT::i16, &AMDGPU::SReg_32RegClass);
----------------
The TODO was taken care of already

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:85-86
     addRegisterClass(MVT::i16, &AMDGPU::SReg_32RegClass);
+    addRegisterClass(MVT::f16, &AMDGPU::SReg_32RegClass);
+    addRegisterClass(MVT::f16, &AMDGPU::VGPR_32RegClass);
+  }
----------------
tstellarAMD wrote:
> There is a 1-to-1 mapping between types and register classes, so you can drop the first addRegisterClass(MVT::f16, ... ) call
I think this should only be added to SReg_32. We already have other random inconsistencies from having f32 in a VGPR class and i32 in SGPR which I've been trying to fix

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3602

-  if (VT != MVT::f32 && VT != MVT::f64)
+  if (VT != MVT::f16 && VT != MVT::f32 && VT != MVT::f64)
     return SDValue();
----------------
Does this need an f16 is legal check? If not this can probably just be an !isVector check, any of the other FP types are unusable 

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:1415-1416

-  if (Opc == AMDGPU::V_MAD_F32 || Opc == AMDGPU::V_MAC_F32_e64) {
+  if (Opc == AMDGPU::V_MAD_F16 || Opc == AMDGPU::V_MAC_F16_e64 ||
+      Opc == AMDGPU::V_MAD_F32 || Opc == AMDGPU::V_MAC_F32_e64) {
+    bool IsF16 = Opc == AMDGPU::V_MAD_F16 || Opc == AMDGPU::V_MAC_F16_e64;
----------------
I would prefer checking the more common f32 cases first

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:1505

-      if (Opc == AMDGPU::V_MAC_F32_e64) {
+      if (Opc == AMDGPU::V_MAC_F16_e64 || Opc == AMDGPU::V_MAC_F32_e64) {
         UseMI.untieRegOperand(
----------------
Ditto

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:443-445
+  (f16 (fmad (VOP3NoMods0 f32:$src0, i32:$src0_modifiers, i1:$clamp, i32:$omod),
+             (VOP3NoMods f32:$src1, i32:$src1_modifiers),
+             (VOP3NoMods f32:$src2, i32:$src2_modifiers))),
----------------
The f32 patterns on the sources look wrong. Not sure how this compiles 

https://reviews.llvm.org/D25975