[PATCH] D50575: [AMDGPU] Add support for a16 modifier for gfx9

Fri Aug 10 12:15:14 PDT 2018

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPU.td:245-246

+def FeatureR128A16 : SubtargetFeature<"a16",
+  "HasR128A16",
+  "true",
----------------
Feature name doesn't match, r128-a16?

================
Comment at: lib/Target/AMDGPU/AMDGPU.td:248
+  "true",
+  "Support 16 bit address types on gfx9"
+>;
----------------
Should be more specific about what kind of address

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4613-4614
   }

+  // check for 16 bit addresses and pack if true
+  unsigned DimIdx = AddrIdx + BaseOpcode->NumExtraArgs;
----------------
Capitalize and punctuate

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4618
+  if (VAddrVT.getScalarType() == MVT::f16 &&
+        Subtarget->getGeneration() >= GCNSubtarget::GFX9) {
+    IsA16 = true;
----------------
This should be checking the subtarget feature you added?

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4620
+    IsA16 = true;
+    VAddrs.clear();
+    // push back non-packed 32 bit addresses
----------------
Why is this clearing?

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4621
+    VAddrs.clear();
+    // push back non-packed 32 bit addresses
+    for (unsigned i = 0; i < BaseOpcode->NumExtraArgs; ++i)
----------------
Capitalize etc.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4624
+      VAddrs.push_back(Op.getOperand(AddrIdx + i));
+    // pack derivatives
+    unsigned DimIdx = AddrIdx + BaseOpcode->NumExtraArgs;
----------------
Capitalize etc.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4628-4630
+        SDValue DimX = Op.getOperand(DimIdx + i);
+        SDValue DimY = DAG.getUNDEF(MVT::f16);
+        DimX = DAG.getBuildVector(MVT::v2f16, DL, {DimX, DimY});
----------------
This is a SCALAR_TO_VECTOR

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4631
+        DimX = DAG.getBuildVector(MVT::v2f16, DL, {DimX, DimY});
+        DimX = DAG.getBitcast(MVT::f32, DimX);
+        VAddrs.push_back(DimX);
----------------
Why this bitcast, and why f32? If we're coercing vector types i32 would be correct

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4648
+          VAddrs.push_back(DimZ);
+          i++;
+        }
----------------
I don't understand the overall flow of all of these sections. There 3 nearly identical looking parts for the different NumGradients, and I'm further confused by the manipulation of i inside the loop

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4660-4662
+      AddrLo = DAG.getBuildVector(MVT::v2f16, DL, {AddrLo, AddrHi});
+      AddrLo = DAG.getBitcast(MVT::f32, AddrLo);
+      VAddrs.push_back(AddrLo);
----------------
Ditto

Repository:
  rL LLVM

https://reviews.llvm.org/D50575