[PATCH] D50575: [AMDGPU] Add support for a16 modifier for gfx9

Mon Aug 13 06:06:55 PDT 2018

rtaylor added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4618
+  if (VAddrVT.getScalarType() == MVT::f16 &&
+        Subtarget->getGeneration() >= GCNSubtarget::GFX9) {
+    IsA16 = true;
----------------
arsenm wrote:
> This should be checking the subtarget feature you added?
Yes, that makes sense.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4620
+    IsA16 = true;
+    VAddrs.clear();
+    // push back non-packed 32 bit addresses
----------------
arsenm wrote:
> Why is this clearing?
VAddrs is set above, the L to LZ mapping needs VAddrs to find _L to convert to _LZ, so this is building the packed version of VAddrs, so clearing it first.  Actually, a better reasoning is that the lod could be the high 16 or low 16 depending on the number of coordinates, so I could remove this clear, do the non-a16 packing as an 'else' to this 'if' and then do the _L to _LZ after. If the lod is high 16 it will just be ignored anyway since it's a don't care. If it's in low 16 (NumCoords+LodOrClampOrMip is odd), high 16 is guaranteed to be don't care since lod  is always last and then we can just pop it. I will implement this.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4628-4630
+        SDValue DimX = Op.getOperand(DimIdx + i);
+        SDValue DimY = DAG.getUNDEF(MVT::f16);
+        DimX = DAG.getBuildVector(MVT::v2f16, DL, {DimX, DimY});
----------------
arsenm wrote:
> This is a SCALAR_TO_VECTOR
Ok.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4631
+        DimX = DAG.getBuildVector(MVT::v2f16, DL, {DimX, DimY});
+        DimX = DAG.getBitcast(MVT::f32, DimX);
+        VAddrs.push_back(DimX);
----------------
arsenm wrote:
> Why this bitcast, and why f32? If we're coercing vector types i32 would be correct
Ok.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4648
+          VAddrs.push_back(DimZ);
+          i++;
+        }
----------------
arsenm wrote:
> I don't understand the overall flow of all of these sections. There 3 nearly identical looking parts for the different NumGradients, and I'm further confused by the manipulation of i inside the loop
Fair enough, this could probably be consolidated, there is some redundancy, I suppose I could make these functions and call them and pass them, though that seems excessive also. The way the address components are packed differs depending on the dimension and the ordering makes it a bit difficult to cleanly do this. I could also do short sections with long conditions.

1D condition packs an undef with dx/dh and an undef with dy/dh.
2D condition packs dx/dh and dy/dh together and dx/dt and dy/dt together.
3D condition does 2D condition but packs an undef with dz/dh and an undef with dz/dt.

The ordering all out would be dxdh, dydh, dzdh, dxdt, dydt, dzdt.

Repository:
  rL LLVM

https://reviews.llvm.org/D50575