[PATCH] D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics

Tue Nov 7 11:36:39 PST 2017

cfang marked 11 inline comments as done.
cfang added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPU.td:289

+def FeaturePackedD16VMem : SubtargetFeature<"packed-d16-vmem",
+  "HasPackedD16VMem",
----------------
arsenm wrote:
> I think we should invert this, to HasUnpackedD16Mem. It's only the one weird target, the packed layout is the expected one and for every other subtarget.
OK. Use HasUnpackedD16VMem.

================
Comment at: lib/Target/AMDGPU/BUFInstructions.td:675
+
+let SubtargetPredicate = NotHasPackedD16VMem in { // NP means data not packed.
+defm BUFFER_LOAD_FORMAT_D16_X_NP : MUBUF_Pseudo_Loads <
----------------
arsenm wrote:
> Comment belongs somewhere else. We usually have the true opcode name as all caps, and modifiers like this as lowercase. In this case we should maybe call it _gfx81 instead.
Done. Use _gfx80 because only gfx80 has the feature "HasUnpackedD16VMem".

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:549
+
+static EVT getExtVT (EVT VT) {
+  if (VT == MVT::v2f16)
----------------
cfang wrote:
> arsenm wrote:
> > Since you really want to get the i16 equivalent vector, you can use changeTypeToInteger on the f16 type.
> For v2f16, I want something that uses 2 registers, so v2i16 doesn't work for me. 
> What should be the type that specifies the two registers that hold the two half values? Here I am using v2f32.
Right. So Remove this function definition.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:589
+    Info.ptrVal = nullptr;
+    Info.align = 0;
+
----------------
arsenm wrote:
> I think you can assume the ABI type alignment for these (at least for the scalar type) not that it probably matters
Not sure how to get the ABI type alignment here since we are using nullptr as ptrval.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3268

+static SDValue ChangeResultType(SDValue Op, EVT EquivResultT, SelectionDAG &DAG) {
+  // Change from v4f16/v2f16 to EquivResultT.
----------------
arsenm wrote:
> s/ChangeResultType/lowerIntrinsicWChain/
Changed func name to lowerIntrinsicWChain. 

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3274
+  unsigned IID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
+  if (IID == Intrinsic::amdgcn_tbuffer_load) {
+    SDValue Ops[] = {
----------------
arsenm wrote:
> switch over IID
OK switch.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3345
+    if (SDValue Res = ChangeResultType(Op, ToVT, DAG)) {
+      if (!HasPacked) { // From v2f32/v4f32 back to v2f16/v4f16.
+        SDValue RFlag = DAG.getTargetConstant(0, SL, MVT::i32);
----------------
cfang wrote:
> arsenm wrote:
> > I'm pretty sure there should be no f32 or FP_ROUND here. This is returning the half format result just expanded into multiple 32-bit registers. They are still half, you need to truncate the int type and bitcast to f16.
> Do you mean the returned type is not v2f32? so what is the type that returned? v2f16? I just could not understand how to represent the two half values in two registers!  
> 
> I hope you can be more specific on what to do here.
For v2f16, changed the intrinsic load type to v2i32 if the target has unpacked vmem instructions. After the load, 
we truncate the data to v2i16, and then bitcast it back to v2f16. v4f16 case is done similarly.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4548-4549
+        } else {// We need to unpack the packed data to store.
+          EVT ExtendedT = getExtVT(MemVT);
+          VData = DAG.getNode(ISD::FP_EXTEND, DL, ExtendedT, VData);
+        }
----------------
arsenm wrote:
> Same thing as with stores
For store of v2f16, we first bitcast the data to v2i16, and then zero_extend it to v2i32 to store. v4f16 case is done similarly. 

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.format.d16.ll:5
+
+;UNPACKED-LABEL: {{^}}buffer_load_format_d16_x:
+;UNPACKED: buffer_load_format_d16_x v0, off, s[0:3], 0
----------------
arsenm wrote:
> Spaces after ;
Thanks.

https://reviews.llvm.org/D38906