[PATCH] D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics

Mon Nov 6 02:14:40 PST 2017

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPU.td:289

+def FeaturePackedD16VMem : SubtargetFeature<"packed-d16-vmem",
+  "HasPackedD16VMem",
----------------
I think we should invert this, to HasUnpackedD16Mem. It's only the one weird target, the packed layout is the expected one and for every other subtarget.

================
Comment at: lib/Target/AMDGPU/BUFInstructions.td:675
+
+let SubtargetPredicate = NotHasPackedD16VMem in { // NP means data not packed.
+defm BUFFER_LOAD_FORMAT_D16_X_NP : MUBUF_Pseudo_Loads <
----------------
Comment belongs somewhere else. We usually have the true opcode name as all caps, and modifiers like this as lowercase. In this case we should maybe call it _gfx81 instead.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:549
+
+static EVT getExtVT (EVT VT) {
+  if (VT == MVT::v2f16)
----------------
Since you really want to get the i16 equivalent vector, you can use changeTypeToInteger on the f16 type.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:589
+    Info.ptrVal = nullptr;
+    Info.align = 0;
+
----------------
I think you can assume the ABI type alignment for these (at least for the scalar type) not that it probably matters

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3268

+static SDValue ChangeResultType(SDValue Op, EVT EquivResultT, SelectionDAG &DAG) {
+  // Change from v4f16/v2f16 to EquivResultT.
----------------
s/ChangeResultType/lowerIntrinsicWChain/

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3274
+  unsigned IID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
+  if (IID == Intrinsic::amdgcn_tbuffer_load) {
+    SDValue Ops[] = {
----------------
switch over IID

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3345
+    if (SDValue Res = ChangeResultType(Op, ToVT, DAG)) {
+      if (!HasPacked) { // From v2f32/v4f32 back to v2f16/v4f16.
+        SDValue RFlag = DAG.getTargetConstant(0, SL, MVT::i32);
----------------
I'm pretty sure there should be no f32 or FP_ROUND here. This is returning the half format result just expanded into multiple 32-bit registers. They are still half, you need to truncate the int type and bitcast to f16.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4548-4549
+        } else {// We need to unpack the packed data to store.
+          EVT ExtendedT = getExtVT(MemVT);
+          VData = DAG.getNode(ISD::FP_EXTEND, DL, ExtendedT, VData);
+        }
----------------
Same thing as with stores

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.format.d16.ll:1
+;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck -check-prefix=UNPACKED %s
+;RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs | FileCheck -check-prefix=PACKED %s
----------------
These should have a common GCN check prefix to avoid the duplicated -LABEL lines

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.format.d16.ll:5
+
+;UNPACKED-LABEL: {{^}}buffer_load_format_d16_x:
+;UNPACKED: buffer_load_format_d16_x v0, off, s[0:3], 0
----------------
Spaces after ;

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.load.d16.ll:34
+;UNPACKED: tbuffer_load_format_d16_xyzw v[0:3], off, s[0:3],  dfmt:6,  nfmt:1, 0
+;UNPACKED: s_waitcnt
+
----------------
The waitcnts aren't interesting to check. it would be more useful to extract and store the components to see what instructions are required.

https://reviews.llvm.org/D38906