[PATCH] D38906: AMDGPU/SI: Implement d16 support for buffer intrinsics
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 6 02:14:40 PST 2017
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPU.td:289
+def FeaturePackedD16VMem : SubtargetFeature<"packed-d16-vmem",
+ "HasPackedD16VMem",
----------------
I think we should invert this, to HasUnpackedD16Mem. It's only the one weird target, the packed layout is the expected one and for every other subtarget.
================
Comment at: lib/Target/AMDGPU/BUFInstructions.td:675
+
+let SubtargetPredicate = NotHasPackedD16VMem in { // NP means data not packed.
+defm BUFFER_LOAD_FORMAT_D16_X_NP : MUBUF_Pseudo_Loads <
----------------
Comment belongs somewhere else. We usually have the true opcode name as all caps, and modifiers like this as lowercase. In this case we should maybe call it _gfx81 instead.
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:549
+
+static EVT getExtVT (EVT VT) {
+ if (VT == MVT::v2f16)
----------------
Since you really want to get the i16 equivalent vector, you can use changeTypeToInteger on the f16 type.
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:589
+ Info.ptrVal = nullptr;
+ Info.align = 0;
+
----------------
I think you can assume the ABI type alignment for these (at least for the scalar type) not that it probably matters
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3268
+static SDValue ChangeResultType(SDValue Op, EVT EquivResultT, SelectionDAG &DAG) {
+ // Change from v4f16/v2f16 to EquivResultT.
----------------
s/ChangeResultType/lowerIntrinsicWChain/
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3274
+ unsigned IID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
+ if (IID == Intrinsic::amdgcn_tbuffer_load) {
+ SDValue Ops[] = {
----------------
switch over IID
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3345
+ if (SDValue Res = ChangeResultType(Op, ToVT, DAG)) {
+ if (!HasPacked) { // From v2f32/v4f32 back to v2f16/v4f16.
+ SDValue RFlag = DAG.getTargetConstant(0, SL, MVT::i32);
----------------
I'm pretty sure there should be no f32 or FP_ROUND here. This is returning the half format result just expanded into multiple 32-bit registers. They are still half, you need to truncate the int type and bitcast to f16.
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4548-4549
+ } else {// We need to unpack the packed data to store.
+ EVT ExtendedT = getExtVT(MemVT);
+ VData = DAG.getNode(ISD::FP_EXTEND, DL, ExtendedT, VData);
+ }
----------------
Same thing as with stores
================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.format.d16.ll:1
+;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck -check-prefix=UNPACKED %s
+;RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs | FileCheck -check-prefix=PACKED %s
----------------
These should have a common GCN check prefix to avoid the duplicated -LABEL lines
================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.format.d16.ll:5
+
+;UNPACKED-LABEL: {{^}}buffer_load_format_d16_x:
+;UNPACKED: buffer_load_format_d16_x v0, off, s[0:3], 0
----------------
Spaces after ;
================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.tbuffer.load.d16.ll:34
+;UNPACKED: tbuffer_load_format_d16_xyzw v[0:3], off, s[0:3], dfmt:6, nfmt:1, 0
+;UNPACKED: s_waitcnt
+
----------------
The waitcnts aren't interesting to check. it would be more useful to extract and store the components to see what instructions are required.
https://reviews.llvm.org/D38906
More information about the llvm-commits
mailing list