[PATCH] D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics

Fri Jul 27 07:14:10 PDT 2018

nhaehnle added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:7799
+  unsigned LWEIdx = DmaskIdx + 6;
+  unsigned UsesTFC = (Node->getConstantOperandVal(TFEIdx) ||
+                      Node->getConstantOperandVal(LWEIdx)) ? 1 : 0;
----------------
This should be a bool.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2780
+  // Verify MIMG
+  if (isMIMG(MI.getOpcode()) && !get(MI.getOpcode()).mayStore()) {
+    // Ensure that the return type used is large enough for all the options
----------------
Can also use `!MI.mayStore()`

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2793-2798
+      if ((LWE && LWE->getImm()) || (TFE && TFE->getImm()))
+        RegCount += IsD16 ? 2 : 1;
+
+      // Adjust for D16 variants
+      bool Packed = !ST.hasUnpackedD16VMem();
+      if (IsD16 && Packed) RegCount = (RegCount + 1) >> 1;
----------------
It's a minor thing, but I think it would be easier to follow to first divide the RegCount for D16 based on `(D16 && D16->getImm() && !ST.hasUnpackedD16VMem())`, and then increment that for LWE || TFE afterwards.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll:15-19
+; GCN-LABEL: {{^}}load_1d_tfe:
+; GCN: v_mov_b32_e32 v0, 0
+; GCN: image_load v[0:7], v5, s[0:7] dmask:0xf unorm tfe{{$}}
+; NOPRT: v_mov_b32_e32 v4, 0
+; NOPRT: image_load v[0:7], v0, s[0:7] dmask:0xf unorm tfe{{$}}
----------------
This doesn't test everything we'd want to test here, which is that in the NOPRT case, you don't have initializations of v0-v3.

I'd suggest adding STRICTPRT and NONSTRICTPRT check prefixes, and adding GCN to the new run line as well, so you don't need to duplicate the image_load line, and you can use NONSTRICTPRT-NOT lines.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll:21
+; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_1:
+; GCN: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x1 tfe{{$}}
+define amdgpu_ps <2 x float> @sample_1d_tfe_adjust_writemask_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
----------------
How about checks for initialization here?

Repository:
  rL LLVM

https://reviews.llvm.org/D48826