[PATCH] D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics

Sat Oct 27 07:58:57 PDT 2018

nhaehnle added a comment.

Thank you for making these changes. I have some detail comments inline and some high-level remarks:

- We need tests for {float, i32}, {<2 x float>, i32}, and {half, i32} return types
- We need tests with dmask = 0 from the very beginning in the intrinsic (there's a case in lowerImage which looks like it may be broken), and a test with dmask != 0 in the intrinsic, but none of the data returns are being used (only the i32 return is used).
- We need tests where the dmask is materially smaller than the return type, e.g. return type {<4 x float>, i32} and popcount(dmask) < 4, and return type {<2 x half>, i32} and popcount(dmask) <= 2.

I also have a high-level concern about the design of lowerImage, because the adjustRetValue feels a bit like spooky action at a distance, plus concern about the last point above about dmask. Have you considered the following possibility:

- Keep ReturnTypes[0] as-is until the code that determines NumVDataDwords.
- In the load case, adjust NumVDataDwords first based on popcount(dmask) and then based on whether LWE/TFE is enabled.
- Synthesize a new ReturnTypes[0] from scratch based on NumVDataDwords
- Then generically extract the data payloads vector from the NewNode, doing cast and adjust-for-unpackedD16 in a common path for both with and without TFE/LWE.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:7
+// License. See LICENSE.TXT for details.
+//
+/// Any MIMG instructions that use tfe or lwe require an initialization of the
----------------
The more common practice seems to be having a separator line here between the license info and the file description.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:20-21
+#include "SIInstrInfo.h"
+#include "Utils/AMDGPULaneDominator.h"
+#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
----------------
I believe you don't need these two includes.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:86-87
+        // variants.
+        if (!TFE && !LWE)
+          continue;
+
----------------
I'd prefer this to be an assertion. It's easy enough to change if we ever do get MIMG instructions without TFE/LWE. In the meantime, assertions allow us to be more explicit about the space of possibilities we actually need to think about.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:108
+          // Abandon attempt if no dmask operand is found.
+          if (!MO_Dmask)
+            continue;
----------------
Same argument here: I'd prefer this to be an assertion.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:117
+          // Subreg indices are counted from 1
+          // When D16 then we want next whole VGPR after write data.
+          bool Packed = !ST.hasUnpackedD16VMem();
----------------
It seems prudent to add a static assertion that `AMDGPU::sub0 == 1 && AMDGPU::sub4 == 5` here.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4722
+  uint64_t Value = TexFailCtrlConst->getZExtValue();
+  SDLoc DL(TexFailCtrlConst);
+  if (Value) {
----------------
This can be moved lower.

================
Comment at: lib/Transforms/InstCombine/InstCombineInternal.h:803
+                                               int DmaskIdx = -1,
+                                               int TFEIdx = -1);

----------------
Should be TFCIdx for consistency.

Repository:
  rL LLVM

https://reviews.llvm.org/D48826