[PATCH] D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 27 07:58:57 PDT 2018
nhaehnle added a comment.
Thank you for making these changes. I have some detail comments inline and some high-level remarks:
- We need tests for {float, i32}, {<2 x float>, i32}, and {half, i32} return types
- We need tests with dmask = 0 from the very beginning in the intrinsic (there's a case in lowerImage which looks like it may be broken), and a test with dmask != 0 in the intrinsic, but none of the data returns are being used (only the i32 return is used).
- We need tests where the dmask is materially smaller than the return type, e.g. return type {<4 x float>, i32} and popcount(dmask) < 4, and return type {<2 x half>, i32} and popcount(dmask) <= 2.
I also have a high-level concern about the design of lowerImage, because the adjustRetValue feels a bit like spooky action at a distance, plus concern about the last point above about dmask. Have you considered the following possibility:
- Keep ReturnTypes[0] as-is until the code that determines NumVDataDwords.
- In the load case, adjust NumVDataDwords first based on popcount(dmask) and then based on whether LWE/TFE is enabled.
- Synthesize a new ReturnTypes[0] from scratch based on NumVDataDwords
- Then generically extract the data payloads vector from the NewNode, doing cast and adjust-for-unpackedD16 in a common path for both with and without TFE/LWE.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:7
+// License. See LICENSE.TXT for details.
+//
+/// Any MIMG instructions that use tfe or lwe require an initialization of the
----------------
The more common practice seems to be having a separator line here between the license info and the file description.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:20-21
+#include "SIInstrInfo.h"
+#include "Utils/AMDGPULaneDominator.h"
+#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
----------------
I believe you don't need these two includes.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:86-87
+ // variants.
+ if (!TFE && !LWE)
+ continue;
+
----------------
I'd prefer this to be an assertion. It's easy enough to change if we ever do get MIMG instructions without TFE/LWE. In the meantime, assertions allow us to be more explicit about the space of possibilities we actually need to think about.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:108
+ // Abandon attempt if no dmask operand is found.
+ if (!MO_Dmask)
+ continue;
----------------
Same argument here: I'd prefer this to be an assertion.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:117
+ // Subreg indices are counted from 1
+ // When D16 then we want next whole VGPR after write data.
+ bool Packed = !ST.hasUnpackedD16VMem();
----------------
It seems prudent to add a static assertion that `AMDGPU::sub0 == 1 && AMDGPU::sub4 == 5` here.
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4722
+ uint64_t Value = TexFailCtrlConst->getZExtValue();
+ SDLoc DL(TexFailCtrlConst);
+ if (Value) {
----------------
This can be moved lower.
================
Comment at: lib/Transforms/InstCombine/InstCombineInternal.h:803
+ int DmaskIdx = -1,
+ int TFEIdx = -1);
----------------
Should be TFCIdx for consistency.
Repository:
rL LLVM
https://reviews.llvm.org/D48826
More information about the llvm-commits
mailing list