[PATCH] D142782: [AMDGPU] Add basic support for extended i8 perm matching
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 31 15:58:42 PST 2023
arsenm added a comment.
The ratio of test to code changes has me worried the test coverage is incomplete
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9766
+/// value of the byte is either constant zero or comes from memory.
+struct ByteProvider {
+ // For constant zero providers Load is set to nullptr. For memory providers
----------------
arsenm wrote:
> Can you keep this as a generic utility?
By generic utility I mean in generic code and used by the load combine as well
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9811
+ // We may need to recursively traverse a series of SRLs
+ if (Depth >= 5)
+ return std::nullopt;
----------------
6 is the one true recursion depth limit
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10059
+ // be extracted. We should simply not combine.
+ SmallVector<unsigned, 4> BytewiseOps = {ISD::SINT_TO_FP, ISD::UINT_TO_FP};
+
----------------
Don't need a small vector, can just directly iterate an initializer list?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10062
+ bool IsCombineExtracted = false;
+ for (auto OrUse : N->uses()) {
+ // Only special case bitcast to vectors
----------------
Is the multiple use case covered in the tests?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10125
+ std::optional<ByteProvider> P =
+ calculateByteProvider(SDValue(N, 0), i, 0, /*StartingIndex*/ i);
+ // TODO support constantZero
----------------
StartingIndex=
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10127
+ // TODO support constantZero
+ if (!P.has_value() || P->isConstantZero())
+ return SDValue();
----------------
!P
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10165
+ AMDGPUISD::PERM, DL, MVT::i32,
+ SDValue(const_cast<SDNode *>(PermNodes[FirstSrc].Src), 0),
+ SecondSrc == -1
----------------
Don't understand why you need the const_cast. Also, why isn't this an SDValue to begin with? Assuming output 0 can be risky
================
Comment at: llvm/test/CodeGen/AMDGPU/permute_i8.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1010 -opaque-pointers -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX10
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx908 -opaque-pointers -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9
----------------
Drop -opaque-pointers (also direction doesn't make sense with the test contents)
================
Comment at: llvm/test/CodeGen/AMDGPU/permute_i8.ll:6
+
+define hidden void @shuffle6766(<4 x i8>* %in0, <4 x i8>* %in1, <4 x i8>* %out0) {
+; GFX10-LABEL: shuffle6766:
----------------
Need to use typed pointers, also should prefer global loads to flat
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142782/new/
https://reviews.llvm.org/D142782
More information about the llvm-commits
mailing list