[PATCH] D142782: [AMDGPU] Add basic support for extended i8 perm matching

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 31 15:58:42 PST 2023


arsenm added a comment.

The ratio of test to code changes has me worried the test coverage is incomplete



================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9766
+/// value of the byte is either constant zero or comes from memory.
+struct ByteProvider {
+  // For constant zero providers Load is set to nullptr. For memory providers
----------------
arsenm wrote:
> Can you keep this as a generic utility?
By generic utility I mean in generic code and used by the load combine as well


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:9811
+  // We may need to recursively traverse a series of SRLs
+  if (Depth >= 5)
+    return std::nullopt;
----------------
6 is the one true recursion depth limit


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10059
+    // be extracted. We should simply not combine.
+    SmallVector<unsigned, 4> BytewiseOps = {ISD::SINT_TO_FP, ISD::UINT_TO_FP};
+
----------------
Don't need a small vector, can just directly iterate an initializer list?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10062
+    bool IsCombineExtracted = false;
+    for (auto OrUse : N->uses()) {
+      // Only special case bitcast to vectors
----------------
Is the multiple use case covered in the tests?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10125
+        std::optional<ByteProvider> P =
+            calculateByteProvider(SDValue(N, 0), i, 0, /*StartingIndex*/ i);
+        // TODO support constantZero
----------------
StartingIndex=


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10127
+        // TODO support constantZero
+        if (!P.has_value() || P->isConstantZero())
+          return SDValue();
----------------
!P


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10165
+          AMDGPUISD::PERM, DL, MVT::i32,
+          SDValue(const_cast<SDNode *>(PermNodes[FirstSrc].Src), 0),
+          SecondSrc == -1
----------------
Don't understand why you need the const_cast. Also, why isn't this an SDValue to begin with? Assuming output 0 can be risky


================
Comment at: llvm/test/CodeGen/AMDGPU/permute_i8.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1010 -opaque-pointers -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX10
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx908  -opaque-pointers -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9
----------------
Drop -opaque-pointers (also direction doesn't make sense with the test contents)


================
Comment at: llvm/test/CodeGen/AMDGPU/permute_i8.ll:6
+
+define hidden void @shuffle6766(<4 x i8>* %in0, <4 x i8>* %in1, <4 x i8>* %out0) {
+; GFX10-LABEL: shuffle6766:
----------------
Need to use typed pointers, also should prefer global loads to flat


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142782/new/

https://reviews.llvm.org/D142782



More information about the llvm-commits mailing list