[PATCH] D133491: [AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4.

Thu Sep 15 01:42:52 PDT 2022

t.p.northover added a comment.

At first glance this seems like a hyper-specific optimization, I take it there's some reasonably common idiom that motivates us even bothering?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10702-10703
+  SDValue Mask2 = Tbl2->getOperand(3);
+  // Make sure the tbl2 mask only selects values in the first 8 lanes (i.e. the
+  // last 8 lanes all have an index of -1).
+  auto IsLowerExtractMask = [](SDValue Mask) {
----------------
Why do we care about this? It looks like we've already checked that lanes being filled by this check are discarded by the shuffle.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10707
+      return false;
+    for (unsigned I = 8; I < 16; I++) {
+      auto *C = dyn_cast<ConstantSDNode>(Mask->getOperand(I));
----------------
Won't this overflow if it's a `tbl2` produding an `<8 x i8>`?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10716
+    return SDValue();
+  SmallVector<SDValue, 16> TBLMaskParts(16, Mask1->getOperand(0));
+  for (unsigned I = 0; I < 8; I++) {
----------------
Maybe default fill with `SDValue()`? We just overwrite all of them immediately afterwards anyway so that'd signal early that the reader doesn't have to care about this line.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10719
+    TBLMaskParts[I] = Mask1->getOperand(I);
+    auto *C = cast<ConstantSDNode>(Mask2->getOperand(I));
+    TBLMaskParts[I + 8] = DAG.getConstant(C->getSExtValue() + 32, dl, MVT::i32);
----------------
Have we checked anywhere that the lower 8 operands are actually constant?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133491/new/

https://reviews.llvm.org/D133491