[PATCH] D135229: [AArch64] Extending lowering of 'trunc <(8|16) x (i16|i64)> %x to <(8|16) x i8>' to use tbl instructions

Tue Nov 22 03:57:42 PST 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13405
+  int NumElements = cast<FixedVectorType>(TI->getType())->getNumElements();
+  auto *SrcTy = cast<FixedVectorType>(TI->getOperand(0)->getType());
+  auto *DstTy = cast<FixedVectorType>(TI->getType());
----------------
Is this guaranteed to be a fixed vector type? Could you add a variant of a test with truncates of scalable vectors (`<vscale x 16 x i8>` or something like that?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13419
+
   SmallVector<Constant *, 16> MaskConst;
+  for (int Itr = 0; Itr < 16; Itr++) {
----------------
It would be great if you could add a brief comment here explaining what kind of masks/shuffles are prepared here.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13444
+  // over the source vector. If TBL's maximum 4 FP/SIMD registers are saturated,
+  // call TBL & store the result in a vector for combining later.
+  SmallVector<Value *> Results;
----------------
store here seems ambiguous here, as we won't emit a store instruction, right?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13489
+    if (ElemsPerTbl < 16) {
+      std::vector<int> FinalMask(ElemsPerTbl);
+      std::iota(FinalMask.begin(), FinalMask.end(), 0);
----------------
SmallVector?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13494
+  } else {
+    std::vector<int> FinalMask(ElemsPerTbl * Results.size());
+    if (ElemsPerTbl < 16) {
----------------
SmallVector?

================
Comment at: llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll:676
+; CHECK-NEXT:  	cmlt	v4.8h, v3.8h, #0
+; CHECK-NEXT:  	tbl	v3.16b, { v4.16b, v5.16b }, v2.16b
+; CHECK-NEXT:  	str	q3, [x0], #32
----------------
Similar to D136722, it is likely not profitable to do this when converting to/from the next power-of-2.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135229/new/

https://reviews.llvm.org/D135229