[PATCH] D135229: [AArch64] Extending lowering of 'trunc <(8|16) x (i16|i64)> %x to <(8|16) x i8>' to use tbl instructions
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 22 03:57:42 PST 2022
fhahn added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13405
+ int NumElements = cast<FixedVectorType>(TI->getType())->getNumElements();
+ auto *SrcTy = cast<FixedVectorType>(TI->getOperand(0)->getType());
+ auto *DstTy = cast<FixedVectorType>(TI->getType());
----------------
Is this guaranteed to be a fixed vector type? Could you add a variant of a test with truncates of scalable vectors (`<vscale x 16 x i8>` or something like that?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13419
+
SmallVector<Constant *, 16> MaskConst;
+ for (int Itr = 0; Itr < 16; Itr++) {
----------------
It would be great if you could add a brief comment here explaining what kind of masks/shuffles are prepared here.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13444
+ // over the source vector. If TBL's maximum 4 FP/SIMD registers are saturated,
+ // call TBL & store the result in a vector for combining later.
+ SmallVector<Value *> Results;
----------------
store here seems ambiguous here, as we won't emit a store instruction, right?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13489
+ if (ElemsPerTbl < 16) {
+ std::vector<int> FinalMask(ElemsPerTbl);
+ std::iota(FinalMask.begin(), FinalMask.end(), 0);
----------------
SmallVector?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13494
+ } else {
+ std::vector<int> FinalMask(ElemsPerTbl * Results.size());
+ if (ElemsPerTbl < 16) {
----------------
SmallVector?
================
Comment at: llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll:676
+; CHECK-NEXT: cmlt v4.8h, v3.8h, #0
+; CHECK-NEXT: tbl v3.16b, { v4.16b, v5.16b }, v2.16b
+; CHECK-NEXT: str q3, [x0], #32
----------------
Similar to D136722, it is likely not profitable to do this when converting to/from the next power-of-2.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135229/new/
https://reviews.llvm.org/D135229
More information about the llvm-commits
mailing list