[PATCH] D135229: [AArch64] Extending lowering of 'trunc <(8|16) x i64> %x to <(8|16) x i8>' to use tbl instructions

Wed Nov 23 02:45:51 PST 2022

nilanjana_basu marked 3 inline comments as done.
nilanjana_basu added a comment.

Removed (8|16)xi16 to (8|16)xi8 conversion because it wasn't showing benefits in instruction count, & additionally adding more instructions to the header. Updated comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13405
+  int NumElements = cast<FixedVectorType>(TI->getType())->getNumElements();
+  auto *SrcTy = cast<FixedVectorType>(TI->getOperand(0)->getType());
+  auto *DstTy = cast<FixedVectorType>(TI->getType());
----------------
fhahn wrote:
> fhahn wrote:
> > Is this guaranteed to be a fixed vector type? Could you add a variant of a test with truncates of scalable vectors (`<vscale x 16 x i8>` or something like that?
> I think it should be fine, I added a test in 4783345426da
Since the source & destination types were checked for FixedVector once in the calling function optimizeExtendOrTruncateConversion(), I didn't check it here again. 

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13405
+  int NumElements = cast<FixedVectorType>(TI->getType())->getNumElements();
+  auto *SrcTy = cast<FixedVectorType>(TI->getOperand(0)->getType());
+  auto *DstTy = cast<FixedVectorType>(TI->getType());
----------------
nilanjana_basu wrote:
> fhahn wrote:
> > fhahn wrote:
> > > Is this guaranteed to be a fixed vector type? Could you add a variant of a test with truncates of scalable vectors (`<vscale x 16 x i8>` or something like that?
> > I think it should be fine, I added a test in 4783345426da
> Since the source & destination types were checked for FixedVector once in the calling function optimizeExtendOrTruncateConversion(), I didn't check it here again. 
Since both the zext & trunc test for scalable vector goes through the optimizeExtendOrTruncateConversion() function, the zext test in [[ https://reviews.llvm.org/rG4783345426da9369a7c107acba8e743be70bda37 | 4783345426da ]] should suffice for the trunc too. Let me know if you think it needs to be replicated for trunc vector too.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13444
+  // over the source vector. If TBL's maximum 4 FP/SIMD registers are saturated,
+  // call TBL & store the result in a vector for combining later.
+  SmallVector<Value *> Results;
----------------
fhahn wrote:
> store here seems ambiguous here, as we won't emit a store instruction, right?
I replaced the "store" with "save" to indicate that it is being stored in the compiler's internal vector data structure. Added a comment at the place of combining these results. 

================
Comment at: llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll:676
+; CHECK-NEXT:  	cmlt	v4.8h, v3.8h, #0
+; CHECK-NEXT:  	tbl	v3.16b, { v4.16b, v5.16b }, v2.16b
+; CHECK-NEXT:  	str	q3, [x0], #32
----------------
fhahn wrote:
> Similar to D136722, it is likely not profitable to do this when converting to/from the next power-of-2.
 Since we only support a handful of vector type truncates in this implementation, only  Yxi16->Yxi8 was a supported next power-of-2 conversion. Removed it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135229/new/

https://reviews.llvm.org/D135229