[PATCH] D136722: [AArch64] Extending lowering of 'zext <Y x i8> %x to <Y x i8X>' to use tbl instructions

Mon Nov 21 13:53:31 PST 2022

nilanjana_basu added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13703
   auto *UIToFP = dyn_cast<UIToFPInst>(I);
-  if (UIToFP &&
-      (SrcTy->getNumElements() == 8 || SrcTy->getNumElements() == 16) &&
-      SrcTy->getElementType()->isIntegerTy(8) &&
+  if (UIToFP && SrcTy->getElementType()->isIntegerTy(8) &&
       DstTy->getElementType()->isFloatTy()) {
----------------
This conversion shows a regression in performance for some cases where there are multiple similar zext instructions present back to back. The generated code with the previous implementation could be folded into a more optimized set of instructions, which is not possible with 'tbl' instructions. One example is 16xi8->16xi64, where I find an increase in the number of instructions after being lowered to tbl on using a loop interleave count of 4, i.e. with 4 back to back zext instructions.

Is it better to rule out this case in this 'if' block or should we not allow tbl-lowering when there are multiple zext instructions of the same type present back to back?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136722/new/

https://reviews.llvm.org/D136722