[llvm] [AArch64] Prevent generating tbl instruction instead of smull (PR #106375)

Wed Sep 4 02:47:31 PDT 2024

================
@@ -16795,6 +16795,16 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
 
       DstTy = TruncDstType;
     }
+
+    // mul(zext(i8), sext) can be transformed into smull(zext, sext) when
+    // destination type is at least SrcWidth * 4, which is faster than using tbl
+    // instructions
----------------
fhahn wrote:

Right, I guess the main point is that the `tbl` lowering is profitable if the remaining extends require at least 2 more steps, so if DstWidth is 64, the remaining extends would be from i8 -> i32, for which `tbl` for which tbl would still be profitable?

It's a moot point because we don't emit `tbl` for extends to i64 (don't remember why there's that restriction), but would be good to have consistent reasoning across the function.

This actually reminded me that support for skipping `tbl` for various widening instructions was added in e97b8a7e3fb9d4bd270bb25bac9777d86dcbdaf3 with the same reasoning as mentioned above. 

Is it possible that AArch64TTI::isWideningInstruction does not recognize `smull` and hence the general logic doesn't trigger?

https://github.com/llvm/llvm-project/pull/106375