[PATCH] D136722: [AArch64] Extending lowering of 'zext <Y x i8> %x to <Y x i8X>' to use tbl instructions
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 9 03:41:14 PST 2022
dmgreen added a comment.
I know it's not your problem, but the code in optimizeExtendOrTruncateConversion doesn't feel like it is in the best place, to be honest. CGP has always described itself as a hack, but we shouldn't be hacking things that much. There will be some obvious cases where the extend/trunc can be optimized but the tbl blocks it.
As far as I understand, the code is only in CGP because it is trying limit the transforms to loops. I'm wondering if it would be better to add some sort of flag into ISel so that combines could tell that the current block is a loop, and behave differently because of it.
================
Comment at: llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll:444
+; CHECK-NEXT: add x8, x8, #16
+; CHECK-NEXT: tbl v4.16b, { v4.16b }, v2.16b
+; CHECK-NEXT: tbl v5.16b, { v5.16b }, v2.16b
----------------
I think this is worse, I'm afraid. We only want to use tbl if it would replace two instructions (it performs two truncate/zext steps). Otherwise we are just adding instructions to the loop header (and using more registers) for no gain.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136722/new/
https://reviews.llvm.org/D136722
More information about the llvm-commits
mailing list