[PATCH] D136722: [AArch64] Extending lowering of 'zext <Y x i8> %x to <Y x i8X>' to use tbl instructions

Wed Nov 9 03:41:14 PST 2022

dmgreen added a comment.

I know it's not your problem, but the code in optimizeExtendOrTruncateConversion doesn't feel like it is in the best place, to be honest. CGP has always described itself as a hack, but we shouldn't be hacking things that much. There will be some obvious cases where the extend/trunc can be optimized but the tbl blocks it.
As far as I understand, the code is only in CGP because it is trying limit the transforms to loops. I'm wondering if it would be better to add some sort of flag into ISel so that combines could tell that the current block is a loop, and behave differently because of it.

================
Comment at: llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll:444
+; CHECK-NEXT: 	add	x8, x8, #16
+; CHECK-NEXT: 	tbl	v4.16b, { v4.16b }, v2.16b
+; CHECK-NEXT: 	tbl	v5.16b, { v5.16b }, v2.16b
----------------
I think this is worse, I'm afraid. We only want to use tbl if it would replace two instructions (it performs two truncate/zext steps). Otherwise we are just adding instructions to the loop header (and using more registers) for no gain.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136722/new/

https://reviews.llvm.org/D136722