[llvm] [AArch64] Improve codegen for some fixed-width partial reductions (PR #126529)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 13 02:12:10 PST 2025


================
@@ -26,6 +26,66 @@ define <4 x i32> @udot(<4 x i32> %acc, <16 x i8> %u, <16 x i8> %s) {
   ret <4 x i32> %partial.reduce
 }
 
+define <4 x i32> @udot_in_loop(ptr %p1, ptr %p2){
----------------
david-arm wrote:

I think that `AArch64TargetLowering::optimizeExtendOrTruncateConversion` bails out if the instruction does not live in a loop header, i.e.

```
  // Try to optimize conversions using tbl. This requires materializing constant
  // index vectors, which can increase code size and add loads. Skip the
  // transform unless the conversion is in a loop block guaranteed to execute
  // and we are not optimizing for size.
  Function *F = I->getParent()->getParent();
  if (!L || L->getHeader() != I->getParent() || F->hasMinSize() ||
      F->hasOptSize())
    return false;
```

https://github.com/llvm/llvm-project/pull/126529


More information about the llvm-commits mailing list