[llvm] [AArch64] Avoid vector Ext in case by-element operation variant apply for all elements (PR #140733)

Mon Jun 2 23:54:59 PDT 2025

davemgreen wrote:

> > From what I remember this was intended to promote more generation of smull/umull instructions. It is probably worth making sure that keeps happening, especially for v2i64 types where the mul would otherwise be scalarized.
> 
> Any test for that case? @davemgreen

Hi - I don't have anything more real-world to hand, but adding sext to one of your examples showed what I meant. We can generate a smull if both the operands have extends, or if there are enough known-bits. Folding away the trunc(dup) might be enough to get it working OK again.
```
define <4 x i32> @ext_shuffle_v4i16_v4i32(<4 x i16> %l, <4 x i32> %as, <4 x i32> %b) {
  %a = sext <4 x i16> %as to <4 x i32>
  %lanes = sext <4 x i16> %l to <4 x i32>
  %shf0 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> zeroinitializer
  %mul0 = mul <4 x i32> %shf0, %a
  %add0 = add <4 x i32> %mul0, %b
  %shf1 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
  %mul1 = mul <4 x i32> %shf1, %a
  %add1 = add <4 x i32> %mul1, %b
  %shf2 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
  %mul2 = mul <4 x i32> %shf2, %a
  %add2 = add <4 x i32> %mul2, %b
  %shf3 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
  %mul3 = mul <4 x i32> %shf3, %a
  %add3 = add <4 x i32> %mul3, %b
  %sub1 = sub <4 x i32> %add0, %add1
  %sub2 = sub <4 x i32> %add2, %add3
  %sub3 = sub <4 x i32> %sub1, %sub2
  ret <4 x i32> %sub3
}
```

https://github.com/llvm/llvm-project/pull/140733