[llvm] [AArch64] Avoid vector Ext in case by-element operation variant apply for all elements (PR #140733)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 2 23:54:59 PDT 2025
davemgreen wrote:
> > From what I remember this was intended to promote more generation of smull/umull instructions. It is probably worth making sure that keeps happening, especially for v2i64 types where the mul would otherwise be scalarized.
>
> Any test for that case? @davemgreen
Hi - I don't have anything more real-world to hand, but adding sext to one of your examples showed what I meant. We can generate a smull if both the operands have extends, or if there are enough known-bits. Folding away the trunc(dup) might be enough to get it working OK again.
```
define <4 x i32> @ext_shuffle_v4i16_v4i32(<4 x i16> %l, <4 x i32> %as, <4 x i32> %b) {
%a = sext <4 x i16> %as to <4 x i32>
%lanes = sext <4 x i16> %l to <4 x i32>
%shf0 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> zeroinitializer
%mul0 = mul <4 x i32> %shf0, %a
%add0 = add <4 x i32> %mul0, %b
%shf1 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%mul1 = mul <4 x i32> %shf1, %a
%add1 = add <4 x i32> %mul1, %b
%shf2 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
%mul2 = mul <4 x i32> %shf2, %a
%add2 = add <4 x i32> %mul2, %b
%shf3 = shufflevector <4 x i32> %lanes, <4 x i32> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%mul3 = mul <4 x i32> %shf3, %a
%add3 = add <4 x i32> %mul3, %b
%sub1 = sub <4 x i32> %add0, %add1
%sub2 = sub <4 x i32> %add2, %add3
%sub3 = sub <4 x i32> %sub1, %sub2
ret <4 x i32> %sub3
}
```
https://github.com/llvm/llvm-project/pull/140733
More information about the llvm-commits
mailing list