[llvm] [InstCombine] Make `(binop ({s|u}itofp),({s|u}itofp))` transform more flexible to mismatched signs (PR #84389)

Fri Mar 8 10:48:59 PST 2024

goldsteinn wrote:

> > > Could you please have a look at [dtcxzyw/llvm-opt-benchmark#336 (comment)](https://github.com/dtcxzyw/llvm-opt-benchmark/pull/336#discussion_r1517575694)?
> > 
> > 
> > So it seems this transform is enabling SLP vectorization in a case where its not profitable:
> > ```
> > ;; Before
> > ; *** IR Dump After SimplifyCFGPass on regress ***
> > ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
> > define float @regress(i16 %inp) local_unnamed_addr #0 {
> >   %r0 = urem i16 %inp, 20
> >   %ui0 = uitofp i16 %r0 to float
> >   %fadd0 = fadd float %ui0, -1.000000e+01
> >   %fdiv0 = fdiv float %fadd0, 0.000000e+00
> >   %ui1 = uitofp i16 %inp to float
> >   %fdiv1 = fdiv float %ui1, 0.000000e+00
> >   %r = fmul float %fdiv1, %fdiv0
> >   ret float %r
> > }
> > ; *** IR Dump After SLPVectorizerPass on regress ***
> > ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
> > define float @regress(i16 %inp) local_unnamed_addr #0 {
> >   %r0 = urem i16 %inp, 20
> >   %ui0 = uitofp i16 %r0 to float
> >   %fadd0 = fadd float %ui0, -1.000000e+01
> >   %fdiv0 = fdiv float %fadd0, 0.000000e+00
> >   %ui1 = uitofp i16 %inp to float
> >   %fdiv1 = fdiv float %ui1, 0.000000e+00
> >   %r = fmul float %fdiv1, %fdiv0
> >   ret float %r
> > }
> > ;; After
> > ; *** IR Dump After SimplifyCFGPass on regress ***
> > ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
> > define float @regress(i16 %inp) local_unnamed_addr #0 {
> >   %r0 = urem i16 %inp, 20
> >   %1 = add nsw i16 %r0, -10
> >   %fadd0 = sitofp i16 %1 to float
> >   %fdiv0 = fdiv float %fadd0, 0.000000e+00
> >   %ui1 = uitofp i16 %inp to float
> >   %fdiv1 = fdiv float %ui1, 0.000000e+00
> >   %r = fmul float %fdiv1, %fdiv0
> >   ret float %r
> > }
> > ; *** IR Dump After SLPVectorizerPass on regress ***
> > ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
> > define float @regress(i16 %inp) local_unnamed_addr #0 {
> >   %r0 = urem i16 %inp, 20
> >   %1 = add nsw i16 %r0, -10
> >   %2 = insertelement <2 x i16> poison, i16 %inp, i32 0
> >   %3 = insertelement <2 x i16> %2, i16 %1, i32 1
> >   %4 = uitofp <2 x i16> %3 to <2 x float>
> >   %5 = sitofp <2 x i16> %3 to <2 x float>
> >   %6 = shufflevector <2 x float> %4, <2 x float> %5, <2 x i32> <i32 0, i32 3>
> >   %7 = fdiv <2 x float> %6, zeroinitializer
> >   %8 = extractelement <2 x float> %7, i32 0
> >   %9 = extractelement <2 x float> %7, i32 1
> >   %r = fmul float %8, %9
> >   ret float %r
> > }
> > ```
It seems without the transform, the `fadd` prevents SLP i.e

```
BEFORE:
Operand 0:
    %ui0 = uitofp i16 %r0 to float
    %ui1 = uitofp i16 %inp to float
Operand 1:
  float -1.000000e+01
  float 0.000000e+00
Scalars: 
    %fadd0 = fadd float %ui0, -1.000000e+01
    %fdiv1 = fdiv float %ui1, 0.000000e+00
State: Vectorize
MainOp:   %fadd0 = fadd float %ui0, -1.000000e+01
AltOp:   %fdiv1 = fdiv float %ui1, 0.000000e+00
VectorizedValue: NULL
ReuseShuffleIndices: Empty
ReorderIndices: 
UserTreeIndices: 
SLP: Costs:
SLP:     ReuseShuffleCost = 0
SLP:     VectorCost = 6
SLP:     ScalarCost = 5
SLP:     ReuseShuffleCost + VecCost - ScalarCost = 1
SLP: Adding cost 1 for bundle n=2 [  %fadd0 = fadd float %ui0, -1.000000e+01, ..].
SLP: Current total cost = 1
SLP: Calculated costs for Tree:

AFTER:
0.
Operand 0:
    %fadd0 = sitofp i16 %1 to float
    %ui1 = uitofp i16 %inp to float
Operand 1:
  float 0.000000e+00
  float 0.000000e+00
Scalars: 
    %fdiv0 = fdiv float %fadd0, 0.000000e+00
    %fdiv1 = fdiv float %ui1, 0.000000e+00
State: Vectorize
MainOp:   %fdiv0 = fdiv float %fadd0, 0.000000e+00
AltOp:   %fdiv0 = fdiv float %fadd0, 0.000000e+00
VectorizedValue: NULL
ReuseShuffleIndices: Empty
ReorderIndices: 
UserTreeIndices: 
SLP: Costs:
SLP:     ReuseShuffleCost = 0
SLP:     VectorCost = 4
SLP:     ScalarCost = 8
SLP:     ReuseShuffleCost + VecCost - ScalarCost = -4
SLP: Adding cost -4 for bundle n=2 [  %fdiv0 = fdiv float %fadd0, 0.000000e+00, ..].
SLP: Current total cost = -4
SLP: Calculated costs for Tree:
```

https://github.com/llvm/llvm-project/pull/84389