[llvm] [InstCombine] Combine interleaved PHI reduction chains. (PR #143878)

Mon Jun 16 05:56:36 PDT 2025

dtcxzyw wrote:

> > > as introduced by the loop vectorizer.
> > 
> > 
> > I guess it is intended to fulfill the pipeline? Imagine the CPU has multiple ports/pipelines executing the same kind of instructions (load/fadd/fmul).
> 
> Generally I agree, but in the cases this patch tries to capture, we don't have loads/stores or other such ops, and the binary ops we have can be collapsed to one. In these cases, I can't see how executing the multiple ops interleaved can be beneficial. For example, consider:
> 
> ```llvm
>   %pn1 = phi [1.0, %BB1], [%op1, %BB2]
>   %pn2 = phi [1.0, %BB1], [%op2, %BB2]
>   %op1 = fmul %pn1, 0.9
>   %op2 = fmul %pn2, 0.9
>   %res = fmul %op1, %op2
> ```
> 
> Which can be folded to:
> 
> ```llvm
>   %pn = phi [1.0, %BB1], [%res, %BB2]
>   %res = fmul %pn, 0.81
> ```
> 
> Assuming the constants can be materialised similarly, the second version requires strictly fewer instructions to effect the same computations.
> 
> Do you see what I mean?

I know. Obviously the combined version is faster. I just wonder if we can avoid introducing this pattern in LoopVectorizer by making some adjustments to cost modeling...

https://github.com/llvm/llvm-project/pull/143878