[llvm] [InstCombine] Combine interleaved PHI reduction chains. (PR #143878)
Yingwei Zheng via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 16 05:56:36 PDT 2025
dtcxzyw wrote:
> > > as introduced by the loop vectorizer.
> >
> >
> > I guess it is intended to fulfill the pipeline? Imagine the CPU has multiple ports/pipelines executing the same kind of instructions (load/fadd/fmul).
>
> Generally I agree, but in the cases this patch tries to capture, we don't have loads/stores or other such ops, and the binary ops we have can be collapsed to one. In these cases, I can't see how executing the multiple ops interleaved can be beneficial. For example, consider:
>
> ```llvm
> %pn1 = phi [1.0, %BB1], [%op1, %BB2]
> %pn2 = phi [1.0, %BB1], [%op2, %BB2]
> %op1 = fmul %pn1, 0.9
> %op2 = fmul %pn2, 0.9
> %res = fmul %op1, %op2
> ```
>
> Which can be folded to:
>
> ```llvm
> %pn = phi [1.0, %BB1], [%res, %BB2]
> %res = fmul %pn, 0.81
> ```
>
> Assuming the constants can be materialised similarly, the second version requires strictly fewer instructions to effect the same computations.
>
> Do you see what I mean?
I know. Obviously the combined version is faster. I just wonder if we can avoid introducing this pattern in LoopVectorizer by making some adjustments to cost modeling...
https://github.com/llvm/llvm-project/pull/143878
More information about the llvm-commits
mailing list