[llvm] [LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (PR #76172)

Tue Mar 26 12:25:29 PDT 2024

alexey-bataev wrote:

> There are two potential cases where the canonical IV increment may wrap: if the original TC=BTC+1 wraps, and if rounding-up TC to a multiple of VF*UF as part of tail folding wraps. When using EVL the computation BTC+1 must not wrap because EVL uses TC explicitly, and EVL avoids any rounding-up of TC. So does OpVPEVL+EVLPhi not wrap, regardless of CanonicalIVIncrement?
In any case, emitIterationCountCheck() seems to currently ensure that neither the canonical IV increment nor the EVL increment may wrap, when tail folding with EVL.

Why depend on CanonicalIVIncrement, can simply place the cast and the bump after VPEVL in the header?

The increment is placed after CanonicalIVIncrement, so just want to keep ext close to next evl.

It would be good for EVL to be independent of CanonicalIVIncrement, e.g., allowing the latter to be introduced late in the VPlan-to-VPlan pipeline (see https://github.com/llvm/llvm-project/pull/82270).

Currently it it maybe considered independent, but there are some corner cases in future, where we need to copy it from IV. Kolya already highlighted the issue with the later materialization of loop IV.

https://github.com/llvm/llvm-project/pull/76172