[PATCH] D39346: [LV] [ScalarEvolution] Fix PR34965 - Cache pointer stride information before LV code gen

Tue Nov 28 03:27:45 PST 2017

Ayal added a comment.

Diego,

Nice catch! Continuing to use SCEV and expect consistent answers in the midst of restructuring the IR is indeed wrong; its cache is not meant to cover for cases that can no longer be analyzed as before.

We’d like to add few points one can learn from this issue, as you are now to support and develop VPlan going forward.

- LV already records consecutivity of pointers, or rather the decision to widen the corresponding loads and stores, using `CM_Widen`. The only thing that seems to be missing, which required calling SCEV again when generating vector code, is to figure out if the stride is +1 or -1. This can be fixed by keeping `CM_Widen` for stride +1 and using, say, `CM_Widen_Reverse` for stride -1.
- So, what change did VPlan’s patch do that triggered this issue? It refactored `collectUniformsAndScalars()` and `collectInstsToScalarize()` away from `selectVectorizationFactor()`, in order to facilitate building VPlans first and deciding on the best VF/VPlan later. It is still possible btw to move `selectVectorizationFactor()` back and embed these collect methods within `expectedCost()` as they worked before, to see their effect on the reproducer.
- How does this refactoring cause the observed issue? Collecting uniforms fills SCEV cache entries (specifically `setCostBasedWideningDecision()`). Computing the cost to eventually select the best VF may invalidate certain cache entries when called the first time for VF = 1 (specifically `getBackedgeTakenInfo()`). That’s why first collecting uniforms for all VF’s, and then computing costs for all VF’s may leave the cache invalid, compared to interleaving the two for each VF – when more than VF = 1 is involved.
- How come `expectedCost()` which eventually calls `isUniform()` after `collectUniformsAndScalars()` was invoked, causes any trouble (let alone invalidate SCEV cache entries) – haven’t we recorded its result already? Well, `getInstructionCost()` does consult the collected `isUniformAfterVectorization()` on each instruction, but later calls `Legal->isUniform(Op2)` on an operand – note that Op2 may reside outside the vectorized loop and thus not collected. Indeed the Op2 checked in the reproducer is outside the vectorized loop, and `Legal->isUniform(Op2)` calls `LAI->isUniform()` which in turn triggers SCEV’s `isLoopInvariant()`. Computing Op2’s AddRec SCEV triggers `getBackedgeTakenInfo()` of the enclosing loop, which invalidates the 1st-order-recurring AddRec inside the vectorized loop. `LAI->isUniform()` could simply return true for values outside `TheLoop`, and/or `getInstructionCost()` could check if Op2 is outside the vectorized loop or in the per-VF collected `isUniformAfterVectorization()` instead of calling `Legal->isUniform(Op2)`.
- Despite retaining the original scalar loop intact as remainder/default loop, information concerning it may be lost. Namely, values inside it which SCEV could originally analyze may now require flow-sensitive analysis (which, btw, //Polyhedral Value & Memory Analysis// may handle – see Johannes’s talk). This is inherent to LV, irrespective of VPlan, and may affect plans to further optimize this loop such as vectorizing it. One tentative solution to the reported case is to subtract //step// from the current value of an IV, instead of rewiring phi’s to access its value from the previous iteration (1st-order recurring AddRec). Worth leaving a TODO note.

Bearing in mind that all VPlan patches so far deliberately aimed to minimize functional changes, such intriguing and esoteric issues hopefully emphasize the importance of changing LV gradually and carefully, and the benefit of LLVM’s vast user-base and profound QA (e.g., PR35122 ;-).

Good Luck!
Gil and Ayal.

https://reviews.llvm.org/D39346