[llvm] Swap UnrollAndJam Pass to before the SLP Vectorizer Pass (PR #97029)

Fri Jul 26 05:52:46 PDT 2024

adprasad-nvidia wrote:

@nikic:
- Running UnrollAndJam was a mistake, I have fixed that. 
- I have added a test in `llvm/test/Transforms/PhaseOrdering/outer-loop-vectorize.ll` that demonstrates how the new code can generate vectorized code from an outer loop IR. The same test with UnrollAndJam in its current position will fail as scalar code is generated.
- I have also followed your suggestion of moving UnrollAndJam earlier to enable greater analysis re-useby moving it to directly after LoopVectorize (after the InferAlignment pass). UnrollAndJam under `!IsFullLTO` now runs in the same place as UnrollAndJam under `IsFullLTO` (Doing this also required another SimplifyCFG pass to be run before UnrollAndJam for outer loop vectorization to trigger.) **This seems to fix the compile time regressions versus ToT with unroll-and-jam enabled.**

@tschuett: As @sjoerdmeijer said, the numbers do not regress existing benchmarks. The new test demonstrates that the code is being vectorized.

https://github.com/llvm/llvm-project/pull/97029