[PATCH] D80236: [VectorCombine] position pass after SLP in the optimization pipeline rather than before

Mon May 25 08:33:43 PDT 2020

bjope added a comment.

Just for info, downstream this caused some benchmark regressions for out OOT target.

Here is my analysis:

1. We've got some additions in the LoopVectorizer that inserts some more guards for taking the vector/scalar path.
2. Those branch conditions might be loop invariant, and might look the same for several loops in a nest.
3. When running EarlyCSE after LoopVectorizer (and before CFGSimplification) those branch conditions are CSE:d, so we use the same condition in branches, not the same subexpressions.
4. I got big diffs after CFGSimplification depending on if I remove EarlyCSE or not. And I guess the CFG simplification benefits from seeing that branches are using the same condition (it depends on code being CSE:d rather than comparing subexpressions).

I can mention that we use ExtraVectorizerPasses in that test, which runs LICM/CFGSimplification etc an extra time before SLP. And I haven't checked if we got the same problem with the regular CFGSimplification.

I'll probably just add an extra run of EarlyCSE among the ExtraVectorizerPasses to solve this downstream.

Given that our downstream additions in LoopVectorizer seem to be involved somehow, this could be more common scenario for us compared to the upstream code base. We could probably handcraft some lit-test to show this, but I guess that wouldn't justify adding back EarlyCSE in the pipe, if it isn't seen in benchmarks being executed on the upstream code base.

Just wanted to let you know about a potential scenario where EarlyCSE makes a difference here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80236/new/

https://reviews.llvm.org/D80236