[PATCH] D25632: [LV] Sink scalar operands of predicated instructions

Tue Oct 18 09:05:13 PDT 2016

mssimpso added a comment.

Michael/Gil,

Thanks very much for the feedback and discussion! I'll go ahead and work on updating the patch according to Michael's suggestions.

In https://reviews.llvm.org/D25632#571664, @mkuper wrote:

> That makes sense, especially the cost angle.
>
> On the other hand, my worry is that long-term, as this kind of thing accumulates, we may end up with an ad-hoc "organically grown" mini-optimizer inside the vectorizer. And I'd really like to try to avoid that. The fact this sinking patch requires https://reviews.llvm.org/D25631 to perform DCE inside the vectorizer only reinforces that concern.

Yes, I agree a mini-optimizer inside the vectorizer is not something we want. But to clarify things somewhat, we're not really optimizing existing code as much as we are teaching the vectorizer to not emit poor code, or to "knowingly generate junk", to paraphrase Gil. In https://reviews.llvm.org/D25631, we're not actually performing DCE. In fact, by not emitting would-be dead code, we're preventing a later DCE pass from having to work as much.

We certainly shouldn't try and reinvent the wheel when it isn't necessary. If a standard pass in the pipeline can clean up after the vectorizer (and the compile time penalty from holding on to and churning over the sub-optimal code we generate is not a concern), we should let it do so, in favor of simplicity in the vectorizer. We don't have a pass that does this though (in the limited sense of the predicated instructions), and there's value in letting the cost model guide the future scalarization/sinking choices we can make.

In https://reviews.llvm.org/D25632#572696, @gilr wrote:

> It's not just the question of whether a later pass will optimize but also of how to cost-model later hypothetical optimizations.

This is definitely true, but it can actually get a little worse! Later passes can use their own cost model computed over the "junk" the vectorizer generates. For example, in the LTO pipeline, loop unrolling is run after the vectorizer, but before InstCombine cleans up the code. We should probably fix this.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4279
+  // algorithm ends after one pass through the worklist doesn't sink a single
+  // instruction. We index the worklist to avoid iterator invalidations.
+  bool Changed;
----------------
mkuper wrote:
> I think the standard way to do this is to run "while (!WorkList.empty()))" and keep a side map of nodes you're done with so you will not add to the worklist. This avoids both the iterator invalidation problem, and recognizing the fixed-point problem.
> It may make things more complicated in this case, though.
Sure, I'll restructure the loop. And you're right, removing from the middle of SetVector is not constant-time, so we should avoid that. Thanks!

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4304
+      // remove the instruction from the worklist, and then add its operands.
+      I->moveBefore(PredBB->getFirstNonPHI());
+      Worklist.remove(I);
----------------
mkuper wrote:
> PredBB->getFirstInsertionPt() (which is essentially the same thing, but nicer. :-) )
Sounds good!

https://reviews.llvm.org/D25632