[PATCH] D53865: [LoopVectorizer] Improve computation of scalarization overhead.

Wed Nov 28 13:10:10 PST 2018

hsaito added a comment.

In D53865#1310934 <https://reviews.llvm.org/D53865#1310934>, @jonpa wrote:

> In D53865#1310026 <https://reviews.llvm.org/D53865#1310026>, @hsaito wrote:
>
> > Sorry, I must have missed this review.
> >
> > VPlan based cost modeling (plus VPlan based code motion) should naturally capture this kind of situation ----- but only to the extent that producer/consumer can reside in the same BB. It's taking a lot longer than I wanted to stabilize (compute exactly the same value as existing cost model in LV).
>
>
> Thanks for taking a look! IIUC, my patch is not useful since VPlan will soon improve this area without it.

Not so quick.
The underlying supporting mechanism is VPReplicateRecipe, in VPlan.h.  The parent of a VP*Recipe is VPBasicBlock. If both use and def belong to the same ReplicateRecipe, things are simple.
Your map based query becomes "do the instructions belong to the same Recipe" query. The question is, of course, can we always do that? If the answer is NO, then, this approach has a hole that need to be filled by some other means.

> I am curious as to how VPlan will accomplish this - will it also add some kind of check with TLI if the instruction will be expanded and propagate this information? Or is there some other way that this may be accomplished?

Recipe is making instruction grouping (within VPBasicBlock) easier to identify. If the code motion across VPBasicBlocks is legal, we want to merge two or more ReplicateRecipes. In ideal cases, both use and def are in the same Recipe, so you don't need a map. You just ask for the cost for the ReplicateRecipe ---- scalar compute * VF for each instruction in the recipe + extract for live-ins + insert for live-outs. In the general case, however, use and def are in different ReplicateRecipes. So, things aren't that simple. For each live-ins, we should check whether live-ins are computed in scalar form and ditto for live-out.

My question back to you is why Scalars is not good enough for your purpose. You get different "scalarlization" answer in collectLoopScalars() and collectTargetScalarized()? If so, that's probably where you want to dig in.

  /// Collect the instructions that are scalar after vectorization. An           
  /// instruction is scalar if it is known to be uniform or will be scalarized   
  /// during vectorization. Non-uniform scalarized instructions will be          
  /// represented by VF values in the vectorized loop, each corresponding to an  
  /// iteration of the original scalar loop.                                     
  void collectLoopScalars(unsigned VF); 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53865/new/

https://reviews.llvm.org/D53865