[PATCH] D27919: [Loop Vectorizer] Interleave vs Gather - in some cases Gather is better.

Thu Jan 19 07:18:37 PST 2017

delena added inline comments.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7008-7009
   // the scalar version.
   if (Legal->isUniformAfterVectorization(I))
     VF = 1;

----------------
mssimpso wrote:
> mssimpso wrote:
> > delena wrote:
> > > mssimpso wrote:
> > > > Hi Elena,
> > > > 
> > > > I had been thinking about the use of isUniformAfterVectorization() here in getInstructionCost(). Wouldn't it now be possible for the set of uniforms to differ from the first collection (before VF selection) and the second collection (after VF selection)? So we would choose a VF based on costs assuming an instruction may or may not be uniform. Then we could later reverse our initial decision about the instruction's uniformity after VF selection, making the total cost on which we based our VF decision inaccurate. Or am I missing something? I haven't yet thought through the implications of this in enough detail to know whether this would matter much or not.
> > > About the list of Uniforms. We insert and then remove only GEPs and Induction variables. We do not calculate cost for them anyway. All other Uniform values stay in place. So, the cost is accurate at the end. There is no circular dependency here.
> > I don't think this is true in general. We mark an instruction uniform if all its users are uniform. So for example, if we have a uniform GEP whose index is some computation, that computation is also uniform if it's only used by the GEP. I think we have some examples in induction.ll, but something like this:
> > 
> > ```
> > %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
> > %sum = add i64 %i, %x
> > %idx = getelementptr inbounds float, float* %a, i64 %sum
> > load float, float* %idx, align 4   
> > ```        
> > 
> > The GEP is consecutive, so it will be marked uniform. %sum will aslo be marked uniform because it's only used by the GEP. If we later decide to scalarize the load, the GEP, the IV, and %sum will all no longer be uniform. So the cost for %sum will have been wrong.
> Just a thought - why not recompute and cache the uniforms (and possibly scalars) for each VF we compute costs for? That would avoid any potential logical inconsistencies. I think the compile-time overhead would probably be minimal (and you're already computing these sets twice anyway).
Just talked with Ayal about this. 

I can collect Uniforms after making decision about Load/Store intructions. And the decision is based on cost. The decision affects another instructions inside the loop, as you've pointed before. Theoretically, if I have N variants of representing **all** memory instructions inside the loop, I should examine 2**N combinations per VF.

Ayal proposed the following sequence, which should be done on CM stage, after legality is finished:
Per VF:

  #  Go through all memory insts and make CM decision
  #  Build Uniforms and Scalars per VF (that's what you say now)
  #  Calculate cost for VF,  based on Uniforms and Scalars

It is still not ideal, but, probably better than what we have. 

Repository:
  rL LLVM

https://reviews.llvm.org/D27919