[PATCH] D14829: [SLP] Vectorize gather-like idioms ending at non-consecutive loads.

Thu Nov 19 15:28:04 PST 2015

> On Nov 19, 2015, at 12:07 PM, Matthew Simpson <mssimpso at codeaurora.org> wrote:
> 
> mssimpso added a comment.
> 
> Hi Nadav,
> 
> Thanks very much for the quick feedback! I'm happy to consider a different implementation. I have have some questions for you first, though, if you don't mind.
> 
> I don't think the expression trees this patch affects are necessarily limited in size. The requirement is that they be seeded by GEPs (that are used by non-consecutive loads). As far as I know, the trees can be arbitrarily large. The trees in my example are small (gep, zext, sub, load), but this doesn't have to be the case in general. Are you suggesting that in your experience, seeding SLP with these GEPs typically only hits short, unprofitable trees in practice?

Oh, I see. Yes, the pointer argument can be the root of a large tree. I think that the common case is that the pointer is a simple gep.  

The compile time cost is that we need to scan the whole function and search for load instruction. But the opportunity (in terms of performance wins) is probably low. 

One way to show that this approach is profitable is to test the LLVM test suite and look for opportunities where this patch improves the performance of benchmarks.

> 
> I have measured performance on our workloads. The cost estimate for the test case I provided in the patch comes in well below zero and is profitable. Estimating the cost of gathers is difficult, but here we are only optimizing the index calculations. I will admit that the number of programs we care about is limited. It would nice to have some additional data points. Is this something you could help with?

Index calculations are often folded into the addressing part of the scalar instruction. 

> 
> Finally, I'm not familiar with the portion of SelectionDAG you mention, so please excuse the rest of my questions. First, there aren't any stores in my example, so would the ConsecutiveStores optimizations you mention even apply? And lastly, wouldn't we still be subject to the same cost estimates (and imprecision therein) if the implementation was moved elsewhere?

You are right. There are no stores. However, you can implement an optimization that is similar to the ConsecutiveStores optimization that identifies consecutive loads and performs vectorization of the addresses (if it is profitable on your target).  The cost model is more efficient in SelectionDAG because it has access to the TargetMachine. 

-Nadav

> 
> Thanks again!
> 
> 
> http://reviews.llvm.org/D14829
> 
> 
>