[PATCH] Fix PR19657 : SLP vectorization doesn't combine scalar load to vector loads

Mon May 19 11:20:29 PDT 2014

Hi Eric, 

On May 16, 2014, at 6:21 PM, Eric Christopher <echristo at gmail.com> wrote:

> Any reason we don't want to check opt level when running to see how
> aggressive we should be and how much we should spend compile time?

I like your idea of adding a flag to control the performance/compile time tradeoff.  One place that could benefit from this flag is the part of the code where we sort the store instructions. In order to reduce the compile time we place the sorts in small buckets and sort them individually. Increasing the size of the buckets can allow more vectorization opportunities. Another place is the consecutive memory address tests where we could throw in a few more checks. 

My main concern with the patch in this thread is that it duplicates a ton of code. The most complicated part of the SLP-vectorizer is the recursion that scans the tree and duplicating that code just to change a few lines will make it unmaintainable. I also suspect that this is not the correct approach to solving this problem, but I must admit that I did not look at the problem carefully. 

Moving forward I would like to see us do a better job on swizzeling loads. I think that adding support for reverse loads would be easy to do and allow the vectorization of many more patterns. 

Thanks,
Nadav

> Is
> there some other approach you'd like to get optimizations like this?
> 
> -eric
> 
> http://reviews.llvm.org/D3800
> 
>