[PATCH] D27919: [Loop Vectorizer] Interleave vs Gather - in some cases Gather is better.

Tue Dec 20 11:13:06 PST 2016

mkuper added inline comments.

================
Comment at: ../../ver4/lib/Transforms/Vectorize/LoopVectorize.cpp:7047-7049
+      // I do not compare "gather" cost vs "interleave pattern", I just assume
+      // that each target provides reasonable MaxInterleaveFactor that
+      // makes the "interleave pattern" profitable. When InterleaveFactor
----------------
delena wrote:
> mssimpso wrote:
> > Why don't you just compare the costs? You wouldn't need to make this assumption anymore.
> The cost that we provide for interleaved access is incorrect, specially for AVX-512. AVX-512 has 3-src shuffles and the real cost is much lower . I can't compare it to Gather - the Gather cost wins today, even for small stride, but it is not true. So there are 2 bugs: 
>   (1) The loop is scalarized and Gather/Scatter option is not considered at all
>   (2) Incorrect cost for interleaving
> 
> I can start from providing a correct cost for interleaving on AVX-512.
> Or I can fix the (1) first of all. I'll retrieve the proper "MaxSupportedInterleaveFactor".
> 
> What do you think?
I think it would be better to fix the cost model first.

It's very pessimistic for x86 in general, not AVX-512, but you're right, it's even worse for AVX-512, because the real cost is lower. But I thought Farhana was already working on that. Am I confused?

Repository:
  rL LLVM

https://reviews.llvm.org/D27919