[PATCH] Calculate vectorization factor using the narrowest type instead of widest type

Sanjay Patel spatel at rotateright.com
Mon Apr 13 21:29:18 PDT 2015


> If I understand this correctly, this will cause us to potentially
generate wider vectors than we have underlying vector registers...

This reminded me of a question I asked on the dev list a few weeks ago
related to this bug:
https://llvm.org/bugs/show_bug.cgi?id=20225

One of the conclusions I came to was:
"The loop vectorizer shouldn't be so eager to generate a larger-than-legal
type."

I don't think there's been much effort trying to optimize super-wide
shuffles.

On Sat, Apr 11, 2015 at 10:51 PM, Nadav Rotem <nrotem at apple.com> wrote:

>
> > On Apr 11, 2015, at 6:41 PM, hfinkel at anl.gov wrote:
> >
> > [+Arnold, Nadav,Chandler]
>
> The loop vectorizer chooses a vectorization factor using a cost model and
> we stop the search at the widest type to limit cross-register shuffles. It
> is very difficult to model the performance impact of cross-register
> shuffles (both the quality of the shuffles that we generate and the
> performance impact of using multiple registers on register pressure).
>
>  Using the narrowest element type would increase the in-register
> utilization but decrease the utilization of execution units due to
> unrolling, so I am not sure what we are gaining here (except for extra
> shuffles).
>
> >
> > If I understand this correctly, this will cause us to potentially
> generate wider vectors than we have underlying vector registers, and I
> think that, generically, this makes sense. Now that our X86 shuffle
> handling is sane, the splitting of wide vectors, and shuffling that you get
> from vector extends/truncates is hopefully not too bad. Other opinions?
> >
> > Did you see any performance changes on the test suite?
> >
> > We might need to update the register-pressure heuristic
> (LoopVectorizationCostModel::calculateRegisterUsage()) to understand that
> very-wide vectors use multiple vector registers.
>
> I agree.  Additionally, I think that we should rewrite this code. I made
> the mistake of calculating register pressure by scanning the code forward
> (start to end) instead of backwards, which made the code unnecessarily
> complicated (liveness should be calculated by scanning the basic block
> backwards).
>
> >
> >
> > http://reviews.llvm.org/D8943
> >
> > EMAIL PREFERENCES
> >  http://reviews.llvm.org/settings/panel/emailpreferences/
> >
> >
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150413/a068262e/attachment.html>


More information about the llvm-commits mailing list