[PATCH] Calculate vectorization factor using the narrowest type instead of widest type

Tue Apr 14 08:26:00 PDT 2015

Hi Sanjay,

Actually Chandler had the opposite opinion, and I at least have been
pushing the ARM backends into q shape that they can deal gracefully with
oversize vectors. The work I've done should affect all backends equally,
actually.

I'd be interested in what others' opinions are here, because I've heard
both arguments and both seem valid to me!

Cheers,

James
On Tue, 14 Apr 2015 at 05:33, Sanjay Patel <spatel at rotateright.com> wrote:

> > If I understand this correctly, this will cause us to potentially
> generate wider vectors than we have underlying vector registers...
>
> This reminded me of a question I asked on the dev list a few weeks ago
> related to this bug:
> https://llvm.org/bugs/show_bug.cgi?id=20225
>
> One of the conclusions I came to was:
> "The loop vectorizer shouldn't be so eager to generate a larger-than-legal
> type."
>
> I don't think there's been much effort trying to optimize super-wide
> shuffles.
>
> On Sat, Apr 11, 2015 at 10:51 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
>>
>> > On Apr 11, 2015, at 6:41 PM, hfinkel at anl.gov wrote:
>> >
>> > [+Arnold, Nadav,Chandler]
>>
>> The loop vectorizer chooses a vectorization factor using a cost model and
>> we stop the search at the widest type to limit cross-register shuffles. It
>> is very difficult to model the performance impact of cross-register
>> shuffles (both the quality of the shuffles that we generate and the
>> performance impact of using multiple registers on register pressure).
>>
>>  Using the narrowest element type would increase the in-register
>> utilization but decrease the utilization of execution units due to
>> unrolling, so I am not sure what we are gaining here (except for extra
>> shuffles).
>>
>> >
>> > If I understand this correctly, this will cause us to potentially
>> generate wider vectors than we have underlying vector registers, and I
>> think that, generically, this makes sense. Now that our X86 shuffle
>> handling is sane, the splitting of wide vectors, and shuffling that you get
>> from vector extends/truncates is hopefully not too bad. Other opinions?
>> >
>> > Did you see any performance changes on the test suite?
>> >
>> > We might need to update the register-pressure heuristic
>> (LoopVectorizationCostModel::calculateRegisterUsage()) to understand that
>> very-wide vectors use multiple vector registers.
>>
>> I agree.  Additionally, I think that we should rewrite this code. I made
>> the mistake of calculating register pressure by scanning the code forward
>> (start to end) instead of backwards, which made the code unnecessarily
>> complicated (liveness should be calculated by scanning the basic block
>> backwards).
>>
>> >
>> >
>> > http://reviews.llvm.org/D8943
>> >
>> > EMAIL PREFERENCES
>> >  http://reviews.llvm.org/settings/panel/emailpreferences/
>> >
>> >
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150414/a7b50ddb/attachment.html>