[llvm-dev] enabling interleaved access loop vectorization

Wed Aug 10 16:32:08 PDT 2016

So, unfortunately, it turns out I don't have access to DENBench.

Do you happen to have a reduced example that gets pessimized by this?

On Tue, Aug 9, 2016 at 11:25 AM, Michael Kuperstein <mkuper at google.com>
wrote:

> Thanks Ayal!
>
> I'll take a look at DENBench.
>
> As another data point - I tried enabling this on our internal benchmarks.
> I'm seeing one regression, and it seems to be a regression of the "good"
> kind - without interleaving we don't vectorize the innermost loop, and with
> interleaving we do. The vectorized loop is actually significantly faster
> when benchmarked in isolation, but in this specific instance, the static
> loop count is unknown, and the dynamic loop count happens to almost always
> be 1 - and this lives inside a hot outer loop.
> That's something we ought to be handling through PGO (or, conceivably,
> outer loop vectorization :-) ).
>
> Michael
>
> On Mon, Aug 8, 2016 at 3:21 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote:
>
>> > We also need to understand what to do with edge elements in the vector
>> if their loading is not required. We, probably, should issue a masked load
>> in this case.
>>
>>
>>
>> The existing code solves such edge cases where the last element of an
>> InterleaveGroup is absent by making sure the last iteration (and up to last
>> VF iterations) are peeled and executed scalarly; see requiresScalarEpilogue.
>>
>>
>>
>>
>>
>> > All regressions that we see are in 32-bit mode.
>>
>>
>>
>> One place to find them, using the default BaseT::getInterleavedMemoryOpCost(),
>> is DENBench’s RGB conversions.
>>
>>
>>
>> Ayal.
>>
>>
>>
>> *From:* Demikhovsky, Elena
>> *Sent:* Monday, August 08, 2016 00:09
>> *To:* Michael Kuperstein <mkuper at google.com>; Renato Golin <
>> renato.golin at linaro.org>
>> *Cc:* Matthew Simpson <mssimpso at codeaurora.org>; Nema, Ashutosh <
>> Ashutosh.Nema at amd.com>; Sanjay Patel <spatel at rotateright.com>; llvm-dev <
>> llvm-dev at lists.llvm.org>; Zaks, Ayal <ayal.zaks at intel.com>
>> *Subject:* RE: [llvm-dev] enabling interleaved access loop vectorization
>>
>>
>>
>> We checked the gathered data again. All regressions that we see are in
>> 32-bit mode. The 64-bit mode looks good overall.
>>
>>
>>
>> -          * Elena*
>>
>>
>>
>> *From:* Michael Kuperstein [mailto:mkuper at google.com <mkuper at google.com>]
>>
>> *Sent:* Saturday, August 06, 2016 02:56
>> *To:* Renato Golin <renato.golin at linaro.org>
>> *Cc:* Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <
>> mssimpso at codeaurora.org>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; Sanjay
>> Patel <spatel at rotateright.com>; llvm-dev <llvm-dev at lists.llvm.org>;
>> Zaks, Ayal <ayal.zaks at intel.com>
>> *Subject:* Re: [llvm-dev] enabling interleaved access loop vectorization
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Aug 5, 2016 at 4:37 PM, Renato Golin <renato.golin at linaro.org>
>> wrote:
>>
>> On 6 August 2016 at 00:18, Michael Kuperstein <mkuper at google.com> wrote:
>> > I agree that we can get *more* improvement with better cost modeling,
>> but
>> > I'd expect to be able to get *some* improvement the way things are right
>> > now.
>>
>> Elena said she saw "some" improvements. :)
>>
>>
>>
>> I didn't mean "some improvements, some regressions", I meant "some of the
>> improvement we'd expect from the full solution". :-)
>>
>>
>>
>>
>> > That's why I'm curious about where we saw regressions - I'm wondering
>> > whether there's really a significant cost modeling issue I'm missing, or
>> > it's something that's easy to fix so that we can make forward progress,
>> > while Ashutosh is working on the longer-term solution.
>>
>> Sounds like a task to try a few patterns and fiddle with the cost model.
>>
>> Arnold did a lot of those during the first months of the vectorizer,
>> so it might be just a matter of finding the right heuristics, at least
>> for the low hanging fruits.
>>
>> Of course, that'd also involve benchmarking everything else, to make
>> sure the new heuristics doesn't introduce regressions on
>> non-interleaved vectorisation.
>>
>>
>>
>> I don't disagree with you.
>>
>>
>>
>> All I'm saying is that before fiddling with the heuristics, it'd be good
>> to understand what exactly breaks if we simply flip the flag. If the answer
>> happens to be "nothing" - well, problem solved. Unfortunately, according to
>> Elena, that's not the answer.
>>
>> I'm going to play with it with our internal benchmarks, but it's my
>> understanding that Elena/Ayal already have some idea of what the problems
>> are.
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160810/8d082b7b/attachment.html>