[llvm-dev] enabling interleaved access loop vectorization
Demikhovsky, Elena via llvm-dev
llvm-dev at lists.llvm.org
Thu May 26 12:35:15 PDT 2016
Interleaved access is not enabled on X86 yet.
We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer produces a long queue of "extracts" and "inserts" that hopefully will be coupled into shuffles on a later instcombine pass.
>From: Renato Golin [mailto:renato.golin at linaro.org]
>Sent: Thursday, May 26, 2016 21:25
>To: Sanjay Patel <spatel at rotateright.com>; Demikhovsky, Elena
><elena.demikhovsky at intel.com>
>Cc: llvm-dev <llvm-dev at lists.llvm.org>
>Subject: Re: [llvm-dev] enabling interleaved access loop vectorization
>On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-
>dev at lists.llvm.org> wrote:
>> Is there a compile-time and/or potential runtime cost that makes
>> enableInterleavedAccessVectorization() default to 'false'?
>> I notice that this is set to true for ARM, AArch64, and PPC.
>> In particular, I'm wondering if there's a reason it's not enabled for
>> x86 in relation to PR27881:
>The feature was originally developed for ARM's VLDn/VSTn instructions
>and then extended to AArch64 and PPC, but not x86/64 yet.
>I believe Elena was working on that, but needed to get the scatter/gather
>intrinsics working first. I just copied her in case I'm wrong. :)
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the llvm-dev