[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Sun Nov 27 07:58:04 PST 2016

On Sun, Nov 27, 2016 at 11:40 PM, Renato Golin <renato.golin at linaro.org> wrote:
> On 27 November 2016 at 15:35, C Bergström <cbergstrom at pathscale.com> wrote:
>> While the VL can vary.. in practice wouldn't the cost of vectorization
>> and width be tied more to the hardware implementation than anything
>> else? The cost of vectorizing thread 1 vs 2 isn't likely to change?
>> (Am I drunk and mistaken?)
>
> Mistaken. :)
>
> The scale of the vector can change between two processes on the same
> machine and it's up to the kernel (I guess) to make sure they're
> correct.
>
> In theory, it could even change in the same process, for instance, as
> a result of PGO or if some loops have less loop-carried dependencies
> than others.
>
> The three important premises are:
>
> 1. The vectorizer still has the duty to restrict the vector length to
> whatever makes it cope with the loop dependencies. SVE *has* to be
> able to cope with that by restricting the number of lanes "per
> access".
>
> 2. The cost analysis will have to assume the smallest possible vector
> size and "hope" that anything larger will only mean profit. This seems
> straight-forward enough.
>
> 3. Hardware flags and target features must be able to override the
> minimum size, maximum size, etc. and it's up to the users to make sure
> that's meaningful in their hardware.

I'll bite my tongue on negative comments, but it seems that for
anything other than trivial loops this is going to put the burden
entirely on the user. Are you telling me the *kernel* is really going
to be able to make these decisions on the fly, correctly?

Won't this block loop transformations?