[llvm-dev] [RFC] Supporting ARM's SVE in LLVM

Sun Nov 27 07:40:46 PST 2016

On 27 November 2016 at 15:35, C Bergström <cbergstrom at pathscale.com> wrote:
> While the VL can vary.. in practice wouldn't the cost of vectorization
> and width be tied more to the hardware implementation than anything
> else? The cost of vectorizing thread 1 vs 2 isn't likely to change?
> (Am I drunk and mistaken?)

Mistaken. :)

The scale of the vector can change between two processes on the same
machine and it's up to the kernel (I guess) to make sure they're
correct.

In theory, it could even change in the same process, for instance, as
a result of PGO or if some loops have less loop-carried dependencies
than others.

The three important premises are:

1. The vectorizer still has the duty to restrict the vector length to
whatever makes it cope with the loop dependencies. SVE *has* to be
able to cope with that by restricting the number of lanes "per
access".

2. The cost analysis will have to assume the smallest possible vector
size and "hope" that anything larger will only mean profit. This seems
straight-forward enough.

3. Hardware flags and target features must be able to override the
minimum size, maximum size, etc. and it's up to the users to make sure
that's meaningful in their hardware.

cheers,
--renato