[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Thu Nov 2 19:04:48 PDT 2017

On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hello all,
>
>
>
> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128
> command line flags supported by latest GCC to clang. These flags will be
> used to limit the vector register size presented by TTI to the vectorizers.
> The backend will still be able to use wider registers for code written
> using the instrinsics in x86intrin.h. And the backend will still be able to
> use AVX512VL instructions and the additional XMM16-31 and YMM16-31
> registers.
>
>
>
> Motivation:
>
> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU
> frequency that may offset the gains from using the wider register size. See
> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference
> Manual published October 2017.
>

I note the doc mentions that 256-bit AVX operations also have the same
issue with reducing the CPU frequency, which is nice to see documented!

There's also the issues discussed here <http://www.agner.org/
optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time for
the 256-bit execution pipeline, which is another issue with using
wide-vector ops.

-The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture
> are only 256-bits wide. 512-bit instructions using these ALUs must use both
> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization
> Reference Manual published October 2017.
>

>  Implementation Plan:
>
> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not
> mapped to any CPU.
>
> -Add mprefer-avx256 and mprefer-avx128 and the corresponding
> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe
> this will allow clang to pass these straight through to the -target-feature
> attribute in IR.
>
> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is
> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return
> 256 if AVX is enabled and prefer-avx128 is not set.
>

Instead of multiple flags that have difficult to understand intersecting
behavior, one flag with a value would be better. E.g., what should
"-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the
answer, it's confusing. (Similarly with other such combinations). Just a
single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier
to understand to me (keeping the same behavior as you mention: asking to
prefer a larger width than is supported by your architecture should be fine
but ignored).

There may be some other backend changes needed, but I plan to address those
> as we find them.
>
>
> At a later point, consider making -mprefer-avx256 the default for Skylake
> Server due to the above mentioned performance considerations.
>

>
Does this sound reasonable?
>
>
>
> *Latest Intel Optimization manual available here: https://software.intel.c
> om/en-us/articles/intel-sdm#optimization
>
>
> -Craig Topper
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171102/90c967a7/attachment.html>