Hello all,

I would like to propose adding the -mprefer-avx256 and -mprefer-avx128
command line flags supported by latest GCC to clang. These flags will be
used to limit the vector register size presented by TTI to the vectorizers.
The backend will still be able to use wider registers for code written
using the instrinsics in x86intrin.h. And the backend will still be able to
use AVX512VL instructions and the additional XMM16-31 and YMM16-31


-Using 512-bit operations on some Intel CPUs may cause a decrease in CPU
frequency that may offset the gains from using the wider register size. See
section 15.26 of IntelĀ® 64 and IA-32 Architectures Optimization Reference
Manual published October 2017.

-The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture
are only 256-bits wide. 512-bit instructions using these ALUs must use both
ports. See section 2.1 of IntelĀ® 64 and IA-32 Architectures Optimization
Reference Manual published October 2017.

Implementation Plan:

-Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not
mapped to any CPU.

-Add mprefer-avx256 and mprefer-avx128 and the corresponding
-mno-prefer-avx128/256 options to clang's driver Options.td file. I believe
this will allow clang to pass these straight through to the -target-feature
attribute in IR.

-Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is
enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return
256 if AVX is enabled and prefer-avx128 is not set.

There may be some other backend changes needed, but I plan to address those
as we find them.

At a later point, consider making -mprefer-avx256 the default for Skylake
Server due to the above mentioned performance considerations.

Does this sound reasonable?

*Latest Intel Optimization manual available here:

-Craig Topper
