[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Fri Nov 3 07:19:21 PDT 2017

I want to focus on just the optimizer/backend part.

Why make this an x86-specific "feature" of the target? We already have
options like this in LoopVectorize.cpp:

static cl::opt<unsigned> ForceTargetNumVectorRegs(
    "force-target-num-vector-regs", cl::init(0), cl::Hidden,
    cl::desc("A flag that overrides the target's number of vector
registers."));

Can we add an equivalent target-independent override for vector width? Any
target with >1 potential register width will benefit from having this
option for experimentation.

On Thu, Nov 2, 2017 at 4:44 PM, Craig Topper via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Reviews of the initial plumbing have been posted
>
> https://reviews.llvm.org/D39575
> https://reviews.llvm.org/D39576
>
> ~Craig
>
> On Thu, Nov 2, 2017 at 4:57 AM, Tobias Grosser <tobias.grosser at inf.ethz.ch
> > wrote:
>
>> Hi Craig,
>>
>> this sounds like a good idea.
>>
>> Best,
>> Tobias
>>
>> On Thu, Nov 2, 2017, at 00:35, Craig Topper via llvm-dev wrote:
>> > Hello all,
>> >
>> >
>> >
>> > I would like to propose adding the -mprefer-avx256 and -mprefer-avx128
>> > command line flags supported by latest GCC to clang. These flags will be
>> > used to limit the vector register size presented by TTI to the
>> > vectorizers.
>> > The backend will still be able to use wider registers for code written
>> > using the instrinsics in x86intrin.h. And the backend will still be able
>> > to
>> > use AVX512VL instructions and the additional XMM16-31 and YMM16-31
>> > registers.
>> >
>> >
>> >
>> > Motivation:
>> >
>> > -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU
>> > frequency that may offset the gains from using the wider register size.
>> > See
>> > section 15.26 of Intel® 64 and IA-32 Architectures Optimization
>> Reference
>> > Manual published October 2017.
>> >
>> > -The vector ALUs on ports 0 and 1 of the Skylake Server
>> microarchitecture
>> > are only 256-bits wide. 512-bit instructions using these ALUs must use
>> > both
>> > ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization
>> > Reference Manual published October 2017.
>> >
>> >
>> >
>> > Implementation Plan:
>> >
>> > -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not
>> > mapped to any CPU.
>> >
>> > -Add mprefer-avx256 and mprefer-avx128 and the corresponding
>> > -mno-prefer-avx128/256 options to clang's driver Options.td file. I
>> > believe
>> > this will allow clang to pass these straight through to the
>> > -target-feature
>> > attribute in IR.
>> >
>> > -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is
>> > enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return
>> > 256 if AVX is enabled and prefer-avx128 is not set.
>> >
>> >
>> >
>> > There may be some other backend changes needed, but I plan to address
>> > those
>> > as we find them.
>> >
>> >
>> > At a later point, consider making -mprefer-avx256 the default for
>> Skylake
>> > Server due to the above mentioned performance considerations.
>> >
>> >
>> >
>> > Does this sound reasonable?
>> >
>> >
>> >
>> > *Latest Intel Optimization manual available here:
>> > https://software.intel.com/en-us/articles/intel-sdm#optimization
>> >
>> >
>> > -Craig Topper
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171103/58a397f7/attachment-0001.html>