[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Sat Nov 11 08:25:29 PST 2017

Yes - I was thinking of FeatureFastScalarFSQRT / FeatureFastVectorFSQRT
which are used by isFsqrtCheap(). These were added to override the default
x86 sqrt estimate codegen with:
https://reviews.llvm.org/D21379

But I'm not sure we really need that kind of hack. Can we adjust the
attribute in clang based on the target cpu? Ie, if you have something like:
$ clang -O2 -march=skylake-avx512 foo.c

Then you can detect that in the clang driver and pass
-mprefer-vector-width=256 to clang codegen as an option? Clang codegen then
adds that function attribute to everything it outputs. Then, the
vectorizers and/or backend detect that attribute and adjust their behavior
based on it.

So I don't think we should be messing with any kind of type legality
checking because that stuff should all be correct already. We're just
choosing a vector size based on a pref. I think we should even allow the
pref to go bigger than a legal type. This came up somewhere on llvm-dev or
in a bug recently in the context of vector reductions.

On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper <craig.topper at gmail.com>
wrote:

> Are you referring to the X86TargetLowering::isFsqrtCheap hook?
>
> ~Craig
>
> On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel at rotateright.com>
> wrote:
>
>> We can tie a user preference / override to a CPU model. We do something
>> like that for square root estimates already (although it does use a
>> SubtargetFeature currently for x86; ideally, we'd key that off of something
>> in the CPU scheduler model).
>>
>>
>> On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper at gmail.com>
>> wrote:
>>
>>> I agree that a less x86 specific command line makes sense. I've been
>>> having an internal discussions with gcc folks and their evaluating
>>> switching to something like -mprefer-vector-width=128/256/512/none
>>>
>>> Based on the current performance data we're seeing, we think we need to
>>> ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go
>>> with a target independent option/implementation is there someway we could
>>> still affect the default behavior in a target specific way?
>>>
>>> ~Craig
>>>
>>> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel at rotateright.com>
>>> wrote:
>>>
>>>> It's clear from the Intel docs how this has evolved, but from a
>>>> compiler perspective, this isn't a Skylake "feature" :) ... nor an Intel
>>>> feature, nor an x86 feature.
>>>>
>>>> It's a generic programmer hint for any target with multiple potential
>>>> vector lengths.
>>>>
>>>> On x86, there's already a potential use case for this hint with a
>>>> different starting motivation: re-vectorization. That's where we take C
>>>> code that uses 128-bit vector intrinsics and selectively widen it to 256-
>>>> or 512-bit vector ops based on a newer CPU target than the code was
>>>> originally written for.
>>>>
>>>> I think it's just a matter of time before a customer requests the same
>>>> ability for another target (maybe they already have and I don't know about
>>>> it). So we should have a solution that recognizes that possibility.
>>>>
>>>> Note that having a target-independent implementation in the optimizer
>>>> doesn't preclude a flag alias in clang to maintain compatibility with gcc.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:
>>>>> > That's a very good point about the ordering of the command line
>>>>> options.
>>>>> > gcc's current implementation treats -mprefer-avx256 has "prefer 256
>>>>> over
>>>>> > 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird
>>>>> for
>>>>> > other reasons, but has less of an ordering ambiguity.
>>>>> >
>>>>> > -mprefer-avx128 has been in gcc for many years and predates the
>>>>> creation
>>>>> > of
>>>>> > avx512. -mprefer-avx256 was added a couple months ago.
>>>>> >
>>>>> > We've had an internal conversation with the implementor of
>>>>> > -mprefer-avx256
>>>>> > in gcc about making -mprefer-avx128 affect 512-bit vectors as well.
>>>>> I'll
>>>>> > bring up the ambiguity issue with them.
>>>>> >
>>>>> > Do we want to be compatible with gcc here?
>>>>>
>>>>> I certainly believe we would want to be compatible with gcc (if we use
>>>>> the same names).
>>>>>
>>>>> Best,
>>>>> Tobias
>>>>>
>>>>> >
>>>>> > ~Craig
>>>>> >
>>>>> > On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com
>>>>> >
>>>>> > wrote:
>>>>> >
>>>>> > >
>>>>> > >
>>>>> > > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <
>>>>> > > llvm-dev at lists.llvm.org> wrote:
>>>>> > >
>>>>> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <
>>>>> > >> llvm-dev at lists.llvm.org> wrote:
>>>>> > >>
>>>>> > >>> Hello all,
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> I would like to propose adding the -mprefer-avx256 and
>>>>> -mprefer-avx128
>>>>> > >>> command line flags supported by latest GCC to clang. These flags
>>>>> will be
>>>>> > >>> used to limit the vector register size presented by TTI to the
>>>>> vectorizers.
>>>>> > >>> The backend will still be able to use wider registers for code
>>>>> written
>>>>> > >>> using the instrinsics in x86intrin.h. And the backend will still
>>>>> be able to
>>>>> > >>> use AVX512VL instructions and the additional XMM16-31 and
>>>>> YMM16-31
>>>>> > >>> registers.
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> Motivation:
>>>>> > >>>
>>>>> > >>> -Using 512-bit operations on some Intel CPUs may cause a
>>>>> decrease in CPU
>>>>> > >>> frequency that may offset the gains from using the wider
>>>>> register size. See
>>>>> > >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization
>>>>> Reference
>>>>> > >>> Manual published October 2017.
>>>>> > >>>
>>>>> > >>
>>>>> > >> I note the doc mentions that 256-bit AVX operations also have the
>>>>> same
>>>>> > >> issue with reducing the CPU frequency, which is nice to see
>>>>> documented!
>>>>> > >>
>>>>> > >> There's also the issues discussed here <http://www.agner.org/
>>>>> > >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up
>>>>> time
>>>>> > >> for the 256-bit execution pipeline, which is another issue with
>>>>> using
>>>>> > >> wide-vector ops.
>>>>> > >>
>>>>> > >>
>>>>> > >> -The vector ALUs on ports 0 and 1 of the Skylake Server
>>>>> microarchitecture
>>>>> > >>> are only 256-bits wide. 512-bit instructions using these ALUs
>>>>> must use both
>>>>> > >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures
>>>>> Optimization
>>>>> > >>> Reference Manual published October 2017.
>>>>> > >>>
>>>>> > >>
>>>>> > >>
>>>>> > >>>  Implementation Plan:
>>>>> > >>>
>>>>> > >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in
>>>>> X86.td not
>>>>> > >>> mapped to any CPU.
>>>>> > >>>
>>>>> > >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding
>>>>> > >>> -mno-prefer-avx128/256 options to clang's driver Options.td
>>>>> file. I believe
>>>>> > >>> this will allow clang to pass these straight through to the
>>>>> -target-feature
>>>>> > >>> attribute in IR.
>>>>> > >>>
>>>>> > >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if
>>>>> AVX512 is
>>>>> > >>> enabled and prefer-avx256 and prefer-avx128 is not set.
>>>>> Similarly return
>>>>> > >>> 256 if AVX is enabled and prefer-avx128 is not set.
>>>>> > >>>
>>>>> > >>
>>>>> > >> Instead of multiple flags that have difficult to understand
>>>>> intersecting
>>>>> > >> behavior, one flag with a value would be better. E.g., what should
>>>>> > >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No
>>>>> matter the
>>>>> > >> answer, it's confusing. (Similarly with other such combinations).
>>>>> Just a
>>>>> > >> single arg "-mprefer-avx={128/256/512}" (with no "no" version)
>>>>> seems easier
>>>>> > >> to understand to me (keeping the same behavior as you mention:
>>>>> asking to
>>>>> > >> prefer a larger width than is supported by your architecture
>>>>> should be fine
>>>>> > >> but ignored).
>>>>> > >>
>>>>> > >>
>>>>> > > I agree with this. It's a little more plumbing as far as subtarget
>>>>> > > features etc (represent via an optional value or just various "set
>>>>> the avx
>>>>> > > width" features - the latter being easier, but uglier), however,
>>>>> it's
>>>>> > > probably the right thing to do.
>>>>> > >
>>>>> > > I was looking at this myself just a couple weeks ago and think
>>>>> this is the
>>>>> > > right direction (when and how to turn things off) - and probably
>>>>> makes
>>>>> > > sense to be a default for these architectures? We might end up
>>>>> needing to
>>>>> > > check a couple of additional TTI places, but it sounds like you're
>>>>> on top
>>>>> > > of it. :)
>>>>> > >
>>>>> > > Thanks very much for doing this work.
>>>>> > >
>>>>> > > -eric
>>>>> > >
>>>>> > >
>>>>> > >>
>>>>> > >>
>>>>> > >> There may be some other backend changes needed, but I plan to
>>>>> address
>>>>> > >>> those as we find them.
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> At a later point, consider making -mprefer-avx256 the default for
>>>>> > >>> Skylake Server due to the above mentioned performance
>>>>> considerations.
>>>>> > >>>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>>
>>>>> > >> Does this sound reasonable?
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> *Latest Intel Optimization manual available here:
>>>>> > >>> https://software.intel.com/en-us/articles/intel-sdm#optimization
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> -Craig Topper
>>>>> > >>>
>>>>> > >>> _______________________________________________
>>>>> > >>> LLVM Developers mailing list
>>>>> > >>> llvm-dev at lists.llvm.org
>>>>> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> > >>>
>>>>> > >>> _______________________________________________
>>>>> > >> LLVM Developers mailing list
>>>>> > >> llvm-dev at lists.llvm.org
>>>>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> > >>
>>>>> > >
>>>>> > _______________________________________________
>>>>> > LLVM Developers mailing list
>>>>> > llvm-dev at lists.llvm.org
>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171111/13723922/attachment-0001.html>