[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Mon Nov 13 15:54:36 PST 2017
On 11/13/2017 05:49 PM, Eric Christopher wrote:
>
>
> On Mon, Nov 13, 2017 at 2:15 PM Craig Topper via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>
> On 11/11/2017 09:52 PM, UE US via llvm-dev wrote:
>> If skylake is that bad at AVX2
>
> I don't think this says anything negative about AVX2, but AVX-512.
>
>
> Right. I think we're at AVX/AVX2 is "bad" on Haswell/Broadwell and
> AVX512 is "bad" on Skylake. At least in the "random autovectorization
> spread out" aspect.
>
>
>
>> it belongs in -mcpu / -march IMO.
>
> No. We'd still want to enable the architectural features for
> vector intrinsics and the like.
>
>
> I took this to mean that the feature should be enabled by default
> for -march=skylake-avx512.
>
>
>
> Agreed.
Yes. Also, GNOMETOYS clarified to me (off list) that is what he meant.
-Hal
>
> -eric
>
>
>
>
>> Based on the current performance data we're seeing, we think
>> we need to ultimately default skylake-avx512 to
>> -mprefer-vector-width=256.
>
> Craig, is this for both integer and floating-point code?
>
>
> I believe so, but I'll try to get confirmation from the people
> with more data.
>
>
>
> -Hal
>
>> Most people will build for the standard x86_64-pc-linux or
>> whatever anyway, and completely ignore the change. This will
>> mainly affect those who build their own software and optimize
>> for their system, and lots there have probably caught on to
>> this already. I always thought that's what -march was made
>> for, really.
>>
>> GNOMETOYS
>>
>> On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Yes - I was thinking of FeatureFastScalarFSQRT /
>> FeatureFastVectorFSQRT which are used by isFsqrtCheap().
>> These were added to override the default x86 sqrt
>> estimate codegen with:
>> https://reviews.llvm.org/D21379
>>
>> But I'm not sure we really need that kind of hack. Can we
>> adjust the attribute in clang based on the target cpu?
>> Ie, if you have something like:
>> $ clang -O2 -march=skylake-avx512 foo.c
>>
>> Then you can detect that in the clang driver and pass
>> -mprefer-vector-width=256 to clang codegen as an option?
>> Clang codegen then adds that function attribute to
>> everything it outputs. Then, the vectorizers and/or
>> backend detect that attribute and adjust their behavior
>> based on it.
>>
>
> Do we have a precedent for setting a target independent flag from
> a target specific cpu string in the clang driver? Want to make
> sure I understand what the processing on such a thing would look
> like. Particularly to get the order right so the user can override it.
>
>>
>> So I don't think we should be messing with any kind of
>> type legality checking because that stuff should all be
>> correct already. We're just choosing a vector size based
>> on a pref. I think we should even allow the pref to go
>> bigger than a legal type. This came up somewhere on
>> llvm-dev or in a bug recently in the context of vector
>> reductions.
>>
>>
>>
>> On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper
>> <craig.topper at gmail.com <mailto:craig.topper at gmail.com>>
>> wrote:
>>
>> Are you referring to
>> the X86TargetLowering::isFsqrtCheap hook?
>>
>> ~Craig
>>
>> On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel
>> <spatel at rotateright.com
>> <mailto:spatel at rotateright.com>> wrote:
>>
>> We can tie a user preference / override to a CPU
>> model. We do something like that for square root
>> estimates already (although it does use a
>> SubtargetFeature currently for x86; ideally, we'd
>> key that off of something in the CPU scheduler
>> model).
>>
>>
>> On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper
>> <craig.topper at gmail.com
>> <mailto:craig.topper at gmail.com>> wrote:
>>
>> I agree that a less x86 specific command line
>> makes sense. I've been having an internal
>> discussions with gcc folks and their
>> evaluating switching to something like
>> -mprefer-vector-width=128/256/512/none
>>
>> Based on the current performance data we're
>> seeing, we think we need to ultimately
>> default skylake-avx512 to
>> -mprefer-vector-width=256. If we go with a
>> target independent option/implementation is
>> there someway we could still affect the
>> default behavior in a target specific way?
>>
>> ~Craig
>>
>> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel
>> <spatel at rotateright.com
>> <mailto:spatel at rotateright.com>> wrote:
>>
>> It's clear from the Intel docs how this
>> has evolved, but from a compiler
>> perspective, this isn't a Skylake
>> "feature" :) ... nor an Intel feature,
>> nor an x86 feature.
>>
>> It's a generic programmer hint for any
>> target with multiple potential vector
>> lengths.
>>
>> On x86, there's already a potential use
>> case for this hint with a different
>> starting motivation: re-vectorization.
>> That's where we take C code that uses
>> 128-bit vector intrinsics and selectively
>> widen it to 256- or 512-bit vector ops
>> based on a newer CPU target than the code
>> was originally written for.
>>
>> I think it's just a matter of time before
>> a customer requests the same ability for
>> another target (maybe they already have
>> and I don't know about it). So we should
>> have a solution that recognizes that
>> possibility.
>>
>> Note that having a target-independent
>> implementation in the optimizer doesn't
>> preclude a flag alias in clang to
>> maintain compatibility with gcc.
>>
>>
>>
>> On Tue, Nov 7, 2017 at 2:02 AM, Tobias
>> Grosser via llvm-dev
>> <llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> On Fri, Nov 3, 2017, at 05:47, Craig
>> Topper via llvm-dev wrote:
>> > That's a very good point about the
>> ordering of the command line options.
>> > gcc's current implementation treats
>> -mprefer-avx256 has "prefer 256 over
>> > 512" and -mprefer-avx128 as "prefer
>> 128 over 256". Which feels weird for
>> > other reasons, but has less of an
>> ordering ambiguity.
>> >
>> > -mprefer-avx128 has been in gcc for
>> many years and predates the creation
>> > of
>> > avx512. -mprefer-avx256 was added a
>> couple months ago.
>> >
>> > We've had an internal conversation
>> with the implementor of
>> > -mprefer-avx256
>> > in gcc about making -mprefer-avx128
>> affect 512-bit vectors as well. I'll
>> > bring up the ambiguity issue with them.
>> >
>> > Do we want to be compatible with
>> gcc here?
>>
>> I certainly believe we would want to
>> be compatible with gcc (if we use
>> the same names).
>>
>> Best,
>> Tobias
>>
>> >
>> > ~Craig
>> >
>> > On Thu, Nov 2, 2017 at 7:18 PM,
>> Eric Christopher <echristo at gmail.com
>> <mailto:echristo at gmail.com>>
>> > wrote:
>> >
>> > >
>> > >
>> > > On Thu, Nov 2, 2017 at 7:05 PM
>> James Y Knight via llvm-dev <
>> > > llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>> > >
>> > >> On Wed, Nov 1, 2017 at 7:35 PM,
>> Craig Topper via llvm-dev <
>> > >> llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>> > >>
>> > >>> Hello all,
>> > >>>
>> > >>>
>> > >>>
>> > >>> I would like to propose adding
>> the -mprefer-avx256 and -mprefer-avx128
>> > >>> command line flags supported by
>> latest GCC to clang. These flags will be
>> > >>> used to limit the vector
>> register size presented by TTI to the
>> vectorizers.
>> > >>> The backend will still be able
>> to use wider registers for code written
>> > >>> using the instrinsics in
>> x86intrin.h. And the backend will
>> still be able to
>> > >>> use AVX512VL instructions and
>> the additional XMM16-31 and YMM16-31
>> > >>> registers.
>> > >>>
>> > >>>
>> > >>>
>> > >>> Motivation:
>> > >>>
>> > >>> -Using 512-bit operations on
>> some Intel CPUs may cause a decrease
>> in CPU
>> > >>> frequency that may offset the
>> gains from using the wider register
>> size. See
>> > >>> section 15.26 of IntelĀ® 64 and
>> IA-32 Architectures Optimization
>> Reference
>> > >>> Manual published October 2017.
>> > >>>
>> > >>
>> > >> I note the doc mentions that
>> 256-bit AVX operations also have the same
>> > >> issue with reducing the CPU
>> frequency, which is nice to see
>> documented!
>> > >>
>> > >> There's also the issues
>> discussed here <http://www.agner.org/
>> > >> optimize/blog/read.php?i=165>
>> (and elsewhere) related to warm-up time
>> > >> for the 256-bit execution
>> pipeline, which is another issue with
>> using
>> > >> wide-vector ops.
>> > >>
>> > >>
>> > >> -The vector ALUs on ports 0 and
>> 1 of the Skylake Server microarchitecture
>> > >>> are only 256-bits wide. 512-bit
>> instructions using these ALUs must
>> use both
>> > >>> ports. See section 2.1 of
>> IntelĀ® 64 and IA-32 Architectures
>> Optimization
>> > >>> Reference Manual published
>> October 2017.
>> > >>>
>> > >>
>> > >>
>> > >>> Implementation Plan:
>> > >>>
>> > >>> -Add prefer-avx256 and
>> prefer-avx128 as SubtargetFeatures in
>> X86.td not
>> > >>> mapped to any CPU.
>> > >>>
>> > >>> -Add mprefer-avx256 and
>> mprefer-avx128 and the corresponding
>> > >>> -mno-prefer-avx128/256 options
>> to clang's driver Options.td file. I
>> believe
>> > >>> this will allow clang to pass
>> these straight through to the
>> -target-feature
>> > >>> attribute in IR.
>> > >>>
>> > >>> -Modify
>> X86TTIImpl::getRegisterBitWidth to
>> only return 512 if AVX512 is
>> > >>> enabled and prefer-avx256 and
>> prefer-avx128 is not set. Similarly
>> return
>> > >>> 256 if AVX is enabled and
>> prefer-avx128 is not set.
>> > >>>
>> > >>
>> > >> Instead of multiple flags that
>> have difficult to understand intersecting
>> > >> behavior, one flag with a value
>> would be better. E.g., what should
>> > >> "-mprefer-avx256 -mprefer-avx128
>> -mno-prefer-avx256" do? No matter the
>> > >> answer, it's confusing.
>> (Similarly with other such
>> combinations). Just a
>> > >> single arg
>> "-mprefer-avx={128/256/512}" (with no
>> "no" version) seems easier
>> > >> to understand to me (keeping the
>> same behavior as you mention: asking to
>> > >> prefer a larger width than is
>> supported by your architecture should
>> be fine
>> > >> but ignored).
>> > >>
>> > >>
>> > > I agree with this. It's a little
>> more plumbing as far as subtarget
>> > > features etc (represent via an
>> optional value or just various "set
>> the avx
>> > > width" features - the latter
>> being easier, but uglier), however, it's
>> > > probably the right thing to do.
>> > >
>> > > I was looking at this myself just
>> a couple weeks ago and think this is the
>> > > right direction (when and how to
>> turn things off) - and probably makes
>> > > sense to be a default for these
>> architectures? We might end up needing to
>> > > check a couple of additional TTI
>> places, but it sounds like you're on top
>> > > of it. :)
>> > >
>> > > Thanks very much for doing this work.
>> > >
>> > > -eric
>> > >
>> > >
>> > >>
>> > >>
>> > >> There may be some other backend
>> changes needed, but I plan to address
>> > >>> those as we find them.
>> > >>>
>> > >>>
>> > >>> At a later point, consider
>> making -mprefer-avx256 the default for
>> > >>> Skylake Server due to the above
>> mentioned performance considerations.
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>>
>> > >> Does this sound reasonable?
>> > >>>
>> > >>>
>> > >>>
>> > >>> *Latest Intel Optimization
>> manual available here:
>> > >>>
>> https://software.intel.com/en-us/articles/intel-sdm#optimization
>> > >>>
>> > >>>
>> > >>> -Craig Topper
>> > >>>
>> > >>>
>> _______________________________________________
>> > >>> LLVM Developers mailing list
>> > >>> llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>
>> > >>>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > >>>
>> > >>>
>> _______________________________________________
>> > >> LLVM Developers mailing list
>> > >> llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>
>> > >>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > >>
>> > >
>> >
>> _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>
>> >
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171113/d4ec460d/attachment-0001.html>
More information about the llvm-dev
mailing list