[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available

Sat Nov 11 20:52:30 PST 2017

On 11/11/2017 09:52 PM, UE US via llvm-dev wrote:
> If skylake is that bad at AVX2

I don't think this says anything negative about AVX2, but AVX-512.

> it belongs in -mcpu / -march IMO.

No. We'd still want to enable the architectural features for vector 
intrinsics and the like.

> Based on the current performance data we're seeing, we think we need 
> to ultimately default skylake-avx512 to -mprefer-vector-width=256.

Craig, is this for both integer and floating-point code?

  -Hal

>    Most people will build for the standard x86_64-pc-linux or whatever 
> anyway,  and completely ignore the change. This will mainly affect 
> those who build their own software and optimize for their system, and 
> lots there have probably caught on to this already.  I always thought 
> that's what -march was made for, really.
>
> GNOMETOYS
>
> On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Yes - I was thinking of FeatureFastScalarFSQRT /
>     FeatureFastVectorFSQRT which are used by isFsqrtCheap(). These
>     were added to override the default x86 sqrt estimate codegen with:
>     https://reviews.llvm.org/D21379 <https://reviews.llvm.org/D21379>
>
>     But I'm not sure we really need that kind of hack. Can we adjust
>     the attribute in clang based on the target cpu? Ie, if you have
>     something like:
>     $ clang -O2 -march=skylake-avx512 foo.c
>
>     Then you can detect that in the clang driver and pass
>     -mprefer-vector-width=256 to clang codegen as an option? Clang
>     codegen then adds that function attribute to everything it
>     outputs. Then, the vectorizers and/or backend detect that
>     attribute and adjust their behavior based on it.
>
>     So I don't think we should be messing with any kind of type
>     legality checking because that stuff should all be correct
>     already. We're just choosing a vector size based on a pref. I
>     think we should even allow the pref to go bigger than a legal
>     type. This came up somewhere on llvm-dev or in a bug recently in
>     the context of vector reductions.
>
>
>
>     On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper
>     <craig.topper at gmail.com <mailto:craig.topper at gmail.com>> wrote:
>
>         Are you referring to the X86TargetLowering::isFsqrtCheap hook?
>
>         ~Craig
>
>         On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel
>         <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
>
>             We can tie a user preference / override to a CPU model. We
>             do something like that for square root estimates already
>             (although it does use a SubtargetFeature currently for
>             x86; ideally, we'd key that off of something in the CPU
>             scheduler model).
>
>
>             On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper
>             <craig.topper at gmail.com <mailto:craig.topper at gmail.com>>
>             wrote:
>
>                 I agree that a less x86 specific command line makes
>                 sense. I've been having an internal discussions with
>                 gcc folks and their evaluating switching to something
>                 like -mprefer-vector-width=128/256/512/none
>
>                 Based on the current performance data we're seeing, we
>                 think we need to ultimately default skylake-avx512 to
>                 -mprefer-vector-width=256. If we go with a target
>                 independent option/implementation is there someway we
>                 could still affect the default behavior in a target
>                 specific way?
>
>                 ~Craig
>
>                 On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel
>                 <spatel at rotateright.com
>                 <mailto:spatel at rotateright.com>> wrote:
>
>                     It's clear from the Intel docs how this has
>                     evolved, but from a compiler perspective, this
>                     isn't a Skylake "feature" :) ... nor an Intel
>                     feature, nor an x86 feature.
>
>                     It's a generic programmer hint for any target with
>                     multiple potential vector lengths.
>
>                     On x86, there's already a potential use case for
>                     this hint with a different starting motivation:
>                     re-vectorization. That's where we take C code that
>                     uses 128-bit vector intrinsics and selectively
>                     widen it to 256- or 512-bit vector ops based on a
>                     newer CPU target than the code was originally
>                     written for.
>
>                     I think it's just a matter of time before a
>                     customer requests the same ability for another
>                     target (maybe they already have and I don't know
>                     about it). So we should have a solution that
>                     recognizes that possibility.
>
>                     Note that having a target-independent
>                     implementation in the optimizer doesn't preclude a
>                     flag alias in clang to maintain compatibility with
>                     gcc.
>
>
>
>                     On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via
>                     llvm-dev <llvm-dev at lists.llvm.org
>                     <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>                         On Fri, Nov 3, 2017, at 05:47, Craig Topper
>                         via llvm-dev wrote:
>                         > That's a very good point about the ordering
>                         of the command line options.
>                         > gcc's current implementation treats
>                         -mprefer-avx256 has "prefer 256 over
>                         > 512" and -mprefer-avx128 as "prefer 128 over
>                         256". Which feels weird for
>                         > other reasons, but has less of an ordering
>                         ambiguity.
>                         >
>                         > -mprefer-avx128 has been in gcc for many
>                         years and predates the creation
>                         > of
>                         > avx512. -mprefer-avx256 was added a couple
>                         months ago.
>                         >
>                         > We've had an internal conversation with the
>                         implementor of
>                         > -mprefer-avx256
>                         > in gcc about making -mprefer-avx128 affect
>                         512-bit vectors as well. I'll
>                         > bring up the ambiguity issue with them.
>                         >
>                         > Do we want to be compatible with gcc here?
>
>                         I certainly believe we would want to be
>                         compatible with gcc (if we use
>                         the same names).
>
>                         Best,
>                         Tobias
>
>                         >
>                         > ~Craig
>                         >
>                         > On Thu, Nov 2, 2017 at 7:18 PM, Eric
>                         Christopher <echristo at gmail.com
>                         <mailto:echristo at gmail.com>>
>                         > wrote:
>                         >
>                         > >
>                         > >
>                         > > On Thu, Nov 2, 2017 at 7:05 PM James Y
>                         Knight via llvm-dev <
>                         > > llvm-dev at lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>> wrote:
>                         > >
>                         > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig
>                         Topper via llvm-dev <
>                         > >> llvm-dev at lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>> wrote:
>                         > >>
>                         > >>> Hello all,
>                         > >>>
>                         > >>>
>                         > >>>
>                         > >>> I would like to propose adding the
>                         -mprefer-avx256 and -mprefer-avx128
>                         > >>> command line flags supported by latest
>                         GCC to clang. These flags will be
>                         > >>> used to limit the vector register size
>                         presented by TTI to the vectorizers.
>                         > >>> The backend will still be able to use
>                         wider registers for code written
>                         > >>> using the instrinsics in x86intrin.h.
>                         And the backend will still be able to
>                         > >>> use AVX512VL instructions and the
>                         additional XMM16-31 and YMM16-31
>                         > >>> registers.
>                         > >>>
>                         > >>>
>                         > >>>
>                         > >>> Motivation:
>                         > >>>
>                         > >>> -Using 512-bit operations on some Intel
>                         CPUs may cause a decrease in CPU
>                         > >>> frequency that may offset the gains from
>                         using the wider register size. See
>                         > >>> section 15.26 of Intel® 64 and IA-32
>                         Architectures Optimization Reference
>                         > >>> Manual published October 2017.
>                         > >>>
>                         > >>
>                         > >> I note the doc mentions that 256-bit AVX
>                         operations also have the same
>                         > >> issue with reducing the CPU frequency,
>                         which is nice to see documented!
>                         > >>
>                         > >> There's also the issues discussed here
>                         <http://www.agner.org/
>                         > >> optimize/blog/read.php?i=165> (and
>                         elsewhere) related to warm-up time
>                         > >> for the 256-bit execution pipeline, which
>                         is another issue with using
>                         > >> wide-vector ops.
>                         > >>
>                         > >>
>                         > >> -The vector ALUs on ports 0 and 1 of the
>                         Skylake Server microarchitecture
>                         > >>> are only 256-bits wide. 512-bit
>                         instructions using these ALUs must use both
>                         > >>> ports. See section 2.1 of Intel® 64 and
>                         IA-32 Architectures Optimization
>                         > >>> Reference Manual published October 2017.
>                         > >>>
>                         > >>
>                         > >>
>                         > >>> Implementation Plan:
>                         > >>>
>                         > >>> -Add prefer-avx256 and prefer-avx128 as
>                         SubtargetFeatures in X86.td not
>                         > >>> mapped to any CPU.
>                         > >>>
>                         > >>> -Add mprefer-avx256 and mprefer-avx128
>                         and the corresponding
>                         > >>> -mno-prefer-avx128/256 options to
>                         clang's driver Options.td file. I believe
>                         > >>> this will allow clang to pass these
>                         straight through to the -target-feature
>                         > >>> attribute in IR.
>                         > >>>
>                         > >>> -Modify X86TTIImpl::getRegisterBitWidth
>                         to only return 512 if AVX512 is
>                         > >>> enabled and prefer-avx256 and
>                         prefer-avx128 is not set. Similarly return
>                         > >>> 256 if AVX is enabled and prefer-avx128
>                         is not set.
>                         > >>>
>                         > >>
>                         > >> Instead of multiple flags that have
>                         difficult to understand intersecting
>                         > >> behavior, one flag with a value would be
>                         better. E.g., what should
>                         > >> "-mprefer-avx256 -mprefer-avx128
>                         -mno-prefer-avx256" do? No matter the
>                         > >> answer, it's confusing. (Similarly with
>                         other such combinations). Just a
>                         > >> single arg "-mprefer-avx={128/256/512}"
>                         (with no "no" version) seems easier
>                         > >> to understand to me (keeping the same
>                         behavior as you mention: asking to
>                         > >> prefer a larger width than is supported
>                         by your architecture should be fine
>                         > >> but ignored).
>                         > >>
>                         > >>
>                         > > I agree with this. It's a little more
>                         plumbing as far as subtarget
>                         > > features etc (represent via an optional
>                         value or just various "set the avx
>                         > > width" features - the latter being easier,
>                         but uglier), however, it's
>                         > > probably the right thing to do.
>                         > >
>                         > > I was looking at this myself just a couple
>                         weeks ago and think this is the
>                         > > right direction (when and how to turn
>                         things off) - and probably makes
>                         > > sense to be a default for these
>                         architectures? We might end up needing to
>                         > > check a couple of additional TTI places,
>                         but it sounds like you're on top
>                         > > of it. :)
>                         > >
>                         > > Thanks very much for doing this work.
>                         > >
>                         > > -eric
>                         > >
>                         > >
>                         > >>
>                         > >>
>                         > >> There may be some other backend changes
>                         needed, but I plan to address
>                         > >>> those as we find them.
>                         > >>>
>                         > >>>
>                         > >>> At a later point, consider making
>                         -mprefer-avx256 the default for
>                         > >>> Skylake Server due to the above
>                         mentioned performance considerations.
>                         > >>>
>                         > >>
>                         > >>
>                         > >>
>                         > >>
>                         > >>
>                         > >>>
>                         > >> Does this sound reasonable?
>                         > >>>
>                         > >>>
>                         > >>>
>                         > >>> *Latest Intel Optimization manual
>                         available here:
>                         > >>>
>                         https://software.intel.com/en-us/articles/intel-sdm#optimization
>                         <https://software.intel.com/en-us/articles/intel-sdm#optimization>
>                         > >>>
>                         > >>>
>                         > >>> -Craig Topper
>                         > >>>
>                         > >>>
>                         _______________________________________________
>                         > >>> LLVM Developers mailing list
>                         > >>> llvm-dev at lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>
>                         > >>>
>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>                         > >>>
>                         > >>>
>                         _______________________________________________
>                         > >> LLVM Developers mailing list
>                         > >> llvm-dev at lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>
>                         > >>
>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>                         > >>
>                         > >
>                         > _______________________________________________
>                         > LLVM Developers mailing list
>                         > llvm-dev at lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>
>                         >
>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>                         _______________________________________________
>                         LLVM Developers mailing list
>                         llvm-dev at lists.llvm.org
>                         <mailto:llvm-dev at lists.llvm.org>
>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
>
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171111/0e1153d7/attachment-0001.html>