[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Sat Nov 11 20:52:30 PST 2017
On 11/11/2017 09:52 PM, UE US via llvm-dev wrote:
> If skylake is that bad at AVX2
I don't think this says anything negative about AVX2, but AVX-512.
> it belongs in -mcpu / -march IMO.
No. We'd still want to enable the architectural features for vector
intrinsics and the like.
> Based on the current performance data we're seeing, we think we need
> to ultimately default skylake-avx512 to -mprefer-vector-width=256.
Craig, is this for both integer and floating-point code?
-Hal
> Most people will build for the standard x86_64-pc-linux or whatever
> anyway, and completely ignore the change. This will mainly affect
> those who build their own software and optimize for their system, and
> lots there have probably caught on to this already. I always thought
> that's what -march was made for, really.
>
> GNOMETOYS
>
> On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Yes - I was thinking of FeatureFastScalarFSQRT /
> FeatureFastVectorFSQRT which are used by isFsqrtCheap(). These
> were added to override the default x86 sqrt estimate codegen with:
> https://reviews.llvm.org/D21379 <https://reviews.llvm.org/D21379>
>
> But I'm not sure we really need that kind of hack. Can we adjust
> the attribute in clang based on the target cpu? Ie, if you have
> something like:
> $ clang -O2 -march=skylake-avx512 foo.c
>
> Then you can detect that in the clang driver and pass
> -mprefer-vector-width=256 to clang codegen as an option? Clang
> codegen then adds that function attribute to everything it
> outputs. Then, the vectorizers and/or backend detect that
> attribute and adjust their behavior based on it.
>
> So I don't think we should be messing with any kind of type
> legality checking because that stuff should all be correct
> already. We're just choosing a vector size based on a pref. I
> think we should even allow the pref to go bigger than a legal
> type. This came up somewhere on llvm-dev or in a bug recently in
> the context of vector reductions.
>
>
>
> On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper
> <craig.topper at gmail.com <mailto:craig.topper at gmail.com>> wrote:
>
> Are you referring to the X86TargetLowering::isFsqrtCheap hook?
>
> ~Craig
>
> On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel
> <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
>
> We can tie a user preference / override to a CPU model. We
> do something like that for square root estimates already
> (although it does use a SubtargetFeature currently for
> x86; ideally, we'd key that off of something in the CPU
> scheduler model).
>
>
> On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper
> <craig.topper at gmail.com <mailto:craig.topper at gmail.com>>
> wrote:
>
> I agree that a less x86 specific command line makes
> sense. I've been having an internal discussions with
> gcc folks and their evaluating switching to something
> like -mprefer-vector-width=128/256/512/none
>
> Based on the current performance data we're seeing, we
> think we need to ultimately default skylake-avx512 to
> -mprefer-vector-width=256. If we go with a target
> independent option/implementation is there someway we
> could still affect the default behavior in a target
> specific way?
>
> ~Craig
>
> On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel
> <spatel at rotateright.com
> <mailto:spatel at rotateright.com>> wrote:
>
> It's clear from the Intel docs how this has
> evolved, but from a compiler perspective, this
> isn't a Skylake "feature" :) ... nor an Intel
> feature, nor an x86 feature.
>
> It's a generic programmer hint for any target with
> multiple potential vector lengths.
>
> On x86, there's already a potential use case for
> this hint with a different starting motivation:
> re-vectorization. That's where we take C code that
> uses 128-bit vector intrinsics and selectively
> widen it to 256- or 512-bit vector ops based on a
> newer CPU target than the code was originally
> written for.
>
> I think it's just a matter of time before a
> customer requests the same ability for another
> target (maybe they already have and I don't know
> about it). So we should have a solution that
> recognizes that possibility.
>
> Note that having a target-independent
> implementation in the optimizer doesn't preclude a
> flag alias in clang to maintain compatibility with
> gcc.
>
>
>
> On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via
> llvm-dev <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> On Fri, Nov 3, 2017, at 05:47, Craig Topper
> via llvm-dev wrote:
> > That's a very good point about the ordering
> of the command line options.
> > gcc's current implementation treats
> -mprefer-avx256 has "prefer 256 over
> > 512" and -mprefer-avx128 as "prefer 128 over
> 256". Which feels weird for
> > other reasons, but has less of an ordering
> ambiguity.
> >
> > -mprefer-avx128 has been in gcc for many
> years and predates the creation
> > of
> > avx512. -mprefer-avx256 was added a couple
> months ago.
> >
> > We've had an internal conversation with the
> implementor of
> > -mprefer-avx256
> > in gcc about making -mprefer-avx128 affect
> 512-bit vectors as well. I'll
> > bring up the ambiguity issue with them.
> >
> > Do we want to be compatible with gcc here?
>
> I certainly believe we would want to be
> compatible with gcc (if we use
> the same names).
>
> Best,
> Tobias
>
> >
> > ~Craig
> >
> > On Thu, Nov 2, 2017 at 7:18 PM, Eric
> Christopher <echristo at gmail.com
> <mailto:echristo at gmail.com>>
> > wrote:
> >
> > >
> > >
> > > On Thu, Nov 2, 2017 at 7:05 PM James Y
> Knight via llvm-dev <
> > > llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
> > >
> > >> On Wed, Nov 1, 2017 at 7:35 PM, Craig
> Topper via llvm-dev <
> > >> llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
> > >>
> > >>> Hello all,
> > >>>
> > >>>
> > >>>
> > >>> I would like to propose adding the
> -mprefer-avx256 and -mprefer-avx128
> > >>> command line flags supported by latest
> GCC to clang. These flags will be
> > >>> used to limit the vector register size
> presented by TTI to the vectorizers.
> > >>> The backend will still be able to use
> wider registers for code written
> > >>> using the instrinsics in x86intrin.h.
> And the backend will still be able to
> > >>> use AVX512VL instructions and the
> additional XMM16-31 and YMM16-31
> > >>> registers.
> > >>>
> > >>>
> > >>>
> > >>> Motivation:
> > >>>
> > >>> -Using 512-bit operations on some Intel
> CPUs may cause a decrease in CPU
> > >>> frequency that may offset the gains from
> using the wider register size. See
> > >>> section 15.26 of IntelĀ® 64 and IA-32
> Architectures Optimization Reference
> > >>> Manual published October 2017.
> > >>>
> > >>
> > >> I note the doc mentions that 256-bit AVX
> operations also have the same
> > >> issue with reducing the CPU frequency,
> which is nice to see documented!
> > >>
> > >> There's also the issues discussed here
> <http://www.agner.org/
> > >> optimize/blog/read.php?i=165> (and
> elsewhere) related to warm-up time
> > >> for the 256-bit execution pipeline, which
> is another issue with using
> > >> wide-vector ops.
> > >>
> > >>
> > >> -The vector ALUs on ports 0 and 1 of the
> Skylake Server microarchitecture
> > >>> are only 256-bits wide. 512-bit
> instructions using these ALUs must use both
> > >>> ports. See section 2.1 of IntelĀ® 64 and
> IA-32 Architectures Optimization
> > >>> Reference Manual published October 2017.
> > >>>
> > >>
> > >>
> > >>> Implementation Plan:
> > >>>
> > >>> -Add prefer-avx256 and prefer-avx128 as
> SubtargetFeatures in X86.td not
> > >>> mapped to any CPU.
> > >>>
> > >>> -Add mprefer-avx256 and mprefer-avx128
> and the corresponding
> > >>> -mno-prefer-avx128/256 options to
> clang's driver Options.td file. I believe
> > >>> this will allow clang to pass these
> straight through to the -target-feature
> > >>> attribute in IR.
> > >>>
> > >>> -Modify X86TTIImpl::getRegisterBitWidth
> to only return 512 if AVX512 is
> > >>> enabled and prefer-avx256 and
> prefer-avx128 is not set. Similarly return
> > >>> 256 if AVX is enabled and prefer-avx128
> is not set.
> > >>>
> > >>
> > >> Instead of multiple flags that have
> difficult to understand intersecting
> > >> behavior, one flag with a value would be
> better. E.g., what should
> > >> "-mprefer-avx256 -mprefer-avx128
> -mno-prefer-avx256" do? No matter the
> > >> answer, it's confusing. (Similarly with
> other such combinations). Just a
> > >> single arg "-mprefer-avx={128/256/512}"
> (with no "no" version) seems easier
> > >> to understand to me (keeping the same
> behavior as you mention: asking to
> > >> prefer a larger width than is supported
> by your architecture should be fine
> > >> but ignored).
> > >>
> > >>
> > > I agree with this. It's a little more
> plumbing as far as subtarget
> > > features etc (represent via an optional
> value or just various "set the avx
> > > width" features - the latter being easier,
> but uglier), however, it's
> > > probably the right thing to do.
> > >
> > > I was looking at this myself just a couple
> weeks ago and think this is the
> > > right direction (when and how to turn
> things off) - and probably makes
> > > sense to be a default for these
> architectures? We might end up needing to
> > > check a couple of additional TTI places,
> but it sounds like you're on top
> > > of it. :)
> > >
> > > Thanks very much for doing this work.
> > >
> > > -eric
> > >
> > >
> > >>
> > >>
> > >> There may be some other backend changes
> needed, but I plan to address
> > >>> those as we find them.
> > >>>
> > >>>
> > >>> At a later point, consider making
> -mprefer-avx256 the default for
> > >>> Skylake Server due to the above
> mentioned performance considerations.
> > >>>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>>
> > >> Does this sound reasonable?
> > >>>
> > >>>
> > >>>
> > >>> *Latest Intel Optimization manual
> available here:
> > >>>
> https://software.intel.com/en-us/articles/intel-sdm#optimization
> <https://software.intel.com/en-us/articles/intel-sdm#optimization>
> > >>>
> > >>>
> > >>> -Craig Topper
> > >>>
> > >>>
> _______________________________________________
> > >>> LLVM Developers mailing list
> > >>> llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>
> > >>>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> > >>>
> > >>>
> _______________________________________________
> > >> LLVM Developers mailing list
> > >> llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>
> > >>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> > >>
> > >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>
> >
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171111/0e1153d7/attachment-0001.html>
More information about the llvm-dev
mailing list