<div dir="ltr"><div>If skylake is that bad at AVX2 it belongs in -mcpu / -march IMO. Most
people will build for the standard x86_64-pc-linux or whatever anyway, and
completely ignore the change. This will mainly affect those who build their own software and optimize for their system, and lots there have probably caught on to this already. I always thought that's what -march was made for, really. <br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature">GNOMETOYS<br></div></div>
<br><div class="gmail_quote">On Sat, Nov 11, 2017 at 10:25 AM, Sanjay Patel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Yes - I was thinking of FeatureFastScalarFSQRT / FeatureFastVectorFSQRT which are used by isFsqrtCheap(). These were added to override the default x86 sqrt estimate codegen with:<br><a href="https://reviews.llvm.org/D21379" target="_blank">https://reviews.llvm.org/D2137<wbr>9</a><br><br></div>But I'm not sure we really need that kind of hack. Can we adjust the attribute in clang based on the target cpu? Ie, if you have something like:<br></div>$ clang -O2 -march=skylake-avx512 foo.c<br><br>Then you can detect that in the clang driver and pass -mprefer-vector-width=256 to clang codegen as an option? Clang codegen then adds that function attribute to everything it outputs. Then, the vectorizers and/or backend detect that attribute and adjust their behavior based on it. <br><br>So I don't think we should be messing with any kind of type legality checking because that stuff should all be correct already. We're just choosing a vector size based on a pref. I think we should even allow the pref to go bigger than a legal type. This came up somewhere on llvm-dev or in a bug recently in the context of vector reductions.<br><br><br></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Are you referring to the X86TargetLowering::isFsqrt<wbr>Cheap hook?</div><div class="gmail_extra"><br clear="all"><div><div class="m_6454106954572217318m_771050129279988374gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>
<br><div class="gmail_quote">On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">We can tie a user preference / override to a CPU model. We do something like that for square root estimates already (although it does use a SubtargetFeature currently for x86; ideally, we'd key that off of something in the CPU scheduler model).<div><div class="m_6454106954572217318m_771050129279988374h5"><br><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I agree that a less x86 specific command line makes sense. I've been having an internal discussions with gcc folks and their evaluating switching to something like -mprefer-vector-width=128/256/<wbr>512/none<div><br></div><div>Based on the current performance data we're seeing, we think we need to ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go with a target independent option/implementation is there someway we could still affect the default behavior in a target specific way?</div></div><div class="gmail_extra"><br clear="all"><div><div class="m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>
<br><div class="gmail_quote">On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>It's clear from the Intel docs how this has evolved, but from a compiler perspective, this isn't a Skylake "feature" :) ... nor an Intel feature, nor an x86 feature. <br><br>It's a generic programmer hint for any target with multiple potential vector lengths. <br></div><div><br></div><div>On x86, there's already a potential use case for this hint with a different starting motivation: re-vectorization. That's where we take C code that uses 128-bit vector intrinsics and selectively widen it to 256- or 512-bit vector ops based on a newer CPU target than the code was originally written for.<br><div><br></div><div>I think it's just a matter of time before a customer requests the
same ability for another target (maybe they already have and I don't
know about it). So we should have a solution that recognizes that
possibility. <br></div><div><br></div></div>Note that having a target-independent implementation in the optimizer doesn't preclude a flag alias in clang to maintain compatibility with gcc.<div><div class="m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991h5"><br><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:<br>
> That's a very good point about the ordering of the command line options.<br>
> gcc's current implementation treats -mprefer-avx256 has "prefer 256 over<br>
> 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for<br>
> other reasons, but has less of an ordering ambiguity.<br>
><br>
> -mprefer-avx128 has been in gcc for many years and predates the creation<br>
> of<br>
> avx512. -mprefer-avx256 was added a couple months ago.<br>
><br>
> We've had an internal conversation with the implementor of<br>
> -mprefer-avx256<br>
> in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll<br>
> bring up the ambiguity issue with them.<br>
><br>
> Do we want to be compatible with gcc here?<br>
<br>
I certainly believe we would want to be compatible with gcc (if we use<br>
the same names).<br>
<br>
Best,<br>
Tobias<br>
<br>
><br>
> ~Craig<br>
><br>
> On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>><br>
> wrote:<br>
><br>
> ><br>
> ><br>
> > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <<br>
> > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
> ><br>
> >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <<br>
> >> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
> >><br>
> >>> Hello all,<br>
> >>><br>
> >>><br>
> >>><br>
> >>> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128<br>
> >>> command line flags supported by latest GCC to clang. These flags will be<br>
> >>> used to limit the vector register size presented by TTI to the vectorizers.<br>
> >>> The backend will still be able to use wider registers for code written<br>
> >>> using the instrinsics in x86intrin.h. And the backend will still be able to<br>
> >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31<br>
> >>> registers.<br>
> >>><br>
> >>><br>
> >>><br>
> >>> Motivation:<br>
> >>><br>
> >>> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU<br>
> >>> frequency that may offset the gains from using the wider register size. See<br>
> >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference<br>
> >>> Manual published October 2017.<br>
> >>><br>
> >><br>
> >> I note the doc mentions that 256-bit AVX operations also have the same<br>
> >> issue with reducing the CPU frequency, which is nice to see documented!<br>
> >><br>
> >> There's also the issues discussed here <<a href="http://www.agner.org/" rel="noreferrer" target="_blank">http://www.agner.org/</a><br>
> >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time<br>
> >> for the 256-bit execution pipeline, which is another issue with using<br>
> >> wide-vector ops.<br>
> >><br>
> >><br>
> >> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture<br>
> >>> are only 256-bits wide. 512-bit instructions using these ALUs must use both<br>
> >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization<br>
> >>> Reference Manual published October 2017.<br>
> >>><br>
> >><br>
> >><br>
> >>> Implementation Plan:<br>
> >>><br>
> >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not<br>
> >>> mapped to any CPU.<br>
> >>><br>
> >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding<br>
> >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe<br>
> >>> this will allow clang to pass these straight through to the -target-feature<br>
> >>> attribute in IR.<br>
> >>><br>
> >>> -Modify X86TTIImpl::getRegisterBitWidt<wbr>h to only return 512 if AVX512 is<br>
> >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return<br>
> >>> 256 if AVX is enabled and prefer-avx128 is not set.<br>
> >>><br>
> >><br>
> >> Instead of multiple flags that have difficult to understand intersecting<br>
> >> behavior, one flag with a value would be better. E.g., what should<br>
> >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the<br>
> >> answer, it's confusing. (Similarly with other such combinations). Just a<br>
> >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier<br>
> >> to understand to me (keeping the same behavior as you mention: asking to<br>
> >> prefer a larger width than is supported by your architecture should be fine<br>
> >> but ignored).<br>
> >><br>
> >><br>
> > I agree with this. It's a little more plumbing as far as subtarget<br>
> > features etc (represent via an optional value or just various "set the avx<br>
> > width" features - the latter being easier, but uglier), however, it's<br>
> > probably the right thing to do.<br>
> ><br>
> > I was looking at this myself just a couple weeks ago and think this is the<br>
> > right direction (when and how to turn things off) - and probably makes<br>
> > sense to be a default for these architectures? We might end up needing to<br>
> > check a couple of additional TTI places, but it sounds like you're on top<br>
> > of it. :)<br>
> ><br>
> > Thanks very much for doing this work.<br>
> ><br>
> > -eric<br>
> ><br>
> ><br>
> >><br>
> >><br>
> >> There may be some other backend changes needed, but I plan to address<br>
> >>> those as we find them.<br>
> >>><br>
> >>><br>
> >>> At a later point, consider making -mprefer-avx256 the default for<br>
> >>> Skylake Server due to the above mentioned performance considerations.<br>
> >>><br>
> >><br>
> >><br>
> >><br>
> >><br>
> >><br>
> >>><br>
> >> Does this sound reasonable?<br>
> >>><br>
> >>><br>
> >>><br>
> >>> *Latest Intel Optimization manual available here:<br>
> >>> <a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/intel-sdm#optimiza<wbr>tion</a><br>
> >>><br>
> >>><br>
> >>> -Craig Topper<br>
> >>><br>
> >>> ______________________________<wbr>_________________<br>
> >>> LLVM Developers mailing list<br>
> >>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> >>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
> >>><br>
> >>> ______________________________<wbr>_________________<br>
> >> LLVM Developers mailing list<br>
> >> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> >> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
> >><br>
> ><br>
> ______________________________<wbr>_________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
</blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></div>
</blockquote></div><br></div>
</blockquote></div><br></div>
</div></div><br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div>