<div dir="ltr"><div><div>Yes - I was thinking of FeatureFastScalarFSQRT / FeatureFastVectorFSQRT which are used by isFsqrtCheap(). These were added to override the default x86 sqrt estimate codegen with:<br><a href="https://reviews.llvm.org/D21379" target="_blank">https://reviews.llvm.org/<wbr>D21379</a><br><br></div>But I'm not sure we really need that kind of hack. Can we adjust the attribute in clang based on the target cpu? Ie, if you have something like:<br></div>$ clang -O2 -march=skylake-avx512 foo.c<br><br>Then you can detect that in the clang driver and pass -mprefer-vector-width=256 to clang codegen as an option? Clang codegen then adds that function attribute to everything it outputs. Then, the vectorizers and/or backend detect that attribute and adjust their behavior based on it. <br><br>So I don't think we should be messing with any kind of type legality checking because that stuff should all be correct already. We're just choosing a vector size based on a pref. I think we should even allow the pref to go bigger than a legal type. This came up somewhere on llvm-dev or in a bug recently in the context of vector reductions.<br><br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 10, 2017 at 6:04 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Are you referring to the X86TargetLowering::<wbr>isFsqrtCheap hook?</div><div class="gmail_extra"><br clear="all"><div><div class="m_771050129279988374gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>

<br><div class="gmail_quote">On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">We can tie a user preference / override to a CPU model. We do something like that for square root estimates already (although it does use a SubtargetFeature currently for x86; ideally, we'd key that off of something in the CPU scheduler model).<div><div class="m_771050129279988374h5"><br><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I agree that a less x86 specific command line makes sense. I've been having an internal discussions with gcc folks and their evaluating switching to something like -mprefer-vector-width=128/256/<wbr>512/none<div><br></div><div>Based on the current performance data we're seeing, we think we need to ultimately default skylake-avx512 to -mprefer-vector-width=256. If we go with a target independent option/implementation is there someway we could still affect the default behavior in a target specific way?</div></div><div class="gmail_extra"><br clear="all"><div><div class="m_771050129279988374m_4887027107317541871m_-9050519988835790991gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>

<br><div class="gmail_quote">On Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>It's clear from the Intel docs how this has evolved, but from a compiler perspective, this isn't a Skylake "feature" :) ... nor an Intel feature, nor an x86 feature. <br><br>It's a generic programmer hint for any target with multiple potential vector lengths. <br></div><div><br></div><div>On x86, there's already a potential use case for this hint with a different starting motivation: re-vectorization. That's where we take C code that uses 128-bit vector intrinsics and selectively widen it to 256- or 512-bit vector ops based on a newer CPU target than the code was originally written for.<br><div><br></div><div>I think it's just a matter of time before a customer requests the 

same ability for another target (maybe they already have and I don't 

know about it). So we should have a solution that recognizes that 

possibility. <br></div><div><br></div></div>Note that having a target-independent implementation in the optimizer doesn't preclude a flag alias in clang to maintain compatibility with gcc.<div><div class="m_771050129279988374m_4887027107317541871m_-9050519988835790991h5"><br><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:<br>

> That's a very good point about the ordering of the command line options.<br>

> gcc's current implementation treats -mprefer-avx256 has "prefer 256 over<br>

> 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for<br>

> other reasons, but has less of an ordering ambiguity.<br>

><br>

> -mprefer-avx128 has been in gcc for many years and predates the creation<br>

> of<br>

> avx512. -mprefer-avx256 was added a couple months ago.<br>

><br>

> We've had an internal conversation with the implementor of<br>

> -mprefer-avx256<br>

> in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll<br>

> bring up the ambiguity issue with them.<br>

><br>

> Do we want to be compatible with gcc here?<br>

<br>

I certainly believe we would want to be compatible with gcc (if we use<br>

the same names).<br>

<br>

Best,<br>

Tobias<br>

<br>

><br>

> ~Craig<br>

><br>

> On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>><br>

> wrote:<br>

><br>

> ><br>

> ><br>

> > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <<br>

> > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

> ><br>

> >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <<br>

> >> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

> >><br>

> >>> Hello all,<br>

> >>><br>

> >>><br>

> >>><br>

> >>> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128<br>

> >>> command line flags supported by latest GCC to clang. These flags will be<br>

> >>> used to limit the vector register size presented by TTI to the vectorizers.<br>

> >>> The backend will still be able to use wider registers for code written<br>

> >>> using the instrinsics in x86intrin.h. And the backend will still be able to<br>

> >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31<br>

> >>> registers.<br>

> >>><br>

> >>><br>

> >>><br>

> >>> Motivation:<br>

> >>><br>

> >>> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU<br>

> >>> frequency that may offset the gains from using the wider register size. See<br>

> >>> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference<br>

> >>> Manual published October 2017.<br>

> >>><br>

> >><br>

> >> I note the doc mentions that 256-bit AVX operations also have the same<br>

> >> issue with reducing the CPU frequency, which is nice to see documented!<br>

> >><br>

> >> There's also the issues discussed here <<a href="http://www.agner.org/" rel="noreferrer" target="_blank">http://www.agner.org/</a><br>

> >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time<br>

> >> for the 256-bit execution pipeline, which is another issue with using<br>

> >> wide-vector ops.<br>

> >><br>

> >><br>

> >> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture<br>

> >>> are only 256-bits wide. 512-bit instructions using these ALUs must use both<br>

> >>> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization<br>

> >>> Reference Manual published October 2017.<br>

> >>><br>

> >><br>

> >><br>

> >>>  Implementation Plan:<br>

> >>><br>

> >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not<br>

> >>> mapped to any CPU.<br>

> >>><br>

> >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding<br>

> >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe<br>

> >>> this will allow clang to pass these straight through to the -target-feature<br>

> >>> attribute in IR.<br>

> >>><br>

> >>> -Modify X86TTIImpl::getRegisterBitWidt<wbr>h to only return 512 if AVX512 is<br>

> >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return<br>

> >>> 256 if AVX is enabled and prefer-avx128 is not set.<br>

> >>><br>

> >><br>

> >> Instead of multiple flags that have difficult to understand intersecting<br>

> >> behavior, one flag with a value would be better. E.g., what should<br>

> >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the<br>

> >> answer, it's confusing. (Similarly with other such combinations). Just a<br>

> >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier<br>

> >> to understand to me (keeping the same behavior as you mention: asking to<br>

> >> prefer a larger width than is supported by your architecture should be fine<br>

> >> but ignored).<br>

> >><br>

> >><br>

> > I agree with this. It's a little more plumbing as far as subtarget<br>

> > features etc (represent via an optional value or just various "set the avx<br>

> > width" features - the latter being easier, but uglier), however, it's<br>

> > probably the right thing to do.<br>

> ><br>

> > I was looking at this myself just a couple weeks ago and think this is the<br>

> > right direction (when and how to turn things off) - and probably makes<br>

> > sense to be a default for these architectures? We might end up needing to<br>

> > check a couple of additional TTI places, but it sounds like you're on top<br>

> > of it. :)<br>

> ><br>

> > Thanks very much for doing this work.<br>

> ><br>

> > -eric<br>

> ><br>

> ><br>

> >><br>

> >><br>

> >> There may be some other backend changes needed, but I plan to address<br>

> >>> those as we find them.<br>

> >>><br>

> >>><br>

> >>> At a later point, consider making -mprefer-avx256 the default for<br>

> >>> Skylake Server due to the above mentioned performance considerations.<br>

> >>><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >><br>

> >>><br>

> >> Does this sound reasonable?<br>

> >>><br>

> >>><br>

> >>><br>

> >>> *Latest Intel Optimization manual available here:<br>

> >>> <a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/intel-sdm#optimiza<wbr>tion</a><br>

> >>><br>

> >>><br>

> >>> -Craig Topper<br>

> >>><br>

> >>> ______________________________<wbr>_________________<br>

> >>> LLVM Developers mailing list<br>

> >>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> >>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

> >>><br>

> >>> ______________________________<wbr>_________________<br>

> >> LLVM Developers mailing list<br>

> >> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> >> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

> >><br>

> ><br>

> ______________________________<wbr>_________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br></div></div></div></div>

</blockquote></div><br></div>

</blockquote></div><br></div></div></div></div></div>

</blockquote></div><br></div>

</blockquote></div><br></div>