<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Hello

all,<span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">I would

like to propose adding the -mprefer-avx256 and -mprefer-avx128 command line

flags supported by latest GCC to clang. These flags will be used to limit the vector

register size presented by TTI to the vectorizers. The backend will still be

able to use wider registers for code written using the instrinsics in

x86intrin.h. And the backend will still be able to use AVX512VL instructions and the

additional XMM16-31 and YMM16-31 registers.<span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Motivation:<span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Using

512-bit operations on some Intel CPUs may cause a decrease in CPU frequency

that may offset the gains from using the wider register size. See section 15.26

of Intel® 64 and IA-32 Architectures Optimization Reference Manual published

October 2017.</span></p></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I note the doc mentions that 256-bit AVX operations also have the same issue with reducing the CPU frequency, which is nice to see documented!<br></div><div><br></div><div>There's also the issues discussed here <<a href="http://www.agner.org/optimize/blog/read.php?i=165" target="_blank">http://www.agner.org/optimize/blog/read.php?i=165</a>> (and elsewhere) related to warm-up time for the 256-bit execution pipeline, which is another issue with using wide-vector ops.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-The

vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture are only

256-bits wide. 512-bit instructions using these ALUs must use both ports. See

section 2.1 of Intel® 64 and IA-32 Architectures Optimization Reference Manual

published October 2017.</span></p></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span><span style="font-family:Arial,sans-serif;font-size:12pt">Implementation

Plan:</span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Add

prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not mapped to

any CPU.<span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Add

mprefer-avx256 and mprefer-avx128 and the corresponding -mno-prefer-avx128/256

options to clang's driver Options.td file. I believe this will allow clang to

pass these straight through to the -target-feature attribute in IR.<span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Modify X86TTIImpl::getRegisterBitWidth

to only return 512 if AVX512 is enabled and prefer-avx256 and prefer-avx128 is not set. Similarly

return 256 if AVX is enabled and prefer-avx128 is not set.</span></p></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Instead of multiple flags that have difficult to understand intersecting behavior, one flag with a value would be better. E.g., what should "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the answer, it's confusing. (Similarly with other such combinations). Just a single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier to understand to me (keeping the same behavior as you mention: asking to prefer a larger width than is supported by your architecture should be fine but ignored).</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div></div></div></div></blockquote><div><br></div><div>I agree with this. It's a little more plumbing as far as subtarget features etc (represent via an optional value or just various "set the avx width" features - the latter being easier, but uglier), however, it's probably the right thing to do.</div><div><br></div><div>I was looking at this myself just a couple weeks ago and think this is the right direction (when and how to turn things off) - and probably makes sense to be a default for these architectures? We might end up needing to check a couple of additional TTI places, but it sounds like you're on top of it. :)</div><div><br></div><div>Thanks very much for doing this work.</div><div><br></div><div>-eric</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span></span></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-family:Arial,sans-serif;font-size:12pt">There may

be some other backend changes needed, but I plan to address those as we find

them.</span><br></p><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><br></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-family:Arial,sans-serif;font-size:16px">At a later point, consider making -mprefer-avx256 the default for Skylake Server due to the above mentioned performance considerations.</span></p></div></blockquote><div><br></div><div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"> </p></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Does this

sound reasonable?<span></span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>


<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">*Latest

Intel Optimization manual available here: <span></span></span><font face="Arial, sans-serif"><span style="font-size:16px"><a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" target="_blank">https://software.intel.com/en-us/articles/intel-sdm#optimization</a></span></font></p><span class="m_-6388844386687843410m_6255661963237500281gmail-m_4991119759517735975gmail-HOEnZb"><font color="#888888"><p class="MsoNormal" style="margin-left:1in"><font face="Arial, sans-serif"><span style="font-size:16px"><br></span></font></p><p class="MsoNormal" style="margin-left:1in"><font face="Arial, sans-serif"><span style="font-size:16px">-Craig Topper</span></font></p></font></span></div>

</div>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div></div></div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div></div>