<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Hello
all,<span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">I would
like to propose adding the -mprefer-avx256 and -mprefer-avx128 command line
flags supported by latest GCC to clang. These flags will be used to limit the vector
register size presented by TTI to the vectorizers. The backend will still be
able to use wider registers for code written using the instrinsics in
x86intrin.h. And the backend will still be able to use AVX512VL instructions and the
additional XMM16-31 and YMM16-31 registers.<span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Motivation:<span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Using
512-bit operations on some Intel CPUs may cause a decrease in CPU frequency
that may offset the gains from using the wider register size. See section 15.26
of Intel® 64 and IA-32 Architectures Optimization Reference Manual published
October 2017.</span></p></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I note the doc mentions that 256-bit AVX operations also have the same issue with reducing the CPU frequency, which is nice to see documented!<br></div><div><br></div><div>There's also the issues discussed here <<a href="http://www.agner.org/optimize/blog/read.php?i=165" target="_blank">http://www.agner.org/optimize/blog/read.php?i=165</a>> (and elsewhere) related to warm-up time for the 256-bit execution pipeline, which is another issue with using wide-vector ops.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-The
vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture are only
256-bits wide. 512-bit instructions using these ALUs must use both ports. See
section 2.1 of Intel® 64 and IA-32 Architectures Optimization Reference Manual
published October 2017.</span></p></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span><span style="font-family:Arial,sans-serif;font-size:12pt">Implementation
Plan:</span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Add
prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not mapped to
any CPU.<span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Add
mprefer-avx256 and mprefer-avx128 and the corresponding -mno-prefer-avx128/256
options to clang's driver Options.td file. I believe this will allow clang to
pass these straight through to the -target-feature attribute in IR.<span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Modify X86TTIImpl::getRegisterBitWidth
to only return 512 if AVX512 is enabled and prefer-avx256 and prefer-avx128 is not set. Similarly
return 256 if AVX is enabled and prefer-avx128 is not set.</span></p></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Instead of multiple flags that have difficult to understand intersecting behavior, one flag with a value would be better. E.g., what should "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the answer, it's confusing. (Similarly with other such combinations). Just a single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier to understand to me (keeping the same behavior as you mention: asking to prefer a larger width than is supported by your architecture should be fine but ignored).</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div></div></div></div></blockquote><div><br></div><div>I agree with this. It's a little more plumbing as far as subtarget features etc (represent via an optional value or just various "set the avx width" features - the latter being easier, but uglier), however, it's probably the right thing to do.</div><div><br></div><div>I was looking at this myself just a couple weeks ago and think this is the right direction (when and how to turn things off) - and probably makes sense to be a default for these architectures? We might end up needing to check a couple of additional TTI places, but it sounds like you're on top of it. :)</div><div><br></div><div>Thanks very much for doing this work.</div><div><br></div><div>-eric</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span></span></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-family:Arial,sans-serif;font-size:12pt">There may
be some other backend changes needed, but I plan to address those as we find
them.</span><br></p><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><br></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-family:Arial,sans-serif;font-size:16px">At a later point, consider making -mprefer-avx256 the default for Skylake Server due to the above mentioned performance considerations.</span></p></div></blockquote><div><br></div><div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin-left:1in"> </p></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Does this
sound reasonable?<span></span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>
<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">*Latest
Intel Optimization manual available here: <span></span></span><font face="Arial, sans-serif"><span style="font-size:16px"><a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" target="_blank">https://software.intel.com/en-us/articles/intel-sdm#optimization</a></span></font></p><span class="m_-6388844386687843410m_6255661963237500281gmail-m_4991119759517735975gmail-HOEnZb"><font color="#888888"><p class="MsoNormal" style="margin-left:1in"><font face="Arial, sans-serif"><span style="font-size:16px"><br></span></font></p><p class="MsoNormal" style="margin-left:1in"><font face="Arial, sans-serif"><span style="font-size:16px">-Craig Topper</span></font></p></font></span></div>
</div>
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div></div></div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div></div>