<div dir="ltr">Reviews of the initial plumbing have been posted<div><br></div><div><a href="https://reviews.llvm.org/D39575">https://reviews.llvm.org/D39575</a><br></div><div><a href="https://reviews.llvm.org/D39576">https://reviews.llvm.org/D39576</a><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>

<br><div class="gmail_quote">On Thu, Nov 2, 2017 at 4:57 AM, Tobias Grosser <span dir="ltr"><<a href="mailto:tobias.grosser@inf.ethz.ch" target="_blank">tobias.grosser@inf.ethz.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Craig,<br>

<br>

this sounds like a good idea.<br>

<br>

Best,<br>

Tobias<br>

<div class="HOEnZb"><div class="h5"><br>

On Thu, Nov 2, 2017, at 00:35, Craig Topper via llvm-dev wrote:<br>

> Hello all,<br>

><br>

><br>

><br>

> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128<br>

> command line flags supported by latest GCC to clang. These flags will be<br>

> used to limit the vector register size presented by TTI to the<br>

> vectorizers.<br>

> The backend will still be able to use wider registers for code written<br>

> using the instrinsics in x86intrin.h. And the backend will still be able<br>

> to<br>

> use AVX512VL instructions and the additional XMM16-31 and YMM16-31<br>

> registers.<br>

><br>

><br>

><br>

> Motivation:<br>

><br>

> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU<br>

> frequency that may offset the gains from using the wider register size.<br>

> See<br>

> section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference<br>

> Manual published October 2017.<br>

><br>

> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture<br>

> are only 256-bits wide. 512-bit instructions using these ALUs must use<br>

> both<br>

> ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization<br>

> Reference Manual published October 2017.<br>

><br>

><br>

><br>

> Implementation Plan:<br>

><br>

> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not<br>

> mapped to any CPU.<br>

><br>

> -Add mprefer-avx256 and mprefer-avx128 and the corresponding<br>

> -mno-prefer-avx128/256 options to clang's driver Options.td file. I<br>

> believe<br>

> this will allow clang to pass these straight through to the<br>

> -target-feature<br>

> attribute in IR.<br>

><br>

> -Modify X86TTIImpl::<wbr>getRegisterBitWidth to only return 512 if AVX512 is<br>

> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return<br>

> 256 if AVX is enabled and prefer-avx128 is not set.<br>

><br>

><br>

><br>

> There may be some other backend changes needed, but I plan to address<br>

> those<br>

> as we find them.<br>

><br>

><br>

> At a later point, consider making -mprefer-avx256 the default for Skylake<br>

> Server due to the above mentioned performance considerations.<br>

><br>

><br>

><br>

> Does this sound reasonable?<br>

><br>

><br>

><br>

> *Latest Intel Optimization manual available here:<br>

> <a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/intel-sdm#<wbr>optimization</a><br>

><br>

><br>

> -Craig Topper<br>

</div></div><div class="HOEnZb"><div class="h5">> ______________________________<wbr>_________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</div></div></blockquote></div><br></div>