<div dir="ltr"><div><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Hello
all,<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">I would
like to propose adding the -mprefer-avx256 and -mprefer-avx128 command line
flags supported by latest GCC to clang. These flags will be used to limit the vector
register size presented by TTI to the vectorizers. The backend will still be
able to use wider registers for code written using the instrinsics in
x86intrin.h. And the backend will still be able to use AVX512VL instructions and the
additional XMM16-31 and YMM16-31 registers.<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Motivation:<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Using
512-bit operations on some Intel CPUs may cause a decrease in CPU frequency
that may offset the gains from using the wider register size. See section 15.26
of Intel® 64 and IA-32 Architectures Optimization Reference Manual published
October 2017.<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-The
vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture are only
256-bits wide. 512-bit instructions using these ALUs must use both ports. See
section 2.1 of Intel® 64 and IA-32 Architectures Optimization Reference Manual
published October 2017.<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Implementation
Plan:<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Add
prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not mapped to
any CPU.<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Add
mprefer-avx256 and mprefer-avx128 and the corresponding -mno-prefer-avx128/256
options to clang's driver Options.td file. I believe this will allow clang to
pass these straight through to the -target-feature attribute in IR.<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">-Modify X86TTIImpl::<wbr>getRegisterBitWidth
to only return 512 if AVX512 is enabled and prefer-avx256 and prefer-avx128 is not set. Similarly
return 256 if AVX is enabled and prefer-avx128 is not set.<span></span></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-family:Arial,sans-serif;font-size:12pt"> </span><br></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">There may
be some other backend changes needed, but I plan to address those as we find
them.<span></span></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><br></span></p><p class="MsoNormal" style="margin-left:1in"><span style="font-family:Arial,sans-serif;font-size:16px">At a later point, consider making -mprefer-avx256 the default for Skylake Server due to the above mentioned performance considerations.</span><span style="font-size:12pt;font-family:Arial,sans-serif"><br></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">Does this
sound reasonable?<span></span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif"><span> </span></span></p>

<p class="MsoNormal" style="margin-left:1in"><span style="font-size:12pt;font-family:Arial,sans-serif">*Latest
Intel Optimization manual available here: <span></span></span><font face="Arial, sans-serif"><span style="font-size:16px"><a href="https://software.intel.com/en-us/articles/intel-sdm#optimization">https://software.intel.com/en-us/articles/intel-sdm#optimization</a></span></font></p><p class="MsoNormal" style="margin-left:1in"><font face="Arial, sans-serif"><span style="font-size:16px"><br></span></font></p><p class="MsoNormal" style="margin-left:1in"><font face="Arial, sans-serif"><span style="font-size:16px">-Craig Topper</span></font></p></div>
</div>