<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Nov 11, 2017 at 8:52 PM, Hal Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<p><br>
</p>
<div class="gmail-m_264012946301939527moz-cite-prefix">On 11/11/2017 09:52 PM, UE US via
llvm-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>If skylake is that bad at AVX2</div>
</div>
</blockquote>
<br></span>
I don't think this says anything negative about AVX2, but AVX-512.<span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div> it belongs in -mcpu / -march IMO. </div>
</div>
</blockquote>
<br></span>
No. We'd still want to enable the architectural features for vector
intrinsics and the like.</div></blockquote><div><br></div><div>I took this to mean that the feature should be enabled by default for -march=skylake-avx512.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">Based on the current performance data we're
seeing, we think we need to ultimately default skylake-avx512 to
-mprefer-vector-width=256.</blockquote>
<br></span>
Craig, is this for both integer and floating-point code?</div></blockquote><div><br></div><div>I believe so, but I'll try to get confirmation from the people with more data.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-HOEnZb"><font color="#888888"><br>
<br>
-Hal <br></font></span><div><div class="gmail-h5">
<br>
<blockquote type="cite">
<div dir="ltr">
<div> Most people will build for the standard x86_64-pc-linux
or whatever anyway, and completely ignore the change. This
will mainly affect those who build their own software and
optimize for their system, and lots there have probably caught
on to this already. I always thought that's what -march was
made for, really. <br>
</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="gmail-m_264012946301939527gmail_signature">GNOMETOYS<br>
</div>
</div>
<br>
<div class="gmail_quote">On Sat, Nov 11, 2017 at 10:25 AM,
Sanjay Patel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>
<div>Yes - I was thinking of FeatureFastScalarFSQRT /
FeatureFastVectorFSQRT which are used by
isFsqrtCheap(). These were added to override the
default x86 sqrt estimate codegen with:<br>
<a href="https://reviews.llvm.org/D21379" target="_blank">https://reviews.llvm.org/D2137<wbr>9</a><br>
<br>
</div>
But I'm not sure we really need that kind of hack. Can
we adjust the attribute in clang based on the target
cpu? Ie, if you have something like:<br>
</div>
$ clang -O2 -march=skylake-avx512 foo.c<br>
<br>
Then you can detect that in the clang driver and pass
-mprefer-vector-width=256 to clang codegen as an option?
Clang codegen then adds that function attribute to
everything it outputs. Then, the vectorizers and/or
backend detect that attribute and adjust their behavior
based on it. <br></div></blockquote></div></div></blockquote></div></div></div></blockquote><div><br></div><div>Do we have a precedent for setting a target independent flag from a target specific cpu string in the clang driver? Want to make sure I understand what the processing on such a thing would look like. Particularly to get the order right so the user can override it.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><div><div class="gmail-h5"><blockquote type="cite"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">
<br>
So I don't think we should be messing with any kind of
type legality checking because that stuff should all be
correct already. We're just choosing a vector size based
on a pref. I think we should even allow the pref to go
bigger than a legal type. This came up somewhere on
llvm-dev or in a bug recently in the context of vector
reductions.<br>
<br>
<br>
</div>
<div class="gmail-m_264012946301939527HOEnZb">
<div class="gmail-m_264012946301939527h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Nov 10, 2017 at 6:04
PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Are you referring to
the X86TargetLowering::isFsqrt<wbr>Cheap hook?</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="gmail-m_264012946301939527m_6454106954572217318m_771050129279988374gmail_signature">~Craig</div>
</div>
<br>
<div class="gmail_quote">On Fri, Nov 10, 2017 at
7:39 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">We can tie a user preference
/ override to a CPU model. We do something
like that for square root estimates
already (although it does use a
SubtargetFeature currently for x86;
ideally, we'd key that off of something in
the CPU scheduler model).
<div>
<div class="gmail-m_264012946301939527m_6454106954572217318m_771050129279988374h5"><br>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu,
Nov 9, 2017 at 4:21 PM, Craig
Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">I agree that a
less x86 specific command
line makes sense. I've been
having an internal
discussions with gcc folks
and their evaluating
switching to something like
-mprefer-vector-width=128/256/<wbr>512/none
<div><br>
</div>
<div>Based on the current
performance data we're
seeing, we think we need
to ultimately default
skylake-avx512 to
-mprefer-vector-width=256.
If we go with a target
independent
option/implementation is
there someway we could
still affect the default
behavior in a target
specific way?</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="gmail-m_264012946301939527m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991gmail_signature">~Craig</div>
</div>
<br>
<div class="gmail_quote">On
Tue, Nov 7, 2017 at 9:06
AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>It's clear from
the Intel docs how
this has evolved,
but from a compiler
perspective, this
isn't a Skylake
"feature" :) ... nor
an Intel feature,
nor an x86 feature.
<br>
<br>
It's a generic
programmer hint for
any target with
multiple potential
vector lengths. <br>
</div>
<div><br>
</div>
<div>On x86, there's
already a potential
use case for this
hint with a
different starting
motivation:
re-vectorization.
That's where we take
C code that uses
128-bit vector
intrinsics and
selectively widen it
to 256- or 512-bit
vector ops based on
a newer CPU target
than the code was
originally written
for.<br>
<div><br>
</div>
<div>I think it's
just a matter of
time before a
customer requests
the same ability
for another target
(maybe they
already have and I
don't know about
it). So we should
have a solution
that recognizes
that possibility.
<br>
</div>
<div><br>
</div>
</div>
Note that having a
target-independent
implementation in the
optimizer doesn't
preclude a flag alias
in clang to maintain
compatibility with
gcc.
<div>
<div class="gmail-m_264012946301939527m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991h5"><br>
<div><br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On
Tue, Nov 7,
2017 at 2:02
AM, Tobias
Grosser via
llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On
Fri, Nov 3,
2017, at
05:47, Craig
Topper via
llvm-dev
wrote:<br>
> That's a
very good
point about
the ordering
of the command
line options.<br>
> gcc's
current
implementation
treats
-mprefer-avx256
has "prefer
256 over<br>
> 512" and
-mprefer-avx128 as "prefer 128 over 256". Which feels weird for<br>
> other
reasons, but
has less of an
ordering
ambiguity.<br>
><br>
>
-mprefer-avx128
has been in
gcc for many
years and
predates the
creation<br>
> of<br>
> avx512.
-mprefer-avx256
was added a
couple months
ago.<br>
><br>
> We've had
an internal
conversation
with the
implementor of<br>
>
-mprefer-avx256<br>
> in gcc
about making
-mprefer-avx128
affect 512-bit
vectors as
well. I'll<br>
> bring up
the ambiguity
issue with
them.<br>
><br>
> Do we
want to be
compatible
with gcc here?<br>
<br>
I certainly
believe we
would want to
be compatible
with gcc (if
we use<br>
the same
names).<br>
<br>
Best,<br>
Tobias<br>
<br>
><br>
> ~Craig<br>
><br>
> On Thu,
Nov 2, 2017 at
7:18 PM, Eric
Christopher
<<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>><br>
> wrote:<br>
><br>
> ><br>
> ><br>
> > On
Thu, Nov 2,
2017 at 7:05
PM James Y
Knight via
llvm-dev <<br>
> > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:<br>
> ><br>
> >>
On Wed, Nov 1,
2017 at 7:35
PM, Craig
Topper via
llvm-dev <<br>
> >>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:<br>
> >><br>
>
>>>
Hello all,<br>
>
>>><br>
>
>>><br>
>
>>><br>
>
>>> I
would like to
propose adding
the
-mprefer-avx256
and
-mprefer-avx128<br>
>
>>>
command line
flags
supported by
latest GCC to
clang. These
flags will be<br>
>
>>>
used to limit
the vector
register size
presented by
TTI to the
vectorizers.<br>
>
>>>
The backend
will still be
able to use
wider
registers for
code written<br>
>
>>>
using the
instrinsics in
x86intrin.h.
And the
backend will
still be able
to<br>
>
>>>
use AVX512VL
instructions
and the
additional
XMM16-31 and
YMM16-31<br>
>
>>>
registers.<br>
>
>>><br>
>
>>><br>
>
>>><br>
>
>>>
Motivation:<br>
>
>>><br>
>
>>>
-Using 512-bit
operations on
some Intel
CPUs may cause
a decrease in
CPU<br>
>
>>>
frequency that
may offset the
gains from
using the
wider register
size. See<br>
>
>>>
section 15.26
of Intel® 64
and IA-32
Architectures
Optimization
Reference<br>
>
>>>
Manual
published
October 2017.<br>
>
>>><br>
> >><br>
> >>
I note the doc
mentions that
256-bit AVX
operations
also have the
same<br>
> >>
issue with
reducing the
CPU frequency,
which is nice
to see
documented!<br>
> >><br>
> >>
There's also
the issues
discussed here
<<a href="http://www.agner.org/" rel="noreferrer" target="_blank">http://www.agner.org/</a><br>
> >>
optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time<br>
> >>
for the
256-bit
execution
pipeline,
which is
another issue
with using<br>
> >>
wide-vector
ops.<br>
> >><br>
> >><br>
> >>
-The vector
ALUs on ports
0 and 1 of the
Skylake Server
microarchitecture<br>
>
>>>
are only
256-bits wide.
512-bit
instructions
using these
ALUs must use
both<br>
>
>>>
ports. See
section 2.1 of
Intel® 64 and
IA-32
Architectures
Optimization<br>
>
>>>
Reference
Manual
published
October 2017.<br>
>
>>><br>
> >><br>
> >><br>
>
>>>
Implementation
Plan:<br>
>
>>><br>
>
>>>
-Add
prefer-avx256
and
prefer-avx128
as
SubtargetFeatures
in X86.td not<br>
>
>>>
mapped to any
CPU.<br>
>
>>><br>
>
>>>
-Add
mprefer-avx256
and
mprefer-avx128
and the
corresponding<br>
>
>>>
-mno-prefer-avx128/256
options to
clang's driver
Options.td
file. I
believe<br>
>
>>>
this will
allow clang to
pass these
straight
through to the
-target-feature<br>
>
>>>
attribute in
IR.<br>
>
>>><br>
>
>>>
-Modify
X86TTIImpl::getRegisterBitWidt<wbr>h
to only return
512 if AVX512
is<br>
>
>>>
enabled and
prefer-avx256
and
prefer-avx128
is not set.
Similarly
return<br>
>
>>>
256 if AVX is
enabled and
prefer-avx128
is not set.<br>
>
>>><br>
> >><br>
> >>
Instead of
multiple flags
that have
difficult to
understand
intersecting<br>
> >>
behavior, one
flag with a
value would be
better. E.g.,
what should<br>
> >>
"-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the<br>
> >>
answer, it's
confusing.
(Similarly
with other
such
combinations).
Just a<br>
> >>
single arg
"-mprefer-avx={128/256/512}"
(with no "no"
version) seems
easier<br>
> >>
to understand
to me (keeping
the same
behavior as
you mention:
asking to<br>
> >>
prefer a
larger width
than is
supported by
your
architecture
should be fine<br>
> >>
but ignored).<br>
> >><br>
> >><br>
> > I
agree with
this. It's a
little more
plumbing as
far as
subtarget<br>
> >
features etc
(represent via
an optional
value or just
various "set
the avx<br>
> >
width"
features - the
latter being
easier, but
uglier),
however, it's<br>
> >
probably the
right thing to
do.<br>
> ><br>
> > I
was looking at
this myself
just a couple
weeks ago and
think this is
the<br>
> >
right
direction
(when and how
to turn things
off) - and
probably makes<br>
> >
sense to be a
default for
these
architectures?
We might end
up needing to<br>
> >
check a couple
of additional
TTI places,
but it sounds
like you're on
top<br>
> > of
it. :)<br>
> ><br>
> >
Thanks very
much for doing
this work.<br>
> ><br>
> >
-eric<br>
> ><br>
> ><br>
> >><br>
> >><br>
> >>
There may be
some other
backend
changes
needed, but I
plan to
address<br>
>
>>>
those as we
find them.<br>
>
>>><br>
>
>>><br>
>
>>>
At a later
point,
consider
making
-mprefer-avx256
the default
for<br>
>
>>>
Skylake Server
due to the
above
mentioned
performance
considerations.<br>
>
>>><br>
> >><br>
> >><br>
> >><br>
> >><br>
> >><br>
>
>>><br>
> >>
Does this
sound
reasonable?<br>
>
>>><br>
>
>>><br>
>
>>><br>
>
>>>
*Latest Intel
Optimization
manual
available
here:<br>
>
>>> <a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/intel-sdm#optimiza<wbr>tion</a><br>
>
>>><br>
>
>>><br>
>
>>>
-Craig Topper<br>
>
>>><br>
>
>>>
______________________________<wbr>_________________<br>
>
>>>
LLVM
Developers
mailing list<br>
>
>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
>
>>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
>
>>><br>
>
>>>
______________________________<wbr>_________________<br>
> >>
LLVM
Developers
mailing list<br>
> >>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> >>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
> >><br>
> ><br>
>
______________________________<wbr>_________________<br>
> LLVM
Developers
mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
______________________________<wbr>_________________<br>
LLVM
Developers
mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="gmail-m_264012946301939527mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="gmail-m_264012946301939527moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="gmail-m_264012946301939527moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
</div></div><span class="gmail-"><pre class="gmail-m_264012946301939527moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</span></div>
<br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div>