<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 15, 2017, at 9:46 AM, Hal Finkel <<a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 03/14/2017 07:50 PM, Adam Nemet
wrote:<br class="">
</div>
<blockquote cite="mid:F261C572-ECFA-4EA4-9A46-241DD475BB06@apple.com" type="cite" class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;" class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 11:33 AM, Hal Finkel <<a moz-do-not-send="true" href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><p class=""><br class="">
</p>
<br class="">
<div class="moz-cite-prefix">On 03/14/2017 12:11 PM,
Adam Nemet wrote:<br class="">
</div>
<blockquote cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com" type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 9:49 AM, Hal
Finkel <<a moz-do-not-send="true" href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 03/14/2017
11:21 AM, Adam Nemet wrote:<br class="">
</div>
<blockquote cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com" type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 6:00
AM, Nema, Ashutosh <<a moz-do-not-send="true" href="mailto:Ashutosh.Nema@amd.com" class="">Ashutosh.Nema@amd.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="WordSection1" style="page:
WordSection1; font-family:
Helvetica; font-size: 10px;
font-style: normal;
font-variant-caps: normal;
font-weight: normal; letter-spacing:
normal; orphans: auto; text-align:
start; text-indent: 0px;
text-transform: none; white-space:
normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width:
0px;">
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Summarizing the
discussion on the implementation
approaches.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Discussed about
two approaches, first running
‘InnerLoopVectorizer’ again on the
epilog loop immediately after
vectorizing the original loop
within the same vectorization
pass, the second approach where
re-running vectorization pass and
limiting vectorization factor of
epilog loop by metadata.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><Approach-2><o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Challenges with
re-running the vectorizer pass:<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span class="">1)<span style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Reusing alias check
result:<span class="Apple-converted-space"> </span><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">When vectorizer pass runs
again it finds the epilog loop as
a new loop and it may generates
alias check, this new alias check
may overkill the gains of epilog
vectorization.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">We should use the already
computed alias check result
instead of re computing again.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span class="">2)<span style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Rerun the vectorizer
and hoist the new alias check:<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">It’s not possible to
hoist alias checks as its not
fully redundant (not dominated by
other checks), it’s not getting
execute in all paths.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><span id="cid:part2.8A601F10.7A15555C@anl.gov" class=""><Mail
Attachment.jpeg></span></div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">I don’t understand. Looks
like you have the same alias checks
for the epilog loop too. How is this
CFG different from the
re-vectorization of the scalar loop? </div>
</div>
</div>
</blockquote>
<br class="">
You're looking at the wrong thing. This *is*
the image from re-vectorization. The other
image (linked below in step (3)) shows the
other option.<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Ah ok, the numbering confused me here.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<br class="">
<blockquote cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com" type="cite" class="">
<div class="">
<div class="">
<div class=""> Would be good to have
both CFGs here and highlighting the
difference.</div>
<div class=""><br class="">
</div>
<div class="">I thought that the whole
point was that *if* you reached the
epilog vector loop via the first
vector loop, you want to bypass the
alias checks before the epilog vector.</div>
</div>
</div>
</blockquote>
<br class="">
Yes, but, that's not quite true now. You can
also reach the epilogue loop if you fail the
min-trip-count check, and so you don't know
anything about the aliasing checks.<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">OK, so we want this loops to be
handled specially. We effectively say that we
only vectorize this loop if it does not require
any alias checks or if the alias checks can be
predicate-forwarded to this loop from existing
checks.</div>
<div class=""><br class="">
</div>
<div class="">This still seems like an orthogonal
issue that may be interesting to solve
independently. In other words this could be a
nice feature in the vectorizer anyway: the loop is
estimated to be low-trip count so feel free to
predicate the new vector loop so that the alias
check result could be reused from some other
block. We obviously don’t have this capability
today but it’s something that could be nice aside
from the vectorizer.</div>
</div>
</blockquote>
<br class="">
That sounds great. I'm not sure, however, exactly what
this means. This "predicate forwarding" sounds like a
non-local restructuring of the surrounding code (because
the predicates aren't known to be true at that point, we
need to search different predecessors to find one in
which the conditions might be true and insert the vector
loop across that predecessor edge instead). Maybe we
could do this, for example, by calling
SE->isKnownPredicate enhanced with some additional
context sensitivity because we currently check
dominating conditional branches for things that are
AddRecs in some loop? Moreover, we then have the problem
of restructuring the order of the trip-count checks
(because we need to fast fail to the scalar loop for the
smallest trip counts). Maybe we can do this the same
way? This means finding a dominating check on the trip
count that implies the check we're about to insert,
change that check to the check we want, keeping the
original only on the true side of the trip-count check
(i.e. if the trip count is larger than the small
threshold, then check the large threshold).<br class="">
<br class="">
If we're going to do all of that, then I'd lean toward
saying that this does not belong in the vectorizer at
all. Rather, this seems like something we'd want in some
general transformation (this seems somewhat akin to what
JumpThreading does). The general transformation seems
something like this; the basic vectorized loop looks
like this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br class="">
...<br class="">
}<br class="">
</tt><br class="">
and after we vectorize the epilogue loop we end up with
this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">if (n >= vf2) {<br class="">
</tt></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">This is (n - start >= vf2) which I think makes this a
bit more complicated since now “check” is not always
redundant if n >= vf2 so hoisting becomes a cost
decision.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><tt class=""> if (check) {<br class="">
</tt></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">But we can turn this into a predicate we already have: if
(n >= vf && check). BTW, this is I think how
Ashutosh’ approach 1 works too; if the first vector loop
does not iterate once the alias result will be false in the
epilog vector loop.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><tt class=""> for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br class="">
...<br class="">
}</tt><br class="">
<br class="">
and we need to end up with this:<br class="">
<br class="">
<tt class="">int start = 0;<br class="">
if (n >= vf2) {<br class="">
if (check) {<br class="">
if (n >= vf) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
<br class="">
for (i = start; i < n; ++i) {<br class="">
...<br class="">
}<br class="">
</tt><br class="">
where we've recognized here that 'check' is the same in
both cases, and that because vf2 < vf, the one
trip-count check implies the other. This latter part
seems like the part that our existing passes might not
know what to do with currently. Thoughts?<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">And then we get:</div>
<div class=""><tt style="background-color: rgb(255, 255, 255);" class=""><br class="">
</tt></div>
<div class=""><tt style="background-color: rgb(255, 255, 255);" class="">int start = 0;</tt></div>
<div class=""><font class="" face="monospace"><b class="">bool
check_result = false;<br style="background-color:
rgb(255, 255, 255);" class="">
</b></font><tt style="background-color: rgb(255, 255,
255);" class="">if (n >= vf) {<br class="">
if (check) {</tt></div>
<div class=""><tt style="background-color: rgb(255, 255, 255);" class=""> <b class="">check_result = true;</b><br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br style="background-color: rgb(255, 255, 255);" class="">
<tt style="background-color: rgb(255, 255, 255);" class="">if
(n - start >= vf2) {<br class="">
if (<b class="">check_result</b>) {<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br style="background-color: rgb(255, 255, 255);" class="">
<tt style="background-color: rgb(255, 255, 255);" class="">for
(i = start; i < n; ++i) {<br class="">
…<br class="">
}</tt></div>
<div class=""><tt style="background-color: rgb(255, 255, 255);" class=""><br class="">
</tt></div>
<div class="">What do you think?</div>
</div>
</div>
</blockquote>
<br class="">
I think this seems about right. We shouldn't re-compare the trip
counts in the epilogue loop if the alias check is false anyway:<br class="">
<br class="">
<div class=""><tt style="background-color: rgb(255, 255, 255);" class="">int
start = 0;</tt><font class="" face="monospace"><b class=""><br style="background-color: rgb(255, 255, 255);" class="">
</b></font><tt style="background-color: rgb(255, 255, 255);" class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
} else { goto scalar; }<br class="">
} else { goto scalar; }<br class="">
</tt><br style="background-color: rgb(255, 255, 255);" class="">
<tt style="background-color: rgb(255, 255, 255);" class="">if (n -
start >= vf2) {<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
</tt><br class="">
<tt class="">scalar:</tt><br style="background-color: rgb(255, 255, 255);" class="">
<tt style="background-color: rgb(255, 255, 255);" class="">for (i
= start; i < n; ++i) {<br class="">
…<br class="">
}</tt></div></div></div></blockquote><div><br class=""></div><div>So to summarize, as long as we guard the epilogue loop with 'n >= vf && check’, we should be able to jump-thread/predicate-foward to the vectorized epilog loop and remove the checks.</div><div><br class=""></div><div>Is this the direction then (rerun vectorizer + enhance/add the above general-purpose optimizations) or we’re still circling back to handling this with a single run of the vectorizer?</div><div><br class=""></div><div>I think that the general difficulty of trying to get the optimal version of code that has been loop-versioned multiple times is that we are interested in transformation that don’t necessarily preserve semantics. I.e. it’s OK to for example to fall back to the original loop in more cases since we happen to know that the original loop is equivalent to the versioned loop.</div><div><br class=""></div><div>Adam </div><br class=""><blockquote type="cite" class=""><div class=""><div bgcolor="#FFFFFF" text="#000000" class="">
<br class="">
-Hal<br class="">
<br class="">
<blockquote cite="mid:F261C572-ECFA-4EA4-9A46-241DD475BB06@apple.com" type="cite" class="">
<div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">Adam</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""> <br class="">
-Hal<br class="">
<br class="">
<blockquote cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com" type="cite" class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">Adam</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<br class="">
-Hal<br class="">
<br class="">
<blockquote cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com" type="cite" class="">
<div class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">I still don’t understand
why that’s not possible with some
sophisticated predicate propagation
independent from the vectorizer. I am
not saying it’s already possible but
it should be.</div>
<div class=""><br class="">
</div>
<div class="">Adam</div>
<div class=""><br class="">
</div>
</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="WordSection1" style="page:
WordSection1; font-family:
Helvetica; font-size: 10px;
font-style: normal;
font-variant-caps: normal;
font-weight: normal; letter-spacing:
normal; orphans: auto; text-align:
start; text-indent: 0px;
text-transform: none; white-space:
normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width:
0px;">
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">NOTE: We cannot
prepone alias check as its
expensive compared to other
checks.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><Approach-1><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span class="">1)<span style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Current patch
depends on the existing
functionality of LoopVectorizer,
it uses ‘InnerLoopVectorizer’
again to vectorize the epilog
loop, as it happens in the same
vectorization pass we have
flexibility to reuse already
computed alias result check &
limit vectorization factor for the
epilog loop.<span class="Apple-converted-space"> </span><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span class="">2)<span style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>It does not generate
the blocks for new block layout
explicitly, rather it depends on
‘InnerLoopVectorizer::createEmptyLoop’
to generate new block layout. The
new block layout get automatically
generated by calling the
‘InnerLoopVectorizer:: vectorize’
again.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span class="">3)<span style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Block layout
description with epilog loop
vectorization is available at<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;" class=""><a moz-do-not-send="true" href="https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png" style="color: purple;
text-decoration: underline;" class="">https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png</a><o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Approach-1 looks
feasible, please comment if any
objections.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Regards,<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Ashutosh<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><span style="font-size: 11pt;
font-family: Calibri,
sans-serif; color: rgb(31, 73,
125);" class=""><o:p class=""> </o:p></span></div>
...<br class="">
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br class="">
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br class="">
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</div>
</div></blockquote></div><br class=""></body></html>