<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 03/14/2017 07:50 PM, Adam Nemet
wrote:<br>
</div>
<blockquote
cite="mid:F261C572-ECFA-4EA4-9A46-241DD475BB06@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;" class=""><br
class="">
<div>
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 11:33 AM, Hal Finkel <<a
moz-do-not-send="true" href="mailto:hfinkel@anl.gov"
class="">hfinkel@anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<p class=""><br class="">
</p>
<br class="">
<div class="moz-cite-prefix">On 03/14/2017 12:11 PM,
Adam Nemet wrote:<br class="">
</div>
<blockquote
cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com"
type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 9:49 AM, Hal
Finkel <<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 03/14/2017
11:21 AM, Adam Nemet wrote:<br class="">
</div>
<blockquote
cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 6:00
AM, Nema, Ashutosh <<a
moz-do-not-send="true"
href="mailto:Ashutosh.Nema@amd.com"
class="">Ashutosh.Nema@amd.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="WordSection1" style="page:
WordSection1; font-family:
Helvetica; font-size: 10px;
font-style: normal;
font-variant-caps: normal;
font-weight: normal; letter-spacing:
normal; orphans: auto; text-align:
start; text-indent: 0px;
text-transform: none; white-space:
normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width:
0px;">
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Summarizing the
discussion on the implementation
approaches.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Discussed about
two approaches, first running
‘InnerLoopVectorizer’ again on the
epilog loop immediately after
vectorizing the original loop
within the same vectorization
pass, the second approach where
re-running vectorization pass and
limiting vectorization factor of
epilog loop by metadata.<o:p
class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><Approach-2><o:p
class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Challenges with
re-running the vectorizer pass:<o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span
class="">1)<span
style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span
class="Apple-converted-space"> </span></span></span>Reusing alias check
result:<span
class="Apple-converted-space"> </span><o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;"
class="">When vectorizer pass runs
again it finds the epilog loop as
a new loop and it may generates
alias check, this new alias check
may overkill the gains of epilog
vectorization.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;"
class="">We should use the already
computed alias check result
instead of re computing again.<o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span
class="">2)<span
style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span
class="Apple-converted-space"> </span></span></span>Rerun the vectorizer
and hoist the new alias check:<o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;"
class="">It’s not possible to
hoist alias checks as its not
fully redundant (not dominated by
other checks), it’s not getting
execute in all paths.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><span
id="cid:part2.8A601F10.7A15555C@anl.gov"
class=""><Mail
Attachment.jpeg></span></div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">I don’t understand. Looks
like you have the same alias checks
for the epilog loop too. How is this
CFG different from the
re-vectorization of the scalar loop? </div>
</div>
</div>
</blockquote>
<br class="">
You're looking at the wrong thing. This *is*
the image from re-vectorization. The other
image (linked below in step (3)) shows the
other option.<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Ah ok, the numbering confused me here.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<br class="">
<blockquote
cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
type="cite" class="">
<div class="">
<div class="">
<div class=""> Would be good to have
both CFGs here and highlighting the
difference.</div>
<div class=""><br class="">
</div>
<div class="">I thought that the whole
point was that *if* you reached the
epilog vector loop via the first
vector loop, you want to bypass the
alias checks before the epilog vector.</div>
</div>
</div>
</blockquote>
<br class="">
Yes, but, that's not quite true now. You can
also reach the epilogue loop if you fail the
min-trip-count check, and so you don't know
anything about the aliasing checks.<br
class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">OK, so we want this loops to be
handled specially. We effectively say that we
only vectorize this loop if it does not require
any alias checks or if the alias checks can be
predicate-forwarded to this loop from existing
checks.</div>
<div class=""><br class="">
</div>
<div class="">This still seems like an orthogonal
issue that may be interesting to solve
independently. In other words this could be a
nice feature in the vectorizer anyway: the loop is
estimated to be low-trip count so feel free to
predicate the new vector loop so that the alias
check result could be reused from some other
block. We obviously don’t have this capability
today but it’s something that could be nice aside
from the vectorizer.</div>
</div>
</blockquote>
<br class="">
That sounds great. I'm not sure, however, exactly what
this means. This "predicate forwarding" sounds like a
non-local restructuring of the surrounding code (because
the predicates aren't known to be true at that point, we
need to search different predecessors to find one in
which the conditions might be true and insert the vector
loop across that predecessor edge instead). Maybe we
could do this, for example, by calling
SE->isKnownPredicate enhanced with some additional
context sensitivity because we currently check
dominating conditional branches for things that are
AddRecs in some loop? Moreover, we then have the problem
of restructuring the order of the trip-count checks
(because we need to fast fail to the scalar loop for the
smallest trip counts). Maybe we can do this the same
way? This means finding a dominating check on the trip
count that implies the check we're about to insert,
change that check to the check we want, keeping the
original only on the true side of the trip-count check
(i.e. if the trip count is larger than the small
threshold, then check the large threshold).<br class="">
<br class="">
If we're going to do all of that, then I'd lean toward
saying that this does not belong in the vectorizer at
all. Rather, this seems like something we'd want in some
general transformation (this seems somewhat akin to what
JumpThreading does). The general transformation seems
something like this; the basic vectorized loop looks
like this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br
class="">
...<br class="">
}<br class="">
</tt><br class="">
and after we vectorize the epilogue loop we end up with
this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">if (n >= vf2) {<br class="">
</tt></div>
</div>
</blockquote>
<div><br class="">
</div>
<div>This is (n - start >= vf2) which I think makes this a
bit more complicated since now “check” is not always
redundant if n >= vf2 so hoisting becomes a cost
decision.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><tt
class=""> if (check) {<br class="">
</tt></div>
</div>
</blockquote>
<div><br class="">
</div>
<div>But we can turn this into a predicate we already have: if
(n >= vf && check). BTW, this is I think how
Ashutosh’ approach 1 works too; if the first vector loop
does not iterate once the alias result will be false in the
epilog vector loop.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><tt
class=""> for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br
class="">
...<br class="">
}</tt><br class="">
<br class="">
and we need to end up with this:<br class="">
<br class="">
<tt class="">int start = 0;<br class="">
if (n >= vf2) {<br class="">
if (check) {<br class="">
if (n >= vf) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
<br class="">
for (i = start; i < n; ++i) {<br class="">
...<br class="">
}<br class="">
</tt><br class="">
where we've recognized here that 'check' is the same in
both cases, and that because vf2 < vf, the one
trip-count check implies the other. This latter part
seems like the part that our existing passes might not
know what to do with currently. Thoughts?<br class="">
</div>
</div>
</blockquote>
<div><br class="">
</div>
<div><br class="">
</div>
<div>And then we get:</div>
<div><tt style="background-color: rgb(255, 255, 255);"
class=""><br class="">
</tt></div>
<div><tt style="background-color: rgb(255, 255, 255);"
class="">int start = 0;</tt></div>
<div><font class="" face="monospace"><b class="">bool
check_result = false;<br style="background-color:
rgb(255, 255, 255);" class="">
</b></font><tt style="background-color: rgb(255, 255,
255);" class="">if (n >= vf) {<br class="">
if (check) {</tt></div>
<div><tt style="background-color: rgb(255, 255, 255);"
class=""> <b class="">check_result = true;</b><br
class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br style="background-color: rgb(255, 255, 255);"
class="">
<tt style="background-color: rgb(255, 255, 255);" class="">if
(n - start >= vf2) {<br class="">
if (<b class="">check_result</b>) {<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br style="background-color: rgb(255, 255, 255);"
class="">
<tt style="background-color: rgb(255, 255, 255);" class="">for
(i = start; i < n; ++i) {<br class="">
…<br class="">
}</tt></div>
<div><tt style="background-color: rgb(255, 255, 255);"
class=""><br class="">
</tt></div>
<div>What do you think?</div>
</div>
</div>
</blockquote>
<br>
I think this seems about right. We shouldn't re-compare the trip
counts in the epilogue loop if the alias check is false anyway:<br>
<br>
<div><tt style="background-color: rgb(255, 255, 255);" class="">int
start = 0;</tt><font class="" face="monospace"><b class=""><br
style="background-color: rgb(255, 255, 255);" class="">
</b></font><tt style="background-color: rgb(255, 255, 255);"
class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
} else { goto scalar; }<br class="">
} else { goto scalar; }<br class="">
</tt><br style="background-color: rgb(255, 255, 255);" class="">
<tt style="background-color: rgb(255, 255, 255);" class="">if (n -
start >= vf2) {<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
</tt><br>
<tt>scalar:</tt><br style="background-color: rgb(255, 255, 255);"
class="">
<tt style="background-color: rgb(255, 255, 255);" class="">for (i
= start; i < n; ++i) {<br class="">
…<br class="">
}</tt></div>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:F261C572-ECFA-4EA4-9A46-241DD475BB06@apple.com"
type="cite">
<div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;" class="">
<div>
<div><br class="">
</div>
<div>Adam</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""> <br
class="">
-Hal<br class="">
<br class="">
<blockquote
cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com"
type="cite" class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">Adam</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<br class="">
-Hal<br class="">
<br class="">
<blockquote
cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
type="cite" class="">
<div class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">I still don’t understand
why that’s not possible with some
sophisticated predicate propagation
independent from the vectorizer. I am
not saying it’s already possible but
it should be.</div>
<div class=""><br class="">
</div>
<div class="">Adam</div>
<div class=""><br class="">
</div>
</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="WordSection1" style="page:
WordSection1; font-family:
Helvetica; font-size: 10px;
font-style: normal;
font-variant-caps: normal;
font-weight: normal; letter-spacing:
normal; orphans: auto; text-align:
start; text-indent: 0px;
text-transform: none; white-space:
normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width:
0px;">
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">NOTE: We cannot
prepone alias check as its
expensive compared to other
checks.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><Approach-1><o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span
class="">1)<span
style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span
class="Apple-converted-space"> </span></span></span>Current patch
depends on the existing
functionality of LoopVectorizer,
it uses ‘InnerLoopVectorizer’
again to vectorize the epilog
loop, as it happens in the same
vectorization pass we have
flexibility to reuse already
computed alias result check &
limit vectorization factor for the
epilog loop.<span
class="Apple-converted-space"> </span><o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span
class="">2)<span
style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span
class="Apple-converted-space"> </span></span></span>It does not generate
the blocks for new block layout
explicitly, rather it depends on
‘InnerLoopVectorizer::createEmptyLoop’
to generate new block layout. The
new block layout get automatically
generated by calling the
‘InnerLoopVectorizer:: vectorize’
again.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;
text-indent: -0.25in;" class=""><span
class="">3)<span
style="font-style: normal;
font-variant-caps: normal;
font-weight: normal;
font-size: 7pt; line-height:
normal; font-family: 'Times
New Roman';" class=""> <span
class="Apple-converted-space"> </span></span></span>Block layout
description with epilog loop
vectorization is available at<o:p
class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt
0.5in; font-size: 11pt;
font-family: Calibri, sans-serif;"
class=""><a moz-do-not-send="true"
href="https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png"
style="color: purple;
text-decoration: underline;"
class="">https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png</a><o:p
class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Approach-1 looks
feasible, please comment if any
objections.<o:p class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Regards,<o:p
class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class="">Ashutosh<o:p
class=""></o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in
0.0001pt; font-size: 12pt;
font-family: 'Times New Roman',
serif;" class=""><span
style="font-size: 11pt;
font-family: Calibri,
sans-serif; color: rgb(31, 73,
125);" class=""><o:p class=""> </o:p></span></div>
...<br class="">
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br class="">
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>