<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 14, 2017, at 11:33 AM, Hal Finkel <<a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><p class=""><br class="">
</p>
<br class="">
<div class="moz-cite-prefix">On 03/14/2017 12:11 PM, Adam Nemet
wrote:<br class="">
</div>
<blockquote cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com" type="cite" class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 9:49 AM, Hal Finkel <<a moz-do-not-send="true" href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 03/14/2017 11:21 AM, Adam
Nemet wrote:<br class="">
</div>
<blockquote cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com" type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 6:00 AM, Nema,
Ashutosh <<a moz-do-not-send="true" href="mailto:Ashutosh.Nema@amd.com" class="">Ashutosh.Nema@amd.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="WordSection1" style="page:
WordSection1; font-family: Helvetica; font-size:
10px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing:
normal; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;">
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">Summarizing the discussion on the
implementation approaches.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">Discussed about two approaches, first
running ‘InnerLoopVectorizer’ again on the
epilog loop immediately after vectorizing the
original loop within the same vectorization
pass, the second approach where re-running
vectorization pass and limiting vectorization
factor of epilog loop by metadata.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><Approach-2><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">Challenges with re-running the
vectorizer pass:<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif; text-indent: -0.25in;" class=""><span class="">1)<span style="font-style: normal;
font-variant-caps: normal; font-weight:
normal; font-size: 7pt; line-height:
normal; font-family: 'Times New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Reusing
alias check result:<span class="Apple-converted-space"> </span><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif;" class="">When vectorizer pass
runs again it finds the epilog loop as a new
loop and it may generates alias check, this
new alias check may overkill the gains of
epilog vectorization.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif;" class="">We should use the
already computed alias check result instead of
re computing again.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif; text-indent: -0.25in;" class=""><span class="">2)<span style="font-style: normal;
font-variant-caps: normal; font-weight:
normal; font-size: 7pt; line-height:
normal; font-family: 'Times New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Rerun
the vectorizer and hoist the new alias check:<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif;" class="">It’s not possible to
hoist alias checks as its not fully redundant
(not dominated by other checks), it’s not
getting execute in all paths.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><span id="cid:part2.8A601F10.7A15555C@anl.gov" class=""><Mail
Attachment.jpeg></span></div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">I don’t understand. Looks like you
have the same alias checks for the epilog loop
too. How is this CFG different from the
re-vectorization of the scalar loop? </div>
</div>
</div>
</blockquote>
<br class="">
You're looking at the wrong thing. This *is* the image
from re-vectorization. The other image (linked below in
step (3)) shows the other option.<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Ah ok, the numbering confused me here.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""> <br class="">
<blockquote cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com" type="cite" class="">
<div class="">
<div class="">
<div class=""> Would be good to have both CFGs here
and highlighting the difference.</div>
<div class=""><br class="">
</div>
<div class="">I thought that the whole point was
that *if* you reached the epilog vector loop via
the first vector loop, you want to bypass the
alias checks before the epilog vector.</div>
</div>
</div>
</blockquote>
<br class="">
Yes, but, that's not quite true now. You can also reach
the epilogue loop if you fail the min-trip-count check,
and so you don't know anything about the aliasing checks.<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">OK, so we want this loops to be handled specially. We
effectively say that we only vectorize this loop if it does
not require any alias checks or if the alias checks can be
predicate-forwarded to this loop from existing checks.</div>
<div class=""><br class="">
</div>
<div class="">This still seems like an orthogonal issue that may be
interesting to solve independently. In other words this could
be a nice feature in the vectorizer anyway: the loop is
estimated to be low-trip count so feel free to predicate the
new vector loop so that the alias check result could be reused
from some other block. We obviously don’t have this
capability today but it’s something that could be nice aside
from the vectorizer.</div>
</div>
</blockquote>
<br class="">
That sounds great. I'm not sure, however, exactly what this means.
This "predicate forwarding" sounds like a non-local restructuring of
the surrounding code (because the predicates aren't known to be true
at that point, we need to search different predecessors to find one
in which the conditions might be true and insert the vector loop
across that predecessor edge instead). Maybe we could do this, for
example, by calling SE->isKnownPredicate enhanced with some
additional context sensitivity because we currently check dominating
conditional branches for things that are AddRecs in some loop?
Moreover, we then have the problem of restructuring the order of the
trip-count checks (because we need to fast fail to the scalar loop
for the smallest trip counts). Maybe we can do this the same way?
This means finding a dominating check on the trip count that implies
the check we're about to insert, change that check to the check we
want, keeping the original only on the true side of the trip-count
check (i.e. if the trip count is larger than the small threshold,
then check the large threshold).<br class="">
<br class="">
If we're going to do all of that, then I'd lean toward saying that
this does not belong in the vectorizer at all. Rather, this seems
like something we'd want in some general transformation (this seems
somewhat akin to what JumpThreading does). The general
transformation seems something like this; the basic vectorized loop
looks like this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br class="">
...<br class="">
}<br class="">
</tt><br class="">
and after we vectorize the epilogue loop we end up with this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">if (n >= vf2) {<br class=""></tt></div></div></blockquote><div><br class=""></div><div>This is (n - start >= vf2) which I think makes this a bit more complicated since now “check” is not always redundant if n >= vf2 so hoisting becomes a cost decision.</div><br class=""><blockquote type="cite" class=""><div class=""><div bgcolor="#FFFFFF" text="#000000" class=""><tt class="">
if (check) {<br class=""></tt></div></div></blockquote><div><br class=""></div><div>But we can turn this into a predicate we already have: if (n >= vf && check). BTW, this is I think how Ashutosh’ approach 1 works too; if the first vector loop does not iterate once the alias result will be false in the epilog vector loop.</div><br class=""><blockquote type="cite" class=""><div class=""><div bgcolor="#FFFFFF" text="#000000" class=""><tt class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br class="">
...<br class="">
}</tt><br class="">
<br class="">
and we need to end up with this:<br class="">
<br class="">
<tt class="">int start = 0;<br class="">
if (n >= vf2) {<br class="">
if (check) {<br class="">
if (n >= vf) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
<br class="">
for (i = start; i < n; ++i) {<br class="">
...<br class="">
}<br class="">
</tt><br class="">
where we've recognized here that 'check' is the same in both cases,
and that because vf2 < vf, the one trip-count check implies the
other. This latter part seems like the part that our existing passes
might not know what to do with currently. Thoughts?<br class=""></div></div></blockquote><div><br class=""></div><div><br class=""></div><div>And then we get:</div><div><tt style="background-color: rgb(255, 255, 255);" class=""><br class=""></tt></div><div><tt style="background-color: rgb(255, 255, 255);" class="">int start = 0;</tt></div><div><font face="monospace" class=""><b class="">bool check_result = false;<br style="background-color: rgb(255, 255, 255);" class=""></b></font><tt style="background-color: rgb(255, 255, 255);" class="">if (n >= vf) {<br class=""> if (check) {</tt></div><div><tt style="background-color: rgb(255, 255, 255);" class=""> <b class="">check_result = true;</b><br class=""> for (...; start += vf)<br class=""> ...<br class=""> }<br class="">}<br class=""></tt><br style="background-color: rgb(255, 255, 255);" class=""><tt style="background-color: rgb(255, 255, 255);" class="">if (n - start >= vf2) {<br class=""> if (<b class="">check_result</b>) {<br class=""> for (...; start += vf2)<br class=""> ...<br class=""> }<br class="">}<br class=""></tt><br style="background-color: rgb(255, 255, 255);" class=""><tt style="background-color: rgb(255, 255, 255);" class="">for (i = start; i < n; ++i) {<br class=""> …<br class="">}</tt></div><div><tt style="background-color: rgb(255, 255, 255);" class=""><br class=""></tt></div><div>What do you think?</div><div><br class=""></div><div>Adam</div><br class=""><blockquote type="cite" class=""><div class=""><div bgcolor="#FFFFFF" text="#000000" class="">
<br class="">
-Hal<br class="">
<br class="">
<blockquote cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com" type="cite" class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">Adam</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""> <br class="">
-Hal<br class="">
<br class="">
<blockquote cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com" type="cite" class="">
<div class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">I still don’t understand why that’s
not possible with some sophisticated predicate
propagation independent from the vectorizer. I am
not saying it’s already possible but it should be.</div>
<div class=""><br class="">
</div>
<div class="">Adam</div>
<div class=""><br class="">
</div>
</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="WordSection1" style="page:
WordSection1; font-family: Helvetica; font-size:
10px; font-style: normal; font-variant-caps:
normal; font-weight: normal; letter-spacing:
normal; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;">
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">NOTE: We cannot prepone alias check
as its expensive compared to other checks.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><Approach-1><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif; text-indent: -0.25in;" class=""><span class="">1)<span style="font-style: normal;
font-variant-caps: normal; font-weight:
normal; font-size: 7pt; line-height:
normal; font-family: 'Times New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Current
patch depends on the existing functionality of
LoopVectorizer, it uses ‘InnerLoopVectorizer’
again to vectorize the epilog loop, as it
happens in the same vectorization pass we have
flexibility to reuse already computed alias
result check & limit vectorization factor
for the epilog loop.<span class="Apple-converted-space"> </span><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif; text-indent: -0.25in;" class=""><span class="">2)<span style="font-style: normal;
font-variant-caps: normal; font-weight:
normal; font-size: 7pt; line-height:
normal; font-family: 'Times New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>It
does not generate the blocks for new block
layout explicitly, rather it depends on
‘InnerLoopVectorizer::createEmptyLoop’ to
generate new block layout. The new block
layout get automatically generated by calling
the ‘InnerLoopVectorizer:: vectorize’ again.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif; text-indent: -0.25in;" class=""><span class="">3)<span style="font-style: normal;
font-variant-caps: normal; font-weight:
normal; font-size: 7pt; line-height:
normal; font-family: 'Times New Roman';" class=""> <span class="Apple-converted-space"> </span></span></span>Block
layout description with epilog loop
vectorization is available at<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt 0.5in;
font-size: 11pt; font-family: Calibri,
sans-serif;" class=""><a moz-do-not-send="true" href="https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png" style="color: purple; text-decoration:
underline;" class="">https://reviews.llvm.org/file/data/fxg5vx3capyj257rrn5j/PHID-FILE-x6thnbf6ub55ep5yhalu/LayoutDescription.png</a><o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">Approach-1 looks feasible, please
comment if any objections.<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">Regards,<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class="">Ashutosh<o:p class=""></o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><o:p class=""> </o:p></div>
<div style="margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: 'Times New Roman', serif;" class=""><span style="font-size: 11pt;
font-family: Calibri, sans-serif; color:
rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div>
...<br class="">
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br class="">
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</div>
</div></blockquote></div><br class=""></div></body></html>