<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 03/15/2017 02:03 PM, Adam Nemet
wrote:<br>
</div>
<blockquote
cite="mid:FF0C48A9-BAD5-4430-B6B5-E4DA14E6B731@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Mar 15, 2017, at 9:46 AM, Hal Finkel <<a
moz-do-not-send="true" href="mailto:hfinkel@anl.gov"
class="">hfinkel@anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 03/14/2017 07:50 PM, Adam
Nemet wrote:<br class="">
</div>
<blockquote
cite="mid:F261C572-ECFA-4EA4-9A46-241DD475BB06@apple.com"
type="cite" class="">
<div dir="auto" style="word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;" class=""><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 11:33 AM, Hal
Finkel <<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<p class=""><br class="">
</p>
<br class="">
<div class="moz-cite-prefix">On 03/14/2017
12:11 PM, Adam Nemet wrote:<br class="">
</div>
<blockquote
cite="mid:E9894D92-6F15-4C01-8085-D8F220B64CD4@apple.com"
type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017, at 9:49
AM, Hal Finkel <<a
moz-do-not-send="true"
href="mailto:hfinkel@anl.gov"
class="">hfinkel@anl.gov</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000"
class="">
<p class=""><br class="">
</p>
<div class="moz-cite-prefix">On
03/14/2017 11:21 AM, Adam Nemet
wrote:<br class="">
</div>
<blockquote
cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
type="cite" class=""> <br
class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 14, 2017,
at 6:00 AM, Nema, Ashutosh
<<a
moz-do-not-send="true"
href="mailto:Ashutosh.Nema@amd.com"
class="">Ashutosh.Nema@amd.com</a>>
wrote:</div>
<br
class="Apple-interchange-newline">
<div class="">
<div class="WordSection1"
style="page: WordSection1;
font-family: Helvetica;
font-size: 10px;
font-style: normal;
font-variant-caps: normal;
font-weight: normal;
letter-spacing: normal;
orphans: auto; text-align:
start; text-indent: 0px;
text-transform: none;
white-space: normal;
widows: auto;
word-spacing: 0px;
-webkit-text-stroke-width:
0px;">
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class="">Summarizing
the discussion on the
implementation
approaches.<o:p class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class=""><o:p
class=""> </o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class="">Discussed
about two approaches,
first running
‘InnerLoopVectorizer’
again on the epilog loop
immediately after
vectorizing the original
loop within the same
vectorization pass, the
second approach where
re-running vectorization
pass and limiting
vectorization factor of
epilog loop by metadata.<o:p
class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class=""><o:p
class=""> </o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class=""><Approach-2><o:p
class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class="">Challenges
with re-running the
vectorizer pass:<o:p
class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt 0.5in;
font-size: 11pt;
font-family: Calibri,
sans-serif; text-indent:
-0.25in;" class=""><span
class="">1)<span
style="font-style:
normal;
font-variant-caps:
normal; font-weight:
normal; font-size:
7pt; line-height:
normal; font-family:
'Times New Roman';"
class=""> <span
class="Apple-converted-space"> </span></span></span>Reusing alias check
result:<span
class="Apple-converted-space"> </span><o:p
class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt 0.5in;
font-size: 11pt;
font-family: Calibri,
sans-serif;" class="">When
vectorizer pass runs
again it finds the
epilog loop as a new
loop and it may
generates alias check,
this new alias check may
overkill the gains of
epilog vectorization.<o:p
class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt 0.5in;
font-size: 11pt;
font-family: Calibri,
sans-serif;" class="">We
should use the already
computed alias check
result instead of re
computing again.<o:p
class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt 0.5in;
font-size: 11pt;
font-family: Calibri,
sans-serif; text-indent:
-0.25in;" class=""><span
class="">2)<span
style="font-style:
normal;
font-variant-caps:
normal; font-weight:
normal; font-size:
7pt; line-height:
normal; font-family:
'Times New Roman';"
class=""> <span
class="Apple-converted-space"> </span></span></span>Rerun the vectorizer
and hoist the new alias
check:<o:p class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt 0.5in;
font-size: 11pt;
font-family: Calibri,
sans-serif;" class="">It’s
not possible to hoist
alias checks as its not
fully redundant (not
dominated by other
checks), it’s not
getting execute in all
paths.<o:p class=""></o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class=""><o:p
class=""> </o:p></div>
<div style="margin: 0in
0in 0.0001pt; font-size:
12pt; font-family:
'Times New Roman',
serif;" class=""><span
id="cid:part2.8A601F10.7A15555C@anl.gov"
class=""><Mail
Attachment.jpeg></span></div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">I don’t
understand. Looks like you
have the same alias checks
for the epilog loop too.
How is this CFG different
from the re-vectorization of
the scalar loop? </div>
</div>
</div>
</blockquote>
<br class="">
You're looking at the wrong thing.
This *is* the image from
re-vectorization. The other image
(linked below in step (3)) shows the
other option.<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Ah ok, the numbering confused me here.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000"
class=""> <br class="">
<blockquote
cite="mid:B6F61517-2F3B-46B6-9A10-6E1D534AFD83@apple.com"
type="cite" class="">
<div class="">
<div class="">
<div class=""> Would be good
to have both CFGs here and
highlighting the difference.</div>
<div class=""><br class="">
</div>
<div class="">I thought that
the whole point was that
*if* you reached the epilog
vector loop via the first
vector loop, you want to
bypass the alias checks
before the epilog vector.</div>
</div>
</div>
</blockquote>
<br class="">
Yes, but, that's not quite true now.
You can also reach the epilogue loop
if you fail the min-trip-count
check, and so you don't know
anything about the aliasing checks.<br
class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">OK, so we want this loops to
be handled specially. We effectively
say that we only vectorize this loop if
it does not require any alias checks or
if the alias checks can be
predicate-forwarded to this loop from
existing checks.</div>
<div class=""><br class="">
</div>
<div class="">This still seems like an
orthogonal issue that may be interesting
to solve independently. In other words
this could be a nice feature in the
vectorizer anyway: the loop is estimated
to be low-trip count so feel free to
predicate the new vector loop so that
the alias check result could be reused
from some other block. We obviously
don’t have this capability today but
it’s something that could be nice aside
from the vectorizer.</div>
</div>
</blockquote>
<br class="">
That sounds great. I'm not sure, however,
exactly what this means. This "predicate
forwarding" sounds like a non-local
restructuring of the surrounding code (because
the predicates aren't known to be true at that
point, we need to search different
predecessors to find one in which the
conditions might be true and insert the vector
loop across that predecessor edge instead).
Maybe we could do this, for example, by
calling SE->isKnownPredicate enhanced with
some additional context sensitivity because we
currently check dominating conditional
branches for things that are AddRecs in some
loop? Moreover, we then have the problem of
restructuring the order of the trip-count
checks (because we need to fast fail to the
scalar loop for the smallest trip counts).
Maybe we can do this the same way? This means
finding a dominating check on the trip count
that implies the check we're about to insert,
change that check to the check we want,
keeping the original only on the true side of
the trip-count check (i.e. if the trip count
is larger than the small threshold, then check
the large threshold).<br class="">
<br class="">
If we're going to do all of that, then I'd
lean toward saying that this does not belong
in the vectorizer at all. Rather, this seems
like something we'd want in some general
transformation (this seems somewhat akin to
what JumpThreading does). The general
transformation seems something like this; the
basic vectorized loop looks like this:<br
class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br
class="">
...<br class="">
}<br class="">
</tt><br class="">
and after we vectorize the epilogue loop we
end up with this:<br class="">
<br class="">
<tt class="">int start = 0;</tt><br class="">
<tt class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">if (n >= vf2) {<br class="">
</tt></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">This is (n - start >= vf2) which I
think makes this a bit more complicated since now
“check” is not always redundant if n >= vf2 so
hoisting becomes a cost decision.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><tt
class=""> if (check) {<br class="">
</tt></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">But we can turn this into a predicate
we already have: if (n >= vf && check).
BTW, this is I think how Ashutosh’ approach 1
works too; if the first vector loop does not
iterate once the alias result will be false in the
epilog vector loop.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><tt
class=""> for (...; start += vf2)<br
class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br class="">
<tt class="">for (i = start; i < n; ++i) {<br
class="">
...<br class="">
}</tt><br class="">
<br class="">
and we need to end up with this:<br class="">
<br class="">
<tt class="">int start = 0;<br class="">
if (n >= vf2) {<br class="">
if (check) {<br class="">
if (n >= vf) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
<br class="">
for (i = start; i < n; ++i) {<br class="">
...<br class="">
}<br class="">
</tt><br class="">
where we've recognized here that 'check' is
the same in both cases, and that because vf2
< vf, the one trip-count check implies the
other. This latter part seems like the part
that our existing passes might not know what
to do with currently. Thoughts?<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">And then we get:</div>
<div class=""><tt style="background-color: rgb(255,
255, 255);" class=""><br class="">
</tt></div>
<div class=""><tt style="background-color: rgb(255,
255, 255);" class="">int start = 0;</tt></div>
<div class=""><font class="" face="monospace"><b
class="">bool check_result = false;<br
style="background-color: rgb(255, 255,
255);" class="">
</b></font><tt style="background-color: rgb(255,
255, 255);" class="">if (n >= vf) {<br
class="">
if (check) {</tt></div>
<div class=""><tt style="background-color: rgb(255,
255, 255);" class=""> <b class="">check_result
= true;</b><br class="">
for (...; start += vf)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br style="background-color: rgb(255, 255,
255);" class="">
<tt style="background-color: rgb(255, 255, 255);"
class="">if (n - start >= vf2) {<br class="">
if (<b class="">check_result</b>) {<br
class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
}<br class="">
</tt><br style="background-color: rgb(255, 255,
255);" class="">
<tt style="background-color: rgb(255, 255, 255);"
class="">for (i = start; i < n; ++i) {<br
class="">
…<br class="">
}</tt></div>
<div class=""><tt style="background-color: rgb(255,
255, 255);" class=""><br class="">
</tt></div>
<div class="">What do you think?</div>
</div>
</div>
</blockquote>
<br class="">
I think this seems about right. We shouldn't re-compare
the trip counts in the epilogue loop if the alias check is
false anyway:<br class="">
<br class="">
<div class=""><tt style="background-color: rgb(255, 255,
255);" class="">int start = 0;</tt><font class=""
face="monospace"><b class=""><br
style="background-color: rgb(255, 255, 255);"
class="">
</b></font><tt style="background-color: rgb(255, 255,
255);" class="">if (n >= vf) {<br class="">
if (check) {<br class="">
for (...; start += vf)<br class="">
...<br class="">
} else { goto scalar; }<br class="">
} else { goto scalar; }<br class="">
</tt><br style="background-color: rgb(255, 255, 255);"
class="">
<tt style="background-color: rgb(255, 255, 255);"
class="">if (n - start >= vf2) {<br class="">
for (...; start += vf2)<br class="">
...<br class="">
}<br class="">
</tt><br class="">
<tt class="">scalar:</tt><br style="background-color:
rgb(255, 255, 255);" class="">
<tt style="background-color: rgb(255, 255, 255);"
class="">for (i = start; i < n; ++i) {<br class="">
…<br class="">
}</tt></div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
<div>So to summarize, as long as we guard the epilogue loop with
'n >= vf && check’, we should be able to
jump-thread/predicate-foward to the vectorized epilog loop and
remove the checks.</div>
<div><br class="">
</div>
<div>Is this the direction then (rerun vectorizer + enhance/add
the above general-purpose optimizations) or we’re still
circling back to handling this with a single run of the
vectorizer?</div>
<div><br class="">
</div>
<div>I think that the general difficulty of trying to get the
optimal version of code that has been loop-versioned multiple
times is that we are interested in transformation that don’t
necessarily preserve semantics. I.e. it’s OK to for example
to fall back to the original loop in more cases since we
happen to know that the original loop is equivalent to the
versioned loop.</div>
</div>
</blockquote>
<br>
I think we've circled back to handling this in a single run of the
vectorizer for exactly this reason. Doing this after the fact would
not be semantics preserving.<br>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:FF0C48A9-BAD5-4430-B6B5-E4DA14E6B731@apple.com"
type="cite">
<div>
<div><br class="">
</div>
<div>Adam </div>
<br class="">
</div>
</blockquote>
...<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>