<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 03/25/2017 10:57 AM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTXNtWd+UY92GaF-4W08paAmLQ_YZSvHu8FoCxC95e3SuA@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sat, Mar 25, 2017 at 5:28 AM, Hal
Finkel <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<p><br>
</p>
<div
class="gmail-m_1179969366190122482moz-cite-prefix">On
03/24/2017 06:00 PM, Daniel Berlin wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Mar 24, 2017 at
3:44 PM, Daniel Berlin <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:dberlin@dberlin.org"
target="_blank">dberlin@dberlin.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">In which i repeat my claim
that trying to keep caches up to date is
significantly harder to get right and
verify than incremental recomputation :)</div>
</blockquote>
<div><br>
</div>
<div>and to be super clear here, the practical
difference between the two is that
incremental computation is really "build an
analysis that can be run at any point, but
is a full analysis that computes all answers
and not on demand. The speed at which the
analysis runs depends on how much data has
changed since last time.".</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> How exactly do you propose this works in this
case (given that the recomputation cost is proportional
to the amount the IR has changed)? </div>
</blockquote>
<div><br>
</div>
<div>As i mentioned, first i would actually measure whether
we really need to be recomputing that often *at all*.</div>
<div>There is no point in either incrementally recomputing
*or* computing on demand if it is not actually buying us
significant generated code performance.</div>
<div>The simplest thing to verify correct is the one that
does not need to be updated at all :)</div>
<div><br>
</div>
<div>Second,yes, the recomputation cost is proportional in
either case.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">It sounds like caching with the
default behavior on RAUW flipped. </div>
</blockquote>
<div>In this case, it would be similar.</div>
<div>Your caching, right now, has the following properties
that are maybe easily changed:<br>
</div>
<div>It is not a full analysis, it must be handled
explicitly</div>
<div>It has no verifier (you would think you could write one
by comparing every value against computing a not cached
result, but i think you will discover that does not work
either, see below)</div>
<div>It has no printer.</div>
<div>The above are all things you suggested might be fixed
anyway, so i'm going to leave them out except
verification.</div>
<div><br>
</div>
<div>Past that, the other explicit difference here is the
approach to computation.<br>
</div>
<div><br>
</div>
<div>On-demand cached computation here starts at "thing that
i am missing data for", and goes walking through the IR
backwards to try to find the answer.</div>
<div>It does not pay attention to computation order, etc.</div>
<div>You start at the leaves, you end at the the root.</div>
<div>That means it often does significant duplicated work
(even if that work is wasted cache lookups and visits) or
provably does not get a maximal fixpoint when making use
of the cache.</div>
<div><br>
</div>
<div>In this case, it is both ;)</div>
<div><br>
</div>
<div>For example, for an operator, it computes the known
bits for the LHS and RHS recursively.</div>
<div>Without caching, it computes them repeatedly, visiting
from use->def.</div>
<div><br>
</div>
<div>Incremental recomputation would compute known-bits as a
forwards propagation problem over the function. </div>
<div>You start at the root, you end at the leaves.</div>
<div>When you visit an operator, the LHS and RHS would
already have been visited (if they are not part of a
cycle), and thus you compute it *once*.</div>
<div>This requires ordering the computation and propagation
properly, mind you.</div>
<div><br>
</div>
<div>Second, the caching version does not reach a maximal
fixpoint:</div>
<div>
<div> // Recurse, but cap the recursion to one
level, because we don't</div>
<div> // want to waste time spinning around in
loops.</div>
</div>
<div>This is because it is working backwards, on demand, and
so phi computation sometimes cannot reach a fixpoint in a
sane amount of time, so it gets cutoff.<br>
</div>
<div>Solving the forwards problem properly, in the correct
order, will *always* give the same answer from the same
starting point, given the same set of invalidation.<br>
</div>
<div><br>
</div>
<div>Note that this in turn affects verification. Because
the on-demand version has cut-offs that effectively depend
on visitation depth, the cached version will get different
answers depending on how much is cached, because it will
need to go less deep to get a specific answer.</div>
<div>This depends on visitation order.</div>
<div>This in turn, makes it really hard to write a verifier
:) You can't just compared cached vs non-cached. The
cached may get better or worse answers depending on
visitation order (incremental recomputation never does in
this case, assuming you invalidate properly in both
cases).<br>
</div>
<div><br>
</div>
<div>I'm pretty sure *you* get this, but for those following
along:<br>
Imagine you have a phi</div>
<div>phi(a, b, c, d)</div>
<div><br>
</div>
<div>Currently, while walking this phi, it will only recurse
a certain depth to find an answer. Let's pretend it's
depth 1.</div>
<div>Let's say given the initial order you call compute
demanded bits in (you call it on the phi before a, b, c,
d), they are all too deep for it to find an answer, it
can't visit deep enough to compute a, b, c and d</div>
<div><br>
</div>
<div>Thus, if you start from *nothing*, the answer you get
for the phi is:<br>
"unknown".</div>
<div>Now, computing the phi is separate from computing the
answer for a, b, c, d.</div>
<div>So next, you visit a, b, c, and d, and compute their
answers.</div>
<div>It can successfully do so (because the depth limit
applies to the phi, and even if it didn't, the depth limit
for these guys is +1 what it was from the phi)</div>
<div><br>
</div>
<div>Now, if you invalidate the phi in the cache (assume
it's just an innocent victim and the value did not really
change, it just "may have"), and ask it to compute the
answer:<br>
<br>
</div>
<div>low and behold, now it has the answers for a, b, c and
d, and can give you an answer for the phi.</div>
<div><br>
</div>
<div>So the cached version gave us a better answer.<br>
</div>
<div><br>
</div>
<div>The real reason is because it got closer to the maximal
fixpoint than the original version.</div>
<div><br>
But still, much harder to verify because the answer really
is allowed to vary depending on the order in which you ask
it for answers.</div>
<div><br>
</div>
<div>The forwards incremental computation has exactly zero
of these problems.</div>
<div>It always computes the same answer.</div>
<div>It only varies the time taken to do so.</div>
<div><br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">Instead of assuming by default that
the replacement instructions has the same known bits as
the original, we assume that it does not (and then
recursively clear out all users).<br>
</div>
</blockquote>
<div><br>
</div>
<div>If they were computed in the same direction, with the
same propagation strategy, yes, i would agree.</div>
</div>
</div>
</div>
</blockquote>
<br>
In general, I agree with all of this. To be clear, I posted this
patch so that we could make some measurements of InstCombine with
the known-bits computation cost stabilized. Given that we compute
known bits on essentially every relevant instruction, doing this
ahead of time makes perfect sense (i.e. aside from all of the
disadvantages, we likely gain very little from the laziness).<br>
<br>
I do want to explicitly clarify the following point, however. In the
analysis as you imagine it, what exactly do we do when we replace an
IR value (or update an instruction's flags)?<br>
<br>
Thanks again,<br>
Hal<br>
<br>
<blockquote
cite="mid:CAF4BwTXNtWd+UY92GaF-4W08paAmLQ_YZSvHu8FoCxC95e3SuA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
-Hal<span class="gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>The compile time hit depends on how often
you want answers. Usually people try to say
"i want them always up to date" (often
without any quantification of cost or other
trade-offs), which leads to trying to do it
on-demand</div>
<div><br>
</div>
<div>Caching is really "build something that
is always computed on demand. Try to make
it faster by storing the answers it gives
somewhere, and hoping that the rest of the
algorithm retains the ability to give
correct answers on demand when you remove
some of the stored answers".</div>
<div><br>
</div>
<div>One can obviously argue that incremental
recomputation is a form of caching, so i'm
trying to be concrete about my definitions.</div>
<div><br>
</div>
<div>usually, caching additionally requires
clients be able to correctly predict and
remove the set of answers necessary for
correctness (IE know enough about how the
algorithm works) but this isn't strictly the
case if designed well.</div>
<div><br>
</div>
<div>Caching also often ends up with degraded
answers over time, while most full analysis
can be made to not do so. See, for example,
memdep, which has unresolvable cache
invalidation issues. This means, for
example, opt -gvn -gvn gives different
answers than opt -gvn | opt -gvn.</div>
<div><br>
</div>
<div>Both approaches obviously require
information about what has changed in the
IR.</div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div> I additionally claim that we should
actually quantify the performance
benefits of doing either over
"recomputing once or twice during
pipeline" before going down this route.
<div><br>
</div>
<div>We are *already* trading off
compile time performance vs generated
code performance. </div>
<div><br>
</div>
<div>All other things being equal, we
should trade it off in a way that
makes maintenance of code and
verification of correctness as easy as
possible.</div>
<div><br>
</div>
<div>Asserting that a good tradeoff to
reduce compile time is "cache" instead
of "stop doing it on demand" (or
incrementally recompute), without any
data other than compile time
performance seems ... not particularly
scientific.</div>
<div><br>
</div>
<div>It's certainly true that caching is
often posited as "easier", but i think
a trip through bugzilla would put this
idea to rest.</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
<div
class="gmail-m_1179969366190122482HOEnZb">
<div class="gmail-m_1179969366190122482h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Mar
24, 2017 at 3:33 PM, Craig Topper
via Phabricator <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:reviews@reviews.llvm.org"
target="_blank">reviews@reviews.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">craig.topper
added a comment.<br>
<br>
I'm seeing more problems than just
nsw/nuw flags here.<br>
<br>
sext.ll test is failing because
SimplifyDemandedInstructions bits
determined that this<br>
<br>
%and = and i32 %x, 16<br>
<br>
shl i32 %and, 27<br>
<br>
Simplified to just the shl because
we were only demanding the MSB of
the shift. This occurred after we
had cached the known bits for the
shl as having 31 lsbs as 0. But
without the "and" in there we can
no longer guarantee the lower bits
of the shift result are 0.<br>
<br>
I also got a failure on shift.ll
not reported here. This was caused
by getShiftedValue rewriting
operands and changing constants of
other shifts. This effectively
shifts the Known bits and thus the
cache needs to be invalidate.<br>
<br>
<br>
<a moz-do-not-send="true"
href="https://reviews.llvm.org/D31239"
rel="noreferrer" target="_blank">https://reviews.llvm.org/D3123<wbr>9</a><br>
<br>
<br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</span><span class="gmail-HOEnZb"><font color="#888888">
<pre class="gmail-m_1179969366190122482moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>