[PATCH] D67500: [DebugInfo] LiveDebugValues: don't create transfer records for potentially invalid locations

Tue Jan 7 04:46:29 PST 2020

jmorse added a comment.

Hi,

I think I can narrow the conceptual difference between our interpretations down to this:

> we should get an initial set of out-locs for each basic block that is derived
>  just from the info we have for sure *within* each basic block and gradually
>  more info gets propagated until we may or may not have a join function that
>  produces new in-locs for a basic block

Plus, you're saying "up" the lattice and I'm saying "down".

Reading through "Compilers, Principles, Techniques and Tools" (Aho, Lam, Sethi, Ullman) [0], there are actually two flavours of dataflow analyses (section 9.2):

- Those where all lattice values are initialized to the bottom element of the lattice, then are incrementally moved up the lattice as more truths about the program become available.
- Those where all lattice values are initialised to the _top_ element of the lattice, and are incrementally moved _down_ the lattice as more information about false-ness in the program becomes available.

Effectively the former assumes nothing and tries to build facts up from the program blocks, wheras the latter assumes everything is true and then removes those facts that can be proved false. I've been exclusively thinking in terms of the second one, and I think your terminology indicates you see LiveDebugValues as the first. (It doesn't help that I've continued to use the term "join", when I think (90%) in the second model it's actually "meet". I can never remember which way round they are).

This would explain the confusion over this patch and D66599 <https://reviews.llvm.org/D66599>: in the first kind of dataflow analysis, during every iteration of the analysis, live-in and live-out sets only take on a lattice element if it's definitely true. Wheras in the second kind, lattice elements aren't necessarily correct until the fixedpoint is reached, because they might yet reduce down the lattice. That's the essence of this patch (D67500 <https://reviews.llvm.org/D67500>), not trying to determine transfers until after the fixedpoint is reached.

Furthermore, it explains why I see a need for multiple transitions in lattice elements. In the original lattice (putting \bot at the bottom):

  True     False
    \       / 
     \     /
      \   /
       \ /
       \bot

You would only ever move up the lattice to True or False if it was definitely true -- and if it was definitely true, you would only ever transition once.  Wheras I think the lattice I'm effectively working with is:

  \top
    |
  True
    |
  False

Where all values are initialized to \top, the "unknown" element (when a block is unvisited). Values decay to True when the block is first visited and the location _could_ be true, and then potentially decay to False if that assumption is later found to be invalid. (Coincidentally, False is also \bot here).

Which dataflow model is better then? The Aho/Lam/Sethi/Ullman book says the first computes a minimal solution, the second a maximal, and for variable locations I suspect we want a maximal solution. But in addition, the Cooper/Torczon "Engineering a Compiler" book (Section 9.3.6) states the second kind is an optimistic algorithm suited for dealing with loops. If you have a loop like this:

  entry:
    dbg.value(%0, ...)
    br label %loop
  loop:
    call void @nop()
    br i1 %cond, label %loop, label %exit
  exit:
    ret void

Then the live-in lattice element of the loop block has a circular dependency with itself due to the loop backedge: it will be True if it is already True. With the second kind of dataflow analysis, we assume the live-in is True and that the backedge will confirm this later: or if the location is False across the backedge then the live-in will later reduce to False. I don't think the first kind of analysis can cope with this case, as you have to start with some kind of assumption or speculation that the live-in is True to later find out whether the backedge live-out is True. Hence, the first kind of analysis produces a minimal solution, the second maximal. This is also the problem that kicked off the 2016 discussion about missing locations [1], locations not being propagated into loops.

In summary:

- I think we've been talking about two different flavours of dataflow analyses, happily I think I've found the words to describe the difference,
- I believe we want to use an optimistic dataflow analysis (the second kind above) because it's necessary to propagate locations through loops,
- This second kind of dataflow analysis requires a lattice transition from True to False because it works by starting with assuming everything is True [2], then removing anything that can be proved to be False.

(I speculated in another comment that I might have written some code that performs False -> True transitions; happily that turns out to not be correct).

[0] Which I'm led to believe is the dragon book, although the international edition doesn't have a dragon on the cover :/
[1] https://groups.google.com/d/msg/llvm-dev/nlgxIfDQnhs/c55kcncIAAAJ
[2] Technically I think the Unknown lattice element in my model is un-necessary, it's a hack to let us avoid initialising every variable location ahead of time. Didn't want to get bogged down with it in this analysis though.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67500/new/

https://reviews.llvm.org/D67500