[PATCH] D88406: [LiveDebugValues][InstrRef][2/2] Emit entry value variable locations

Mon Sep 28 12:56:06 PDT 2020

jmorse added a comment.

Djordje wrote:

> Can you please explain what part of the implementation of the instr-based-ldv gives us ability to "see through" to the value assignments?

Good question, and it's probably time for a worked example of the whole "system", using kill-entry-value-after-diamond-bbs.mir. It's ideal because it doesn't contains loops, and loops are an entirely different kettle of fish. The overall summary is: we effectively compute a map for every instruction in the function recording which value is in each register, so we can "see" that the DBG_VALUEs assigns the same value that was already assigned to the variable. I'll be overly verbose below and use some tables to illustrate what's going on

Let's also pretend that I've applied D85760 <https://reviews.llvm.org/D85760> to the tree -- it doesn't change the overall algorithm, but makes it easier to understand. Without it, steps one and three (below) are merged. Haven't gotten around to commiting D85760 <https://reviews.llvm.org/D85760> yet :|

The kill-entry-value-after-diamond-bbs.mir test has the diamond shape:

     a
    / \
   /   \
  b     c
   \   /
    \ /
     d

Each value in the function is represented by a unique ValueIDNum, which I'm representing below as "Value{a, b, c}", where 'a' is the block number, 'b' is the defining instructions number in the block or 'live-in', and 'c' identifies the register where the value is defined.

There are roughly five steps in the "ExtendRanges" method in InstrRefBasedLDV that deal with everything:

1. Produce machine value transfer function
2. Work out the live-in machine values for each block
3. Produce variable value transfer function
4. Work out the live-in variable values for each block
5. Produce variable locations for each variable in each block.

Step one: we produce a transfer function for each block. This is the same ``transfer function'' as defined by dataflow literature, it defines what values each block places in each location. Here's an example for the entry block in the example test:

  $rsp = Value{0, 8, $rsp}
  $edi = Value{0, 18, $edi}
  $ebp = Value{0, live-in, $edx}
  $edx = Value{0, 16, $edx}

It's a summary of how values move around in the block. All of $rsp, $edx and $edi are assigned new values in the block -- but $ebp is assigned a copy of $edx in the entry block. Whatever value is live-in to the block in $edx is copied into $ebp during the block. All registers that aren't defined in a block are live-through, whatever value comes in goes out in the same location. The transfer functions for the two diamond branches, block 1:

  $ebx = Value{1, 4, $ebx}
  $esi = Value{1, 3, $esi}

Block 2:

  $ebp = Value{2, 4, $ebp}
  $esi = Value{2, 3, $esi}

Which represents the registers defined by the MOV32ri instructions.

Step two: we use the "normal" dataflow process to work out what values are live-in to each block. The lattice used is a bit crazy, but ignoring loops it works like this: the initial live-ins to the entry block are these values (lets ignore rsp):

  $edi = Value{0, live-in, $edi}
  $esi = Value{0, live-in, $esi}
  $ebx = Value{0, live-in, $ebx}
  $ebp = Value{0, live-in, $ebp}
  $edx = Value{0, live-in, $edx}

We can apply the transfer function above, and find the live-outs from the entry block to be:

  $edi = Value{0, 18,  $edi}
  $esi = Value{0, live-in, $esi}
  $ebx = Value{0, live-in, $ebx}
  $ebp = Value{0, live-in, $edx}
  $edx = Value{0, 16, $edx}

Note that $edi and $edx have been assigned values; $ebp has been assigned whatever was in $edx on entry, and both $esi and $ebx are "live-through", so unchanged. Because blocks 1 and 2 only have the entry block as their predecessor, we can just copy the live-outs from the entry block to be the live-ins for blocks 1 and 2. We apply the transfer function from each block again to work out what their live-outs are: in block 1 $ebx and $esi are assigned values, while in block 2 $ebp and $esi are assigned values. Given the live-out values for blocks 1 and 2, we can compute the live-ins to block 3: but we have to resolve the different values being merged from the two predecessors: {$ebx, $esi, $ebp} have different values in each block. These are recorded as being PHI values, like this:

  $edi = Value{0, 18,  $edi}
  $esi = Value{3, live-in, $esi}
  $ebx = Value{3, live-in, $ebx}
  $ebp = Value{3, live-in, $ebx}
  $edx = Value{0, 16, $edx}

$edi and $edx get the same Value because the predecessors agree on the value; wheras the other registers get new Values invented indicating that a PHI occurred on entry to block 3.

Key observation about this: once we have a set of live-in values for each block, we can work out a mapping that identifies **what value each register has** for every instruction. That brings us on to,

Step three (assume D85760 <https://reviews.llvm.org/D85760> is applied, see the code immediately after mlocDataflow is called). For each block, we:

- Take its live-in values as calculated above (they get loaded into the MLocTracker class),
- Step through each instruction in the block again,
- For every DBG_VALUE instruction **read the value** in the specified register, and record it as a variable assignment.

How we "see through" those DBG_VALUE assignments should now be clear(er): We know the live-in value in $esi for blocks 1 and 2 is Value{0, live-in, $esi}, and we can track the copy to $ebx or $ebp respectively. When the DBG_VALUE instructions read either $ebx or $ebp, they will read Value{0, live-in, $esi}.

We then get the following variable value transfer functions for each block (looking only at  !17):

  entry: !17 = Value{0, live-in, $esi}
  block1: !17 = Value{0, live-in, $esi}
  block2: !17 = Value{0, live-in, $esi}

And can then perform:

Step four: another dataflow process with the variable transfer functions above, which calculates the live-in **values** for each variable. It's very simple in this example: the variable is assigned the same value in every block. It becomes much more painful with loops, but let's skip over that, and move to:

Step five: We now have a set of live-in values for each register for each block; and a set of live-in values for each variable in each block. For each block, we:

- Create a map of each Value to a location (register)
- For each live-in variable value, look up whether that value is in a register somewhere, and if so emit a DBG_VALUE,
- Also for each live-in variable, record what location we placed it in,
- Step through each instruction, and
  - If it's a "normal" instruction, record any definitions / copies of values that happen,
  - If it's a DBG_VALUE, update the value and location that the variable has,
  - If a "normal" instruction clobbers any variable locations, try to find a backup location and emit a DBG_VALUE.

Entry values can be emitted when the **variable value** is an entry value, i.e. it was the value live-in to the entry block Value{0, live-in, $somereg}, and we can't find a location when entering a block or when a register is clobbered.

~

So to repeat the summary from above: we have a map for every instruction recording what value is in each register. The difference that those DBG_VALUEs in blocks 1 and 2 make is that the variable value transfer function is:

  entry: !17 = Value{0, live-in, $esi}
  block1: !17 = Value{0, live-in, $esi}
  block2: !17 = Value{0, live-in, $esi}

Wheras without them it would be:

  entry: !17 = Value{0, live-in, $esi}

Where the value in entry would propagate into the other blocks anyway.

~

I'm still struggling to find a good way to describe how these things (are supposed to) work; hopefully spamming the information above will help a bit.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88406/new/

https://reviews.llvm.org/D88406