[PATCH] D136320: [Assignment Tracking Analysis][1/*] Add analysis pass core

Thu Oct 20 01:04:36 PDT 2022

Orlando created this revision.
Orlando added a project: debug-info.
Herald added subscribers: nlopes, mgrang, hiraditya.
Herald added a project: All.
Orlando requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

This patch stack implements the assignment tracking analysis. This patch contains the main body, but there are unfortunately a few more large code blobs to follow.

The problem and goal
--------------------

Using the Assignment Tracking "model" it's not possible to determine a variable location just by looking at a debug intrinsic in isolation. Instructions without any metadata can change the location of a variable. The meaning of dbg.assign intrinsics changes depending on whether there are linked instructions, and where they are relative to those instructions. So we need to analyse the IR and convert the embedded information into a form that SelectionDAG can consume to produce debug variable locations in MIR.

The core of the solution is a dataflow analysis which, aiming to maximise the memory location coverage for variables, outputs a mapping of instruction positions to variable location definitions.

High level overview and API
---------------------------

`AssignmentTrackingAnalysis` is a pass that analyses IR to produce a mapping of instruction positions to variable location definitions. The results are encapsulated by the `FunctionVarLocs` class.

The pass is integrated with LLVM in this patch but the analysis is not used yet. A future patch updates SelectionDAG separately.

The results of the analysis are exposed via `getResults` using the returned `const FunctionVarLocs *`'s const methods:

  const VarLocInfo *single_locs_begin() const;
  const VarLocInfo *single_locs_end() const;
  const VarLocInfo *locs_begin(const Instruction *Before) const;
  const VarLocInfo *locs_end(const Instruction *Before) const;
  void print(raw_ostream &OS, const Function &Fn) const;

Debug intrinsics can be ignored after running the analysis. Instead, variable location definitions that occur between an instruction `Inst` and its predecessor (or block start) can be found by looping over the range:

  locs_begin(Inst), locs_end(Inst)

Similarly, variables with a memory location that is valid for their lifetime can be iterated over using the range:

  single_locs_begin(Inst), single_locs_end(Inst)

Dataflow high level details
---------------------------

The analysis itself is a standard fixed point dataflow algorithm that traverses the CFG using a worklist that is initialised with every block in reverse post order. It computes a result for each visited block that is used to compute the result of successor blocks. Each time the result changes for a block its successors are added to the worklist if not already present. The analysis terminates when the result of every block is stable. Care has been taken to ensure that the merging of information from predecessor blocks yields a result that changes monotonically.

For each block we track "live-in" (`LiveIn`) and "live-out" (`LiveOut`) results. The former represents the currently known input to a block, which is the merged (`join`) result of the live-outs of visited predecessors (empty for the entry block). The live-in set is copied to create a working set for the block (`LiveSet`). The working set is modified as each instruction in the block is processed (`process`). After processing the last instruction in the block, the working set is the live-out result for the block. The "results" are `BlockInfo` objects. These encode assignments to memory and to variables, and track whether each variable's memory location is a good debug location for the variable or not. The actual variable location information (concrete implicit location value, or memory address) is stored off to the side in `InsertBeforeMap`, which is used after the dataflow is complete to build the instruction -> location definition mapping.

Patch tour
----------

Here's a high-level call-graph that hopefully helps patch navigation.

  +-runOnFunction
    +-analyzeFunction
      +-run
        +-process
        | +-processNonDbgInstruction
        | | +-processTaggedInstruction
        | | +-processUntaggedInstruction
        | |  
        | +-processDbgInstruction
        |   +-processDbgAssign
        |   +-processDbgValue
        |   
        +-join
          +-joinBlockInfo
            +-joinLocMap
            | +-joinKind
            |
            +-joinAssignmentMap
              +-joinAssignment

`AssignmentTrackingLowering::run` (just `run` above) is where the dataflow starts. Most of this function is dedicated to initialize helper structures and setup worklist traversal scaffolding. The important functions called from here are `join` and `process`.

It's probably easier to start with `join` as it will result in an understanding of the types involved, giving `process` more meaning. `join` is responsible for merging the live-outs of predecessors. See the docu-comment at the forward declaration in the class definition. `join` calls other `joinXYZ` methods and those call another set, working on merging every element of `BlockInfo`.

`BlockInfo` is made up of 3 maps.

  LocMap LiveLoc;
  AssignmentMap StackHomeValue;
  AssignmentMap DebugValue;

`LiveLoc` maps variables to `LocKind`, which describes the current kind of location for each variable.
`StackHomeValue` maps variables the last `Assignment` to its stack slot (N.B. looking at this now, maybe it should be keyed by address rather than variable - this can come later as a refactor if necessary as it will likely need changing with one of the TODO list items (in D132220 <https://reviews.llvm.org/D132220>)).
`DebugValue` maps variables to the last `Assignment` to the variable.

`process` is where instructions in a block are analysed. The important functions here are `addMemDef`, `addDbgDef`, `setLocKind`, and `emitDbgValue`. All the leaf `process` functions call these so I didn't add them to the call graph map. A call to `addMemDef` states a store with a given `ID` to a variable fragments's memory location has occurred . Similarly, `addDbgDef` states an assignment with an `ID` to a fragment of a variable has occurred. When the variable's memory location assignment and the debug assignment "match" a variable location definition describing the memory location is emitted. Otherwise, an appropriate implicit location value is chosen. `setLocKind` sets whether the current variable location for the variable is `Mem`, `Val` or `None` and `emitDbgValue` saves the location to `InsertBeforeMap`.

The analysis tracks locations for each fragment of each variable that has a definition (/is used in a debug intrinsic). `addMemDef`, `addDbgDef`, and `setLocKind` apply their changes to all fragments contained fully within the one passed in. So, an assignment to bits [0, 64) of a variable is noted for bits [0, 32) too.

I'm aware this patch is large and that tour is not. Hopefully it gives reviewers a good starting point though. Please don't hesitate to ask questions!

---

Tests are coming in a separate patch.

https://reviews.llvm.org/D136320

Files:
  llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h
  llvm/include/llvm/InitializePasses.h
  llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
  llvm/lib/CodeGen/CMakeLists.txt
  llvm/lib/CodeGen/CodeGen.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D136320.468901.patch
Type: text/x-patch
Size: 64325 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221020/c2dbe901/attachment-0001.bin>