[PATCH] D136320: [Assignment Tracking Analysis][1/*] Add analysis pass core

Jeremy Morse via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Nov 20 12:50:47 PST 2022


jmorse added a comment.

I'm in as far as line 900, hitting submit to clear the queue of comments just in case I don't get back to this. Random remarks:

NB: the locs_begin and single_locs_begin ranges could also have a locs() and single_locs() methods that return `iterator_range` objects made with `make_range`, that allows you to use range-based for loops. On the other hand, there's only two places in this patch that they're used, so it might not be a big deal.

Using the term "location" everywhere initially made me twitch as in the IR we have Values... but then again the whole point of assignment tracking is that the location can sometimes be the stack, where the Value isn't know.

Why does NotAlwaysStackHomed ignore fragments -- can't SROA cut up variables into parts, some of which could be partially promoted and mutated, others of which aren't? Perhaps it's an approximation where any partially promoted variable gets explicitly tracked for all fields (this seems fine).



================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:1-2
+#ifndef LLVM_LIB_CODEGEN_ASSIGNMENTTRACKINGANALYSIS_H
+#define LLVM_LIB_CODEGEN_ASSIGNMENTTRACKINGANALYSIS_H
+
----------------
meganit, s/LIB/INCLUDE/, or something


================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:8-11
+namespace llvm { class Function; }
+namespace llvm { class Instruction; }
+namespace llvm { class Value; }
+namespace llvm { class raw_ostream; }
----------------
Is this convention, to close and open the namespace each time?


================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:33
+  /// change occurs before (see VarLocsBeforeInst). The elements from
+  /// zero to SingleVarLocEnd represent variable with a single location.
+  SmallVector<VarLocInfo> VarLocRecords;
----------------



================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:35
+  SmallVector<VarLocInfo> VarLocRecords;
+  // End of range of VarLocRecords that represent variables with a single
+  // location that is valid for the entire scope. Range starts at 0.
----------------



================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:38-41
+  // Maps a range of VarLocRecords to the position before a particular
+  // instruction.
+  DenseMap<const Instruction *, std::pair<unsigned, unsigned>>
+      VarLocsBeforeInst;
----------------
Shouldn't the comment wording be inverted -- it maps a position to a range, no?


================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:55-57
+  DebugVariable getVariable(VariableID ID) const {
+    return Variables[static_cast<unsigned>(ID)];
+  }
----------------
I'd suggest returning a (const?) reference unless there's a real need to return a temporary.


================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:65
+  const VarLocInfo *single_locs_end() const {
+    return VarLocRecords.begin() + SingleVarLocEnd;
+  }
----------------
I recommend std::advance to do exactly the same thing, but making it very clear to the reader that you're messing with iterators, rather than having to consider that there's pointer arithmetic happening. Here and elsewhere.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:1
+#include "llvm/CodeGen/AssignmentTrackingAnalysis.h"
+#include "llvm/ADT/DenseMapInfo.h"
----------------
IIRC clang-format has an opinion on the order of includes


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:77-79
+  DebugVariable getVariable(VariableID ID) const {
+    return Variables[static_cast<unsigned>(ID)];
+  }
----------------
Return a const reference instead?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:83
+  /// Before.
+  const SmallVector<VarLocInfo> *getWedge(const Instruction *Before) const {
+    auto R = VarLocsBeforeInst.find(Before);
----------------
Recommend returning a SmallVectorImpl pointer, which I think is a superclass of SmallVector. There's no practical difference but it avoids implicitly encoding the size of the vector in the method information.

(I get the feeling this wants to be std::optional too, however you can't have optional references iirc, so it's probably a bad idea).


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:90-94
+  /// Replace the defs that come just before /p Before with /p Wedge.
+  void setWedge(const Instruction *Before,
+                const SmallVector<VarLocInfo> &Wedge) {
+    VarLocsBeforeInst[Before] = Wedge;
+  }
----------------
Given the usage of this method, consider an rvalue / std::moveable method too, that might save some un-necessary memory allocations.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:104
+    VarLoc.V = V;
+    SingleLocVars.push_back(VarLoc);
+  }
----------------
I want to say "use emplace_back", but I can't imagine it makes much difference. Up to you.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:164
+
+void FunctionVarLocs::init(FunctionVarLocsBuilder *Builder) {
+  // Add the single-location variables first.
----------------
Reference argument instead of pointer? It's unconditionally dereferenced.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:175
+    unsigned BlockStart = VarLocRecords.size();
+    for (auto VarLoc : P.second)
+      VarLocRecords.emplace_back(VarLoc);
----------------
`auto &` or you'll generate a temporary, I think.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:179
+    // Record the start and end indices.
+    if (BlockEnd - BlockStart != 0)
+      VarLocsBeforeInst[P.first] = {BlockStart, BlockEnd};
----------------



================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:204
+                                                Value *Start,
+                                                DIExpression **Expression) {
+  APInt OffsetInBytes(DL.getTypeSizeInBits(Start->getType()), false);
----------------
Passing in a reference to a pointer might be slightly neater -- depends whether the explicit dereferencing of Expression is designed to signal to the reader that there's a side-effect. An alternative would be returning a {Value,Expression} pair. This is slightly stylistic, up to you.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:211-212
+    Ops = {dwarf::DW_OP_plus_uconst, OffsetInBytes.getZExtValue()};
+  *Expression = DIExpression::prependOpcodes(
+      *Expression, Ops, /*StackValue=*/false, /*EntryValue=*/false);
+  *Expression = DIExpression::append(*Expression, {dwarf::DW_OP_deref});
----------------
This prepending could be put inside the conditional, yes?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:237-239
+  /// join(Mem, Val)  = None
+  /// join(None, Mem) = Mem
+  /// join(None, Val) = Val
----------------
This doesn't seem to be how `joinKind` implements it; additionally if we can switch from {Mem,Val} to None, and then {None,Mem} to Mem, doesn't that mean there can be a "Val" in a predecessor block but this isn't reflected in the lattice value.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:244
+  /// Therefore currently there's no way to ensure that Mem values and Val
+  /// values are the same. This could be a future extension, though I'm not
+  /// sure that many additional locations would be recovered that way in
----------------
I'm not sure -> it's not clear, to avoid the reader questioning "who?"


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:267-272
+    /// The dbg.assign that marks this dbg-def. Mem-defs don't use this field.
+    /// This is used to lookup the value + expression in the debug program if
+    /// the stack slot gets assigned a value earlier than expected. Because
+    /// we're only tracking the one dbg.assign, we can't capture debug PHIs.
+    /// It's unlikely that we're losing out on much coverage by avoiding that
+    /// extra work.
----------------
IMO: too much detail about the algorithm for just a field, better to have that detail in a function, and just the first line documenting the field.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:311
+      // If the Status is Known then we expect there to be an assignment ID.
+      assert(Status != Known || ID != nullptr);
+    }
----------------
IMHO, simpler to read if it's `Status == NoneOrPHI || ID`, YMMV.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:324-325
+  // Machinery to defer inserting dbg.values.
+  using InsertMap = MapVector<Instruction *, MapVector<VariableID, VarLocInfo>>;
+  InsertMap InsertBeforeMap;
+  void emitDbgValue(LocKind Kind, const DbgVariableIntrinsic *Source,
----------------
A map of maps sounds expensive -- do both keys of this really need to be randomly accessed, or could they instead be inserted and sorted at a later date? (I haven't read far down enough to get a grip of how the container is used.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:360
+    /// that we can't deduce the LocKind of a variable by just looking at the
+    /// two maps.
+    LocMap LiveLoc;
----------------
This is another one of those places where I feel the word "dominance frontier" can be inserted to a) make ourselves feel clever, and b) actually disambiguate what's going on. i.e., "NoneOrPhi indicates whether there is a single dominating definition which can be found in StackHomeValue or DebugValue". Or something like that? (Might not be right).


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:383-384
+  FunctionVarLocsBuilder *FnVarLocs;
+  DenseMap<const BasicBlock *, BlockInfo> LiveIn;
+  DenseMap<const BasicBlock *, BlockInfo> LiveOut;
+
----------------
These translate to maps of maps again, which risks an expensive reallocation. There isn't necessarily anything to do about this if that's not triggered by the workloads that actaully exist.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:392
+
+  VariableID getVariableID(DebugVariable Var) {
+    return static_cast<VariableID>(FnVarLocs->insertVariable(Var));
----------------
`const &` argument?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:413-414
+  ///
+  /// For the map-type functions, all unmapped keys are assumed to be
+  /// associated with a top value (⊤).
+  ///@{
----------------
This wants explaining why -- it's because the top value represents "don't know", right?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:425-426
+  /// Process the instructions in \p BB updating \p LiveSet along the way. \p
+  /// LiveSet must be initialized with the current live-in locations before
+  /// calling this.
+  void process(BasicBlock &BB, BlockInfo *LiveSet);
----------------
If it's initialized, pass by reference instead?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:434
+  void processDbgInstruction(Instruction &I, BlockInfo *LiveSet);
+  /// Update \p LiveSet after encountering an instruciton with a DIAssignID
+  /// attachment, \p I.
----------------
and below


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:443
+  /// Add an assignment to memory for the variable /p Var.
+  void addMemDef(BlockInfo *LiveSet, VariableID Var, Assignment AV);
+  /// Add an assignment to the variable /p Var.
----------------
As ever, default to passing the BlockInfo by reference, and AV as const reference, just for simplicity and to avoid un-necessary locals? And below.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:474
+
+  // Update the LocKind for all fragments contained within Var.
+  for (VariableID Frag : VarContains[Var])
----------------
Var itself is a fragment, right? Small risk that the reader thinks it's a DILocalVariable? (Our terminology doesn't help).


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:492-493
+    LiveSet->DebugValue.insert({Var, Assignment::makeNoneOrPhi()});
+    // Add (Var -> ⊤) to LiveLocs if Var isn't in LiveLocs yet.
+    LiveSet->LiveLoc.insert({Var, LocKind::None});
+  };
----------------
This slightly threw me; the "None" value being set isn't important because everything that calls addMemDef calls setLocKind too, right? If so, best to document this in a comment please.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:564-567
+    // It's possible that getVariableLocationOp(0) is null. Occurs in
+    // llvm/test/DebugInfo/Generic/2010-05-03-OriginDIE.ll Treat it as undef.
+    if (!Val)
+      Val = UndefValue::get(Type::getInt1Ty(Source->getContext()));
----------------
If it's a scenario that llvm will never generate nowadays, and the test is testing something unrelated, might be easier to fix the test rather than make the actual code suffer for the past.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:570
+    // Find a suitable insert point.
+    Instruction *InsertBefore = After->getNextNode();
+    assert(InsertBefore && "Shouldn't be inserting after a terminator");
----------------
Hhrrrrmmmm. Using `getNextNode` makes me twitch on account of it being the source of various debug-affects-codegen problems in the past, or hard-to-unpick debug behaviours. In this context it's certainly the right thing to use though (fry-eyes.jpg)


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:634
+
+void AssignmentTrackingLowering::processUntaggedInstruction(
+    Instruction &I, AssignmentTrackingLowering::BlockInfo *LiveSet) {
----------------
This is all fine; IMO there needs to be a moderately detailed comment about what this is doing at the overall-algorithm level, i.e. "interpret stack stores that are not tagged as an assignment in memory because [blah]". I remember this from past discussions, IMO it should be stated at the outset what this method is aiming to do.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:647
+    // Alloca or non-store-like inst.
+    return (Optional<at::AssignmentInfo>)None;
+  }();
----------------
Is this cast necessary? Might be better to put a trailing type on the lambda to make it un-necessary? Any particular need for a lambda instead of something else?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:678
+    // meaning the memory location should be used. We don't have an assignment
+    // ID though so use Assignment::makeNoneOrPhi() to crate an imaginary one.
+    addMemDef(LiveSet, Var, Assignment::makeNoneOrPhi());
----------------



================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:715
+
+void AssignmentTrackingLowering::processTaggedInstruction(
+    Instruction &I, AssignmentTrackingLowering::BlockInfo *LiveSet) {
----------------
This looks good; I suspect I'd find tests highly valuable in this situation to understand exactly the behaviour that's being created, I understand they're in a later patch though.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:750
+      setLocKind(LiveSet, Var, LocKind::Mem);
+      emitDbgValue(LocKind::Mem, DAI, &I);
+    } else {
----------------
early-continue here will save a level of indentation for the rest of the loop.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:824
+          dbgs() << "Val, Stack matches Debug program but address is undef\n";);
+      setLocKind(LiveSet, Var, LocKind::Val);
+    } else {
----------------
It looks slightly out of place that the LocKind is recorded as a Val, but then emitDbgValue is used with LocKind::Mem -- I suppose the problem is that I don't know what's meant by elision of the original store. What are the effects of this scenario {LocKind==Val,emitDbgValue(Mem...)}?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:837
+  }
+}
+void AssignmentTrackingLowering::processDbgValue(DbgValueInst &DVI,
----------------
\n


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:846-848
+  // We have no ID to create an Assignment with so we must mark this
+  // assignment as NoneOrPhi. Note that the dbg.value still exists, we just
+  // cannot determine the assignment responsible for setting this value.
----------------
Would I be right in thinking that these dbg.values turn up because of promotion, and thus the dbg.values are "normally" attached to PHIs? If so, is it possible to assert that fact. If not maybe the comment could be reworded to give the impression that this isn't an omission in the design, it's the natural effect of the design. (I think the term "we must..." makes me feel like it's a defect).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136320/new/

https://reviews.llvm.org/D136320



More information about the llvm-commits mailing list