[PATCH] D136320: [Assignment Tracking Analysis][1/*] Add analysis pass core

Orlando Cazalet-Hyams via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 22 09:25:33 PST 2022


Orlando added a comment.

> Ah, yes, sorry, my spam bot is very indiscriminate..

Great, thanks! & no worries.

---

Thanks @jmorse for the detailed review - that should be all the inline comments addressed (or responded-to) but please do ping if I missed any.

> NB: the locs_begin and single_locs_begin ranges could also have a locs() and single_locs() methods that return iterator_range objects made with make_range, that allows you to use range-based for loops. On the other hand, there's only two places in this patch that they're used, so it might not be a big deal.

Initially I opted to avoid adding more code, happy to change that if you prefer.

> Using the term "location" everywhere initially made me twitch as in the IR we have Values... but then again the whole point of assignment tracking is that the location can sometimes be the stack, where the Value isn't know.



> Why does NotAlwaysStackHomed ignore fragments -- can't SROA cut up variables into parts, some of which could be partially promoted and mutated, others of which aren't? Perhaps it's an approximation where any partially promoted variable gets explicitly tracked for all fields (this seems fine).

Yeah it is an approximation - we only consider variables that are wholly located in one alloca as ones to put in the MachineFunction stacked-homed-variable side table. SROAd variables that you described above will fall back to using location lists (which may include stack locations).



================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:1-2
+#ifndef LLVM_LIB_CODEGEN_ASSIGNMENTTRACKINGANALYSIS_H
+#define LLVM_LIB_CODEGEN_ASSIGNMENTTRACKINGANALYSIS_H
+
----------------
jmorse wrote:
> meganit, s/LIB/INCLUDE/, or something
Changed to copy other headers in include/CodeGen.


================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:8-11
+namespace llvm { class Function; }
+namespace llvm { class Instruction; }
+namespace llvm { class Value; }
+namespace llvm { class raw_ostream; }
----------------
jmorse wrote:
> Is this convention, to close and open the namespace each time?
Ah, nope, that was include-what-you-use. Fixed.


================
Comment at: llvm/include/llvm/CodeGen/AssignmentTrackingAnalysis.h:38-41
+  // Maps a range of VarLocRecords to the position before a particular
+  // instruction.
+  DenseMap<const Instruction *, std::pair<unsigned, unsigned>>
+      VarLocsBeforeInst;
----------------
jmorse wrote:
> Shouldn't the comment wording be inverted -- it maps a position to a range, no?
It made sense to me but happy to change it.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:1
+#include "llvm/CodeGen/AssignmentTrackingAnalysis.h"
+#include "llvm/ADT/DenseMapInfo.h"
----------------
jmorse wrote:
> IIRC clang-format has an opinion on the order of includes
clang-format was happy with (/reinstated) this order (perhaps xxx.h is allowed at the top of xxx.cpp).


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:90-94
+  /// Replace the defs that come just before /p Before with /p Wedge.
+  void setWedge(const Instruction *Before,
+                const SmallVector<VarLocInfo> &Wedge) {
+    VarLocsBeforeInst[Before] = Wedge;
+  }
----------------
jmorse wrote:
> Given the usage of this method, consider an rvalue / std::moveable method too, that might save some un-necessary memory allocations.
I think the next patches in the stack have similar usage so I've just removed the copying version.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:204
+                                                Value *Start,
+                                                DIExpression **Expression) {
+  APInt OffsetInBytes(DL.getTypeSizeInBits(Start->getType()), false);
----------------
jmorse wrote:
> Passing in a reference to a pointer might be slightly neater -- depends whether the explicit dereferencing of Expression is designed to signal to the reader that there's a side-effect. An alternative would be returning a {Value,Expression} pair. This is slightly stylistic, up to you.
> signal to the reader that there's a side-effect

That was the intention, but I prefer yout pair idea. Done.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:237-239
+  /// join(Mem, Val)  = None
+  /// join(None, Mem) = Mem
+  /// join(None, Val) = Val
----------------
jmorse wrote:
> This doesn't seem to be how `joinKind` implements it; additionally if we can switch from {Mem,Val} to None, and then {None,Mem} to Mem, doesn't that mean there can be a "Val" in a predecessor block but this isn't reflected in the lattice value.
This comment is wrong / out of date, sorry. Fixed.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:267-272
+    /// The dbg.assign that marks this dbg-def. Mem-defs don't use this field.
+    /// This is used to lookup the value + expression in the debug program if
+    /// the stack slot gets assigned a value earlier than expected. Because
+    /// we're only tracking the one dbg.assign, we can't capture debug PHIs.
+    /// It's unlikely that we're losing out on much coverage by avoiding that
+    /// extra work.
----------------
jmorse wrote:
> IMO: too much detail about the algorithm for just a field, better to have that detail in a function, and just the first line documenting the field.
Moved the comment body into `joinAssignment`.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:324-325
+  // Machinery to defer inserting dbg.values.
+  using InsertMap = MapVector<Instruction *, MapVector<VariableID, VarLocInfo>>;
+  InsertMap InsertBeforeMap;
+  void emitDbgValue(LocKind Kind, const DbgVariableIntrinsic *Source,
----------------
jmorse wrote:
> A map of maps sounds expensive -- do both keys of this really need to be randomly accessed, or could they instead be inserted and sorted at a later date? (I haven't read far down enough to get a grip of how the container is used.
The maps are `{ instruction that comes after : { variable frag instance : location definition }`. The location definitions are added to the map each time they're calculated. Huh - I think this needs to clear the sub-map each before each instruction is visited. And after doing that, I can see that we don't need the sub-map at all. Replaced and added `resetInsertionPoint`. Good spot, thanks.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:360
+    /// that we can't deduce the LocKind of a variable by just looking at the
+    /// two maps.
+    LocMap LiveLoc;
----------------
jmorse wrote:
> This is another one of those places where I feel the word "dominance frontier" can be inserted to a) make ourselves feel clever, and b) actually disambiguate what's going on. i.e., "NoneOrPhi indicates whether there is a single dominating definition which can be found in StackHomeValue or DebugValue". Or something like that? (Might not be right).
That code comment is indeed confusing at best. I've had a go at rewording it, let me know what you think. Happy to make further changes.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:383-384
+  FunctionVarLocsBuilder *FnVarLocs;
+  DenseMap<const BasicBlock *, BlockInfo> LiveIn;
+  DenseMap<const BasicBlock *, BlockInfo> LiveOut;
+
----------------
jmorse wrote:
> These translate to maps of maps again, which risks an expensive reallocation. There isn't necessarily anything to do about this if that's not triggered by the workloads that actaully exist.
I'm not sure that these ones can be avoided. These are at least initialized to a size equal to the number of basic blocks, down in `AssignmentTrackingLowering::run`.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:413-414
+  ///
+  /// For the map-type functions, all unmapped keys are assumed to be
+  /// associated with a top value (⊤).
+  ///@{
----------------
jmorse wrote:
> This wants explaining why -- it's because the top value represents "don't know", right?
I've elaborated a bit. Is this ok or does it need more?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:425-426
+  /// Process the instructions in \p BB updating \p LiveSet along the way. \p
+  /// LiveSet must be initialized with the current live-in locations before
+  /// calling this.
+  void process(BasicBlock &BB, BlockInfo *LiveSet);
----------------
jmorse wrote:
> If it's initialized, pass by reference instead?
I chose to use a pointer to hint at call sites that it is modified. If that is not compelling (or the hint is not doing its job) I can change it - just thought I'd offer up my rationale first in case it made a difference.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:443
+  /// Add an assignment to memory for the variable /p Var.
+  void addMemDef(BlockInfo *LiveSet, VariableID Var, Assignment AV);
+  /// Add an assignment to the variable /p Var.
----------------
jmorse wrote:
> As ever, default to passing the BlockInfo by reference, and AV as const reference, just for simplicity and to avoid un-necessary locals? And below.
(`process` reply applies - made the `const&` change though)


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:474
+
+  // Update the LocKind for all fragments contained within Var.
+  for (VariableID Frag : VarContains[Var])
----------------
jmorse wrote:
> Var itself is a fragment, right? Small risk that the reader thinks it's a DILocalVariable? (Our terminology doesn't help).
Yes (if by "is a fragment" you mean is an ID that identifies a `{DILocalVariable, FragmentInfo, InlinedAt}` tuple aka a `DebugVariable`, which indeed includes fragment information). I don't think I've deviated massively from common naming practices but I agree that our terminology isn't necessarily helpful here. `DebugVariable` should probably be called something like `VariableInstanceFragment` (but that is outside the scope of this patch of course). Do you have any suggestions for `Var` / `VariableID`?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:492-493
+    LiveSet->DebugValue.insert({Var, Assignment::makeNoneOrPhi()});
+    // Add (Var -> ⊤) to LiveLocs if Var isn't in LiveLocs yet.
+    LiveSet->LiveLoc.insert({Var, LocKind::None});
+  };
----------------
jmorse wrote:
> This slightly threw me; the "None" value being set isn't important because everything that calls addMemDef calls setLocKind too, right? If so, best to document this in a comment please.
Yeah that's right, in this instance we're just adding `Var` to `LiveSet` if it isn't there already (using `insert`). I'll update the comment


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:564-567
+    // It's possible that getVariableLocationOp(0) is null. Occurs in
+    // llvm/test/DebugInfo/Generic/2010-05-03-OriginDIE.ll Treat it as undef.
+    if (!Val)
+      Val = UndefValue::get(Type::getInt1Ty(Source->getContext()));
----------------
jmorse wrote:
> If it's a scenario that llvm will never generate nowadays, and the test is testing something unrelated, might be easier to fix the test rather than make the actual code suffer for the past.
It still happens - it's what is returned in by `getVariableLocationOp` when the wrapped `Value` is replaced with empty metadata.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:634
+
+void AssignmentTrackingLowering::processUntaggedInstruction(
+    Instruction &I, AssignmentTrackingLowering::BlockInfo *LiveSet) {
----------------
jmorse wrote:
> This is all fine; IMO there needs to be a moderately detailed comment about what this is doing at the overall-algorithm level, i.e. "interpret stack stores that are not tagged as an assignment in memory because [blah]". I remember this from past discussions, IMO it should be stated at the outset what this method is aiming to do.
I've put that here but is there somewhere better for it should live / be copied to, do you think? (& what do you think of the comment?).


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:647
+    // Alloca or non-store-like inst.
+    return (Optional<at::AssignmentInfo>)None;
+  }();
----------------
jmorse wrote:
> Is this cast necessary? Might be better to put a trailing type on the lambda to make it un-necessary? Any particular need for a lambda instead of something else?
Ah, `AssignmentInfo` has no default ctor so we can't do this:
```
AssignmentInfo Info;
if ...
    Info = ....
else if ...
     Info = ...
```

Let me know if you'd prefer that though as I can just add one.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:824
+          dbgs() << "Val, Stack matches Debug program but address is undef\n";);
+      setLocKind(LiveSet, Var, LocKind::Val);
+    } else {
----------------
jmorse wrote:
> It looks slightly out of place that the LocKind is recorded as a Val, but then emitDbgValue is used with LocKind::Mem -- I suppose the problem is that I don't know what's meant by elision of the original store. What are the effects of this scenario {LocKind==Val,emitDbgValue(Mem...)}?
> It looks slightly out of place that the LocKind is recorded as a Val, but then emitDbgValue is used with LocKind::Mem

It does look slightly out of place. This could be a bug - I'll get back to you on this one.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:846-848
+  // We have no ID to create an Assignment with so we must mark this
+  // assignment as NoneOrPhi. Note that the dbg.value still exists, we just
+  // cannot determine the assignment responsible for setting this value.
----------------
jmorse wrote:
> Would I be right in thinking that these dbg.values turn up because of promotion, and thus the dbg.values are "normally" attached to PHIs? If so, is it possible to assert that fact. If not maybe the comment could be reworded to give the impression that this isn't an omission in the design, it's the natural effect of the design. (I think the term "we must..." makes me feel like it's a defect).
They should mostly be attached to PHIs, but can sneak in in other ways too. I'll reword it.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:915
+
+AssignmentTrackingLowering::LocMap
+AssignmentTrackingLowering::joinLocMap(const LocMap &A, const LocMap &B) {
----------------
jmorse wrote:
> Note to self, returning a DenseMap by value is probably fine if it gets NRVO'd.
That's what I'm counting on, but I'm happy to change it if it's unconventional / looks suspicious.


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:932
+  // remaining elements to into SymmetricDifference.
+  for (auto Pair : A) {
+    VariableID Var = Pair.first;
----------------
jmorse wrote:
> `auto &` or you might get storage.
Good point - and changed to use structured binding for the pair while I'm here (and in`joinAssignmentMap`).


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:975
+    return Assignment::makeNoneOrPhi();
+  if (A.Status == Assignment::NoneOrPhi)
+    return Assignment::makeNoneOrPhi();
----------------
jmorse wrote:
> Coming immediately after the leading comment, shouldn't this also check if `A.Status == NoneOrPhi`?
That is checked right here above your review comment. I used separate `if`s for readability, but happy to combine them if this was counterproductive?


================
Comment at: llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:1087
+    else
+      BBLiveIn = joinBlockInfo(BBLiveIn, PredLiveOut->second);
+    FirstJoin = false;
----------------
jmorse wrote:
> /me squints -- I want to say that BBLiveIn being the argument and the return value will lead to some kind of aliasing weirdness or performance slowdown (in the form of un-necessary DenseMap copies). I can't actually put my finger on why that would be forced though. Do you have any feeling for it?
Hmmm, I don't remember this sticking out in the profiles but it was a while ago that I looked at the performance. I have wrapped the argument in a `std::move`, which could help.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136320/new/

https://reviews.llvm.org/D136320



More information about the llvm-commits mailing list