[Lldb-commits] [PATCH] D119963: [LLDB] Dump valid ranges of variables

Thu Feb 24 05:33:48 PST 2022

labath added inline comments.

================
Comment at: lldb/source/Core/Address.cpp:739
-              s->PutCString(", location = ");
-              var_sp->DumpLocationForAddress(s, *this);
-              s->PutCString(", decl = ");
----------------
This place was the only caller of Variable::DumpLocationForAddress. What if we, instead of duplicating its logic here, just change the function to do what we want?
The interface could be as simple as `Variable::DumpLocations(Stream&, Address)` where, if one provides an invalid Address, then all locations get dumped, and one can request a specific location by providing a concrete address.
We might even do something similar for the DWARFExpression class, and avoid the filter callbacks completely.

================
Comment at: lldb/source/Expression/DWARFExpression.cpp:2751-2753
       addr_t slide = load_function_start - m_loclist_addresses->func_file_addr;
-      loc->Range->LowPC += slide;
-      loc->Range->HighPC += slide;
+      loc.Range->LowPC += slide;
+      loc.Range->HighPC += slide;
----------------
Could the sliding happen inside `GetLocationExpressions`, so that we don't have to  repeat it in each callback?

================
Comment at: lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s:28-29
 # CHECK:     Variable{{.*}}, name = "x0", {{.*}}, scope = parameter, location =
 # CHECK-NEXT:  [0x0000000000000000, 0x0000000000000001): DW_OP_reg5 RDI
 # CHECK-NEXT:  [0x0000000000000001, 0x0000000000000006): DW_OP_reg0 RAX
 # CHECK:     Variable{{.*}}, name = "x1", {{.*}}, scope = parameter
----------------
zequanwu wrote:
> labath wrote:
> > zequanwu wrote:
> > > labath wrote:
> > > > zequanwu wrote:
> > > > > labath wrote:
> > > > > > zequanwu wrote:
> > > > > > > labath wrote:
> > > > > > > > zequanwu wrote:
> > > > > > > > > `image dump symfile` already prints valid ranges for variables along with where the value is at each range.
> > > > > > > > Are you sure it does?
> > > > > > > > 
> > > > > > > > I was under the impression that there are two distinct range concepts being combined here. One is the range list member of the Variable object (as given by `GetScopeRange` -- that's the one you're printing now), and the other is the list of ranges hidden in the DWARFExpression object, which come from the debug_loc(lists) section (that's the one we've been printing so far). And that the root cause of the confusion is the very existence of these two concepts.
> > > > > > > > 
> > > > > > > > If I got it wrong, then do let me know, cause it would make things a lot simpler if there is only one validity concept to think about.
> > > > > > > Dwarf plugin is supposed to construct the `m_scope_range` member of an Variable, but it doesn't. `scope_ranges` is empty at https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp#L3468. 
> > > > > > > `image dump symfile` dumps the dwarf location list in `m_location` in `Variable`. 
> > > > > > > The dwarf location list has more information than `m_scope_range` as it contains info about where the value is during each range. (e.g. which register the variable lives in). 
> > > > > > > 
> > > > > > > So, I think we need to use similar logic to construct `m_scope_range` when creating `Variable` in dwarf plugin like this https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp#L145.
> > > > > > Ok, I see where you're coming from. You're essentially saying that the fact that the dwarf plugin does not fill this out is a bug.
> > > > > > 
> > > > > > I don't think that's the case. My interpretation was (and [[ https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/Variable.cpp#L313 | this comment]] confirms it) that an empty range here means the entire enclosing block. (Also, DWARF was for a long time the only symbol file plugin, so what it does is kinda "correct by definition").
> > > > > > 
> > > > > > I don't think we want to change that interpretation, as forcing a copy of the range in the location list would be wasteful (it would be different if this was an interface that one could query, and that the dwarf plugin could implement by consulting the location list). However, since the dwarf class does not actually make use of this functionality (it was [[ https://reviews.llvm.org/D17449 | added ]] to support DW_AT_start_scope, then broken at some point, and eventually [[ https://reviews.llvm.org/D62302 | removed ]]), we do have some freedom in defining the interactions of the two fields (if you still want to pursue this, that is).
> > > > > > 
> > > > > > So how about this: if the user passes the extra flag, then we print both the range field (if it exists), and the *full* location list (in that order, ideally). That way the output will be either `range = [a, b), [c, d), location = DW_OP_reg47` or `location = [a,b) -> DW_OP_reg4, [c,d) -> DW_OP_reg7`. If the dwarf plugin starts using the range field again then the output will contain both fields, which will be slightly confusing, but at least not misleading (and we can also change the format then).
> > > > > Oh, I think I misunderstood `m_scope_range`. It's the range list where the variable is valid regardless whether its value is accessible or not (valid range). As for `m_location` in `Variable`, it's describing the ranges where the value is (value range). They are not the same. 
> > > > > 
> > > > > So, currently how NativePDB creates local Variable's range is not correct. That only works when it's not optimized build such that the valid range is the same as the value range. It still need to create dwarf location lists to correctly represent the value range, but as mentioned [[ https://reviews.llvm.org/D119508#3319113 | here ]], we need to choose a generic "variable location provider" interface for that.
> > > > > 
> > > > > Oh, I think I misunderstood m_scope_range. It's the range list where the variable is valid regardless whether its value is accessible or not (valid range). As for m_location in Variable, it's describing the ranges where the value is (value range).
> > > > 
> > > > Yes, that was my initial assumption as well, and I think that is the only interpretation in which it makes sense to have two sources of range information for a variable. However, I've done some research since then, and I haven't found any compiler or debugger which would model the program sufficiently precisely to be able to make that distinction.
> > > > 
> > > > There are definite limits as to how far you can go with pdb using these abstractions, but given they (the m_scope_range) exist, I think you could make use of them (as you've done now), if they are sufficient for your current use case. That said, I would definitely encourage you to create a better abstraction for providing the location information for a variable.
> > > I think you meant to replace `DWARFExpression` with a more generic interface which has the same functionalities as `DWARFExpression`. That seems a lot work, especially on `DWARFExpression::Evaluate`. 
> > Well.. not exactly "replace". You know how they say there's no software engineering problem that can't be solved by adding a layer of indirection. So, I thought we could create a new `VariableLocationProvider` (for lack of a better name) interface, and one of the implementations of that interface would be backed by a DWARFExpression class. Theoretically you may not need to touch the DWARFExpression class at all -- just wrap it so that it conforms to the new interface.
> > 
> > This is still pretty hand-wavy, so I don't know how much work would it be, but it does not seem like it should be _that_ hard..
> Thanks, I see what you mean. I thought it would be each symbol file plugin needs to convert their debug formats to a universal one and the actual evaluation will happens in that one.
It's true that how most of our debug parsing works by converting the debug info into a generic format, but that is something I would want to change. It is really easy to construct the generic format from dwarf debug info (because it was designed that way), but the other symbol file plugins are often struggling to fit into this model. They often have the data that lldb needs easily accessible, just not in the format that lldb can accept, and so they often have to create copies just to put it into the required format. To some extent, this is also what happened with the dwarf, as we started using some of the llvm libraries to parse the debug info.

So overall, I think it would be better to just prescribe an interface through which one can query e.g. the line tables, and leave it up to the plugins to choose a suitable data structure to answer those queries. And I can't think of a better example for why is this needed than dwarf expression evaluation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119963/new/

https://reviews.llvm.org/D119963