[Lldb-commits] [PATCH] D84402: [lldb/DWARF] Add more line table validation

Fri Jul 24 22:12:55 PDT 2020

MaskRay added a comment.

For ELF, there are non-pic cases (i.e. -no-pie) and pic cases (-pie or -shared). I think it is sufficient just testing -pie (image base is zero). If filtering for -pie works, filter for -no-pie or -shared should work as well.

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:1044
   for (const llvm::DWARFDebugLine::Sequence &seq : line_table->Sequences) {
+    if (!list.ContainsFileAddressRange(seq.LowPC, seq.HighPC - seq.LowPC))
+      continue;
----------------
clayborg wrote:
> labath wrote:
> > clayborg wrote:
> > > dblaikie wrote:
> > > > Could you specifically look for/propagate tombstones here, to reduce the risk of large functions overlapping with the valid address range, even when they're tombstoned to zero? (because zero+large function size could still end up overlapping with valid code)
> > > > 
> > > > To do that well, I guess it'd have to be implemented at a lower-layer, inside the line table state machine - essentially dropping all lines derived from a "set address" operation that specifies a tombstone.
> > > Just checking if the section lists contains an address doesn't help weed out addresses that were tombstoned to zero since our PT_LOAD[0] will almost always contain zero for shared libraries. It might be nice to make a list of addresses that come from sections with read + execute permissions and store that in SymbolFileDWARF one time at startup. Then these searches will be faster as we are looking in less ranges, and most likely will not contain address zero. This code will catch the -1 and -2 tombstones, but most linkers I have run into use zero and the tombstone. 
> > > 
> > > If our algorithm only checks sections with no subsections and then makes a list of file addresses for and section ranges for those, we should have a great list. The entire PT_LOAD[0] will usually be mapped read + execute, so we can't just check top level sections for ELF. Mach-o also has this issue __TEXT in mac is also mapped read + execute and usually contains zero for shared libraries, but since the sections must come after the mach-o header, the sections within the __TEXT segment have correct permissions and would work, just like they would for ELF.
> > You're right -- this would not handle shared libraries with base zero.
> > 
> > I am slightly uneasy about requiring executable permissions for all line tables. While it does not seem terribly useful to have line tables for non-executable code, if someone does have a line table for it for whatever reason (maybe he wants to make it executable at runtime?) it would be a shame not to display it. Also the choice of using section rather than segment permissions feels slightly arbitrary (although I could make a case for it), as it's the segment permissions which will actually define the runtime memory permissions.
> > 
> > Since this is really about filtering out (near) zero addresses, how about we make that explicit? Find the lowest executable (section) address and reject anything below that? Additionally, I'd also reject all addresses which are completely outside of the module range, as those not going to get used for anything, and they are generating bogus line-table dumps.
> > 
> > What do you think?
> > 
> > David: The -1 tombstones are already sort of handled in llvm (and in lldb since D83957). They are "handled" in the sense that the all sequences with and end PC lower than the start PC are rejected (and line sequences starting with (unsigned)-1 will definitely wrap). This is trying to do something about the zero tombstones.
> > Since this is really about filtering out (near) zero addresses, how about we make that explicit? Find the lowest executable (section) address and reject anything below that? Additionally, I'd also reject all addresses which are completely outside of the module range, as those not going to get used for anything, and they are generating bogus line-table dumps.
> > 
> > What do you think?
> 
> That will work for me. My main goal is to get anything that should have been dead stripped out from appearing in results for line lookups or function lookups. The quicker we can short circuit these cases the better for performance. We can also use this when we try to lookup functions and don't return any matches for functions whose start address falls below this value.
> 
> 
There is a concern about ContainsFileAddressRange's performance.

FindSectionContainingFileAddress iterates all sections and can be slow when the number of sections is large.

>  if someone does have a line table for it for whatever reason (maybe he wants to make it executable at runtime?) it would be a shame not to display it.

+1

Instead of a [lowpc,highpc) range check, I wonder whether we should just filter out lowpc. We don't seem to benefit much from a range check. A data point is that gdb filters with the CU address range and only checks lowpc (https://sourceware.org/git/?p=binutils-gdb.git;a=blobdiff;f=gdb/dwarf2/read.c;h=405b5fb3348c94aad10e3bb40f393137ddb0759c;hp=4622d14a05c6819b482c9c97c14a14755876aa72;hb=a8caed5d7faa639a1e6769eba551d15d8ddd9510;hpb=33d1369f183f1c276e3f0f52b5573fb2f5843b1c )
If the CU's lowpc is 0, it will allow a line sequence at address 0, otherwise disallow it. There is some bare metal usage with zero address.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84402/new/

https://reviews.llvm.org/D84402