[PATCH] D87656: [llvm-dwarfdump] --show-sources option to show all sources

Mon Oct 19 02:25:55 PDT 2020

jhenderson added inline comments.

================
Comment at: llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:455
+  for (const auto &CU : DICtx.compile_units()) {
+    const auto *LT = DICtx.getLineTableForUnit(CU.get());
+    for (uint32_t I = 1; I <= LT->Prologue.FileNames.size(); ++I) {
----------------
Higuoxing wrote:
> phosek wrote:
> > Higuoxing wrote:
> > > jhenderson wrote:
> > > > phosek wrote:
> > > > > probinson wrote:
> > > > > > jhenderson wrote:
> > > > > > > Higuoxing wrote:
> > > > > > > > jhenderson wrote:
> > > > > > > > > I think we need testing for multiple CUs. The current test only checks a single one. This might go against the yaml2obj usage suggested above though (@Higuoxing, is there support for multiple tables in .debug_line yet?).
> > > > > > > > > is there support for multiple tables in .debug_line yet?
> > > > > > > > 
> > > > > > > > Yes, `yaml2obj` supports emitting multiple line tables. I'm able to help craft these test cases.
> > > > > > > > 
> > > > > > > > ----------------------------
> > > > > > > > 
> > > > > > > > It looks that `LT` isn't checked. If a compilation unit doesn't have an associated line table, `llvm-dwarfdump --show-sources` will crash.
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > const auto *LT = DICtx.getLineTableForUnit(CU.get()); // Can be a null pointer.
> > > > > > > > for (uint32_t I = 1; I <= LT->Prologue.FileNames.size(); ++I) {
> > > > > > > >                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > >   ...
> > > > > > > > }
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > We can reproduce it using the following test case.
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > $ yaml2obj %s | llvm-dwarfdump --show-sources -
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > --- !ELF
> > > > > > > > FileHeader:
> > > > > > > >   Class:   ELFCLASS64
> > > > > > > >   Data:    ELFDATA2LSB
> > > > > > > >   Type:    ET_EXEC
> > > > > > > >   Machine: EM_X86_64
> > > > > > > > DWARF:
> > > > > > > >   debug_info:
> > > > > > > >     - Version: 4
> > > > > > > > ```
> > > > > > > Nice catch! In fact, do we really need to use the CUs at all for this? Could we not just iterate over all line tables? That would allow this to work when there is no .debug_info data too (which the DWARF spec implies is permitted).
> > > > > > I don't know how carefully the spec says it is permitted, but certainly I've heard committee members talk about stripping everything but .debug_line (and with v5, .debug_line_str) from an object file.
> > > > > > 
> > > > > > In DWARF v4, technically the primary source file & compilation dir could be omitted from the line table, although in practice I think that never happens. In v5 the primary source file & dir are supposed to be explicit in the line table, so I think ignoring .debug_info ought to be okay in general.
> > > > > @jhenderson is there an API to iterate over all line tables? I searched through LLVM but haven't found anything.
> > > > I thought there was, but having taking a look, I don't know of an interface that allows you to simply iterate over all line tables without parsing all of them.
> > > > 
> > > > Certainly you can iterate over all the line tables by parsing them in order by using the `SectionParser` class of DWARFDebugLine.h. I'm not sure if that's exactly the right way forward here though, since I suspect by this point the DWARFContext may have already done (some of) the parsing (I haven't dug into the code to confirm either way).
> > > > 
> > > > There's also `getOrParseLineTable`, which takes an offset, `Context` and `DWARFDataExtractor` and gives you back the line table at that offset, which may or may not have already been parsed (it will return the cached version if it has been). You'd need to then use the length field within the line table to identify the next offset to use. Maybe a new function could sit on top of that to give you the ability to iterate over them, and only parse the ones that haven't been already? Alternatively, you could modify the `SectionParser` class to cache the parsed line tables so that it doesn't matter if you try to reparse them later.
> > > I think you are able to iterate over line tables via the following code snippets.
> > > 
> > > ```
> > >   DWARFDataExtractor LineData(DICtx.getDWARFObj(),
> > >                               DICtx.getDWARFObj().getLineSection(),
> > >                               DICtx.isLittleEndian(), 0);
> > >   DWARFDebugLine::SectionParser Parser(LineData, DICtx,
> > >                                        DICtx.normal_units());
> > >   while (!Parser.done()) {
> > >     DWARFDebugLine::LineTable LT = Parser.parseNext(
> > >       RecoverableErrorHandler,
> > >       UnrecoverableErrorHandler);
> > >     // Dump file names with paths.
> > >     ...
> > >   }
> > > ```
> > Thanks, I tried that which made me realize that without CU, we don't have the `comp_dir`, is that something we care about?
> I have no idea. Perhaps @jhenderson and @dblaikie can help us?
Ah, that's a good point. I think having the compilation directory is useful, but perhaps not a deal breaker. In other words, if it would be clean to do, I'd think the behaviour could be:

1) If .debug_line only is present, print just the names assuming some reasonable assumption about the compilation dir (e.g. the working directory/empty string/"." etc).
2) If both are present, use the one specified in the CU.

I'm very much open to other thoughts though. I feel like this option could be useful without .debug_info being present, but I don't know how much of a common case that actually is.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87656/new/

https://reviews.llvm.org/D87656