[llvm] [llvm-dwp] turn duplicate dwo id into warning, continue to gen dwp (PR #121193)

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 2 11:17:22 PST 2025


dwblaikie wrote:

> > the same file compiled/linked in twice, and the linker discards one but the dwp actions don't know how to do that (if you're using dwp -e you probably wouldn't hit these cases, since the redundant objects would be discarded by the linker)
> 
> Yes, the normal duplicate .cpp should be discarded when linking as mentioned. What's interesting here is the real case we met: multiple .a files have the same .o files (which only contain declarations and are linked using whole-archive, I know this is a bit inappropriate), causing the linker to link all the duplicate .o files in the main program. Using llvm-dwp -e will result in an error in this case because of the duplicate skeletons. So it seems there is a possibility of such skeleton duplication...

Ah, that's probably even easier to handle, then (compared to the duplicate DWO IDs that happen when building DWP files without `-e`, instead using a list of the objects passed to the linker, or worse - when linking DWP files incrementally, when the input is another DWP file which might have CUs and TUs, and the CU is duplicate but the TUs may be needed? (I guess that shouldn't happen, though - if the DWO ID collision is benign, the TUs should all already be included - perhaps that's something we could slice out of the warning/error))

In the `-e` case, I think if the CU ID is duplicate, ignore the whole object file - seems fine to me. Could have a warning-upgradeable-to-error, or error-downgradeable-to-warning. I don't have strong feelings either way - all the cases I've seen were benign (genuinely identical DWO files, not a hash collision from two different but same-hashing DWO files)

Perhaps "same DWO ID with different contents" could be an error by default (downgradeable to a warning) - but means more expensive checking when duplicate DWO IDs are encountered, even if they turn out to be benign.

> > I'm guessing current DWARF consumers don't handle this, and just assume it's a bug/mistake/duplicate skeleton unit if two have the same DWO ID.
> 
> Do we have any thoughts on this? Should we consider skipping the duplicate DWOs and then generating the DWP(better than nothing?), or packaging the duplicate DWOs in the DWP (what binutil dwp does), or continue with the current approach of reporting an error and then exiting? In the current llvm-dwp implementation, skipping the duplicate DWOs and generating a DWP seems to be more convenient.

We should skip them - but we have to do so carefully.

If we're doing incremental DWP building, then one DWP might have a duplicate CU DWO ID, but we need to verify that all the split type units that reference the sections shared with that CU DWO ID aren't needed either. (eg: .debug_str_offsets.dwo)

Maybe we've done this correctly already, since this would be needed for type units that share sections - but I might've been lazy/cut corners and relied on the CU already pulling in the relevant sections. One way to test would be to see if llvm-dwp can handle a file with /only/ type units, and correctly pull in the section contents and emit it into the dwp. If that doesn't work, then there's probably work to be done to allow ignoring a CU and not ignoring a TU that references the same contents - or checking and failing in that case would be fine too/marginally better.

https://github.com/llvm/llvm-project/pull/121193


More information about the llvm-commits mailing list