[llvm-dev] yaml2obj support for COFF debug directories

Thu Mar 5 00:10:58 PST 2020

On Wed, 4 Mar 2020, Reid Kleckner via llvm-dev wrote:

> I think it seems like an oversight, and improvements in this area would be
> welcome.
> I think most of the effort in COFF <-> YAML translation has been for
> representing object files, and debug directories are a feature of fully
> linked PE images. With that in mind, it's not too surprising that the
> feature is missing.

In general, it should be possible to roundtrip linked PE images via yaml 
just fine - their contents would just be part of the opaque section 
contents blob. Hard to inspect and tweak by hand, but so are lots of other 
things that are referended via data directories (like base relocation 
tables) and stored in the plain section contents.

But debug directories have got one property which would break this - they 
have a PointerToRawData field, that should contain the raw byte offset 
within the linked PE image, to their content data. As roundtrip via yaml 
does rewrite the file structure (and the output layout of yaml2obj isn't 
supposed to be fixed), the exact value of this field would have to be 
updated. As far as I know, yaml2obj doesn't do this at the moment.

llvm-objcopy's COFF backend does try to do it 
(COFFWriter::patchDebugDirectory in 
llvm/tools/llvm-objcopy/COFF/Writer.cpp), but when I now reread the code 
there, I'm pretty sure I made some mistakes there. (I incorrectly assumed 
that the raw data is interleaved after each debug directory header.) With 
your lld patch for the CET compat flag, it should be easy to generate a 
testcase for that, with more than one debug directory.

One general design question regarding this in obj2yaml, is that when the 
debug directories are synthesized, should they be appended onto one of the 
existing sections (with normal hex dumped contents) or created as an 
entirely new section? Synthesizing them separately works fine for cases 
where a file is generated entirely from scratch with yaml, but is tricky 
for obj2yaml, where the original debug directories pretty much need to be 
left in place. In that case, each time a PE image is roundtripped via 
yaml, it would generate yet another set of debug directories, orphaning 
the old ones.

Finally, when reading the spec, it also seems like the payload of a debug 
directory doesn't even need to be in the mappable parts of sections, but 
could be in unmapped areas of the PE image file (by having 
AddressOfRawData set to zero, so it can only be found via 
PointerToRawData). This doesn't seem like something that e.g. 
llvm-readobj's --coff-debug-directoriy currently supports though (and 
llvm-objcopy expects the paylaod to be moved along as part of sections' 
contents).

I'll make a note to try to fix llvm-objcopy's assumptions about the 
location of the payload this sometime in the future.

// Martin