[llvm-dev] [llvm-pdbutil] : merge not working properly

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 14 13:48:10 PST 2019


Yes I am the person who wrote this feature (along with most other
PDB-related features).

I thought about some and I think it's a bit hard (if not impossible) to
merge PDBs in this way.  Here's a short list of things I came up with

1) We need to merge the list of modules.  This requires first detecting if
two modules are actually the same.  For example, if I run llvm-pdbutil on a
random PDB on my disk, I get this (output is trimmed for brevity)

$ llvm-pdbutil.exe dump -modules bin\not.pdb

                          Modules
============================================================
Mod 0000 |
`D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
  Obj:
`D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
  debug stream: 14, # files: 80, has ec info: false
  pdb file ni: 0 ``, src file ni: 0 ``
Mod 0001 | `lib\Support\CMakeFiles\LLVMSupport.dir\Program.cpp.obj`:
Obj: `D:\src\llvmbuild\cl\Debug\x64\lib\LLVMSupport.lib`:
debug stream: 46, # files: 102, has ec info: false
pdb file ni: 0 ``, src file ni: 0 ``


The easiest thing to do is consider them to be the same only if both the
module name and object name are identical, but depending on your use case
this might not be sufficient (for example what if the 2 PDBs were built in
different output directories, then you might have `D:\foo\not.cpp.obj` in
one PDB and `D:\bar\not.cpp.obj` in another one.  So we would need to find
a solution that makes sense here.

2) When two modules are the same, we need to merge their file list and
debug stream.  In the above example:

$ llvm-pdbutil.exe dump -files -modi=0 bin\not.pdb
                           Files
============================================================
Mod 0000 |
`D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
- (MD5: 2FE06AF7EACFB232C6FF033DBFC4412E) c:\program files (x86)\microsoft
visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\stdexcept
- (MD5: 0B299654FBC61F03E9533F9296BBD2B3) c:\program files (x86)\microsoft
visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\xstring
etc...

$ llvm-pdbutil.exe dump -symbols -modi=0 bin\not.pdb
                          Symbols
============================================================
  Mod 0000 |
`D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
       4 | S_OBJNAME [size = 80] sig=0,
`D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`
      84 | S_COMPILE3 [size = 60]
           machine = intel x86-x64, Ver = Microsoft (R) Optimizing
Compiler, language = c++
           frontend = 19.16.27024.1, backend = 19.16.27024.1
           flags = security checks | hot patchable
     144 | S_UNAMESPACE [size = 20] `__vc_attributes`

The file list is easy, but for the symbol records, some of these records
might be the same in 2 different object files, and some might be
different.  So we need to de-duplicate them into the final PDB.  LLD
actually already does this, so a lot of the code for this portion is
probably already written in LLD.   See PDBLinker::mergeSymbolRecords in
lld/COFF/PDB.cpp.  The algorithm is slightly different when merging 2 PDBs,
but that's the general idea.


3) We need to merge the publics and globals stream, similar to the above.


For #2 and #3 above, this is going to be tricky.  How do you know if 2
symbols are actually the same symbol?  Even if they have the same name it
might, for example, be a symbol for a certain function F.  Suppose the
first PDB is for executable A, and the second PDB is for executable B.
What if the generated code for function F in executable A differs from the
generated code for F in executable B?  Does that end up as two symbols in
the merged PDB or 1?  I'm not sure if there's a good way to handle this.

I guess it might help to know more about your intended use case.  Then we
might be able to make some simplifications to the problem that would allow
us to decide on a reasonable solution.

On Mon, Jan 14, 2019 at 5:39 AM Vivien Millet <vivien.millet at gmail.com>
wrote:

> Were you the man in charge of this feature ? If not, do you know who was
> in charge (to see what could be the best way / what is missing to complete
> this feature) ?
>
> Le lun. 24 déc. 2018 à 02:01, Zachary Turner <zturner at google.com> a
> écrit :
>
>> The merge feature was implemented primarily for testing but was never
>> really productionized, so your guess about what the underlying problem is
>> sounds correct to me. We could probably hide the subcommand so users don’t
>> accidentally use it, or if someone wants to properly implement the missing
>> features, that would be even better
>> On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> When trying to merge 2 pdbs which have each their own DBI stream, I
>>> endup with a pdb with an inconsistent number of stream and no DBI stream
>>> (or at least not at fixed index 3, producing a corrupt error when dumping
>>> with -l).
>>> Looking at the code, it seems that we don't merge other streams than TPI
>>> and IPI streams, am I right ?
>>> Is the "merge" feature completely implemented ?
>>> Thanks
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/f9e86c51/attachment.html>


More information about the llvm-dev mailing list