[llvm-dev] [llvm-pdbutil] : merge not working properly

Tue Jan 15 02:50:35 PST 2019

Hello Zachary !
Thanks for your time !
So you are one of the happy guys who suffered from the lack of PDB format
information :)
To be honest I'm really a beginner in the PDB stuff, I just read some llvm
documentation to understand what went wrong when merging my PDBs.
In my case, what I do with my team and try to achieve is this :
- Run our application under a visual studio debugger
- Generate JIT code ( using llvm MCJIT  )
- Then, either :
   - export as COFF obj file with dwarf information and then convert it
with cv2pdb to obtain a pdb of my JIT symbols (what I do now)
   - export directly to PDB my JIT debug info (what i would like to do, if
you have an idea how..)
- Detach the visual studio debugger
- Merge my JIT pdb into a copy of the executable pdb (where things start to
go bad..)
- Replace original executable by the copy (creating a backup of original)
- Reattach  the visual studio debugger to my executable (loading the new
pdb version)
- Debug JIT code with visual studio.
- On each JIT rebuild, restart these steps from the original native
executable PDB to avoid merge conflict between the multiple JIT iterations

So, concerning the three stages you describe:
- 1) : I would be even more naive : I would consider every module as a new
module without trying to merge them by name (but I might be too naive..)
- 2) and 3) : Same here, in my case I won't have same symbols/modules
conflicting, it is impossible, so I would choose again the simplest and
naive case : by default every symbol is always a new symbol (addition).
Options for 'merge' feature could be the classical mathematics group
operations : addition(default)/union/intersection. And this option could
apply to different level of merge (modules -> symbols -> etc).
I'm not a warrior of PDB like you and I don't know what consequences there
is to choose one or another way on the PDB final structure and readability.
Then I trust you on the good choice to take.

Do you think what I try to achieve is doable ? Would you help me to do it ?

Thank you

PS : BTW, If you or someone knows another (better/easier) way to debug
MCJIT code with visual studio, I'm really open to hear about it !

Le lun. 14 janv. 2019 à 22:49, Zachary Turner <zturner at google.com> a écrit :

> Yes I am the person who wrote this feature (along with most other
> PDB-related features).
>
> I thought about some and I think it's a bit hard (if not impossible) to
> merge PDBs in this way.  Here's a short list of things I came up with
>
> 1) We need to merge the list of modules.  This requires first detecting if
> two modules are actually the same.  For example, if I run llvm-pdbutil on a
> random PDB on my disk, I get this (output is trimmed for brevity)
>
> $ llvm-pdbutil.exe dump -modules bin\not.pdb
>
>                           Modules
> ============================================================
> Mod 0000 |
> `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
>   Obj:
> `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
>   debug stream: 14, # files: 80, has ec info: false
>   pdb file ni: 0 ``, src file ni: 0 ``
> Mod 0001 | `lib\Support\CMakeFiles\LLVMSupport.dir\Program.cpp.obj`:
> Obj: `D:\src\llvmbuild\cl\Debug\x64\lib\LLVMSupport.lib`:
> debug stream: 46, # files: 102, has ec info: false
> pdb file ni: 0 ``, src file ni: 0 ``
>
>
> The easiest thing to do is consider them to be the same only if both the
> module name and object name are identical, but depending on your use case
> this might not be sufficient (for example what if the 2 PDBs were built in
> different output directories, then you might have `D:\foo\not.cpp.obj` in
> one PDB and `D:\bar\not.cpp.obj` in another one.  So we would need to find
> a solution that makes sense here.
>
> 2) When two modules are the same, we need to merge their file list and
> debug stream.  In the above example:
>
> $ llvm-pdbutil.exe dump -files -modi=0 bin\not.pdb
>                            Files
> ============================================================
> Mod 0000 |
> `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
> - (MD5: 2FE06AF7EACFB232C6FF033DBFC4412E) c:\program files (x86)\microsoft
> visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\stdexcept
> - (MD5: 0B299654FBC61F03E9533F9296BBD2B3) c:\program files (x86)\microsoft
> visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\xstring
> etc...
>
> $ llvm-pdbutil.exe dump -symbols -modi=0 bin\not.pdb
>                           Symbols
> ============================================================
>   Mod 0000 |
> `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:
>        4 | S_OBJNAME [size = 80] sig=0,
> `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`
>       84 | S_COMPILE3 [size = 60]
>            machine = intel x86-x64, Ver = Microsoft (R) Optimizing
> Compiler, language = c++
>            frontend = 19.16.27024.1, backend = 19.16.27024.1
>            flags = security checks | hot patchable
>      144 | S_UNAMESPACE [size = 20] `__vc_attributes`
>
> The file list is easy, but for the symbol records, some of these records
> might be the same in 2 different object files, and some might be
> different.  So we need to de-duplicate them into the final PDB.  LLD
> actually already does this, so a lot of the code for this portion is
> probably already written in LLD.   See PDBLinker::mergeSymbolRecords in
> lld/COFF/PDB.cpp.  The algorithm is slightly different when merging 2 PDBs,
> but that's the general idea.
>
>
> 3) We need to merge the publics and globals stream, similar to the above.
>
>
> For #2 and #3 above, this is going to be tricky.  How do you know if 2
> symbols are actually the same symbol?  Even if they have the same name it
> might, for example, be a symbol for a certain function F.  Suppose the
> first PDB is for executable A, and the second PDB is for executable B.
> What if the generated code for function F in executable A differs from the
> generated code for F in executable B?  Does that end up as two symbols in
> the merged PDB or 1?  I'm not sure if there's a good way to handle this.
>
> I guess it might help to know more about your intended use case.  Then we
> might be able to make some simplifications to the problem that would allow
> us to decide on a reasonable solution.
>
> On Mon, Jan 14, 2019 at 5:39 AM Vivien Millet <vivien.millet at gmail.com>
> wrote:
>
>> Were you the man in charge of this feature ? If not, do you know who was
>> in charge (to see what could be the best way / what is missing to complete
>> this feature) ?
>>
>> Le lun. 24 déc. 2018 à 02:01, Zachary Turner <zturner at google.com> a
>> écrit :
>>
>>> The merge feature was implemented primarily for testing but was never
>>> really productionized, so your guess about what the underlying problem is
>>> sounds correct to me. We could probably hide the subcommand so users don’t
>>> accidentally use it, or if someone wants to properly implement the missing
>>> features, that would be even better
>>> On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> When trying to merge 2 pdbs which have each their own DBI stream, I
>>>> endup with a pdb with an inconsistent number of stream and no DBI stream
>>>> (or at least not at fixed index 3, producing a corrupt error when dumping
>>>> with -l).
>>>> Looking at the code, it seems that we don't merge other streams than
>>>> TPI and IPI streams, am I right ?
>>>> Is the "merge" feature completely implemented ?
>>>> Thanks
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190115/197fdf9e/attachment.html>