[llvm-dev] [llvm-pdbutil] : merge not working properly

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 17 10:08:55 PST 2019


It's possible in theory to support incremental updates to a PDB (the file
format is designed specifically with that in mind).  But this functionality
was never added to the PDB library since lld doesn't support incremental
linking, we never really needed it.

The "dumb" way would be to just create a new PDB file, build it using the
old contents and the new contents (making sure to de-duplicate records as
necessary).

Supporting incremental updates should be possible, but most of LLVM's File
I/O abstractions are based around mmapping a file and writing to it, which
doesn't work when you don't know the file size in advance.  So there would
be some interesting problems to solve here.

On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <vivien.millet at gmail.com>
wrote:

> Hi Zachary !
> If there a way to easily create a new PDBFileBuilder from an existing
> PDBFile or can/should I do the translation myself ?
> I would like to start from a builder filled with the EXE PDB data and then
> complete its DBI stream with the JIT module/symbols.
>
> Thanks !
>
>
> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <vivien.millet at gmail.com> a
> écrit :
>
>> Thank you Zachary !
>> I will have some soon I think ..
>> I first need to explore the llvmpdb-util code more because I don't even
>> know where to start with the PDB api..
>>
>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <zturner at google.com> a
>> écrit :
>>
>>> Sure. Along the way I’m happy to answer any specific questions you might
>>> have too even if it’s for your downstream project
>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <vivien.millet at gmail.com>
>>> wrote:
>>>
>>>> I would be up to improve pdbutil but I doubt I have enough knowledge or
>>>> time to provide the complete merge feature, it would still be a very
>>>> specific kind of merge as you describe it. Anyway I could start trying to
>>>> do it in my jit compiler and then, once I get something working (if that
>>>> happens :)), i can come back to you with the piece of code and see if it is
>>>> worth integrating it to pdbutil and how ?
>>>>
>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <zturner at google.com> a
>>>> écrit :
>>>>
>>>>> Well, that’s certainly possible, but improving llvm-pdbutil is another
>>>>> possibility. Doing it directly in your jit compiler will probably save you
>>>>> time though, since you won’t have to worry about writing tests and going
>>>>> through code review
>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <vivien.millet at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the tips !
>>>>>> When you talk about doing all of this I suppose you think about using
>>>>>> llvm/debuginfo/pdb, pick code here and there to generate the pdb in memory,
>>>>>> read the executable one and perform the merge directly in my jit compiler,
>>>>>> right ? Not using pdbutil ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <zturner at google.com> a
>>>>>> écrit :
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Zachary !
>>>>>>>> Thanks for your time !
>>>>>>>> So you are one of the happy guys who suffered from the lack of PDB
>>>>>>>> format information :)
>>>>>>>>
>>>>>>> Yes, that would be me :)
>>>>>>>
>>>>>>>
>>>>>>>> To be honest I'm really a beginner in the PDB stuff, I just read
>>>>>>>> some llvm documentation to understand what went wrong when merging my PDBs.
>>>>>>>> In my case, what I do with my team and try to achieve is this :
>>>>>>>> - Run our application under a visual studio debugger
>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>> - Then, either :
>>>>>>>>    - export as COFF obj file with dwarf information and then
>>>>>>>> convert it with cv2pdb to obtain a pdb of my JIT symbols (what I do now)
>>>>>>>>    - export directly to PDB my JIT debug info (what i would like to
>>>>>>>> do, if you have an idea how..)
>>>>>>>> - Detach the visual studio debugger
>>>>>>>> - Merge my JIT pdb into a copy of the executable pdb (where things
>>>>>>>> start to go bad..)
>>>>>>>> - Replace original executable by the copy (creating a backup of
>>>>>>>> original)
>>>>>>>> - Reattach  the visual studio debugger to my executable (loading
>>>>>>>> the new pdb version)
>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>> - On each JIT rebuild, restart these steps from the original native
>>>>>>>> executable PDB to avoid merge conflict between the multiple JIT iterations
>>>>>>>>
>>>>>>> Yea, it's an interesting use case.  It makes me think it would be
>>>>>>> nice if the PDB format supported some way of having a symbol which simply
>>>>>>> refers to another PDB file, that way you could re-write that PDB file at
>>>>>>> runtime once all your code is jitted, and when the debugger tries to look
>>>>>>> up that symbol, it finds a record that tells it to go check the other PDB
>>>>>>> file.
>>>>>>>
>>>>>>> So, here are the things I think you would need to do:
>>>>>>>
>>>>>>> 1) Create a JIT module in the module list with a unique name.  All
>>>>>>> symbols will go here.  llvm-pdbutil dump -modules shows you the list.  Be
>>>>>>> careful about putting it at the end though, because there's already one at
>>>>>>> the end called * LINKER * that is kind of special.  On the other hand, you
>>>>>>> don't want to put it first because it means you will have to do lots of
>>>>>>> fixups on the EXE PDB.  It's probably best to add it right before the
>>>>>>> linker module, this has the least chance of breaking anything.
>>>>>>>
>>>>>>> 2) In the debug stream for this module, add all symbols.  You will
>>>>>>> need to fix up their type indices.  As you noticed, llvm-pdbutil already
>>>>>>> merges type information from the JIT PDB, so after merging the type indices
>>>>>>> in the EXE PDB will be different than they were in the JIT PDB, but the
>>>>>>> symbol records will refer to the JIT PDB type indices.  So these need to be
>>>>>>> fixed up.  LLD already has code to do this, you can probably borrow a
>>>>>>> similar algorithm with some slight modifications (lldb/COFF/PDB.cpp, search
>>>>>>> for mergeSymbolRecords)
>>>>>>>
>>>>>>> 3) Merge in the new section contributions and section map.  See LLD
>>>>>>> again for how to modify these.  Hopefully the object file you exported
>>>>>>> contains relocated symbol addresses so you don't have to do any fixups here.
>>>>>>>
>>>>>>> 4) Merge in the publics and globals.  This shouldn't be too hard, I
>>>>>>> think you can just iterate over them in the JIT PDB and add them to the new
>>>>>>> EXE PDB.
>>>>>>>
>>>>>>> You're kind of in uncharted territory here, so this is just a rough
>>>>>>> idea of what needs to be done.  There may be other issues that you don't
>>>>>>> encounter until you actually try it out.
>>>>>>>
>>>>>>> Unfortunately I don't personally have the time to work on this, but
>>>>>>> it sounds neat, and I'm happy to help if you run into questions or problems
>>>>>>> along the way.
>>>>>>>
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/b490289c/attachment.html>


More information about the llvm-dev mailing list