[llvm-dev] [llvm-pdbutil] : merge not working properly

Vivien Millet via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 17 10:52:17 PST 2019

Ok I understand more what you meant. In fact I don’t care about the pdb
size, at least as a first step, so it won’t be a problem for me to have
duplicated symbols. Concerning TypeIndices my plan if possible is not to
generate a pdb for my jit and merge it, but instead directly extract debug
info from a DwarfContext just after llvm::object::ObjectFile is emitted by
the JIT engine and complete the EXE PDB I had rebuilt with PDBFileBuilder.
Does it sounds a good bet to you ? If I succeed doing that I think that
could be a good extension to the debugging possibilities of MCJit if not
being an extension to pdbutil.

Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <zturner at google.com> a écrit :

> Well, for example the TPI stream is just one big collection of types.
> Presumably your JIT code will reuse some of the same types (perhaps,
> std::string for example) as your non-jitted code.  Your jitted symbol
> records in the object file (for example, a local variable of type
> std::string in your jitted code) will refer to the type for std;:string by
> a TypeIndex, and your original PDB will also refer to std::string by a
> different TypeIndex.
> In LLD, when we merge in types and symbols from each object file, we keep
> a hash table of which types have already been seen, so that if we see the
> same type again, we can just use the TypeIndex that we wrote on a previous
> object file.  Then, when we add symbol records, we have to update its
> fields that used the old TypeIndex to use the new TypeIndex instead.
> De-duplicating though, I suppose, is not strictly necessary, it will just
> keep your PDB size down.  But you *will* need to at least re-write the
> TypeIndexes from the jitted code.  For example, you may decide that instead
> of de-duplicating, you just append them all to the end of the TPI stream
> (where all the types go in PDB) to keep things simple.  Since they were in
> a different position before, they now have different TypeIndices.  So you
> will need to re-write all TypeIndices so that they are correct after the
> merge.   Both types and symbols can refer to types, so you will need to do
> this both for the types of the jitted code as well as the symbols of the
> jitted code.
> Let me know if that makes sense.
> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <vivien.millet at gmail.com>
> wrote:
>> Ok I see..
>> what do you mean by “making sure to de-duplicate records as necessary” ?
>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <zturner at google.com> a
>> écrit :
>>> It's possible in theory to support incremental updates to a PDB (the
>>> file format is designed specifically with that in mind).  But this
>>> functionality was never added to the PDB library since lld doesn't support
>>> incremental linking, we never really needed it.
>>> The "dumb" way would be to just create a new PDB file, build it using
>>> the old contents and the new contents (making sure to de-duplicate records
>>> as necessary).
>>> Supporting incremental updates should be possible, but most of LLVM's
>>> File I/O abstractions are based around mmapping a file and writing to it,
>>> which doesn't work when you don't know the file size in advance.  So there
>>> would be some interesting problems to solve here.
>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <vivien.millet at gmail.com>
>>> wrote:
>>>> Hi Zachary !
>>>> If there a way to easily create a new PDBFileBuilder from an existing
>>>> PDBFile or can/should I do the translation myself ?
>>>> I would like to start from a builder filled with the EXE PDB data and
>>>> then complete its DBI stream with the JIT module/symbols.
>>>> Thanks !
>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <vivien.millet at gmail.com>
>>>> a écrit :
>>>>> Thank you Zachary !
>>>>> I will have some soon I think ..
>>>>> I first need to explore the llvmpdb-util code more because I don't
>>>>> even know where to start with the PDB api..
>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <zturner at google.com> a
>>>>> écrit :
>>>>>> Sure. Along the way I’m happy to answer any specific questions you
>>>>>> might have too even if it’s for your downstream project
>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>> I would be up to improve pdbutil but I doubt I have enough knowledge
>>>>>>> or time to provide the complete merge feature, it would still be a very
>>>>>>> specific kind of merge as you describe it. Anyway I could start trying to
>>>>>>> do it in my jit compiler and then, once I get something working (if that
>>>>>>> happens :)), i can come back to you with the piece of code and see if it is
>>>>>>> worth integrating it to pdbutil and how ?
>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <zturner at google.com>
>>>>>>> a écrit :
>>>>>>>> Well, that’s certainly possible, but improving llvm-pdbutil is
>>>>>>>> another possibility. Doing it directly in your jit compiler will probably
>>>>>>>> save you time though, since you won’t have to worry about writing tests and
>>>>>>>> going through code review
>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>> Thanks for the tips !
>>>>>>>>> When you talk about doing all of this I suppose you think about
>>>>>>>>> using llvm/debuginfo/pdb, pick code here and there to generate the pdb in
>>>>>>>>> memory, read the executable one and perform the merge directly in my jit
>>>>>>>>> compiler, right ? Not using pdbutil ?
>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <zturner at google.com>
>>>>>>>>> a écrit :
>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>> So you are one of the happy guys who suffered from the lack of
>>>>>>>>>>> PDB format information :)
>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>> To be honest I'm really a beginner in the PDB stuff, I just read
>>>>>>>>>>> some llvm documentation to understand what went wrong when merging my PDBs.
>>>>>>>>>>> In my case, what I do with my team and try to achieve is this :
>>>>>>>>>>> - Run our application under a visual studio debugger
>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>> - Then, either :
>>>>>>>>>>>    - export as COFF obj file with dwarf information and then
>>>>>>>>>>> convert it with cv2pdb to obtain a pdb of my JIT symbols (what I do now)
>>>>>>>>>>>    - export directly to PDB my JIT debug info (what i would like
>>>>>>>>>>> to do, if you have an idea how..)
>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>> - Merge my JIT pdb into a copy of the executable pdb (where
>>>>>>>>>>> things start to go bad..)
>>>>>>>>>>> - Replace original executable by the copy (creating a backup of
>>>>>>>>>>> original)
>>>>>>>>>>> - Reattach  the visual studio debugger to my executable (loading
>>>>>>>>>>> the new pdb version)
>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>> - On each JIT rebuild, restart these steps from the original
>>>>>>>>>>> native executable PDB to avoid merge conflict between the multiple JIT
>>>>>>>>>>> iterations
>>>>>>>>>> Yea, it's an interesting use case.  It makes me think it would be
>>>>>>>>>> nice if the PDB format supported some way of having a symbol which simply
>>>>>>>>>> refers to another PDB file, that way you could re-write that PDB file at
>>>>>>>>>> runtime once all your code is jitted, and when the debugger tries to look
>>>>>>>>>> up that symbol, it finds a record that tells it to go check the other PDB
>>>>>>>>>> file.
>>>>>>>>>> So, here are the things I think you would need to do:
>>>>>>>>>> 1) Create a JIT module in the module list with a unique name.
>>>>>>>>>> All symbols will go here.  llvm-pdbutil dump -modules shows you the list.
>>>>>>>>>> Be careful about putting it at the end though, because there's already one
>>>>>>>>>> at the end called * LINKER * that is kind of special.  On the other hand,
>>>>>>>>>> you don't want to put it first because it means you will have to do lots of
>>>>>>>>>> fixups on the EXE PDB.  It's probably best to add it right before the
>>>>>>>>>> linker module, this has the least chance of breaking anything.
>>>>>>>>>> 2) In the debug stream for this module, add all symbols.  You
>>>>>>>>>> will need to fix up their type indices.  As you noticed, llvm-pdbutil
>>>>>>>>>> already merges type information from the JIT PDB, so after merging the type
>>>>>>>>>> indices in the EXE PDB will be different than they were in the JIT PDB, but
>>>>>>>>>> the symbol records will refer to the JIT PDB type indices.  So these need
>>>>>>>>>> to be fixed up.  LLD already has code to do this, you can probably borrow a
>>>>>>>>>> similar algorithm with some slight modifications (lldb/COFF/PDB.cpp, search
>>>>>>>>>> for mergeSymbolRecords)
>>>>>>>>>> 3) Merge in the new section contributions and section map.  See
>>>>>>>>>> LLD again for how to modify these.  Hopefully the object file you exported
>>>>>>>>>> contains relocated symbol addresses so you don't have to do any fixups here.
>>>>>>>>>> 4) Merge in the publics and globals.  This shouldn't be too hard,
>>>>>>>>>> I think you can just iterate over them in the JIT PDB and add them to the
>>>>>>>>>> new EXE PDB.
>>>>>>>>>> You're kind of in uncharted territory here, so this is just a
>>>>>>>>>> rough idea of what needs to be done.  There may be other issues that you
>>>>>>>>>> don't encounter until you actually try it out.
>>>>>>>>>> Unfortunately I don't personally have the time to work on this,
>>>>>>>>>> but it sounds neat, and I'm happy to help if you run into questions or
>>>>>>>>>> problems along the way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/c50d9072/attachment.html>

More information about the llvm-dev mailing list