[llvm-dev] [llvm-pdbutil] : merge not working properly

Thu Jan 17 11:25:54 PST 2019

That’s a good question, by default when emitting the object file I choose
COFF but it embeds dwarf and not codeview in the end.. there probably is a
way to do it or at least it must be implemented if not yet..
Lets imagine I manage to do that.. when you say there is nothing to do, I
still must have a PDBFileBuilder to copy the codeview data inside the EXE
PDB right ? I cannot insert them easily in the EXE PDB with another way ?

Le jeu. 17 janv. 2019 à 20:01, Zachary Turner <zturner at google.com> a écrit :

> Well, is it possible to just hook up the CodeView debug info generator to
> MCJIT?  If you're not jitting, and you just compile something, we translate
> all of the LLVM metadata into CodeView in the file CodeViewDebug.cpp.
> Then, the object file just already has CodeView in it.  If it's not hard to
> do, this would probably be a better solution, because you don't have to
> worry about *how* to translate DWARF into CodeView, which is not always
> trivial.
>
> If you can configure this in MCJIT, you won't even need to do anything,
> you can just open the ObjectFile, look for the .debug$T and .debug$S
> sections, iterate over each one and re-write their TypeIndices while
> copying them to the output PDB file.
>
> On Thu, Jan 17, 2019 at 10:52 AM Vivien Millet <vivien.millet at gmail.com>
> wrote:
>
>> Ok I understand more what you meant. In fact I don’t care about the pdb
>> size, at least as a first step, so it won’t be a problem for me to have
>> duplicated symbols. Concerning TypeIndices my plan if possible is not to
>> generate a pdb for my jit and merge it, but instead directly extract debug
>> info from a DwarfContext just after llvm::object::ObjectFile is emitted by
>> the JIT engine and complete the EXE PDB I had rebuilt with PDBFileBuilder.
>> Does it sounds a good bet to you ? If I succeed doing that I think that
>> could be a good extension to the debugging possibilities of MCJit if not
>> being an extension to pdbutil.
>>
>> Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <zturner at google.com> a
>> écrit :
>>
>>> Well, for example the TPI stream is just one big collection of types.
>>> Presumably your JIT code will reuse some of the same types (perhaps,
>>> std::string for example) as your non-jitted code.  Your jitted symbol
>>> records in the object file (for example, a local variable of type
>>> std::string in your jitted code) will refer to the type for std;:string by
>>> a TypeIndex, and your original PDB will also refer to std::string by a
>>> different TypeIndex.
>>>
>>> In LLD, when we merge in types and symbols from each object file, we
>>> keep a hash table of which types have already been seen, so that if we see
>>> the same type again, we can just use the TypeIndex that we wrote on a
>>> previous object file.  Then, when we add symbol records, we have to update
>>> its fields that used the old TypeIndex to use the new TypeIndex instead.
>>>
>>> De-duplicating though, I suppose, is not strictly necessary, it will
>>> just keep your PDB size down.  But you *will* need to at least re-write the
>>> TypeIndexes from the jitted code.  For example, you may decide that instead
>>> of de-duplicating, you just append them all to the end of the TPI stream
>>> (where all the types go in PDB) to keep things simple.  Since they were in
>>> a different position before, they now have different TypeIndices.  So you
>>> will need to re-write all TypeIndices so that they are correct after the
>>> merge.   Both types and symbols can refer to types, so you will need to do
>>> this both for the types of the jitted code as well as the symbols of the
>>> jitted code.
>>>
>>> Let me know if that makes sense.
>>>
>>> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <vivien.millet at gmail.com>
>>> wrote:
>>>
>>>> Ok I see..
>>>> what do you mean by “making sure to de-duplicate records as necessary” ?
>>>>
>>>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <zturner at google.com> a
>>>> écrit :
>>>>
>>>>> It's possible in theory to support incremental updates to a PDB (the
>>>>> file format is designed specifically with that in mind).  But this
>>>>> functionality was never added to the PDB library since lld doesn't support
>>>>> incremental linking, we never really needed it.
>>>>>
>>>>> The "dumb" way would be to just create a new PDB file, build it using
>>>>> the old contents and the new contents (making sure to de-duplicate records
>>>>> as necessary).
>>>>>
>>>>> Supporting incremental updates should be possible, but most of LLVM's
>>>>> File I/O abstractions are based around mmapping a file and writing to it,
>>>>> which doesn't work when you don't know the file size in advance.  So there
>>>>> would be some interesting problems to solve here.
>>>>>
>>>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>> vivien.millet at gmail.com> wrote:
>>>>>
>>>>>> Hi Zachary !
>>>>>> If there a way to easily create a new PDBFileBuilder from an existing
>>>>>> PDBFile or can/should I do the translation myself ?
>>>>>> I would like to start from a builder filled with the EXE PDB data and
>>>>>> then complete its DBI stream with the JIT module/symbols.
>>>>>>
>>>>>> Thanks !
>>>>>>
>>>>>>
>>>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <vivien.millet at gmail.com>
>>>>>> a écrit :
>>>>>>
>>>>>>> Thank you Zachary !
>>>>>>> I will have some soon I think ..
>>>>>>> I first need to explore the llvmpdb-util code more because I don't
>>>>>>> even know where to start with the PDB api..
>>>>>>>
>>>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <zturner at google.com>
>>>>>>> a écrit :
>>>>>>>
>>>>>>>> Sure. Along the way I’m happy to answer any specific questions you
>>>>>>>> might have too even if it’s for your downstream project
>>>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I would be up to improve pdbutil but I doubt I have enough
>>>>>>>>> knowledge or time to provide the complete merge feature, it would still be
>>>>>>>>> a very specific kind of merge as you describe it. Anyway I could start
>>>>>>>>> trying to do it in my jit compiler and then, once I get something working
>>>>>>>>> (if that happens :)), i can come back to you with the piece of code and see
>>>>>>>>> if it is worth integrating it to pdbutil and how ?
>>>>>>>>>
>>>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <zturner at google.com>
>>>>>>>>> a écrit :
>>>>>>>>>
>>>>>>>>>> Well, that’s certainly possible, but improving llvm-pdbutil is
>>>>>>>>>> another possibility. Doing it directly in your jit compiler will probably
>>>>>>>>>> save you time though, since you won’t have to worry about writing tests and
>>>>>>>>>> going through code review
>>>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the tips !
>>>>>>>>>>> When you talk about doing all of this I suppose you think about
>>>>>>>>>>> using llvm/debuginfo/pdb, pick code here and there to generate the pdb in
>>>>>>>>>>> memory, read the executable one and perform the merge directly in my jit
>>>>>>>>>>> compiler, right ? Not using pdbutil ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>>>> So you are one of the happy guys who suffered from the lack of
>>>>>>>>>>>>> PDB format information :)
>>>>>>>>>>>>>
>>>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> To be honest I'm really a beginner in the PDB stuff, I just
>>>>>>>>>>>>> read some llvm documentation to understand what went wrong when merging my
>>>>>>>>>>>>> PDBs.
>>>>>>>>>>>>> In my case, what I do with my team and try to achieve is this :
>>>>>>>>>>>>> - Run our application under a visual studio debugger
>>>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>> - Then, either :
>>>>>>>>>>>>>    - export as COFF obj file with dwarf information and then
>>>>>>>>>>>>> convert it with cv2pdb to obtain a pdb of my JIT symbols (what I do now)
>>>>>>>>>>>>>    - export directly to PDB my JIT debug info (what i would
>>>>>>>>>>>>> like to do, if you have an idea how..)
>>>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>>>> - Merge my JIT pdb into a copy of the executable pdb (where
>>>>>>>>>>>>> things start to go bad..)
>>>>>>>>>>>>> - Replace original executable by the copy (creating a backup
>>>>>>>>>>>>> of original)
>>>>>>>>>>>>> - Reattach  the visual studio debugger to my executable
>>>>>>>>>>>>> (loading the new pdb version)
>>>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>>>> - On each JIT rebuild, restart these steps from the original
>>>>>>>>>>>>> native executable PDB to avoid merge conflict between the multiple JIT
>>>>>>>>>>>>> iterations
>>>>>>>>>>>>>
>>>>>>>>>>>> Yea, it's an interesting use case.  It makes me think it would
>>>>>>>>>>>> be nice if the PDB format supported some way of having a symbol which
>>>>>>>>>>>> simply refers to another PDB file, that way you could re-write that PDB
>>>>>>>>>>>> file at runtime once all your code is jitted, and when the debugger tries
>>>>>>>>>>>> to look up that symbol, it finds a record that tells it to go check the
>>>>>>>>>>>> other PDB file.
>>>>>>>>>>>>
>>>>>>>>>>>> So, here are the things I think you would need to do:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Create a JIT module in the module list with a unique name.
>>>>>>>>>>>> All symbols will go here.  llvm-pdbutil dump -modules shows you the list.
>>>>>>>>>>>> Be careful about putting it at the end though, because there's already one
>>>>>>>>>>>> at the end called * LINKER * that is kind of special.  On the other hand,
>>>>>>>>>>>> you don't want to put it first because it means you will have to do lots of
>>>>>>>>>>>> fixups on the EXE PDB.  It's probably best to add it right before the
>>>>>>>>>>>> linker module, this has the least chance of breaking anything.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) In the debug stream for this module, add all symbols.  You
>>>>>>>>>>>> will need to fix up their type indices.  As you noticed, llvm-pdbutil
>>>>>>>>>>>> already merges type information from the JIT PDB, so after merging the type
>>>>>>>>>>>> indices in the EXE PDB will be different than they were in the JIT PDB, but
>>>>>>>>>>>> the symbol records will refer to the JIT PDB type indices.  So these need
>>>>>>>>>>>> to be fixed up.  LLD already has code to do this, you can probably borrow a
>>>>>>>>>>>> similar algorithm with some slight modifications (lldb/COFF/PDB.cpp, search
>>>>>>>>>>>> for mergeSymbolRecords)
>>>>>>>>>>>>
>>>>>>>>>>>> 3) Merge in the new section contributions and section map.  See
>>>>>>>>>>>> LLD again for how to modify these.  Hopefully the object file you exported
>>>>>>>>>>>> contains relocated symbol addresses so you don't have to do any fixups here.
>>>>>>>>>>>>
>>>>>>>>>>>> 4) Merge in the publics and globals.  This shouldn't be too
>>>>>>>>>>>> hard, I think you can just iterate over them in the JIT PDB and add them to
>>>>>>>>>>>> the new EXE PDB.
>>>>>>>>>>>>
>>>>>>>>>>>> You're kind of in uncharted territory here, so this is just a
>>>>>>>>>>>> rough idea of what needs to be done.  There may be other issues that you
>>>>>>>>>>>> don't encounter until you actually try it out.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I don't personally have the time to work on this,
>>>>>>>>>>>> but it sounds neat, and I'm happy to help if you run into questions or
>>>>>>>>>>>> problems along the way.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/762264f4/attachment.html>