[llvm-dev] [llvm-pdbutil] : merge not working properly

Vivien Millet via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 18 03:31:54 PST 2019

Ok ! It was just to be sure I understood well.
Sorry for not replying directly, I wanted to try first to emit CodeView
before continuing the discussion and it was time for me to go to bed here..
I just tried it now and it is very easy to switch to CodeView. For the ones
interested : you just have to give your TargetTriple to your llvm::Module
used for JIT and then call module->addModuleFlag(llvm::Module::Warning,
"CodeView", 1) to tell the AsmPrinter this module prefer CodeView instead
of Dwarf.
I've checked the content of my .obj file, and there is valid  .debug$T and
.debug$S sections, so everything goes well until now.
Now as a parallel task I will try to read the EXE PDB and re-export it "as
it" to see if I break something in visual studio.
If I succeed to do that, that might be added as a feature to PDBFile or
PDBFileBuilder to simplify the process for other users.
I keep you in touch.

Le jeu. 17 janv. 2019 à 20:50, Zachary Turner <zturner at google.com> a écrit :

> When I say "nothing to do" I just mean that you won't have to do anything
> to convert the record from one format (DWARF) to another format
> (CodeView).  You will have a COFF object file either on disk (probably
> named foo.obj or something) or in memory.  And this object file will have a
> .debug$S section with CodeView symbols and a .debug$T section with CodeView
> types.  Then you will still need to use the PDBFileBuilder to add these
> records to the final PDB, but they will already be in the correct format
> that PDBFileBuilder expects, you won't need to convert them from DWARF
> (which is not trivial).
> On Thu, Jan 17, 2019 at 11:26 AM Vivien Millet <vivien.millet at gmail.com>
> wrote:
>> That’s a good question, by default when emitting the object file I choose
>> COFF but it embeds dwarf and not codeview in the end.. there probably is a
>> way to do it or at least it must be implemented if not yet..
>> Lets imagine I manage to do that.. when you say there is nothing to do, I
>> still must have a PDBFileBuilder to copy the codeview data inside the EXE
>> PDB right ? I cannot insert them easily in the EXE PDB with another way ?
>> Le jeu. 17 janv. 2019 à 20:01, Zachary Turner <zturner at google.com> a
>> écrit :
>>> Well, is it possible to just hook up the CodeView debug info generator
>>> to MCJIT?  If you're not jitting, and you just compile something, we
>>> translate all of the LLVM metadata into CodeView in the file
>>> CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>> If it's not hard to do, this would probably be a better solution, because
>>> you don't have to worry about *how* to translate DWARF into CodeView, which
>>> is not always trivial.
>>> If you can configure this in MCJIT, you won't even need to do anything,
>>> you can just open the ObjectFile, look for the .debug$T and .debug$S
>>> sections, iterate over each one and re-write their TypeIndices while
>>> copying them to the output PDB file.
>>> On Thu, Jan 17, 2019 at 10:52 AM Vivien Millet <vivien.millet at gmail.com>
>>> wrote:
>>>> Ok I understand more what you meant. In fact I don’t care about the pdb
>>>> size, at least as a first step, so it won’t be a problem for me to have
>>>> duplicated symbols. Concerning TypeIndices my plan if possible is not to
>>>> generate a pdb for my jit and merge it, but instead directly extract debug
>>>> info from a DwarfContext just after llvm::object::ObjectFile is emitted by
>>>> the JIT engine and complete the EXE PDB I had rebuilt with PDBFileBuilder.
>>>> Does it sounds a good bet to you ? If I succeed doing that I think that
>>>> could be a good extension to the debugging possibilities of MCJit if not
>>>> being an extension to pdbutil.
>>>> Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <zturner at google.com> a
>>>> écrit :
>>>>> Well, for example the TPI stream is just one big collection of types.
>>>>> Presumably your JIT code will reuse some of the same types (perhaps,
>>>>> std::string for example) as your non-jitted code.  Your jitted symbol
>>>>> records in the object file (for example, a local variable of type
>>>>> std::string in your jitted code) will refer to the type for std;:string by
>>>>> a TypeIndex, and your original PDB will also refer to std::string by a
>>>>> different TypeIndex.
>>>>> In LLD, when we merge in types and symbols from each object file, we
>>>>> keep a hash table of which types have already been seen, so that if we see
>>>>> the same type again, we can just use the TypeIndex that we wrote on a
>>>>> previous object file.  Then, when we add symbol records, we have to update
>>>>> its fields that used the old TypeIndex to use the new TypeIndex instead.
>>>>> De-duplicating though, I suppose, is not strictly necessary, it will
>>>>> just keep your PDB size down.  But you *will* need to at least re-write the
>>>>> TypeIndexes from the jitted code.  For example, you may decide that instead
>>>>> of de-duplicating, you just append them all to the end of the TPI stream
>>>>> (where all the types go in PDB) to keep things simple.  Since they were in
>>>>> a different position before, they now have different TypeIndices.  So you
>>>>> will need to re-write all TypeIndices so that they are correct after the
>>>>> merge.   Both types and symbols can refer to types, so you will need to do
>>>>> this both for the types of the jitted code as well as the symbols of the
>>>>> jitted code.
>>>>> Let me know if that makes sense.
>>>>> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>> vivien.millet at gmail.com> wrote:
>>>>>> Ok I see..
>>>>>> what do you mean by “making sure to de-duplicate records as
>>>>>> necessary” ?
>>>>>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <zturner at google.com> a
>>>>>> écrit :
>>>>>>> It's possible in theory to support incremental updates to a PDB (the
>>>>>>> file format is designed specifically with that in mind).  But this
>>>>>>> functionality was never added to the PDB library since lld doesn't support
>>>>>>> incremental linking, we never really needed it.
>>>>>>> The "dumb" way would be to just create a new PDB file, build it
>>>>>>> using the old contents and the new contents (making sure to de-duplicate
>>>>>>> records as necessary).
>>>>>>> Supporting incremental updates should be possible, but most of
>>>>>>> LLVM's File I/O abstractions are based around mmapping a file and writing
>>>>>>> to it, which doesn't work when you don't know the file size in advance.  So
>>>>>>> there would be some interesting problems to solve here.
>>>>>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>> Hi Zachary !
>>>>>>>> If there a way to easily create a new PDBFileBuilder from an
>>>>>>>> existing PDBFile or can/should I do the translation myself ?
>>>>>>>> I would like to start from a builder filled with the EXE PDB data
>>>>>>>> and then complete its DBI stream with the JIT module/symbols.
>>>>>>>> Thanks !
>>>>>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>> Thank you Zachary !
>>>>>>>>> I will have some soon I think ..
>>>>>>>>> I first need to explore the llvmpdb-util code more because I don't
>>>>>>>>> even know where to start with the PDB api..
>>>>>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <zturner at google.com>
>>>>>>>>> a écrit :
>>>>>>>>>> Sure. Along the way I’m happy to answer any specific questions
>>>>>>>>>> you might have too even if it’s for your downstream project
>>>>>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>> I would be up to improve pdbutil but I doubt I have enough
>>>>>>>>>>> knowledge or time to provide the complete merge feature, it would still be
>>>>>>>>>>> a very specific kind of merge as you describe it. Anyway I could start
>>>>>>>>>>> trying to do it in my jit compiler and then, once I get something working
>>>>>>>>>>> (if that happens :)), i can come back to you with the piece of code and see
>>>>>>>>>>> if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>> Well, that’s certainly possible, but improving llvm-pdbutil is
>>>>>>>>>>>> another possibility. Doing it directly in your jit compiler will probably
>>>>>>>>>>>> save you time though, since you won’t have to worry about writing tests and
>>>>>>>>>>>> going through code review
>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>> Thanks for the tips !
>>>>>>>>>>>>> When you talk about doing all of this I suppose you think
>>>>>>>>>>>>> about using llvm/debuginfo/pdb, pick code here and there to generate the
>>>>>>>>>>>>> pdb in memory, read the executable one and perform the merge directly in my
>>>>>>>>>>>>> jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>>>>>> So you are one of the happy guys who suffered from the lack
>>>>>>>>>>>>>>> of PDB format information :)
>>>>>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>>>>>> To be honest I'm really a beginner in the PDB stuff, I just
>>>>>>>>>>>>>>> read some llvm documentation to understand what went wrong when merging my
>>>>>>>>>>>>>>> PDBs.
>>>>>>>>>>>>>>> In my case, what I do with my team and try to achieve is
>>>>>>>>>>>>>>> this :
>>>>>>>>>>>>>>> - Run our application under a visual studio debugger
>>>>>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>> - Then, either :
>>>>>>>>>>>>>>>    - export as COFF obj file with dwarf information and then
>>>>>>>>>>>>>>> convert it with cv2pdb to obtain a pdb of my JIT symbols (what I do now)
>>>>>>>>>>>>>>>    - export directly to PDB my JIT debug info (what i would
>>>>>>>>>>>>>>> like to do, if you have an idea how..)
>>>>>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>>>>>> - Merge my JIT pdb into a copy of the executable pdb (where
>>>>>>>>>>>>>>> things start to go bad..)
>>>>>>>>>>>>>>> - Replace original executable by the copy (creating a backup
>>>>>>>>>>>>>>> of original)
>>>>>>>>>>>>>>> - Reattach  the visual studio debugger to my executable
>>>>>>>>>>>>>>> (loading the new pdb version)
>>>>>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>>>>>> - On each JIT rebuild, restart these steps from the original
>>>>>>>>>>>>>>> native executable PDB to avoid merge conflict between the multiple JIT
>>>>>>>>>>>>>>> iterations
>>>>>>>>>>>>>> Yea, it's an interesting use case.  It makes me think it
>>>>>>>>>>>>>> would be nice if the PDB format supported some way of having a symbol which
>>>>>>>>>>>>>> simply refers to another PDB file, that way you could re-write that PDB
>>>>>>>>>>>>>> file at runtime once all your code is jitted, and when the debugger tries
>>>>>>>>>>>>>> to look up that symbol, it finds a record that tells it to go check the
>>>>>>>>>>>>>> other PDB file.
>>>>>>>>>>>>>> So, here are the things I think you would need to do:
>>>>>>>>>>>>>> 1) Create a JIT module in the module list with a unique
>>>>>>>>>>>>>> name.  All symbols will go here.  llvm-pdbutil dump -modules shows you the
>>>>>>>>>>>>>> list.  Be careful about putting it at the end though, because there's
>>>>>>>>>>>>>> already one at the end called * LINKER * that is kind of special.  On the
>>>>>>>>>>>>>> other hand, you don't want to put it first because it means you will have
>>>>>>>>>>>>>> to do lots of fixups on the EXE PDB.  It's probably best to add it right
>>>>>>>>>>>>>> before the linker module, this has the least chance of breaking anything.
>>>>>>>>>>>>>> 2) In the debug stream for this module, add all symbols.  You
>>>>>>>>>>>>>> will need to fix up their type indices.  As you noticed, llvm-pdbutil
>>>>>>>>>>>>>> already merges type information from the JIT PDB, so after merging the type
>>>>>>>>>>>>>> indices in the EXE PDB will be different than they were in the JIT PDB, but
>>>>>>>>>>>>>> the symbol records will refer to the JIT PDB type indices.  So these need
>>>>>>>>>>>>>> to be fixed up.  LLD already has code to do this, you can probably borrow a
>>>>>>>>>>>>>> similar algorithm with some slight modifications (lldb/COFF/PDB.cpp, search
>>>>>>>>>>>>>> for mergeSymbolRecords)
>>>>>>>>>>>>>> 3) Merge in the new section contributions and section map.
>>>>>>>>>>>>>> See LLD again for how to modify these.  Hopefully the object file you
>>>>>>>>>>>>>> exported contains relocated symbol addresses so you don't have to do any
>>>>>>>>>>>>>> fixups here.
>>>>>>>>>>>>>> 4) Merge in the publics and globals.  This shouldn't be too
>>>>>>>>>>>>>> hard, I think you can just iterate over them in the JIT PDB and add them to
>>>>>>>>>>>>>> the new EXE PDB.
>>>>>>>>>>>>>> You're kind of in uncharted territory here, so this is just a
>>>>>>>>>>>>>> rough idea of what needs to be done.  There may be other issues that you
>>>>>>>>>>>>>> don't encounter until you actually try it out.
>>>>>>>>>>>>>> Unfortunately I don't personally have the time to work on
>>>>>>>>>>>>>> this, but it sounds neat, and I'm happy to help if you run into questions
>>>>>>>>>>>>>> or problems along the way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190118/46218ee2/attachment.html>

More information about the llvm-dev mailing list