[llvm-dev] [llvm-pdbutil] : merge not working properly

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Wed Jan 23 11:56:29 PST 2019

(BTW, I'm adding llvm-dev back to the list, I didn't notice it got taken
off.  In general I try to keep the list on all emails, even if it's
extremely technical and specific, because someday someone else will try to
do this, and it'll be nice if they can read the whole thread).

I can think of a couple of things that might be wrong:

1) If the string table is in a different order, then anything that refers
to the string table need to be changed to refer to the new offset.  If the
string "foo" is at offset 12 in the old PDB, but offset 15 in the new PDB,
then somewhere there is a record which is going to look at offset 12 and
expect to find something, and that will mess up.  The main place this is
important is in the File Checksums table, there is an entry that says which
file it is a checksum for, and that refers to the string table.  However,
it's possible for certain symbol records to refer to the string table too.
See lld/COFF/PDB.cpp and Ctrl+F for "PDBStrTab" and you will find some
information about this.

2) When you run `llvm-pdbutil dump -streams` on the copied PDB, do all of
them show a reasonable description?  Are there any streams that say (???)?
If so, that's a problem.

> does visual studio will consider a symbol file broken if the address goes
beyond the official module address range (the compiled one), because my JIT
code is allocated after the end of the module with VirtualAlloc
That is a good question, and part of why my job is so difficult, because I
can't look at their code.  But I think the answer is "probably".  The
debugger has to have some way to convert an address in your running process
into a symbol and offset, because that's how all debug info is represented
in the PDB.  So if there is no module, then there is no RVA (because the R
in RVA means relative, and what would it be relative to?).

One idea to test this would be to create a DLL called jitted_code.dll, give
it a huuuuuge .text section (probably just a .asm file and use some
assembly directives to allocate a very large series of null bytes), and
then write your jit code into that area.  This way you would not need to
modify the existing PDB you would only need to make a new PDB called
jitted_code.pdb with 1 module, and those symbols could have meaningful
RVAs.  And you might not even need to detach the debugger if you do things
this way, because you could just right click the jitted_code.dll module in
the modules window and choose Load Symbols.

On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet <vivien.millet at gmail.com>

> Yes this is it, I just make a copy from a pdb generated by link.exe (the
> microsoft one).
> Using llvm-pdbutil to compare is what I do, except I do it with "-all"
> And I get almost everything the same : same number of streams, section map
> looks good,string table looks good (even if not the same order), same
> number of modules with the symbols and subsection practically the same, and
> this is why I get stuck, I miss something but I can't see what because I
> don't know where to look for. Visual studio works with it, I can debug my
> original exe, but probably without the globals...
> And the other problem is that the difference between the dumps is not
> necessarily a bug because the builder may generate new hashes values,
> reorder streams, modules, etc ...
> Right now I gave up to have publics and globals streams and attacked the
> real goal : insert my jit codeview into the pdb. I have again done «
> something » but as I don’t understand how the format work I don’t have it
> working in visual studio.. except once, a single time it worked and the
> breakpoint turned on in the UI (even if the rva was broken for the
> instructions) but it happened a single time .. then I get depressed the
> next times..... cvdump displays it all « correct », no corrupt stuff
> apparently. But what I do is probably wrong somewhere. What I do is I take
> .debug$S and .debug$T as is without relocations just to see, but what I
> don’t know really is : does visual studio will consider a symbol file
> broken if the address goes beyond the official module address range (the
> compiled one), because my JIT code is allocated after the end of the module
> with VirtualAlloc.
> Another thing I don’t get is the section contribution, what is it exactly
> ? I inserted section contrib for all sections except the debug$ ones but I
> don’t know what i’m really doing and it’s my average problem implementing
> this JIT feature...
> I also don’t know what are relocations inside the codeview format, what is
> the difference between RVA and relocation, is there anything to do with
> this related to the codeview part I need to insert in the pdb ? I don’t see
> why visual studio needs more than just RVA<->Line mapping..
> This is really making me crazy being so ignorant and trying to guess what
> visual studio does...
> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner at google.com> a
> écrit :
>> So if i understand correctly, you're basically just trying to implement
>> something like a pdb *copy*, just as a test to see if you can get it to
>> work.  So you generate a PDB with cl/link or clang-cl/lld-link, then try to
>> copy it using your tool, then see if it still works.
>> If this is correct, and it's not working, then there is probably just
>> something you didn't copy.  Neither Publics nor globals actually contain
>> their own data, instead they just refer to records from the corresponding
>> module stream.  So an S_PROCREF for the function "main" might have fields
>> that say "the name of the function is main, and it's at offset 20 of module
>> 1".  So, if there is no module 1, or if offset 20 of module is not actually
>> an S_GPROC32 for the function main, then it will be broken.
>> Did you also go through each module in the source PDB, add a new module
>> in the target PDB, then copy all of the symbols for each one?
>> the best way to find differences is by using llvm-pdbutil on the source
>> and target PDBs and looking for things that look different.  For example,
>> I'd start with llvm-pdbutil dump -streams and then seeing if they even have
>> all the same streams.  If one of them is missing streams, that's a good
>> place to start.  If they have the same streams, then look for ones where
>> the size is different.  Then drill into those to see why the size is
>> different.
>> LMK if that helps.
>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <vivien.millet at gmail.com>
>> wrote:
>>> For now I'm not merging my JIT CodeView section, I only try to build a
>>> pure copy of an existing PDB using the XxxBuilder classes (PDBFileBuilder &
>>> Co / reading a PDBFile) and check if visual studio wants to eat it..
>>> For Publics and Globals, what I do is naive, I use the GsiStreamBuilder
>>> and prey :)
>>>   if (File.hasPDBGlobalsStream() && File.getPDBGlobalsStream()) {
>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>     GlobalsStream &stream = *File.getPDBGlobalsStream();
>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>     for (uint32_t PubSymOff : stream.getGlobalsTable()) {
>>>       CVSymbol Sym = SymbolRecords.readRecord(PubSymOff);
>>>       builder.addGlobalSymbol(Sym);
>>>     }
>>>   }
>>>   if (File.hasPDBPublicsStream() && File.getPDBPublicsStream()) {
>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>     PublicsStream &stream = *File.getPDBPublicsStream();
>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>> 	std::vector<PublicSym32> Publics;
>>>     for (uint32_t PubSymOff : stream.getPublicsTable()) {
>>>       PublicSym32 Pub = cantFail(
>>>           llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>               SymbolRecords.readRecord(PubSymOff)));
>>>       Publics.push_back(Pub);
>>>     }
>>>     if (!Publics.empty()) {
>>>       // Sort the public symbols and add them to the stream.
>>>       std::sort(Publics.begin(), Publics.end(),
>>>            [](const PublicSym32 &L, const PublicSym32 &R) {
>>>              return L.Name < R.Name;
>>>            });
>>>       for (const PublicSym32 &Pub : Publics)
>>>         builder.addPublicSymbol(Pub);
>>>     }
>>>   }
>>> Is it what you meant ?
>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner <zturner at google.com> a
>>> écrit :
>>>> Also, even if symbolGoesInGlobalsStream returns true, you can’t just
>>>> copy it. Functions, for example, which are S_GPROC32 or S_LPROC32 in the
>>>> module stream, are S_PROCREF in the globals stream. Similarly, *everything*
>>>> in the publics stream is S_PUB32. So you need to convert each symbol to the
>>>> proper type for the stream it’s going to go in
>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>> Publics are basically a list of everything that has a mangled name. To
>>>>> be honest, I don’t know what the debugger uses this for.
>>>>> Globals is essentially every symbol in the pdb in one large table. The
>>>>> reason this is important is because if you type “foo” in the watch window,
>>>>> the debugger doesn’t necessarily know what compiland foo comes from. So it
>>>>> has to have a way to find everything in the entire program no matter what
>>>>> compiland it came from. That’s what the globals are.
>>>>> Both publics and globals are hash tables, so one possible reason there
>>>>> might be a problem is that you need to rehash the entire table. When you
>>>>> build your modified pdb, I would suggest starting with an empty publics /
>>>>> globals stream, adding all items from the first pdb by iterating over those
>>>>> records and using a GlobalsStreamBuilder, then adding all your jitted items
>>>>> separately, then writing it out. That should make sure it gets hashed
>>>>> correctly.
>>>>> Are you doing that?
>>>>> Btw, not all symbols belong in the globals / publics stream. Check the
>>>>> code in lld and search for symbolGoesInGlobalsStream and
>>>>> symbolGoesInPublicsStream to see the logic it uses
>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien Millet <vivien.millet at gmail.com>
>>>>> wrote:
>>>>>> Hi Zachary, sorry for disturbing again..
>>>>>> I've fixed some problems (StringTable, SectionMap and few things here
>>>>>> and there..) and my converted PDB seems now to work inside visual studio..
>>>>>> But I'm not sure if I have full debug features because I don't
>>>>>> succeed to translate Publics and Globals correctly. CVDump says PDB is
>>>>>> corrupted whereas PDBUTIL -dump correctly displays them.
>>>>>> I don't really understand what Publics and Globals stream really are,
>>>>>> if the symbols are really in the corresponding streams or if they are just
>>>>>> references to somewhere else.
>>>>>> The LLVM documentation is not complete about these two Publics and
>>>>>> Globals stream so I'm a bit lost on how to handle them or find what is
>>>>>> "corrupted" according to CVDump.
>>>>>> I took example on LLD and yaml2pdb to help me to do some tough
>>>>>> conversions but I noticed that in yaml2pdb there is no GsiStream exported
>>>>>> (no GsiBuidler use and no reference to Publics or Globals anywhere), is it
>>>>>> wanted/correct ?
>>>>>> Thanks and sorry If I'm a bit spaming, it's my 99% time task right
>>>>>> now and being stuck without any clue is difficult :) But I guess you
>>>>>> experienced even more suffering when documentation didn't exist at all !
>>>>>> Have a good day !
>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien Millet <vivien.millet at gmail.com>
>>>>>> a écrit :
>>>>>>> ERRATUM, my bad, the pdb I tested is also corrupted according to
>>>>>>> cvdump.exe, I on't know why, I regenerated again and now I have a working
>>>>>>> dump. You don't need to fix anything.
>>>>>>> Le dim. 20 janv. 2019 à 20:26, Vivien Millet <
>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>> Hi Zachary,
>>>>>>>> I've done a first step to rewrite  existing PDBFile with
>>>>>>>> PDBFileBuilder, I get mostly of the work done but I don't get as much
>>>>>>>> output as input (some streams are not mirrored for unknown reasons and some
>>>>>>>> data must be missing here and there...).
>>>>>>>> When I try to replace the original by the rebuilt one for
>>>>>>>> debugging, the pdb loads well but breakpoints failed to activate with a
>>>>>>>> "unexpected symbol reader error while processing foobar.exe". You probably
>>>>>>>> know what it means or already encountered this error I guess.
>>>>>>>> I also tried to create a minimal program to simplify comparisons
>>>>>>>> between original and new PDB but I get an error dumping the original  pdb
>>>>>>>> exported by visual studio  with -all (PublicsStream.cpp|98). I think it is
>>>>>>>> a bug.
>>>>>>>> I've attached the related main.cpp and PDB to this email if you
>>>>>>>> want to check what is the error exactly (vs2017, x86 and x64 have same
>>>>>>>> issues).
>>>>>>>> I've attached also my code (git diff). I added an « identity »
>>>>>>>> feature to pdbutil which uses the code I wrote to regenerate the input pdb.
>>>>>>>> You can use it to see what I get so far..
>>>>>>>> I’ve seen you added recently a fix related to FPO but you say it’s
>>>>>>>> only for x86 so I don’t think it would change something but who knows..
>>>>>>>> Anyway, if you have a moment to check my work so far and give me
>>>>>>>> feedbacks it’s welcome because I get out of ideas about what goes wrong..
>>>>>>>> Thanks, I go back digging into the pdb mysteries !
>>>>>>>> Le ven. 18 janv. 2019 à 12:31, Vivien Millet <
>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>> Ok ! It was just to be sure I understood well.
>>>>>>>>> Sorry for not replying directly, I wanted to try first to emit
>>>>>>>>> CodeView before continuing the discussion and it was time for me to go to
>>>>>>>>> bed here..
>>>>>>>>> I just tried it now and it is very easy to switch to CodeView. For
>>>>>>>>> the ones interested : you just have to give your TargetTriple to your
>>>>>>>>> llvm::Module used for JIT and then call
>>>>>>>>> module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell the
>>>>>>>>> AsmPrinter this module prefer CodeView instead of Dwarf.
>>>>>>>>> I've checked the content of my .obj file, and there is valid
>>>>>>>>> .debug$T and  .debug$S sections, so everything goes well until now.
>>>>>>>>> Now as a parallel task I will try to read the EXE PDB and
>>>>>>>>> re-export it "as it" to see if I break something in visual studio.
>>>>>>>>> If I succeed to do that, that might be added as a feature to
>>>>>>>>> PDBFile or PDBFileBuilder to simplify the process for other users.
>>>>>>>>> I keep you in touch.
>>>>>>>>> Thanks
>>>>>>>>> Le jeu. 17 janv. 2019 à 20:50, Zachary Turner <zturner at google.com>
>>>>>>>>> a écrit :
>>>>>>>>>> When I say "nothing to do" I just mean that you won't have to do
>>>>>>>>>> anything to convert the record from one format (DWARF) to another format
>>>>>>>>>> (CodeView).  You will have a COFF object file either on disk (probably
>>>>>>>>>> named foo.obj or something) or in memory.  And this object file will have a
>>>>>>>>>> .debug$S section with CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>> types.  Then you will still need to use the PDBFileBuilder to add these
>>>>>>>>>> records to the final PDB, but they will already be in the correct format
>>>>>>>>>> that PDBFileBuilder expects, you won't need to convert them from DWARF
>>>>>>>>>> (which is not trivial).
>>>>>>>>>> On Thu, Jan 17, 2019 at 11:26 AM Vivien Millet <
>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>> That’s a good question, by default when emitting the object file
>>>>>>>>>>> I choose COFF but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>> probably is a way to do it or at least it must be implemented if not yet..
>>>>>>>>>>> Lets imagine I manage to do that.. when you say there is nothing
>>>>>>>>>>> to do, I still must have a PDBFileBuilder to copy the codeview data inside
>>>>>>>>>>> the EXE PDB right ? I cannot insert them easily in the EXE PDB with another
>>>>>>>>>>> way ?
>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:01, Zachary Turner <
>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>> Well, is it possible to just hook up the CodeView debug info
>>>>>>>>>>>> generator to MCJIT?  If you're not jitting, and you just compile something,
>>>>>>>>>>>> we translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>> CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>> If it's not hard to do, this would probably be a better solution, because
>>>>>>>>>>>> you don't have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>> is not always trivial.
>>>>>>>>>>>> If you can configure this in MCJIT, you won't even need to do
>>>>>>>>>>>> anything, you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>> .debug$S sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>> while copying them to the output PDB file.
>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>> Ok I understand more what you meant. In fact I don’t care
>>>>>>>>>>>>> about the pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>> me to have duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>> is not to generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>> extract debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>> is emitted by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>> PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>> I think that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>> MCJit if not being an extension to pdbutil.
>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>> Well, for example the TPI stream is just one big collection
>>>>>>>>>>>>>> of types.  Presumably your JIT code will reuse some of the same types
>>>>>>>>>>>>>> (perhaps, std::string for example) as your non-jitted code.  Your jitted
>>>>>>>>>>>>>> symbol records in the object file (for example, a local variable of type
>>>>>>>>>>>>>> std::string in your jitted code) will refer to the type for std;:string by
>>>>>>>>>>>>>> a TypeIndex, and your original PDB will also refer to std::string by a
>>>>>>>>>>>>>> different TypeIndex.
>>>>>>>>>>>>>> In LLD, when we merge in types and symbols from each object
>>>>>>>>>>>>>> file, we keep a hash table of which types have already been seen, so that
>>>>>>>>>>>>>> if we see the same type again, we can just use the TypeIndex that we wrote
>>>>>>>>>>>>>> on a previous object file.  Then, when we add symbol records, we have to
>>>>>>>>>>>>>> update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>> De-duplicating though, I suppose, is not strictly necessary,
>>>>>>>>>>>>>> it will just keep your PDB size down.  But you *will* need to at least
>>>>>>>>>>>>>> re-write the TypeIndexes from the jitted code.  For example, you may decide
>>>>>>>>>>>>>> that instead of de-duplicating, you just append them all to the end of the
>>>>>>>>>>>>>> TPI stream (where all the types go in PDB) to keep things simple.  Since
>>>>>>>>>>>>>> they were in a different position before, they now have different
>>>>>>>>>>>>>> TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>> correct after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>> you will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>> the symbols of the jitted code.
>>>>>>>>>>>>>> Let me know if that makes sense.
>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>> Ok I see..
>>>>>>>>>>>>>>> what do you mean by “making sure to de-duplicate records as
>>>>>>>>>>>>>>> necessary” ?
>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>> It's possible in theory to support incremental updates to a
>>>>>>>>>>>>>>>> PDB (the file format is designed specifically with that in mind).  But this
>>>>>>>>>>>>>>>> functionality was never added to the PDB library since lld doesn't support
>>>>>>>>>>>>>>>> incremental linking, we never really needed it.
>>>>>>>>>>>>>>>> The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>> build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>> de-duplicate records as necessary).
>>>>>>>>>>>>>>>> Supporting incremental updates should be possible, but most
>>>>>>>>>>>>>>>> of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>> writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>> advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi Zachary !
>>>>>>>>>>>>>>>>> If there a way to easily create a new PDBFileBuilder from
>>>>>>>>>>>>>>>>> an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>> I would like to start from a builder filled with the EXE
>>>>>>>>>>>>>>>>> PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>> Thank you Zachary !
>>>>>>>>>>>>>>>>>> I will have some soon I think ..
>>>>>>>>>>>>>>>>>> I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>> because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>> Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>> questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>> enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>> still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>> start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>> working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>> and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>> Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>> will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>> writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>> When you talk about doing all of this I suppose you
>>>>>>>>>>>>>>>>>>>>>> think about using llvm/debuginfo/pdb, pick code here and there to generate
>>>>>>>>>>>>>>>>>>>>>> the pdb in memory, read the executable one and perform the merge directly
>>>>>>>>>>>>>>>>>>>>>> in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>> So you are one of the happy guys who suffered from
>>>>>>>>>>>>>>>>>>>>>>>> the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>> To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>> stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>> when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>> In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>> achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>> - Run our application under a visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>> - Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>    - export as COFF obj file with dwarf information
>>>>>>>>>>>>>>>>>>>>>>>> and then convert it with cv2pdb to obtain a pdb of my JIT symbols (what I
>>>>>>>>>>>>>>>>>>>>>>>> do now)
>>>>>>>>>>>>>>>>>>>>>>>>    - export directly to PDB my JIT debug info (what
>>>>>>>>>>>>>>>>>>>>>>>> i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>> - Merge my JIT pdb into a copy of the executable
>>>>>>>>>>>>>>>>>>>>>>>> pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>> - Replace original executable by the copy (creating
>>>>>>>>>>>>>>>>>>>>>>>> a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>> - Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>> executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>> - On each JIT rebuild, restart these steps from the
>>>>>>>>>>>>>>>>>>>>>>>> original native executable PDB to avoid merge conflict between the multiple
>>>>>>>>>>>>>>>>>>>>>>>> JIT iterations
>>>>>>>>>>>>>>>>>>>>>>> Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>> think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>> symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>> that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>> debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>> go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>> So, here are the things I think you would need to do:
>>>>>>>>>>>>>>>>>>>>>>> 1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>> unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>> you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>> there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>> On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>> have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>> right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>>>>>>>>>> 2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>> symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>> merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>> the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>> indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>> you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>> (lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>> 3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>> section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>> file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>> do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>> 4) Merge in the publics and globals.  This shouldn't
>>>>>>>>>>>>>>>>>>>>>>> be too hard, I think you can just iterate over them in the JIT PDB and add
>>>>>>>>>>>>>>>>>>>>>>> them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>> You're kind of in uncharted territory here, so this
>>>>>>>>>>>>>>>>>>>>>>> is just a rough idea of what needs to be done.  There may be other issues
>>>>>>>>>>>>>>>>>>>>>>> that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>> Unfortunately I don't personally have the time to
>>>>>>>>>>>>>>>>>>>>>>> work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>> questions or problems along the way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190123/581b9414/attachment.html>

More information about the llvm-dev mailing list