[llvm-dev] [llvm-pdbutil] : merge not working properly

Vivien Millet via llvm-dev llvm-dev at lists.llvm.org
Wed Jan 23 12:42:21 PST 2019


(Yes you are right this is my fault)
Considering the string table, it only seems to contains file relative
informations in every pdb I am using, and it looks correct but I will check
it.
I looked at the pdb.cpp code about checksums and tables, I copied some
stuff and got things wrong according to cvdump, then I simplified the
process of copying the table and it worked (in cvdump it finds the file
matching line etc...) so I suspect this is also correct.

All the streams look good, but I will check deeper !

It seems right what you say about rva and modules, this is what I m afraid
of, doing all of this for nothing or almost..

Your idea looks good concerning the .text section in a separated dll, but
will it be executable memory ? .text is where static strings go right ?
When you say putting my jit in there, do you mean writing it when the
jitted_code.dll is loaded in memory or on the .dll file directly before
loading it ? In the first scenario I wonder if the section will be
executable, in the second scenario I can’t do it because it would require
perfect linking with the other code my jit points to..

Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at google.com> a écrit :

> (BTW, I'm adding llvm-dev back to the list, I didn't notice it got taken
> off.  In general I try to keep the list on all emails, even if it's
> extremely technical and specific, because someday someone else will try to
> do this, and it'll be nice if they can read the whole thread).
>
> I can think of a couple of things that might be wrong:
>
> 1) If the string table is in a different order, then anything that refers
> to the string table need to be changed to refer to the new offset.  If the
> string "foo" is at offset 12 in the old PDB, but offset 15 in the new PDB,
> then somewhere there is a record which is going to look at offset 12 and
> expect to find something, and that will mess up.  The main place this is
> important is in the File Checksums table, there is an entry that says which
> file it is a checksum for, and that refers to the string table.  However,
> it's possible for certain symbol records to refer to the string table too.
> See lld/COFF/PDB.cpp and Ctrl+F for "PDBStrTab" and you will find some
> information about this.
>
> 2) When you run `llvm-pdbutil dump -streams` on the copied PDB, do all of
> them show a reasonable description?  Are there any streams that say (???)?
> If so, that's a problem.
>
> > does visual studio will consider a symbol file broken if the address
> goes beyond the official module address range (the compiled one), because
> my JIT code is allocated after the end of the module with VirtualAlloc
> That is a good question, and part of why my job is so difficult, because I
> can't look at their code.  But I think the answer is "probably".  The
> debugger has to have some way to convert an address in your running process
> into a symbol and offset, because that's how all debug info is represented
> in the PDB.  So if there is no module, then there is no RVA (because the R
> in RVA means relative, and what would it be relative to?).
>
> One idea to test this would be to create a DLL called jitted_code.dll,
> give it a huuuuuge .text section (probably just a .asm file and use some
> assembly directives to allocate a very large series of null bytes), and
> then write your jit code into that area.  This way you would not need to
> modify the existing PDB you would only need to make a new PDB called
> jitted_code.pdb with 1 module, and those symbols could have meaningful
> RVAs.  And you might not even need to detach the debugger if you do things
> this way, because you could just right click the jitted_code.dll module in
> the modules window and choose Load Symbols.
>
>
>
> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet <vivien.millet at gmail.com>
> wrote:
>
>> Yes this is it, I just make a copy from a pdb generated by link.exe (the
>> microsoft one).
>> Using llvm-pdbutil to compare is what I do, except I do it with "-all"
>> And I get almost everything the same : same number of streams, section
>> map looks good,string table looks good (even if not the same order), same
>> number of modules with the symbols and subsection practically the same, and
>> this is why I get stuck, I miss something but I can't see what because I
>> don't know where to look for. Visual studio works with it, I can debug my
>> original exe, but probably without the globals...
>> And the other problem is that the difference between the dumps is not
>> necessarily a bug because the builder may generate new hashes values,
>> reorder streams, modules, etc ...
>>
>> Right now I gave up to have publics and globals streams and attacked the
>> real goal : insert my jit codeview into the pdb. I have again done «
>> something » but as I don’t understand how the format work I don’t have it
>> working in visual studio.. except once, a single time it worked and the
>> breakpoint turned on in the UI (even if the rva was broken for the
>> instructions) but it happened a single time .. then I get depressed the
>> next times..... cvdump displays it all « correct », no corrupt stuff
>> apparently. But what I do is probably wrong somewhere. What I do is I take
>> .debug$S and .debug$T as is without relocations just to see, but what I
>> don’t know really is : does visual studio will consider a symbol file
>> broken if the address goes beyond the official module address range (the
>> compiled one), because my JIT code is allocated after the end of the module
>> with VirtualAlloc.
>> Another thing I don’t get is the section contribution, what is it exactly
>> ? I inserted section contrib for all sections except the debug$ ones but I
>> don’t know what i’m really doing and it’s my average problem implementing
>> this JIT feature...
>> I also don’t know what are relocations inside the codeview format, what
>> is the difference between RVA and relocation, is there anything to do with
>> this related to the codeview part I need to insert in the pdb ? I don’t see
>> why visual studio needs more than just RVA<->Line mapping..
>> This is really making me crazy being so ignorant and trying to guess what
>> visual studio does...
>>
>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner at google.com> a
>> écrit :
>>
>>> So if i understand correctly, you're basically just trying to implement
>>> something like a pdb *copy*, just as a test to see if you can get it to
>>> work.  So you generate a PDB with cl/link or clang-cl/lld-link, then try to
>>> copy it using your tool, then see if it still works.
>>>
>>> If this is correct, and it's not working, then there is probably just
>>> something you didn't copy.  Neither Publics nor globals actually contain
>>> their own data, instead they just refer to records from the corresponding
>>> module stream.  So an S_PROCREF for the function "main" might have fields
>>> that say "the name of the function is main, and it's at offset 20 of module
>>> 1".  So, if there is no module 1, or if offset 20 of module is not actually
>>> an S_GPROC32 for the function main, then it will be broken.
>>>
>>> Did you also go through each module in the source PDB, add a new module
>>> in the target PDB, then copy all of the symbols for each one?
>>>
>>> the best way to find differences is by using llvm-pdbutil on the source
>>> and target PDBs and looking for things that look different.  For example,
>>> I'd start with llvm-pdbutil dump -streams and then seeing if they even have
>>> all the same streams.  If one of them is missing streams, that's a good
>>> place to start.  If they have the same streams, then look for ones where
>>> the size is different.  Then drill into those to see why the size is
>>> different.
>>>
>>> LMK if that helps.
>>>
>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <vivien.millet at gmail.com>
>>> wrote:
>>>
>>>> For now I'm not merging my JIT CodeView section, I only try to build a
>>>> pure copy of an existing PDB using the XxxBuilder classes (PDBFileBuilder &
>>>> Co / reading a PDBFile) and check if visual studio wants to eat it..
>>>> For Publics and Globals, what I do is naive, I use the GsiStreamBuilder
>>>> and prey :)
>>>>
>>>>
>>>>
>>>>   if (File.hasPDBGlobalsStream() && File.getPDBGlobalsStream()) {
>>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>>     GlobalsStream &stream = *File.getPDBGlobalsStream();
>>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>>
>>>>     for (uint32_t PubSymOff : stream.getGlobalsTable()) {
>>>>       CVSymbol Sym = SymbolRecords.readRecord(PubSymOff);
>>>>       builder.addGlobalSymbol(Sym);
>>>>     }
>>>>   }
>>>>   if (File.hasPDBPublicsStream() && File.getPDBPublicsStream()) {
>>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>>     PublicsStream &stream = *File.getPDBPublicsStream();
>>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>>
>>>> 	std::vector<PublicSym32> Publics;
>>>>
>>>>     for (uint32_t PubSymOff : stream.getPublicsTable()) {
>>>>       PublicSym32 Pub = cantFail(
>>>>           llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>               SymbolRecords.readRecord(PubSymOff)));
>>>>       Publics.push_back(Pub);
>>>>     }
>>>>
>>>>     if (!Publics.empty()) {
>>>>       // Sort the public symbols and add them to the stream.
>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>            [](const PublicSym32 &L, const PublicSym32 &R) {
>>>>              return L.Name < R.Name;
>>>>            });
>>>>       for (const PublicSym32 &Pub : Publics)
>>>>         builder.addPublicSymbol(Pub);
>>>>     }
>>>>
>>>>   }
>>>>
>>>> Is it what you meant ?
>>>>
>>>>
>>>>
>>>>
>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner <zturner at google.com> a
>>>> écrit :
>>>>
>>>>> Also, even if symbolGoesInGlobalsStream returns true, you can’t just
>>>>> copy it. Functions, for example, which are S_GPROC32 or S_LPROC32 in the
>>>>> module stream, are S_PROCREF in the globals stream. Similarly, *everything*
>>>>> in the publics stream is S_PUB32. So you need to convert each symbol to the
>>>>> proper type for the stream it’s going to go in
>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner <zturner at google.com>
>>>>> wrote:
>>>>>
>>>>>> Publics are basically a list of everything that has a mangled name.
>>>>>> To be honest, I don’t know what the debugger uses this for.
>>>>>>
>>>>>> Globals is essentially every symbol in the pdb in one large table.
>>>>>> The reason this is important is because if you type “foo” in the watch
>>>>>> window, the debugger doesn’t necessarily know what compiland foo comes
>>>>>> from. So it has to have a way to find everything in the entire program no
>>>>>> matter what compiland it came from. That’s what the globals are.
>>>>>>
>>>>>> Both publics and globals are hash tables, so one possible reason
>>>>>> there might be a problem is that you need to rehash the entire table. When
>>>>>> you build your modified pdb, I would suggest starting with an empty publics
>>>>>> / globals stream, adding all items from the first pdb by iterating over
>>>>>> those records and using a GlobalsStreamBuilder, then adding all your jitted
>>>>>> items separately, then writing it out. That should make sure it gets hashed
>>>>>> correctly.
>>>>>>
>>>>>> Are you doing that?
>>>>>>
>>>>>> Btw, not all symbols belong in the globals / publics stream. Check
>>>>>> the code in lld and search for symbolGoesInGlobalsStream and
>>>>>> symbolGoesInPublicsStream to see the logic it uses
>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien Millet <
>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Zachary, sorry for disturbing again..
>>>>>>>
>>>>>>> I've fixed some problems (StringTable, SectionMap and few things
>>>>>>> here and there..) and my converted PDB seems now to work inside visual
>>>>>>> studio..
>>>>>>> But I'm not sure if I have full debug features because I don't
>>>>>>> succeed to translate Publics and Globals correctly. CVDump says PDB is
>>>>>>> corrupted whereas PDBUTIL -dump correctly displays them.
>>>>>>> I don't really understand what Publics and Globals stream really
>>>>>>> are, if the symbols are really in the corresponding streams or if they are
>>>>>>> just references to somewhere else.
>>>>>>> The LLVM documentation is not complete about these two Publics and
>>>>>>> Globals stream so I'm a bit lost on how to handle them or find what is
>>>>>>> "corrupted" according to CVDump.
>>>>>>> I took example on LLD and yaml2pdb to help me to do some tough
>>>>>>> conversions but I noticed that in yaml2pdb there is no GsiStream exported
>>>>>>> (no GsiBuidler use and no reference to Publics or Globals anywhere), is it
>>>>>>> wanted/correct ?
>>>>>>> Thanks and sorry If I'm a bit spaming, it's my 99% time task right
>>>>>>> now and being stuck without any clue is difficult :) But I guess you
>>>>>>> experienced even more suffering when documentation didn't exist at all !
>>>>>>> Have a good day !
>>>>>>>
>>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien Millet <
>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>
>>>>>>>> ERRATUM, my bad, the pdb I tested is also corrupted according to
>>>>>>>> cvdump.exe, I on't know why, I regenerated again and now I have a working
>>>>>>>> dump. You don't need to fix anything.
>>>>>>>>
>>>>>>>> Le dim. 20 janv. 2019 à 20:26, Vivien Millet <
>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>
>>>>>>>>> Hi Zachary,
>>>>>>>>> I've done a first step to rewrite  existing PDBFile with
>>>>>>>>> PDBFileBuilder, I get mostly of the work done but I don't get as much
>>>>>>>>> output as input (some streams are not mirrored for unknown reasons and some
>>>>>>>>> data must be missing here and there...).
>>>>>>>>> When I try to replace the original by the rebuilt one for
>>>>>>>>> debugging, the pdb loads well but breakpoints failed to activate with a
>>>>>>>>> "unexpected symbol reader error while processing foobar.exe". You probably
>>>>>>>>> know what it means or already encountered this error I guess.
>>>>>>>>> I also tried to create a minimal program to simplify comparisons
>>>>>>>>> between original and new PDB but I get an error dumping the original  pdb
>>>>>>>>> exported by visual studio  with -all (PublicsStream.cpp|98). I think it is
>>>>>>>>> a bug.
>>>>>>>>> I've attached the related main.cpp and PDB to this email if you
>>>>>>>>> want to check what is the error exactly (vs2017, x86 and x64 have same
>>>>>>>>> issues).
>>>>>>>>> I've attached also my code (git diff). I added an « identity »
>>>>>>>>> feature to pdbutil which uses the code I wrote to regenerate the input pdb.
>>>>>>>>> You can use it to see what I get so far..
>>>>>>>>> I’ve seen you added recently a fix related to FPO but you say it’s
>>>>>>>>> only for x86 so I don’t think it would change something but who knows..
>>>>>>>>> Anyway, if you have a moment to check my work so far and give me
>>>>>>>>> feedbacks it’s welcome because I get out of ideas about what goes wrong..
>>>>>>>>> Thanks, I go back digging into the pdb mysteries !
>>>>>>>>>
>>>>>>>>> Le ven. 18 janv. 2019 à 12:31, Vivien Millet <
>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>
>>>>>>>>>> Ok ! It was just to be sure I understood well.
>>>>>>>>>> Sorry for not replying directly, I wanted to try first to emit
>>>>>>>>>> CodeView before continuing the discussion and it was time for me to go to
>>>>>>>>>> bed here..
>>>>>>>>>> I just tried it now and it is very easy to switch to CodeView.
>>>>>>>>>> For the ones interested : you just have to give your TargetTriple to your
>>>>>>>>>> llvm::Module used for JIT and then call
>>>>>>>>>> module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell the
>>>>>>>>>> AsmPrinter this module prefer CodeView instead of Dwarf.
>>>>>>>>>> I've checked the content of my .obj file, and there is valid
>>>>>>>>>> .debug$T and  .debug$S sections, so everything goes well until now.
>>>>>>>>>> Now as a parallel task I will try to read the EXE PDB and
>>>>>>>>>> re-export it "as it" to see if I break something in visual studio.
>>>>>>>>>> If I succeed to do that, that might be added as a feature to
>>>>>>>>>> PDBFile or PDBFileBuilder to simplify the process for other users.
>>>>>>>>>> I keep you in touch.
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:50, Zachary Turner <zturner at google.com>
>>>>>>>>>> a écrit :
>>>>>>>>>>
>>>>>>>>>>> When I say "nothing to do" I just mean that you won't have to do
>>>>>>>>>>> anything to convert the record from one format (DWARF) to another format
>>>>>>>>>>> (CodeView).  You will have a COFF object file either on disk (probably
>>>>>>>>>>> named foo.obj or something) or in memory.  And this object file will have a
>>>>>>>>>>> .debug$S section with CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>>> types.  Then you will still need to use the PDBFileBuilder to add these
>>>>>>>>>>> records to the final PDB, but they will already be in the correct format
>>>>>>>>>>> that PDBFileBuilder expects, you won't need to convert them from DWARF
>>>>>>>>>>> (which is not trivial).
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 17, 2019 at 11:26 AM Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That’s a good question, by default when emitting the object
>>>>>>>>>>>> file I choose COFF but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>> probably is a way to do it or at least it must be implemented if not yet..
>>>>>>>>>>>> Lets imagine I manage to do that.. when you say there is
>>>>>>>>>>>> nothing to do, I still must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>> inside the EXE PDB right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>> another way ?
>>>>>>>>>>>>
>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:01, Zachary Turner <
>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> Well, is it possible to just hook up the CodeView debug info
>>>>>>>>>>>>> generator to MCJIT?  If you're not jitting, and you just compile something,
>>>>>>>>>>>>> we translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>>> CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>>> If it's not hard to do, this would probably be a better solution, because
>>>>>>>>>>>>> you don't have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>> is not always trivial.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you can configure this in MCJIT, you won't even need to do
>>>>>>>>>>>>> anything, you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>> .debug$S sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>> while copying them to the output PDB file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok I understand more what you meant. In fact I don’t care
>>>>>>>>>>>>>> about the pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>> me to have duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>> is not to generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>> extract debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>> is emitted by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>> PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>> I think that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>> MCJit if not being an extension to pdbutil.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, for example the TPI stream is just one big collection
>>>>>>>>>>>>>>> of types.  Presumably your JIT code will reuse some of the same types
>>>>>>>>>>>>>>> (perhaps, std::string for example) as your non-jitted code.  Your jitted
>>>>>>>>>>>>>>> symbol records in the object file (for example, a local variable of type
>>>>>>>>>>>>>>> std::string in your jitted code) will refer to the type for std;:string by
>>>>>>>>>>>>>>> a TypeIndex, and your original PDB will also refer to std::string by a
>>>>>>>>>>>>>>> different TypeIndex.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In LLD, when we merge in types and symbols from each object
>>>>>>>>>>>>>>> file, we keep a hash table of which types have already been seen, so that
>>>>>>>>>>>>>>> if we see the same type again, we can just use the TypeIndex that we wrote
>>>>>>>>>>>>>>> on a previous object file.  Then, when we add symbol records, we have to
>>>>>>>>>>>>>>> update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> De-duplicating though, I suppose, is not strictly necessary,
>>>>>>>>>>>>>>> it will just keep your PDB size down.  But you *will* need to at least
>>>>>>>>>>>>>>> re-write the TypeIndexes from the jitted code.  For example, you may decide
>>>>>>>>>>>>>>> that instead of de-duplicating, you just append them all to the end of the
>>>>>>>>>>>>>>> TPI stream (where all the types go in PDB) to keep things simple.  Since
>>>>>>>>>>>>>>> they were in a different position before, they now have different
>>>>>>>>>>>>>>> TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>> correct after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>> you will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>> the symbols of the jitted code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if that makes sense.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok I see..
>>>>>>>>>>>>>>>> what do you mean by “making sure to de-duplicate records as
>>>>>>>>>>>>>>>> necessary” ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It's possible in theory to support incremental updates to
>>>>>>>>>>>>>>>>> a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>> this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>> support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>> build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>> de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>> most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>> writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>> advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Zachary !
>>>>>>>>>>>>>>>>>> If there a way to easily create a new PDBFileBuilder from
>>>>>>>>>>>>>>>>>> an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>> I would like to start from a builder filled with the EXE
>>>>>>>>>>>>>>>>>> PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you Zachary !
>>>>>>>>>>>>>>>>>>> I will have some soon I think ..
>>>>>>>>>>>>>>>>>>> I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>> because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>>> questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>>> enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>>> still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>>> start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>>> working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>>> and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>> will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>> writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>> When you talk about doing all of this I suppose you
>>>>>>>>>>>>>>>>>>>>>>> think about using llvm/debuginfo/pdb, pick code here and there to generate
>>>>>>>>>>>>>>>>>>>>>>> the pdb in memory, read the executable one and perform the merge directly
>>>>>>>>>>>>>>>>>>>>>>> in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>> So you are one of the happy guys who suffered from
>>>>>>>>>>>>>>>>>>>>>>>>> the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>> stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>> when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>> In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>> achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>> - Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>> debugger
>>>>>>>>>>>>>>>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>> - Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>    - export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>> information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>> symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>    - export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>> (what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>> - Merge my JIT pdb into a copy of the executable
>>>>>>>>>>>>>>>>>>>>>>>>> pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>> - Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>> (creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>> - Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>> executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>> - On each JIT rebuild, restart these steps from
>>>>>>>>>>>>>>>>>>>>>>>>> the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>> multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>> think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>> symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>> that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>> debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>> go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> So, here are the things I think you would need to
>>>>>>>>>>>>>>>>>>>>>>>> do:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>>> unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>> you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>> there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>> have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>> right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>> symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>> merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>> the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>> indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>> you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>> (lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>> section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>> file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>> do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>> shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>> PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> You're kind of in uncharted territory here, so this
>>>>>>>>>>>>>>>>>>>>>>>> is just a rough idea of what needs to be done.  There may be other issues
>>>>>>>>>>>>>>>>>>>>>>>> that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately I don't personally have the time to
>>>>>>>>>>>>>>>>>>>>>>>> work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>> questions or problems along the way.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190123/4899a15e/attachment.html>


More information about the llvm-dev mailing list