[llvm-dev] [llvm-pdbutil] : merge not working properly

Vivien Millet via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 28 11:48:18 PST 2019

To be more precise on the crash context, it only happens if I write in the
"Watch" window a variable with a name unknown from the local context, for
example I type "foobar" which does not exist in my program, and then Visual
Studio freezes (the cursor busy), then it crashes/closes and relaunch as
usual. I suspect the Global/Public stream stuff in this case to be wrong,
or at least a problem in my symbol record but my method parameter displays
well in the "Watch".. If you have an idea..
Could it be a mangling problem in my symbol records ? I don't use C++
mangling, then maybe parsing my symbols can generate bugs..
Is there a C++ mangler in LLVM I can use to produce correct names ?

Le lun. 28 janv. 2019 à 20:22, Vivien Millet <vivien.millet at gmail.com> a
écrit :

> Hello Zachary,
> Sorry for replying so lately but It's been a week I'm thinking an working
> hard on your  "dll memory buffer"  idea to see if it works and give you
> feedbacks !
> And it works pretty well until now :
> I shared on the list what I did :
> - create a .ASM file full of "int 3" instructions (to ensure that if we
> execute over the boundaries we instantly break.
> - Compile this to a .DLL
> - use hexadecimal editor to change ".text" section Characteristics from
> Read/Execute to Read/Write/Execute
> - run my program which does JIT compilation
> - get the start RVA of the .text section (which is always 0x1000 in my
> case)
> - Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a
> custom DllMemMgr I give to MCJIT
> - On NotifyObjectEmitted replace the dll pdb by a custom one I build
> myself with your PDBFileBuilder
> - On finalizing memory, reload first the dll to trigger visual studio pdb
> reloading (not working don't know why yet), ensure it goes into the same
> virtual space, protect memory using VirtualProtect.
> - Place a breakpoint in my JIT file, it displays "loaded", execute JIT, it
> breaks
> ...and ....
> * drums *
> Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc ...
> and this, every time, I don't know why..
> I noticed, when compiling C++ equivalent to my JIT program, that a simple
> "int param" is written size=20 in C++ pdb and size=16 in my JIT pdb, do you
> know what this "size" attribute represent in the S_LOCAL Symbol section ? I
> suspect the symbol section to have program for the watch issue .. but I am
> not sure, If you have an idea...
> I also had an "illegal instruction" exception when stepping with F10 after
> break, but when I'm not breaking the code it runs well..
> A lot of mysteries there again...
> Visual studio displays well the disassembly with the debug lines at the
> right place, etc .. so I don't get why visual studio crashes..
> Another issue I have is that I always have to remove/add my breakpoint so
> that visual studio realy breaks, even if it says "I'm a good breakpoint at
> that good address". Does it have a relation with file checksums ? It seems
> mine has a "none" checksum so I suspect this to be the problem.. but I
> don't know how to fix it because I added the checksum with addChecksum with
> the good file name and still I get "none" in the dump...
> So right know I'm quite hopeful because I get something reacting in Visual
> studio, but I have no idea why it crashes..
> Have you already encountered this issue when testing your generated pdbs
> ?
> Do you know the role of Section Contributions in the PDB/debugging session
> ?
> Any tip for checking Symbol record validity in the dump ? looks good to
> me, no ??? anywhere or Error ..
> Thank you !
> Le mer. 23 janv. 2019 à 22:29, Zachary Turner <zturner at google.com> a
> écrit :
>> .text is where code goes, I don't know why it's called .text, it's just
>> been that way for many decades and the name stuck around.  But actually you
>> can call the section whatever you want.  Maybe it's even better to call it
>> something other than .text, because .text is where your DllMain and other
>> stuff will be.  You could call it .jit if you wanted to.  You should be
>> able to create the section with whatever flags you want to.  You'll need to
>> produce a jit_code.obj probably compiled from assembly that makes a section
>> named .jit and sets the flags to be executable (you can just copy the flags
>> from a normal .text section of some other program).  Then link this file
>> together along with a jitted_code_main.obj which you compiled from a simple
>> source file with a DllMain function that does nothing.  This would make
>> jitted_code.dll, then have your program link against jitted_code.lib.
>> Right now you jit the code into some buffer that you created with
>> VirtualAlloc.  If you do the above, it will load jitted_code.dll into
>> memory and the OS loader will allocate some memory for each section.  So
>> this would be like your VirtualAlloc, you can just find the address of the
>> .jit section and use that buffer instead of the VirtualAlloc buffer as the
>> target address of your jit operations.
>> Again, this is just an idea, no promises it will work, but unfortunately
>> that's kind of the best you can do when dealing with closed source things,
>> just make guesses and hope for the best.
>> On Wed, Jan 23, 2019 at 12:42 PM Vivien Millet <vivien.millet at gmail.com>
>> wrote:
>>> (Yes you are right this is my fault)
>>> Considering the string table, it only seems to contains file relative
>>> informations in every pdb I am using, and it looks correct but I will check
>>> it.
>>> I looked at the pdb.cpp code about checksums and tables, I copied some
>>> stuff and got things wrong according to cvdump, then I simplified the
>>> process of copying the table and it worked (in cvdump it finds the file
>>> matching line etc...) so I suspect this is also correct.
>>> All the streams look good, but I will check deeper !
>>> It seems right what you say about rva and modules, this is what I m
>>> afraid of, doing all of this for nothing or almost..
>>> Your idea looks good concerning the .text section in a separated dll,
>>> but will it be executable memory ? .text is where static strings go right ?
>>> When you say putting my jit in there, do you mean writing it when the
>>> jitted_code.dll is loaded in memory or on the .dll file directly before
>>> loading it ? In the first scenario I wonder if the section will be
>>> executable, in the second scenario I can’t do it because it would require
>>> perfect linking with the other code my jit points to..
>>> Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at google.com> a
>>> écrit :
>>>> (BTW, I'm adding llvm-dev back to the list, I didn't notice it got
>>>> taken off.  In general I try to keep the list on all emails, even if it's
>>>> extremely technical and specific, because someday someone else will try to
>>>> do this, and it'll be nice if they can read the whole thread).
>>>> I can think of a couple of things that might be wrong:
>>>> 1) If the string table is in a different order, then anything that
>>>> refers to the string table need to be changed to refer to the new offset.
>>>> If the string "foo" is at offset 12 in the old PDB, but offset 15 in the
>>>> new PDB, then somewhere there is a record which is going to look at offset
>>>> 12 and expect to find something, and that will mess up.  The main place
>>>> this is important is in the File Checksums table, there is an entry that
>>>> says which file it is a checksum for, and that refers to the string table.
>>>> However, it's possible for certain symbol records to refer to the string
>>>> table too.  See lld/COFF/PDB.cpp and Ctrl+F for "PDBStrTab" and you will
>>>> find some information about this.
>>>> 2) When you run `llvm-pdbutil dump -streams` on the copied PDB, do all
>>>> of them show a reasonable description?  Are there any streams that say
>>>> (???)?  If so, that's a problem.
>>>> > does visual studio will consider a symbol file broken if the address
>>>> goes beyond the official module address range (the compiled one), because
>>>> my JIT code is allocated after the end of the module with VirtualAlloc
>>>> That is a good question, and part of why my job is so difficult,
>>>> because I can't look at their code.  But I think the answer is "probably".
>>>> The debugger has to have some way to convert an address in your running
>>>> process into a symbol and offset, because that's how all debug info is
>>>> represented in the PDB.  So if there is no module, then there is no RVA
>>>> (because the R in RVA means relative, and what would it be relative to?).
>>>> One idea to test this would be to create a DLL called jitted_code.dll,
>>>> give it a huuuuuge .text section (probably just a .asm file and use some
>>>> assembly directives to allocate a very large series of null bytes), and
>>>> then write your jit code into that area.  This way you would not need to
>>>> modify the existing PDB you would only need to make a new PDB called
>>>> jitted_code.pdb with 1 module, and those symbols could have meaningful
>>>> RVAs.  And you might not even need to detach the debugger if you do things
>>>> this way, because you could just right click the jitted_code.dll module in
>>>> the modules window and choose Load Symbols.
>>>> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet <vivien.millet at gmail.com>
>>>> wrote:
>>>>> Yes this is it, I just make a copy from a pdb generated by link.exe
>>>>> (the microsoft one).
>>>>> Using llvm-pdbutil to compare is what I do, except I do it with "-all"
>>>>> And I get almost everything the same : same number of streams, section
>>>>> map looks good,string table looks good (even if not the same order), same
>>>>> number of modules with the symbols and subsection practically the same, and
>>>>> this is why I get stuck, I miss something but I can't see what because I
>>>>> don't know where to look for. Visual studio works with it, I can debug my
>>>>> original exe, but probably without the globals...
>>>>> And the other problem is that the difference between the dumps is not
>>>>> necessarily a bug because the builder may generate new hashes values,
>>>>> reorder streams, modules, etc ...
>>>>> Right now I gave up to have publics and globals streams and attacked
>>>>> the real goal : insert my jit codeview into the pdb. I have again done «
>>>>> something » but as I don’t understand how the format work I don’t have it
>>>>> working in visual studio.. except once, a single time it worked and the
>>>>> breakpoint turned on in the UI (even if the rva was broken for the
>>>>> instructions) but it happened a single time .. then I get depressed the
>>>>> next times..... cvdump displays it all « correct », no corrupt stuff
>>>>> apparently. But what I do is probably wrong somewhere. What I do is I take
>>>>> .debug$S and .debug$T as is without relocations just to see, but what I
>>>>> don’t know really is : does visual studio will consider a symbol file
>>>>> broken if the address goes beyond the official module address range (the
>>>>> compiled one), because my JIT code is allocated after the end of the module
>>>>> with VirtualAlloc.
>>>>> Another thing I don’t get is the section contribution, what is it
>>>>> exactly ? I inserted section contrib for all sections except the debug$
>>>>> ones but I don’t know what i’m really doing and it’s my average problem
>>>>> implementing this JIT feature...
>>>>> I also don’t know what are relocations inside the codeview format,
>>>>> what is the difference between RVA and relocation, is there anything to do
>>>>> with this related to the codeview part I need to insert in the pdb ? I
>>>>> don’t see why visual studio needs more than just RVA<->Line mapping..
>>>>> This is really making me crazy being so ignorant and trying to guess
>>>>> what visual studio does...
>>>>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner at google.com> a
>>>>> écrit :
>>>>>> So if i understand correctly, you're basically just trying to
>>>>>> implement something like a pdb *copy*, just as a test to see if you can get
>>>>>> it to work.  So you generate a PDB with cl/link or clang-cl/lld-link, then
>>>>>> try to copy it using your tool, then see if it still works.
>>>>>> If this is correct, and it's not working, then there is probably just
>>>>>> something you didn't copy.  Neither Publics nor globals actually contain
>>>>>> their own data, instead they just refer to records from the corresponding
>>>>>> module stream.  So an S_PROCREF for the function "main" might have fields
>>>>>> that say "the name of the function is main, and it's at offset 20 of module
>>>>>> 1".  So, if there is no module 1, or if offset 20 of module is not actually
>>>>>> an S_GPROC32 for the function main, then it will be broken.
>>>>>> Did you also go through each module in the source PDB, add a new
>>>>>> module in the target PDB, then copy all of the symbols for each one?
>>>>>> the best way to find differences is by using llvm-pdbutil on the
>>>>>> source and target PDBs and looking for things that look different.  For
>>>>>> example, I'd start with llvm-pdbutil dump -streams and then seeing if they
>>>>>> even have all the same streams.  If one of them is missing streams, that's
>>>>>> a good place to start.  If they have the same streams, then look for ones
>>>>>> where the size is different.  Then drill into those to see why the size is
>>>>>> different.
>>>>>> LMK if that helps.
>>>>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <
>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>> For now I'm not merging my JIT CodeView section, I only try to build
>>>>>>> a pure copy of an existing PDB using the XxxBuilder classes (PDBFileBuilder
>>>>>>> & Co / reading a PDBFile) and check if visual studio wants to eat it..
>>>>>>> For Publics and Globals, what I do is naive, I use the
>>>>>>> GsiStreamBuilder and prey :)
>>>>>>>   if (File.hasPDBGlobalsStream() && File.getPDBGlobalsStream()) {
>>>>>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>>>>>     GlobalsStream &stream = *File.getPDBGlobalsStream();
>>>>>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>>>>>     for (uint32_t PubSymOff : stream.getGlobalsTable()) {
>>>>>>>       CVSymbol Sym = SymbolRecords.readRecord(PubSymOff);
>>>>>>>       builder.addGlobalSymbol(Sym);
>>>>>>>     }
>>>>>>>   }
>>>>>>>   if (File.hasPDBPublicsStream() && File.getPDBPublicsStream()) {
>>>>>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>>>>>     PublicsStream &stream = *File.getPDBPublicsStream();
>>>>>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>>>>> 	std::vector<PublicSym32> Publics;
>>>>>>>     for (uint32_t PubSymOff : stream.getPublicsTable()) {
>>>>>>>       PublicSym32 Pub = cantFail(
>>>>>>>           llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>>>>               SymbolRecords.readRecord(PubSymOff)));
>>>>>>>       Publics.push_back(Pub);
>>>>>>>     }
>>>>>>>     if (!Publics.empty()) {
>>>>>>>       // Sort the public symbols and add them to the stream.
>>>>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>>>>            [](const PublicSym32 &L, const PublicSym32 &R) {
>>>>>>>              return L.Name < R.Name;
>>>>>>>            });
>>>>>>>       for (const PublicSym32 &Pub : Publics)
>>>>>>>         builder.addPublicSymbol(Pub);
>>>>>>>     }
>>>>>>>   }
>>>>>>> Is it what you meant ?
>>>>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner <zturner at google.com>
>>>>>>> a écrit :
>>>>>>>> Also, even if symbolGoesInGlobalsStream returns true, you can’t
>>>>>>>> just copy it. Functions, for example, which are S_GPROC32 or S_LPROC32 in
>>>>>>>> the module stream, are S_PROCREF in the globals stream. Similarly,
>>>>>>>> *everything* in the publics stream is S_PUB32. So you need to convert each
>>>>>>>> symbol to the proper type for the stream it’s going to go in
>>>>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner <zturner at google.com>
>>>>>>>> wrote:
>>>>>>>>> Publics are basically a list of everything that has a mangled
>>>>>>>>> name. To be honest, I don’t know what the debugger uses this for.
>>>>>>>>> Globals is essentially every symbol in the pdb in one large table.
>>>>>>>>> The reason this is important is because if you type “foo” in the watch
>>>>>>>>> window, the debugger doesn’t necessarily know what compiland foo comes
>>>>>>>>> from. So it has to have a way to find everything in the entire program no
>>>>>>>>> matter what compiland it came from. That’s what the globals are.
>>>>>>>>> Both publics and globals are hash tables, so one possible reason
>>>>>>>>> there might be a problem is that you need to rehash the entire table. When
>>>>>>>>> you build your modified pdb, I would suggest starting with an empty publics
>>>>>>>>> / globals stream, adding all items from the first pdb by iterating over
>>>>>>>>> those records and using a GlobalsStreamBuilder, then adding all your jitted
>>>>>>>>> items separately, then writing it out. That should make sure it gets hashed
>>>>>>>>> correctly.
>>>>>>>>> Are you doing that?
>>>>>>>>> Btw, not all symbols belong in the globals / publics stream. Check
>>>>>>>>> the code in lld and search for symbolGoesInGlobalsStream and
>>>>>>>>> symbolGoesInPublicsStream to see the logic it uses
>>>>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien Millet <
>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>> Hi Zachary, sorry for disturbing again..
>>>>>>>>>> I've fixed some problems (StringTable, SectionMap and few things
>>>>>>>>>> here and there..) and my converted PDB seems now to work inside visual
>>>>>>>>>> studio..
>>>>>>>>>> But I'm not sure if I have full debug features because I don't
>>>>>>>>>> succeed to translate Publics and Globals correctly. CVDump says PDB is
>>>>>>>>>> corrupted whereas PDBUTIL -dump correctly displays them.
>>>>>>>>>> I don't really understand what Publics and Globals stream really
>>>>>>>>>> are, if the symbols are really in the corresponding streams or if they are
>>>>>>>>>> just references to somewhere else.
>>>>>>>>>> The LLVM documentation is not complete about these two Publics
>>>>>>>>>> and Globals stream so I'm a bit lost on how to handle them or find what is
>>>>>>>>>> "corrupted" according to CVDump.
>>>>>>>>>> I took example on LLD and yaml2pdb to help me to do some tough
>>>>>>>>>> conversions but I noticed that in yaml2pdb there is no GsiStream exported
>>>>>>>>>> (no GsiBuidler use and no reference to Publics or Globals anywhere), is it
>>>>>>>>>> wanted/correct ?
>>>>>>>>>> Thanks and sorry If I'm a bit spaming, it's my 99% time task
>>>>>>>>>> right now and being stuck without any clue is difficult :) But I guess you
>>>>>>>>>> experienced even more suffering when documentation didn't exist at all !
>>>>>>>>>> Have a good day !
>>>>>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien Millet <
>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>> ERRATUM, my bad, the pdb I tested is also corrupted according to
>>>>>>>>>>> cvdump.exe, I on't know why, I regenerated again and now I have a working
>>>>>>>>>>> dump. You don't need to fix anything.
>>>>>>>>>>> Le dim. 20 janv. 2019 à 20:26, Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>> Hi Zachary,
>>>>>>>>>>>> I've done a first step to rewrite  existing PDBFile with
>>>>>>>>>>>> PDBFileBuilder, I get mostly of the work done but I don't get as much
>>>>>>>>>>>> output as input (some streams are not mirrored for unknown reasons and some
>>>>>>>>>>>> data must be missing here and there...).
>>>>>>>>>>>> When I try to replace the original by the rebuilt one for
>>>>>>>>>>>> debugging, the pdb loads well but breakpoints failed to activate with a
>>>>>>>>>>>> "unexpected symbol reader error while processing foobar.exe". You probably
>>>>>>>>>>>> know what it means or already encountered this error I guess.
>>>>>>>>>>>> I also tried to create a minimal program to simplify
>>>>>>>>>>>> comparisons between original and new PDB but I get an error dumping the
>>>>>>>>>>>> original  pdb exported by visual studio  with -all (PublicsStream.cpp|98).
>>>>>>>>>>>> I think it is a bug.
>>>>>>>>>>>> I've attached the related main.cpp and PDB to this email if you
>>>>>>>>>>>> want to check what is the error exactly (vs2017, x86 and x64 have same
>>>>>>>>>>>> issues).
>>>>>>>>>>>> I've attached also my code (git diff). I added an « identity »
>>>>>>>>>>>> feature to pdbutil which uses the code I wrote to regenerate the input pdb.
>>>>>>>>>>>> You can use it to see what I get so far..
>>>>>>>>>>>> I’ve seen you added recently a fix related to FPO but you say
>>>>>>>>>>>> it’s only for x86 so I don’t think it would change something but who knows..
>>>>>>>>>>>> Anyway, if you have a moment to check my work so far and give
>>>>>>>>>>>> me feedbacks it’s welcome because I get out of ideas about what goes wrong..
>>>>>>>>>>>> Thanks, I go back digging into the pdb mysteries !
>>>>>>>>>>>> Le ven. 18 janv. 2019 à 12:31, Vivien Millet <
>>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>> Ok ! It was just to be sure I understood well.
>>>>>>>>>>>>> Sorry for not replying directly, I wanted to try first to emit
>>>>>>>>>>>>> CodeView before continuing the discussion and it was time for me to go to
>>>>>>>>>>>>> bed here..
>>>>>>>>>>>>> I just tried it now and it is very easy to switch to CodeView.
>>>>>>>>>>>>> For the ones interested : you just have to give your TargetTriple to your
>>>>>>>>>>>>> llvm::Module used for JIT and then call
>>>>>>>>>>>>> module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell the
>>>>>>>>>>>>> AsmPrinter this module prefer CodeView instead of Dwarf.
>>>>>>>>>>>>> I've checked the content of my .obj file, and there is valid
>>>>>>>>>>>>> .debug$T and  .debug$S sections, so everything goes well until now.
>>>>>>>>>>>>> Now as a parallel task I will try to read the EXE PDB and
>>>>>>>>>>>>> re-export it "as it" to see if I break something in visual studio.
>>>>>>>>>>>>> If I succeed to do that, that might be added as a feature to
>>>>>>>>>>>>> PDBFile or PDBFileBuilder to simplify the process for other users.
>>>>>>>>>>>>> I keep you in touch.
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:50, Zachary Turner <
>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>> When I say "nothing to do" I just mean that you won't have to
>>>>>>>>>>>>>> do anything to convert the record from one format (DWARF) to another format
>>>>>>>>>>>>>> (CodeView).  You will have a COFF object file either on disk (probably
>>>>>>>>>>>>>> named foo.obj or something) or in memory.  And this object file will have a
>>>>>>>>>>>>>> .debug$S section with CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>>>>>> types.  Then you will still need to use the PDBFileBuilder to add these
>>>>>>>>>>>>>> records to the final PDB, but they will already be in the correct format
>>>>>>>>>>>>>> that PDBFileBuilder expects, you won't need to convert them from DWARF
>>>>>>>>>>>>>> (which is not trivial).
>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 11:26 AM Vivien Millet <
>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>> That’s a good question, by default when emitting the object
>>>>>>>>>>>>>>> file I choose COFF but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>>>>> probably is a way to do it or at least it must be implemented if not yet..
>>>>>>>>>>>>>>> Lets imagine I manage to do that.. when you say there is
>>>>>>>>>>>>>>> nothing to do, I still must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>>>>> inside the EXE PDB right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>>>>> another way ?
>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:01, Zachary Turner <
>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>> Well, is it possible to just hook up the CodeView debug
>>>>>>>>>>>>>>>> info generator to MCJIT?  If you're not jitting, and you just compile
>>>>>>>>>>>>>>>> something, we translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>>>>>> CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>>>>>> If it's not hard to do, this would probably be a better solution, because
>>>>>>>>>>>>>>>> you don't have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>>>>> is not always trivial.
>>>>>>>>>>>>>>>> If you can configure this in MCJIT, you won't even need to
>>>>>>>>>>>>>>>> do anything, you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>>>>> .debug$S sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>>>>> while copying them to the output PDB file.
>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>> Ok I understand more what you meant. In fact I don’t care
>>>>>>>>>>>>>>>>> about the pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>>>>> me to have duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>>>>> is not to generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>>>>> extract debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>>>>> is emitted by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>>>>> PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>>>>> I think that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>>>>> MCJit if not being an extension to pdbutil.
>>>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>> Well, for example the TPI stream is just one big
>>>>>>>>>>>>>>>>>> collection of types.  Presumably your JIT code will reuse some of the same
>>>>>>>>>>>>>>>>>> types (perhaps, std::string for example) as your non-jitted code.  Your
>>>>>>>>>>>>>>>>>> jitted symbol records in the object file (for example, a local variable of
>>>>>>>>>>>>>>>>>> type std::string in your jitted code) will refer to the type for
>>>>>>>>>>>>>>>>>> std;:string by a TypeIndex, and your original PDB will also refer to
>>>>>>>>>>>>>>>>>> std::string by a different TypeIndex.
>>>>>>>>>>>>>>>>>> In LLD, when we merge in types and symbols from each
>>>>>>>>>>>>>>>>>> object file, we keep a hash table of which types have already been seen, so
>>>>>>>>>>>>>>>>>> that if we see the same type again, we can just use the TypeIndex that we
>>>>>>>>>>>>>>>>>> wrote on a previous object file.  Then, when we add symbol records, we have
>>>>>>>>>>>>>>>>>> to update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>>>> De-duplicating though, I suppose, is not strictly
>>>>>>>>>>>>>>>>>> necessary, it will just keep your PDB size down.  But you *will* need to at
>>>>>>>>>>>>>>>>>> least re-write the TypeIndexes from the jitted code.  For example, you may
>>>>>>>>>>>>>>>>>> decide that instead of de-duplicating, you just append them all to the end
>>>>>>>>>>>>>>>>>> of the TPI stream (where all the types go in PDB) to keep things simple.
>>>>>>>>>>>>>>>>>> Since they were in a different position before, they now have different
>>>>>>>>>>>>>>>>>> TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>>>>> correct after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>>>>> you will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>>>>> the symbols of the jitted code.
>>>>>>>>>>>>>>>>>> Let me know if that makes sense.
>>>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Ok I see..
>>>>>>>>>>>>>>>>>>> what do you mean by “making sure to de-duplicate records
>>>>>>>>>>>>>>>>>>> as necessary” ?
>>>>>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>> It's possible in theory to support incremental updates
>>>>>>>>>>>>>>>>>>>> to a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>>>>> this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>>>>> support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>>>> The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>>>>> build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>>>>> de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>>>> Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>>>>> most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>>>>> writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>>>>> advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Zachary !
>>>>>>>>>>>>>>>>>>>>> If there a way to easily create a new PDBFileBuilder
>>>>>>>>>>>>>>>>>>>>> from an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>>>>> I would like to start from a builder filled with the
>>>>>>>>>>>>>>>>>>>>> EXE PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>> Thank you Zachary !
>>>>>>>>>>>>>>>>>>>>>> I will have some soon I think ..
>>>>>>>>>>>>>>>>>>>>>> I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>>>>> because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>> Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>>>>>> questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>>>>>> enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>>>>>> still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>>>>>> start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>>>>>> working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>>>>>> and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>> Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>>>>> will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>>>>> writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>>>>> When you talk about doing all of this I suppose
>>>>>>>>>>>>>>>>>>>>>>>>>> you think about using llvm/debuginfo/pdb, pick code here and there to
>>>>>>>>>>>>>>>>>>>>>>>>>> generate the pdb in memory, read the executable one and perform the merge
>>>>>>>>>>>>>>>>>>>>>>>>>> directly in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>>>>> So you are one of the happy guys who suffered
>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>>>>> stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>>>>> when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>>>> debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>>>>> information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>>> symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Merge my JIT pdb into a copy of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> executable pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>>>>> executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - On each JIT rebuild, restart these steps from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>>>>> think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>>>>> symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>>>>> that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>>>>> debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>>>>> go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>>>> So, here are the things I think you would need
>>>>>>>>>>>>>>>>>>>>>>>>>>> to do:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>>>>>> unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>>>>> you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>>>>> there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>>>>> have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>>>>> right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>>>>> symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>>>>> merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>>>>> the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>>>>> indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>>>>> you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>>>>> (lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>>>>> section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>>>>> file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>>>>> do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>>>> You're kind of in uncharted territory here, so
>>>>>>>>>>>>>>>>>>>>>>>>>>> this is just a rough idea of what needs to be done.  There may be other
>>>>>>>>>>>>>>>>>>>>>>>>>>> issues that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately I don't personally have the time
>>>>>>>>>>>>>>>>>>>>>>>>>>> to work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>>>>> questions or problems along the way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190128/5eefe461/attachment-0001.html>

More information about the llvm-dev mailing list