[llvm-dev] [llvm-pdbutil] : merge not working properly

Mon Jan 28 11:22:56 PST 2019

Hello Zachary,
Sorry for replying so lately but It's been a week I'm thinking an working
hard on your  "dll memory buffer"  idea to see if it works and give you
feedbacks !
And it works pretty well until now :
I shared on the list what I did :
- create a .ASM file full of "int 3" instructions (to ensure that if we
execute over the boundaries we instantly break.
- Compile this to a .DLL
- use hexadecimal editor to change ".text" section Characteristics from
Read/Execute to Read/Write/Execute
- run my program which does JIT compilation
- get the start RVA of the .text section (which is always 0x1000 in my case)
- Load the .DLL and use the ModuleAddress+RVA as a memory buffer in a
custom DllMemMgr I give to MCJIT
- On NotifyObjectEmitted replace the dll pdb by a custom one I build myself
with your PDBFileBuilder
- On finalizing memory, reload first the dll to trigger visual studio pdb
reloading (not working don't know why yet), ensure it goes into the same
virtual space, protect memory using VirtualProtect.
- Place a breakpoint in my JIT file, it displays "loaded", execute JIT, it
breaks
...and ....
* drums *
Visual Studio CRASHES when I open the Watch window or Locals/Auto/etc ...
and this, every time, I don't know why..
I noticed, when compiling C++ equivalent to my JIT program, that a simple
"int param" is written size=20 in C++ pdb and size=16 in my JIT pdb, do you
know what this "size" attribute represent in the S_LOCAL Symbol section ? I
suspect the symbol section to have program for the watch issue .. but I am
not sure, If you have an idea...
I also had an "illegal instruction" exception when stepping with F10 after
break, but when I'm not breaking the code it runs well..

A lot of mysteries there again...

Visual studio displays well the disassembly with the debug lines at the
right place, etc .. so I don't get why visual studio crashes..
Another issue I have is that I always have to remove/add my breakpoint so
that visual studio realy breaks, even if it says "I'm a good breakpoint at
that good address". Does it have a relation with file checksums ? It seems
mine has a "none" checksum so I suspect this to be the problem.. but I
don't know how to fix it because I added the checksum with addChecksum with
the good file name and still I get "none" in the dump...
So right know I'm quite hopeful because I get something reacting in Visual
studio, but I have no idea why it crashes..

Have you already encountered this issue when testing your generated pdbs ?
Do you know the role of Section Contributions in the PDB/debugging session
?
Any tip for checking Symbol record validity in the dump ? looks good to me,
no ??? anywhere or Error ..

Thank you !

Le mer. 23 janv. 2019 à 22:29, Zachary Turner <zturner at google.com> a écrit :

> .text is where code goes, I don't know why it's called .text, it's just
> been that way for many decades and the name stuck around.  But actually you
> can call the section whatever you want.  Maybe it's even better to call it
> something other than .text, because .text is where your DllMain and other
> stuff will be.  You could call it .jit if you wanted to.  You should be
> able to create the section with whatever flags you want to.  You'll need to
> produce a jit_code.obj probably compiled from assembly that makes a section
> named .jit and sets the flags to be executable (you can just copy the flags
> from a normal .text section of some other program).  Then link this file
> together along with a jitted_code_main.obj which you compiled from a simple
> source file with a DllMain function that does nothing.  This would make
> jitted_code.dll, then have your program link against jitted_code.lib.
>
> Right now you jit the code into some buffer that you created with
> VirtualAlloc.  If you do the above, it will load jitted_code.dll into
> memory and the OS loader will allocate some memory for each section.  So
> this would be like your VirtualAlloc, you can just find the address of the
> .jit section and use that buffer instead of the VirtualAlloc buffer as the
> target address of your jit operations.
>
> Again, this is just an idea, no promises it will work, but unfortunately
> that's kind of the best you can do when dealing with closed source things,
> just make guesses and hope for the best.
>
>
>
> On Wed, Jan 23, 2019 at 12:42 PM Vivien Millet <vivien.millet at gmail.com>
> wrote:
>
>> (Yes you are right this is my fault)
>> Considering the string table, it only seems to contains file relative
>> informations in every pdb I am using, and it looks correct but I will check
>> it.
>> I looked at the pdb.cpp code about checksums and tables, I copied some
>> stuff and got things wrong according to cvdump, then I simplified the
>> process of copying the table and it worked (in cvdump it finds the file
>> matching line etc...) so I suspect this is also correct.
>>
>> All the streams look good, but I will check deeper !
>>
>> It seems right what you say about rva and modules, this is what I m
>> afraid of, doing all of this for nothing or almost..
>>
>> Your idea looks good concerning the .text section in a separated dll, but
>> will it be executable memory ? .text is where static strings go right ?
>> When you say putting my jit in there, do you mean writing it when the
>> jitted_code.dll is loaded in memory or on the .dll file directly before
>> loading it ? In the first scenario I wonder if the section will be
>> executable, in the second scenario I can’t do it because it would require
>> perfect linking with the other code my jit points to..
>>
>> Le mer. 23 janv. 2019 à 20:57, Zachary Turner <zturner at google.com> a
>> écrit :
>>
>>> (BTW, I'm adding llvm-dev back to the list, I didn't notice it got taken
>>> off.  In general I try to keep the list on all emails, even if it's
>>> extremely technical and specific, because someday someone else will try to
>>> do this, and it'll be nice if they can read the whole thread).
>>>
>>> I can think of a couple of things that might be wrong:
>>>
>>> 1) If the string table is in a different order, then anything that
>>> refers to the string table need to be changed to refer to the new offset.
>>> If the string "foo" is at offset 12 in the old PDB, but offset 15 in the
>>> new PDB, then somewhere there is a record which is going to look at offset
>>> 12 and expect to find something, and that will mess up.  The main place
>>> this is important is in the File Checksums table, there is an entry that
>>> says which file it is a checksum for, and that refers to the string table.
>>> However, it's possible for certain symbol records to refer to the string
>>> table too.  See lld/COFF/PDB.cpp and Ctrl+F for "PDBStrTab" and you will
>>> find some information about this.
>>>
>>> 2) When you run `llvm-pdbutil dump -streams` on the copied PDB, do all
>>> of them show a reasonable description?  Are there any streams that say
>>> (???)?  If so, that's a problem.
>>>
>>> > does visual studio will consider a symbol file broken if the address
>>> goes beyond the official module address range (the compiled one), because
>>> my JIT code is allocated after the end of the module with VirtualAlloc
>>> That is a good question, and part of why my job is so difficult, because
>>> I can't look at their code.  But I think the answer is "probably".  The
>>> debugger has to have some way to convert an address in your running process
>>> into a symbol and offset, because that's how all debug info is represented
>>> in the PDB.  So if there is no module, then there is no RVA (because the R
>>> in RVA means relative, and what would it be relative to?).
>>>
>>> One idea to test this would be to create a DLL called jitted_code.dll,
>>> give it a huuuuuge .text section (probably just a .asm file and use some
>>> assembly directives to allocate a very large series of null bytes), and
>>> then write your jit code into that area.  This way you would not need to
>>> modify the existing PDB you would only need to make a new PDB called
>>> jitted_code.pdb with 1 module, and those symbols could have meaningful
>>> RVAs.  And you might not even need to detach the debugger if you do things
>>> this way, because you could just right click the jitted_code.dll module in
>>> the modules window and choose Load Symbols.
>>>
>>>
>>>
>>> On Wed, Jan 23, 2019 at 11:13 AM Vivien Millet <vivien.millet at gmail.com>
>>> wrote:
>>>
>>>> Yes this is it, I just make a copy from a pdb generated by link.exe
>>>> (the microsoft one).
>>>> Using llvm-pdbutil to compare is what I do, except I do it with "-all"
>>>> And I get almost everything the same : same number of streams, section
>>>> map looks good,string table looks good (even if not the same order), same
>>>> number of modules with the symbols and subsection practically the same, and
>>>> this is why I get stuck, I miss something but I can't see what because I
>>>> don't know where to look for. Visual studio works with it, I can debug my
>>>> original exe, but probably without the globals...
>>>> And the other problem is that the difference between the dumps is not
>>>> necessarily a bug because the builder may generate new hashes values,
>>>> reorder streams, modules, etc ...
>>>>
>>>> Right now I gave up to have publics and globals streams and attacked
>>>> the real goal : insert my jit codeview into the pdb. I have again done «
>>>> something » but as I don’t understand how the format work I don’t have it
>>>> working in visual studio.. except once, a single time it worked and the
>>>> breakpoint turned on in the UI (even if the rva was broken for the
>>>> instructions) but it happened a single time .. then I get depressed the
>>>> next times..... cvdump displays it all « correct », no corrupt stuff
>>>> apparently. But what I do is probably wrong somewhere. What I do is I take
>>>> .debug$S and .debug$T as is without relocations just to see, but what I
>>>> don’t know really is : does visual studio will consider a symbol file
>>>> broken if the address goes beyond the official module address range (the
>>>> compiled one), because my JIT code is allocated after the end of the module
>>>> with VirtualAlloc.
>>>> Another thing I don’t get is the section contribution, what is it
>>>> exactly ? I inserted section contrib for all sections except the debug$
>>>> ones but I don’t know what i’m really doing and it’s my average problem
>>>> implementing this JIT feature...
>>>> I also don’t know what are relocations inside the codeview format, what
>>>> is the difference between RVA and relocation, is there anything to do with
>>>> this related to the codeview part I need to insert in the pdb ? I don’t see
>>>> why visual studio needs more than just RVA<->Line mapping..
>>>> This is really making me crazy being so ignorant and trying to guess
>>>> what visual studio does...
>>>>
>>>> Le lun. 21 janv. 2019 à 19:50, Zachary Turner <zturner at google.com> a
>>>> écrit :
>>>>
>>>>> So if i understand correctly, you're basically just trying to
>>>>> implement something like a pdb *copy*, just as a test to see if you can get
>>>>> it to work.  So you generate a PDB with cl/link or clang-cl/lld-link, then
>>>>> try to copy it using your tool, then see if it still works.
>>>>>
>>>>> If this is correct, and it's not working, then there is probably just
>>>>> something you didn't copy.  Neither Publics nor globals actually contain
>>>>> their own data, instead they just refer to records from the corresponding
>>>>> module stream.  So an S_PROCREF for the function "main" might have fields
>>>>> that say "the name of the function is main, and it's at offset 20 of module
>>>>> 1".  So, if there is no module 1, or if offset 20 of module is not actually
>>>>> an S_GPROC32 for the function main, then it will be broken.
>>>>>
>>>>> Did you also go through each module in the source PDB, add a new
>>>>> module in the target PDB, then copy all of the symbols for each one?
>>>>>
>>>>> the best way to find differences is by using llvm-pdbutil on the
>>>>> source and target PDBs and looking for things that look different.  For
>>>>> example, I'd start with llvm-pdbutil dump -streams and then seeing if they
>>>>> even have all the same streams.  If one of them is missing streams, that's
>>>>> a good place to start.  If they have the same streams, then look for ones
>>>>> where the size is different.  Then drill into those to see why the size is
>>>>> different.
>>>>>
>>>>> LMK if that helps.
>>>>>
>>>>> On Mon, Jan 21, 2019 at 10:03 AM Vivien Millet <
>>>>> vivien.millet at gmail.com> wrote:
>>>>>
>>>>>> For now I'm not merging my JIT CodeView section, I only try to build
>>>>>> a pure copy of an existing PDB using the XxxBuilder classes (PDBFileBuilder
>>>>>> & Co / reading a PDBFile) and check if visual studio wants to eat it..
>>>>>> For Publics and Globals, what I do is naive, I use the
>>>>>> GsiStreamBuilder and prey :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   if (File.hasPDBGlobalsStream() && File.getPDBGlobalsStream()) {
>>>>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>>>>     GlobalsStream &stream = *File.getPDBGlobalsStream();
>>>>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>>>>
>>>>>>     for (uint32_t PubSymOff : stream.getGlobalsTable()) {
>>>>>>       CVSymbol Sym = SymbolRecords.readRecord(PubSymOff);
>>>>>>       builder.addGlobalSymbol(Sym);
>>>>>>     }
>>>>>>   }
>>>>>>   if (File.hasPDBPublicsStream() && File.getPDBPublicsStream()) {
>>>>>>     GSIStreamBuilder &builder = this->getGsiBuilder();
>>>>>>     PublicsStream &stream = *File.getPDBPublicsStream();
>>>>>>     SymbolStream &SymbolRecords = cantFail(File.getPDBSymbolStream());
>>>>>>
>>>>>> 	std::vector<PublicSym32> Publics;
>>>>>>
>>>>>>     for (uint32_t PubSymOff : stream.getPublicsTable()) {
>>>>>>       PublicSym32 Pub = cantFail(
>>>>>>           llvm::codeview::SymbolDeserializer::deserializeAs<PublicSym32>(
>>>>>>               SymbolRecords.readRecord(PubSymOff)));
>>>>>>       Publics.push_back(Pub);
>>>>>>     }
>>>>>>
>>>>>>     if (!Publics.empty()) {
>>>>>>       // Sort the public symbols and add them to the stream.
>>>>>>       std::sort(Publics.begin(), Publics.end(),
>>>>>>            [](const PublicSym32 &L, const PublicSym32 &R) {
>>>>>>              return L.Name < R.Name;
>>>>>>            });
>>>>>>       for (const PublicSym32 &Pub : Publics)
>>>>>>         builder.addPublicSymbol(Pub);
>>>>>>     }
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> Is it what you meant ?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le lun. 21 janv. 2019 à 18:50, Zachary Turner <zturner at google.com> a
>>>>>> écrit :
>>>>>>
>>>>>>> Also, even if symbolGoesInGlobalsStream returns true, you can’t just
>>>>>>> copy it. Functions, for example, which are S_GPROC32 or S_LPROC32 in the
>>>>>>> module stream, are S_PROCREF in the globals stream. Similarly, *everything*
>>>>>>> in the publics stream is S_PUB32. So you need to convert each symbol to the
>>>>>>> proper type for the stream it’s going to go in
>>>>>>> On Mon, Jan 21, 2019 at 9:46 AM Zachary Turner <zturner at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Publics are basically a list of everything that has a mangled name.
>>>>>>>> To be honest, I don’t know what the debugger uses this for.
>>>>>>>>
>>>>>>>> Globals is essentially every symbol in the pdb in one large table.
>>>>>>>> The reason this is important is because if you type “foo” in the watch
>>>>>>>> window, the debugger doesn’t necessarily know what compiland foo comes
>>>>>>>> from. So it has to have a way to find everything in the entire program no
>>>>>>>> matter what compiland it came from. That’s what the globals are.
>>>>>>>>
>>>>>>>> Both publics and globals are hash tables, so one possible reason
>>>>>>>> there might be a problem is that you need to rehash the entire table. When
>>>>>>>> you build your modified pdb, I would suggest starting with an empty publics
>>>>>>>> / globals stream, adding all items from the first pdb by iterating over
>>>>>>>> those records and using a GlobalsStreamBuilder, then adding all your jitted
>>>>>>>> items separately, then writing it out. That should make sure it gets hashed
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> Are you doing that?
>>>>>>>>
>>>>>>>> Btw, not all symbols belong in the globals / publics stream. Check
>>>>>>>> the code in lld and search for symbolGoesInGlobalsStream and
>>>>>>>> symbolGoesInPublicsStream to see the logic it uses
>>>>>>>> On Mon, Jan 21, 2019 at 8:36 AM Vivien Millet <
>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Zachary, sorry for disturbing again..
>>>>>>>>>
>>>>>>>>> I've fixed some problems (StringTable, SectionMap and few things
>>>>>>>>> here and there..) and my converted PDB seems now to work inside visual
>>>>>>>>> studio..
>>>>>>>>> But I'm not sure if I have full debug features because I don't
>>>>>>>>> succeed to translate Publics and Globals correctly. CVDump says PDB is
>>>>>>>>> corrupted whereas PDBUTIL -dump correctly displays them.
>>>>>>>>> I don't really understand what Publics and Globals stream really
>>>>>>>>> are, if the symbols are really in the corresponding streams or if they are
>>>>>>>>> just references to somewhere else.
>>>>>>>>> The LLVM documentation is not complete about these two Publics and
>>>>>>>>> Globals stream so I'm a bit lost on how to handle them or find what is
>>>>>>>>> "corrupted" according to CVDump.
>>>>>>>>> I took example on LLD and yaml2pdb to help me to do some tough
>>>>>>>>> conversions but I noticed that in yaml2pdb there is no GsiStream exported
>>>>>>>>> (no GsiBuidler use and no reference to Publics or Globals anywhere), is it
>>>>>>>>> wanted/correct ?
>>>>>>>>> Thanks and sorry If I'm a bit spaming, it's my 99% time task right
>>>>>>>>> now and being stuck without any clue is difficult :) But I guess you
>>>>>>>>> experienced even more suffering when documentation didn't exist at all !
>>>>>>>>> Have a good day !
>>>>>>>>>
>>>>>>>>> Le dim. 20 janv. 2019 à 22:27, Vivien Millet <
>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>
>>>>>>>>>> ERRATUM, my bad, the pdb I tested is also corrupted according to
>>>>>>>>>> cvdump.exe, I on't know why, I regenerated again and now I have a working
>>>>>>>>>> dump. You don't need to fix anything.
>>>>>>>>>>
>>>>>>>>>> Le dim. 20 janv. 2019 à 20:26, Vivien Millet <
>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>
>>>>>>>>>>> Hi Zachary,
>>>>>>>>>>> I've done a first step to rewrite  existing PDBFile with
>>>>>>>>>>> PDBFileBuilder, I get mostly of the work done but I don't get as much
>>>>>>>>>>> output as input (some streams are not mirrored for unknown reasons and some
>>>>>>>>>>> data must be missing here and there...).
>>>>>>>>>>> When I try to replace the original by the rebuilt one for
>>>>>>>>>>> debugging, the pdb loads well but breakpoints failed to activate with a
>>>>>>>>>>> "unexpected symbol reader error while processing foobar.exe". You probably
>>>>>>>>>>> know what it means or already encountered this error I guess.
>>>>>>>>>>> I also tried to create a minimal program to simplify comparisons
>>>>>>>>>>> between original and new PDB but I get an error dumping the original  pdb
>>>>>>>>>>> exported by visual studio  with -all (PublicsStream.cpp|98). I think it is
>>>>>>>>>>> a bug.
>>>>>>>>>>> I've attached the related main.cpp and PDB to this email if you
>>>>>>>>>>> want to check what is the error exactly (vs2017, x86 and x64 have same
>>>>>>>>>>> issues).
>>>>>>>>>>> I've attached also my code (git diff). I added an « identity »
>>>>>>>>>>> feature to pdbutil which uses the code I wrote to regenerate the input pdb.
>>>>>>>>>>> You can use it to see what I get so far..
>>>>>>>>>>> I’ve seen you added recently a fix related to FPO but you say
>>>>>>>>>>> it’s only for x86 so I don’t think it would change something but who knows..
>>>>>>>>>>> Anyway, if you have a moment to check my work so far and give me
>>>>>>>>>>> feedbacks it’s welcome because I get out of ideas about what goes wrong..
>>>>>>>>>>> Thanks, I go back digging into the pdb mysteries !
>>>>>>>>>>>
>>>>>>>>>>> Le ven. 18 janv. 2019 à 12:31, Vivien Millet <
>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Ok ! It was just to be sure I understood well.
>>>>>>>>>>>> Sorry for not replying directly, I wanted to try first to emit
>>>>>>>>>>>> CodeView before continuing the discussion and it was time for me to go to
>>>>>>>>>>>> bed here..
>>>>>>>>>>>> I just tried it now and it is very easy to switch to CodeView.
>>>>>>>>>>>> For the ones interested : you just have to give your TargetTriple to your
>>>>>>>>>>>> llvm::Module used for JIT and then call
>>>>>>>>>>>> module->addModuleFlag(llvm::Module::Warning, "CodeView", 1) to tell the
>>>>>>>>>>>> AsmPrinter this module prefer CodeView instead of Dwarf.
>>>>>>>>>>>> I've checked the content of my .obj file, and there is valid
>>>>>>>>>>>> .debug$T and  .debug$S sections, so everything goes well until now.
>>>>>>>>>>>> Now as a parallel task I will try to read the EXE PDB and
>>>>>>>>>>>> re-export it "as it" to see if I break something in visual studio.
>>>>>>>>>>>> If I succeed to do that, that might be added as a feature to
>>>>>>>>>>>> PDBFile or PDBFileBuilder to simplify the process for other users.
>>>>>>>>>>>> I keep you in touch.
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:50, Zachary Turner <
>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> When I say "nothing to do" I just mean that you won't have to
>>>>>>>>>>>>> do anything to convert the record from one format (DWARF) to another format
>>>>>>>>>>>>> (CodeView).  You will have a COFF object file either on disk (probably
>>>>>>>>>>>>> named foo.obj or something) or in memory.  And this object file will have a
>>>>>>>>>>>>> .debug$S section with CodeView symbols and a .debug$T section with CodeView
>>>>>>>>>>>>> types.  Then you will still need to use the PDBFileBuilder to add these
>>>>>>>>>>>>> records to the final PDB, but they will already be in the correct format
>>>>>>>>>>>>> that PDBFileBuilder expects, you won't need to convert them from DWARF
>>>>>>>>>>>>> (which is not trivial).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 11:26 AM Vivien Millet <
>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That’s a good question, by default when emitting the object
>>>>>>>>>>>>>> file I choose COFF but it embeds dwarf and not codeview in the end.. there
>>>>>>>>>>>>>> probably is a way to do it or at least it must be implemented if not yet..
>>>>>>>>>>>>>> Lets imagine I manage to do that.. when you say there is
>>>>>>>>>>>>>> nothing to do, I still must have a PDBFileBuilder to copy the codeview data
>>>>>>>>>>>>>> inside the EXE PDB right ? I cannot insert them easily in the EXE PDB with
>>>>>>>>>>>>>> another way ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 20:01, Zachary Turner <
>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, is it possible to just hook up the CodeView debug info
>>>>>>>>>>>>>>> generator to MCJIT?  If you're not jitting, and you just compile something,
>>>>>>>>>>>>>>> we translate all of the LLVM metadata into CodeView in the file
>>>>>>>>>>>>>>> CodeViewDebug.cpp.  Then, the object file just already has CodeView in it.
>>>>>>>>>>>>>>> If it's not hard to do, this would probably be a better solution, because
>>>>>>>>>>>>>>> you don't have to worry about *how* to translate DWARF into CodeView, which
>>>>>>>>>>>>>>> is not always trivial.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you can configure this in MCJIT, you won't even need to
>>>>>>>>>>>>>>> do anything, you can just open the ObjectFile, look for the .debug$T and
>>>>>>>>>>>>>>> .debug$S sections, iterate over each one and re-write their TypeIndices
>>>>>>>>>>>>>>> while copying them to the output PDB file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:52 AM Vivien Millet <
>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok I understand more what you meant. In fact I don’t care
>>>>>>>>>>>>>>>> about the pdb size, at least as a first step, so it won’t be a problem for
>>>>>>>>>>>>>>>> me to have duplicated symbols. Concerning TypeIndices my plan if possible
>>>>>>>>>>>>>>>> is not to generate a pdb for my jit and merge it, but instead directly
>>>>>>>>>>>>>>>> extract debug info from a DwarfContext just after llvm::object::ObjectFile
>>>>>>>>>>>>>>>> is emitted by the JIT engine and complete the EXE PDB I had rebuilt with
>>>>>>>>>>>>>>>> PDBFileBuilder. Does it sounds a good bet to you ? If I succeed doing that
>>>>>>>>>>>>>>>> I think that could be a good extension to the debugging possibilities of
>>>>>>>>>>>>>>>> MCJit if not being an extension to pdbutil.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:37, Zachary Turner <
>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, for example the TPI stream is just one big
>>>>>>>>>>>>>>>>> collection of types.  Presumably your JIT code will reuse some of the same
>>>>>>>>>>>>>>>>> types (perhaps, std::string for example) as your non-jitted code.  Your
>>>>>>>>>>>>>>>>> jitted symbol records in the object file (for example, a local variable of
>>>>>>>>>>>>>>>>> type std::string in your jitted code) will refer to the type for
>>>>>>>>>>>>>>>>> std;:string by a TypeIndex, and your original PDB will also refer to
>>>>>>>>>>>>>>>>> std::string by a different TypeIndex.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In LLD, when we merge in types and symbols from each
>>>>>>>>>>>>>>>>> object file, we keep a hash table of which types have already been seen, so
>>>>>>>>>>>>>>>>> that if we see the same type again, we can just use the TypeIndex that we
>>>>>>>>>>>>>>>>> wrote on a previous object file.  Then, when we add symbol records, we have
>>>>>>>>>>>>>>>>> to update its fields that used the old TypeIndex to use the new TypeIndex
>>>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> De-duplicating though, I suppose, is not strictly
>>>>>>>>>>>>>>>>> necessary, it will just keep your PDB size down.  But you *will* need to at
>>>>>>>>>>>>>>>>> least re-write the TypeIndexes from the jitted code.  For example, you may
>>>>>>>>>>>>>>>>> decide that instead of de-duplicating, you just append them all to the end
>>>>>>>>>>>>>>>>> of the TPI stream (where all the types go in PDB) to keep things simple.
>>>>>>>>>>>>>>>>> Since they were in a different position before, they now have different
>>>>>>>>>>>>>>>>> TypeIndices.  So you will need to re-write all TypeIndices so that they are
>>>>>>>>>>>>>>>>> correct after the merge.   Both types and symbols can refer to types, so
>>>>>>>>>>>>>>>>> you will need to do this both for the types of the jitted code as well as
>>>>>>>>>>>>>>>>> the symbols of the jitted code.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know if that makes sense.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:24 AM Vivien Millet <
>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ok I see..
>>>>>>>>>>>>>>>>>> what do you mean by “making sure to de-duplicate records
>>>>>>>>>>>>>>>>>> as necessary” ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Le jeu. 17 janv. 2019 à 19:09, Zachary Turner <
>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It's possible in theory to support incremental updates
>>>>>>>>>>>>>>>>>>> to a PDB (the file format is designed specifically with that in mind).  But
>>>>>>>>>>>>>>>>>>> this functionality was never added to the PDB library since lld doesn't
>>>>>>>>>>>>>>>>>>> support incremental linking, we never really needed it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The "dumb" way would be to just create a new PDB file,
>>>>>>>>>>>>>>>>>>> build it using the old contents and the new contents (making sure to
>>>>>>>>>>>>>>>>>>> de-duplicate records as necessary).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Supporting incremental updates should be possible, but
>>>>>>>>>>>>>>>>>>> most of LLVM's File I/O abstractions are based around mmapping a file and
>>>>>>>>>>>>>>>>>>> writing to it, which doesn't work when you don't know the file size in
>>>>>>>>>>>>>>>>>>> advance.  So there would be some interesting problems to solve here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jan 17, 2019 at 10:03 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Zachary !
>>>>>>>>>>>>>>>>>>>> If there a way to easily create a new PDBFileBuilder
>>>>>>>>>>>>>>>>>>>> from an existing PDBFile or can/should I do the translation myself ?
>>>>>>>>>>>>>>>>>>>> I would like to start from a builder filled with the
>>>>>>>>>>>>>>>>>>>> EXE PDB data and then complete its DBI stream with the JIT module/symbols.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 23:41, Vivien Millet <
>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> a écrit :
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you Zachary !
>>>>>>>>>>>>>>>>>>>>> I will have some soon I think ..
>>>>>>>>>>>>>>>>>>>>> I first need to explore the llvmpdb-util code more
>>>>>>>>>>>>>>>>>>>>> because I don't even know where to start with the PDB api..
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:51, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Sure. Along the way I’m happy to answer any specific
>>>>>>>>>>>>>>>>>>>>>> questions you might have too even if it’s for your downstream project
>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:38 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I would be up to improve pdbutil but I doubt I have
>>>>>>>>>>>>>>>>>>>>>>> enough knowledge or time to provide the complete merge feature, it would
>>>>>>>>>>>>>>>>>>>>>>> still be a very specific kind of merge as you describe it. Anyway I could
>>>>>>>>>>>>>>>>>>>>>>> start trying to do it in my jit compiler and then, once I get something
>>>>>>>>>>>>>>>>>>>>>>> working (if that happens :)), i can come back to you with the piece of code
>>>>>>>>>>>>>>>>>>>>>>> and see if it is worth integrating it to pdbutil and how ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Le mer. 16 janv. 2019 à 22:12, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Well, that’s certainly possible, but improving
>>>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil is another possibility. Doing it directly in your jit compiler
>>>>>>>>>>>>>>>>>>>>>>>> will probably save you time though, since you won’t have to worry about
>>>>>>>>>>>>>>>>>>>>>>>> writing tests and going through code review
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 16, 2019 at 1:01 PM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the tips !
>>>>>>>>>>>>>>>>>>>>>>>>> When you talk about doing all of this I suppose
>>>>>>>>>>>>>>>>>>>>>>>>> you think about using llvm/debuginfo/pdb, pick code here and there to
>>>>>>>>>>>>>>>>>>>>>>>>> generate the pdb in memory, read the executable one and perform the merge
>>>>>>>>>>>>>>>>>>>>>>>>> directly in my jit compiler, right ? Not using pdbutil ?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Le mar. 15 janv. 2019 à 22:49, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>>>>>> zturner at google.com> a écrit :
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jan 15, 2019 at 2:50 AM Vivien Millet <
>>>>>>>>>>>>>>>>>>>>>>>>>> vivien.millet at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Zachary !
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your time !
>>>>>>>>>>>>>>>>>>>>>>>>>>> So you are one of the happy guys who suffered
>>>>>>>>>>>>>>>>>>>>>>>>>>> from the lack of PDB format information :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, that would be me :)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> To be honest I'm really a beginner in the PDB
>>>>>>>>>>>>>>>>>>>>>>>>>>> stuff, I just read some llvm documentation to understand what went wrong
>>>>>>>>>>>>>>>>>>>>>>>>>>> when merging my PDBs.
>>>>>>>>>>>>>>>>>>>>>>>>>>> In my case, what I do with my team and try to
>>>>>>>>>>>>>>>>>>>>>>>>>>> achieve is this :
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Run our application under a visual studio
>>>>>>>>>>>>>>>>>>>>>>>>>>> debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Generate JIT code ( using llvm MCJIT  )
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Then, either :
>>>>>>>>>>>>>>>>>>>>>>>>>>>    - export as COFF obj file with dwarf
>>>>>>>>>>>>>>>>>>>>>>>>>>> information and then convert it with cv2pdb to obtain a pdb of my JIT
>>>>>>>>>>>>>>>>>>>>>>>>>>> symbols (what I do now)
>>>>>>>>>>>>>>>>>>>>>>>>>>>    - export directly to PDB my JIT debug info
>>>>>>>>>>>>>>>>>>>>>>>>>>> (what i would like to do, if you have an idea how..)
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Detach the visual studio debugger
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Merge my JIT pdb into a copy of the executable
>>>>>>>>>>>>>>>>>>>>>>>>>>> pdb (where things start to go bad..)
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Replace original executable by the copy
>>>>>>>>>>>>>>>>>>>>>>>>>>> (creating a backup of original)
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Reattach  the visual studio debugger to my
>>>>>>>>>>>>>>>>>>>>>>>>>>> executable (loading the new pdb version)
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Debug JIT code with visual studio.
>>>>>>>>>>>>>>>>>>>>>>>>>>> - On each JIT rebuild, restart these steps from
>>>>>>>>>>>>>>>>>>>>>>>>>>> the original native executable PDB to avoid merge conflict between the
>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple JIT iterations
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Yea, it's an interesting use case.  It makes me
>>>>>>>>>>>>>>>>>>>>>>>>>> think it would be nice if the PDB format supported some way of having a
>>>>>>>>>>>>>>>>>>>>>>>>>> symbol which simply refers to another PDB file, that way you could re-write
>>>>>>>>>>>>>>>>>>>>>>>>>> that PDB file at runtime once all your code is jitted, and when the
>>>>>>>>>>>>>>>>>>>>>>>>>> debugger tries to look up that symbol, it finds a record that tells it to
>>>>>>>>>>>>>>>>>>>>>>>>>> go check the other PDB file.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> So, here are the things I think you would need to
>>>>>>>>>>>>>>>>>>>>>>>>>> do:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Create a JIT module in the module list with a
>>>>>>>>>>>>>>>>>>>>>>>>>> unique name.  All symbols will go here.  llvm-pdbutil dump -modules shows
>>>>>>>>>>>>>>>>>>>>>>>>>> you the list.  Be careful about putting it at the end though, because
>>>>>>>>>>>>>>>>>>>>>>>>>> there's already one at the end called * LINKER * that is kind of special.
>>>>>>>>>>>>>>>>>>>>>>>>>> On the other hand, you don't want to put it first because it means you will
>>>>>>>>>>>>>>>>>>>>>>>>>> have to do lots of fixups on the EXE PDB.  It's probably best to add it
>>>>>>>>>>>>>>>>>>>>>>>>>> right before the linker module, this has the least chance of breaking
>>>>>>>>>>>>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 2) In the debug stream for this module, add all
>>>>>>>>>>>>>>>>>>>>>>>>>> symbols.  You will need to fix up their type indices.  As you noticed,
>>>>>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil already merges type information from the JIT PDB, so after
>>>>>>>>>>>>>>>>>>>>>>>>>> merging the type indices in the EXE PDB will be different than they were in
>>>>>>>>>>>>>>>>>>>>>>>>>> the JIT PDB, but the symbol records will refer to the JIT PDB type
>>>>>>>>>>>>>>>>>>>>>>>>>> indices.  So these need to be fixed up.  LLD already has code to do this,
>>>>>>>>>>>>>>>>>>>>>>>>>> you can probably borrow a similar algorithm with some slight modifications
>>>>>>>>>>>>>>>>>>>>>>>>>> (lldb/COFF/PDB.cpp, search for mergeSymbolRecords)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Merge in the new section contributions and
>>>>>>>>>>>>>>>>>>>>>>>>>> section map.  See LLD again for how to modify these.  Hopefully the object
>>>>>>>>>>>>>>>>>>>>>>>>>> file you exported contains relocated symbol addresses so you don't have to
>>>>>>>>>>>>>>>>>>>>>>>>>> do any fixups here.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 4) Merge in the publics and globals.  This
>>>>>>>>>>>>>>>>>>>>>>>>>> shouldn't be too hard, I think you can just iterate over them in the JIT
>>>>>>>>>>>>>>>>>>>>>>>>>> PDB and add them to the new EXE PDB.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> You're kind of in uncharted territory here, so
>>>>>>>>>>>>>>>>>>>>>>>>>> this is just a rough idea of what needs to be done.  There may be other
>>>>>>>>>>>>>>>>>>>>>>>>>> issues that you don't encounter until you actually try it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately I don't personally have the time to
>>>>>>>>>>>>>>>>>>>>>>>>>> work on this, but it sounds neat, and I'm happy to help if you run into
>>>>>>>>>>>>>>>>>>>>>>>>>> questions or problems along the way.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190128/d1b93efc/attachment-0001.html>