<div dir="ltr">Hello Zachary ! <br>Thanks for your time !<br>So you are one of the happy guys who suffered from the lack of PDB format information :)<br>To be honest I'm really a beginner in the PDB stuff, I just read some llvm documentation to understand what went wrong when merging my PDBs.<br>In my case, what I do with my team and try to achieve is this :<br>- Run our application under a visual studio debugger<br>- Generate JIT code ( using llvm MCJIT  )<br>- Then, either :<br>   - export as COFF obj file with dwarf information and then convert it with cv2pdb to obtain a pdb of my JIT symbols (what I do now)<br>   - export directly to PDB my JIT debug info (what i would like to do, if you have an idea how..)<br>- Detach the visual studio debugger <br>- Merge my JIT pdb into a copy of the executable pdb (where things start to go bad..)<br>- Replace original executable by the copy (creating a backup of original)<br>- Reattach 


the visual studio debugger to my executable (loading the new pdb version)<br>- Debug JIT code with visual studio.<br>- On each JIT rebuild, restart these steps from the original native executable PDB to avoid merge conflict between the multiple JIT iterations<br><br>So, concerning the three stages you describe:<br>- 1) : I would be even more naive : I would consider every module as a new module without trying to merge them by name (but I might be too naive..)<br>- 2) and 3) : Same here, in my case I won't have same symbols/modules conflicting, it is impossible, so I would choose again the simplest and naive case : by default every symbol is always a new symbol (addition). 


Options for 'merge' feature could be the classical mathematics group operations : addition(default)/union/intersection. And this option could apply to different level of merge (modules -> symbols -> etc).<br>I'm not a warrior of PDB like you and I don't know what consequences there is to choose one or another way on the PDB final structure and readability. <br>Then I trust you on the good choice to take.<div><br>Do you think what I try to achieve is doable ? Would you help me to do it ?<br><br>Thank you<br><br>PS : BTW, If you or someone knows another (better/easier) way to debug MCJIT code with visual studio, I'm really open to hear about it !  <br> </div></div><br><div class="gmail_quote"><div dir="ltr">Le lun. 14 janv. 2019 à 22:49, Zachary Turner <<a href="mailto:zturner@google.com" target="_blank">zturner@google.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Yes I am the person who wrote this feature (along with most other PDB-related features).<div><br></div><div>I thought about some and I think it's a bit hard (if not impossible) to merge PDBs in this way.  Here's a short list of things I came up with</div><div><br></div><div>1) We need to merge the list of modules.  This requires first detecting if two modules are actually the same.  For example, if I run llvm-pdbutil on a random PDB on my disk, I get this (output is trimmed for brevity)</div><div><br></div><div>$ llvm-pdbutil.exe dump -modules bin\not.pdb<br></div><div><br></div><div><div>                          Modules</div><div>============================================================</div><div>Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:</div><div>  Obj: `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:</div><div>  debug stream: 14, # files: 80, has ec info: false</div><div>  pdb file ni: 0 ``, src file ni: 0 ``</div><div>Mod 0001 | `lib\Support\CMakeFiles\LLVMSupport.dir\Program.cpp.obj`:</div><div>Obj: `D:\src\llvmbuild\cl\Debug\x64\lib\LLVMSupport.lib`:</div><div>debug stream: 46, # files: 102, has ec info: false</div><div>pdb file ni: 0 ``, src file ni: 0 ``</div></div><div><br></div><div><br></div><div>The easiest thing to do is consider them to be the same only if both the module name and object name are identical, but depending on your use case this might not be sufficient (for example what if the 2 PDBs were built in different output directories, then you might have `D:\foo\not.cpp.obj` in one PDB and `D:\bar\not.cpp.obj` in another one.  So we would need to find a solution that makes sense here.</div><div><br></div><div>2) When two modules are the same, we need to merge their file list and debug stream.  In the above example:</div><div><br></div><div><div>$ llvm-pdbutil.exe dump -files -modi=0 bin\not.pdb</div><div>                           Files<br></div><div>============================================================</div><div>Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:</div><div>- (MD5: 2FE06AF7EACFB232C6FF033DBFC4412E) c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\stdexcept</div><div>- (MD5: 0B299654FBC61F03E9533F9296BBD2B3) c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\xstring</div></div><div>etc...</div><div><br></div><div><div>$ llvm-pdbutil.exe dump -symbols -modi=0 bin\not.pdb</div><div>                          Symbols<br></div><div>============================================================</div><div>  Mod 0000 | `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`:</div><div>       4 | S_OBJNAME [size = 80] sig=0, `D:\src\llvmbuild\cl\Debug\x64\utils\not\CMakeFiles\not.dir\not.cpp.obj`</div><div>      84 | S_COMPILE3 [size = 60]</div><div>           machine = intel x86-x64, Ver = Microsoft (R) Optimizing Compiler, language = c++</div><div>           frontend = 19.16.27024.1, backend = 19.16.27024.1</div><div>           flags = security checks | hot patchable</div><div>     144 | S_UNAMESPACE [size = 20] `__vc_attributes`</div></div><div><br></div><div>The file list is easy, but for the symbol records, some of these records might be the same in 2 different object files, and some might be different.  So we need to de-duplicate them into the final PDB.  LLD actually already does this, so a lot of the code for this portion is probably already written in LLD.   See PDBLinker::mergeSymbolRecords in lld/COFF/PDB.cpp.  The algorithm is slightly different when merging 2 PDBs, but that's the general idea.</div><div><br></div><div><br></div><div>3) We need to merge the publics and globals stream, similar to the above.</div><div><br></div><div><br></div><div>For #2 and #3 above, this is going to be tricky.  How do you know if 2 symbols are actually the same symbol?  Even if they have the same name it might, for example, be a symbol for a certain function F.  Suppose the first PDB is for executable A, and the second PDB is for executable B.  What if the generated code for function F in executable A differs from the generated code for F in executable B?  Does that end up as two symbols in the merged PDB or 1?  I'm not sure if there's a good way to handle this.</div><div><br></div><div>I guess it might help to know more about your intended use case.  Then we might be able to make some simplifications to the problem that would allow us to decide on a reasonable solution.</div></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Jan 14, 2019 at 5:39 AM Vivien Millet <<a href="mailto:vivien.millet@gmail.com" target="_blank">vivien.millet@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Were you the man in charge of this feature ? If not, do you know who was in charge (to see what could be the best way / what is missing to complete this feature) ?</div><br><div class="gmail_quote"><div dir="ltr">Le lun. 24 déc. 2018 à 02:01, Zachary Turner <<a href="mailto:zturner@google.com" target="_blank">zturner@google.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The merge feature was implemented primarily for testing but was never really productionized, so your guess about what the underlying problem is sounds correct to me.  We could probably hide the subcommand so users don’t accidentally use it, or if someone wants to properly implement the missing features, that would be even better <br><div class="gmail_quote"><div dir="ltr">On Sat, Dec 22, 2018 at 10:48 AM Vivien Millet via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">When trying to merge 2 pdbs which have each their own DBI stream, I endup with a pdb with an inconsistent number of stream and no DBI stream (or at least not at fixed index 3, producing a corrupt error when dumping with -l).<br>Looking at the code, it seems that we don't merge other streams than TPI and IPI streams, am I right ? <br>Is the "merge" feature completely implemented ?<br>Thanks</div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>

</blockquote></div>

</blockquote></div>

</blockquote></div>