<br><br><div class="gmail_quote">On Fri Dec 12 2014 at 2:26:38 PM Argyrios Kyrtzidis <<a href="mailto:kyrtzidis@apple.com">kyrtzidis@apple.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div>On Dec 12, 2014, at 12:58 PM, Eric Christopher <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>> wrote:</div><br><div><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">On Fri Dec 12 2014 at 12:35:51 PM Argyrios Kyrtzidis <<a href="mailto:kyrtzidis@apple.com" target="_blank">kyrtzidis@apple.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div>On Dec 12, 2014, at 10:42 AM, Adrian Prantl <<a href="mailto:aprantl@apple.com" target="_blank">aprantl@apple.com</a>> wrote:</div><br><div><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br>On Dec 11, 2014, at 5:07 PM, Argyrios Kyrtzidis <<a href="mailto:kyrtzidis@apple.com" target="_blank">kyrtzidis@apple.com</a>> wrote:<br><br><br><blockquote type="cite">On Dec 11, 2014, at 4:30 PM, Adrian Prantl <<a href="mailto:aprantl@apple.com" target="_blank">aprantl@apple.com</a>> wrote:<br><br><blockquote type="cite"><br>On Dec 11, 2014, at 3:37 PM, Argyrios Kyrtzidis <<a href="mailto:kyrtzidis@apple.com" target="_blank">kyrtzidis@apple.com</a>> wrote:<br><br><br><blockquote type="cite">On Dec 11, 2014, at 3:08 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:<br><br>On Thu, Dec 11, 2014 at 3:00 PM, Argyrios Kyrtzidis <<a href="mailto:kyrtzidis@apple.com" target="_blank">kyrtzidis@apple.com</a>> wrote:<br><br><blockquote type="cite">On Dec 11, 2014, at 2:59 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:<br><br>On Thu, Dec 11, 2014 at 2:40 PM, Argyrios Kyrtzidis <<a href="mailto:kyrtzidis@apple.com" target="_blank">kyrtzidis@apple.com</a>> wrote:<br>The .pcm file is currently independent of debug info, meaning the compiler invocation will be able to use the same .pcm file regardless of whether the invocation had enabled debug info or not;<br><br>We can't use the same .pcm file for -DNDEBUG vs -UNDEBUG builds. Do we ever get to reuse a .pcm file like this in practice?<br></blockquote><br>You can choose to add, or not to add, debug info to a release build.<br><br>Sure, I don't dispute that this .pcm reuse can happen in theory. But what I'm wondering is: Does this actually happen in practice? How often? Is this case worth optimizing for?<br><br>There are other things I'd like to bundle with a .pcm file (.o and .ir code for inline functions, for instance) that would also benefit from using an ELF wrapper format, and would also vary based on clang's CodeGen options. One possible approach would be to have (at least) two files -- one CodeGen-independent AST file, and one CodeGen-dependent file containing all the other bits -- but that seems to introduce complexity that is unnecessary in almost all cases. (Also note that even flags like -O or -fsanitize=address cause us to build different .pcm files today, because they affect preprocessor macros.)<br></blockquote><br>I don’t see the reason to make the module file itself the container, particularly when whatever the container may contain doesn’t affect in any way the semantic info that the module file is supposed to provide, we just proliferate module files and/or rebuild module files unnecessarily.<br>It’s true that the situation is not ideal currently and we have -O[1 ~ 3] reusing the .pcm but -Os does not, but in the future we could try to address this, not make the situation fundamentally worse and inescapable. I’d like that modules not turn into a “glorified PCH system" where there is practically zero re-use for them.<br><br>Back to the debug info, why not have the container like this<br><br>Foundation.pcm.o<br> \<br>Foundation.pcm<br><br>where the container references the .pcm file, and you can put the debug info in it (or ir later on).<br><br>Debug info can reference Foundation.pcm.o and get extended to handle the serialized AST from .pcm.<br></blockquote><br>At least for the debug info I was hoping that it would be cheap enough to always emit it together with the pcm, especially given that modules are being rebuilt comparatively infrequently. The module debug info is essentially just an alternative encoding of the types provided by the module, and exactly same conditions that trigger a re-generation of the .pcm today would also necessitate re-generating the debug info.<br><br>If we do want to hold on to the ability of having modules without debug info this approach does appears a little less practical.<br><br>Having two separate files complicates the module rebuild stage a bit and we will need to also store a hash of the module to make sure the two files are in sync, but it would certainly be doable. It would probably be easier to provide non-debuggable modules this way.<br><br>Are modules without debug info desirable? Translating types to DWARF is relatively cheap and it is my understanding that modules are not rebuilt very often and since the module cache is shared across all projects.<br></blockquote><br>I actually love the idea of .o for a module, but I see it more of a ‘codegen’ container where we can put more in it than just debug info. Codegen in lots of areas is necessarily more specific than the .pcm needs to be, so I’d suggest that we put the building blocks now to have a .o container reference the .pcm file and we can build on top of it.<br></blockquote><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Although there are only very few opt-level-dependent codegen differences in clang, they do exist, and if we want to prepare for a future were modules come with (bit)code having the AST and the products of codegen in different files sounds reasonable. If it were only for the debug type information, I don’t think the separation makes sense; it’s just a subset of the AST encoded in DWARF. But if we want to plan ahead for a future were we have several different .pcm.o files per .pcm file, at the first glance, having the debug info split across the .pcm (for types) and the .pcm.o (line tables for the code) appears a bit complicated. Then again, if we do put the the debug _type_ information into the various .pcm.o files, we would end up with duplicate type information in all the different .pcm.O0.o, .pcm.O3.o, etc., which is also a little wasteful, but not terrible.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>If debug info is efficiently generated then the duplication won’t matter much right ? The complication of trying to separate debug line tables from debug type-info seems to me to indicate that putting debug info the .pcm is less future-proof reasonable if we can’t share it. In any case, even if we want to share the type-info among different line tables, separating the type-info into a .o outside the .pcm file doesn’t exclude this option.</div></div></div></blockquote><div><br></div><div>If the module is efficiently generated then adding the debug info into it won't matter right?</div><div><br></div><div>I.e. I'm not sure where you're going with this line of reasoning. Basically if you want to have two modules, one for debug and one for non-debug that's also ok, but I still don't see a reason to put the debug info in a separate file.</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>No, I don’t want to have two module files, one with debug info and one without. My arguments for separating are:</div><div>- It’s worthwhile to keep the concepts, module semantic info and debug info, separate because debug info is inherently a codegen concept. It is possible that there may be a codegen flag that affects debug info (for argument’s sake something related to ABI) but will not affect the modules’ semantic info. This will allow us to deal with module sharing without worrying about codegen.</div></div></div></blockquote><div><br></div><div>Debug info for types isn't inherently a code generation concept. If you think about it, debug info for types is a stable (if lossy) serialization method for a module file. The line number etc for when there's code generated is a separate issue.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div>- It’s likely that a codegen container for modules will be extended in the future with more content tied to codegen flags, so we might as well create the codegen container separate now.</div></div></div></blockquote><div><br></div><div><span style="font-size:13.1999998092651px">Possibly.</span></div><div><span style="font-size:13.1999998092651px"> </span><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div>- We don’t need to generate debug info for builds that were not having debug info generated before, this is additional work by the compiler that has no use.</div></div></div></blockquote><div><br></div><div>This I agree with. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><br></div><div>The argument I’ve seen to tie debug info with the .pcm is that they are going to be associated together. This is not a complicated problem, .pcm files are tied to each other also (using a hash), we know how to handle this.</div></div></div><div style="word-wrap:break-word"><div><br><blockquote type="cite"><div><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><br></div><blockquote type="cite"><div><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">My personal vote would still be to unconditionally put the debug type information in the .pcm file, but I don’t feel very strong about it.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>I’m concerned about this because a release build than never enables debug info should not have debug info getting generated. In the past couple of years we have gradually regressed compile-time performance of clang and this is a fact; I believe one part is a consequence of not being prudent enough and sticking to “pay only what you use”.</div></div></div></blockquote><div><br></div><div>I don't see module creation as an issue here and I think this is a red herring.</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>I don’t follow what you mean, generating debug info unconditionally, even when it was not requested, is work that was not occurring before and has no use. Is “red herring” meaning you think the additional work will be insignificant ? I’d like to see the data against Cocoa.</div></div></div><div style="word-wrap:break-word"><div><br></div></div></blockquote><div><br></div><div>No one has been providing numbers in this thread :) I don't personally feel creating debug info is particularly time consuming, but I don't have hard numbers handy - especially the pcm->debug info translation as that would be talking about a tool that doesn't exist yet (or a mode in the compiler that doesn't exist yet).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><span style="font-size:13.1999998092651px"> </span></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br><blockquote type="cite"><div><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">How does this sound: Initially, we put the debug type information into a separate .pcm.o file, and then, after we have same data about how expensive it is to generate it, we can move it into the .pcm file as an optimization, so the debug type info can be shared across all build configurations?</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>SGTM.</div></div></div><div style="word-wrap:break-word"><div><br></div></div></blockquote><div><br></div><div>How do you see this working with the general edit-compile-debug scheme?</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word"><div><div>Could you be more specific, not sure what the issue will be regarding edit-compile-debug.</div></div></div></blockquote><div><br></div><div>You haven't provided a rough outline of how the compilation pipeline will work with your changes.</div><div><br></div><div>That said, I'll do so now, this is roughly the way I'd originally envisioned it, but we went to the single file to minimize dependencies and inter-references between lots of files.</div><div><br></div><div>That said, here's a plot for how this could work:</div><div><br></div><div>Compilation happens from top to bottom.</div><div><br></div><div>                            .pcm</div><div>                               /\</div><div>                             /    \</div><div>                           /        \</div><div>                    use .o  -   debug .o</div><div><br></div><div>use.o: translation unit that includes a pcm</div><div>debug.o debug info for pcm file (possibly more, see below)</div><div><br></div><div>So when we create (implicit or explicit module build - keep in mind we care about more than the implicit strategy here) the pcm file we store it off to the side. When we create a use with debug information this will need to trigger in the build system (somehow) that we need to create a debug file for the pcm file. This debug file could/should also contain things like inline functions that are required to be available externally, though in the degenerate case (because inline functions are everywhere) we have just debug info for the types (created by our doesn't exist yet tool). Then, from there, the use will necessarily need to reference the debug file (which will likely be a .o file) by name and location so that the use .o knows where to look for the debug information (and that skeleton is linked into the final binary).</div><div><br></div><div>This is going to involve some secret/implicit communication from the build system to the use .o (the translation unit) so that it knows the name and location of the debug .o so that it can be referenced. The debug .o will need to know the name and location of the pcm file so that uses like are imagined in lldb will work with pcm files and can be found from the executable. (We may or may not want this full chain, we might want to put both into use.o and debug .o but this can be argued either way).</div><div><br></div><div>This doesn't include a possible additional file of IR (let's call it .ir) that can be constructed from the pcm file and have available externally functions used when compiling use .o, but we can put that as an extra height to the tree. Avoiding a lot of this dependency analysis and extra actions was why we were originally going down the "debug info is in the pcm file" path, but it's not necessary.</div><div><br></div><div>So, that's roughly what the new plan here was looking like if we split things out to multiple files. It avoids consideration of things like <span style="font-size:13.1999998092651px">pcm files being used when compiling for multiple architectures and a few other things that'll be tricky in the long run, but can probably be thought about later. Given I'm not doing most of the heavy lifting I'm just arguing the cost/benefits of various strategies.</span></div><div><span style="font-size:13.1999998092651px"><br></span></div><div><span style="font-size:13.1999998092651px">-eric</span></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><br></div><blockquote type="cite"><div><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><br></div><div>-eric</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">-- adrian</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br><blockquote type="cite"><br>-- adrian<br><br><blockquote type="cite"><br><br><blockquote type="cite"><blockquote type="cite">with this change if an invocation had built a module file with debug info disabled, it would be inapplicable to the same invocation that had debug info enabled and would have to rebuild it; essentially we are tying module building with debug info. The module file as the “collection of semantic info” is conceptually independent from debug info.<br><br>Did you consider having the debug info container being another file (e.g. besides the .pcm) that will reference the .pcm file ? This way, instead of having to update all users of module files, regardless if they care about debug info or not, you’d just make debug info another user of .pcm files, no more special than the others.<br><br><blockquote type="cite">On Dec 10, 2014, at 2:27 PM, Adrian Prantl <<a href="mailto:aprantl@apple.com" target="_blank">aprantl@apple.com</a>> wrote:<br><br>Hi everyone,<br><br>As the first step in preparation for module debugging (see<span> </span><a href="http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html</a>) this patch turns the *.pcm files that are used to store clang modules and precompiled headers in a platform-dependent Mach-O/ELF/COFF container, so that eventually we will be able to store debug information alongside the module in the same file.<br><br>This is implemented by using the standard LLVM code generation machinery. Instead of directly writing to the output file, the serialized AST blob is attached to an empty llvm::Module as a ModuleFlag. The module is passed to the backend which emits the AST blob into a special “__clang_pch" section in TargetLoweringObjectFile*.<br>On the ASTReader side, any object file is transparently unwrapped and the BitstreamReader is pointed directly to the AST section.<br><br>Other than the .pcm files having an extra header inside, this patch is not meant to have any user-visible effects.<br><br>Known bugs: I still need to figure out how to make c-index-test link against and register the available targets (check-all passes, but the modules created by c-index-test currently are plain old .pcm files).<br>Open questions: I made up the name of the new __clang_pch section and the various flags on the different platforms on the spot. I’m open to better suggestions.<br><br>Let me know what you think!<br><br>-- adrian<br><clang.diff><llvm.diff>_______________________________________________<br>cfe-dev mailing list<br><a href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a></blockquote></blockquote></blockquote></blockquote></blockquote></blockquote></div></blockquote></div></div></blockquote></div></div></blockquote></div></div></blockquote></div>