<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 3, 2017 at 11:28 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Fri, Feb 3, 2017 at 10:18 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="m_-3463647051349438932m_6445137844656645529m_201298186416225277gmail-">On Fri, Feb 3, 2017 at 8:31 PM, Sean Silva via Phabricator <span dir="ltr"><<a href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">silvas added a comment.<br>
<br>
This change does two things (as you mention in the description):<br>
<br>
1. Adding -static-func-strip-dirname-pre<wbr>fix which provides a way to have more control when `-static-func-full-module-pref<wbr>ix=true` is specified.<br></blockquote><div><br></div></span><div>This is actually a more general form of -static-func-full-mdoule-prefi<wbr>x.</div><span class="m_-3463647051349438932m_6445137844656645529m_201298186416225277gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
2. Changing the default of -static-func-full-module-prefi<wbr>x to true.<br>
<br>
IIRC, -static-func-full-module-prefi<wbr>x defaults to false because it caused issues when set to true (in fact, it was introduced to avoid these issues). The default value of -static-func-strip-dirname-pre<wbr>fix introduced in this patch (i.e. 0) is effectively a no-op; so ignore 1. for now. This means that the net effect of this patch is that compilation will, by default, have a regression on the issue fixed by r275193 / <a href="http://reviews.llvm.org/D22028" rel="noreferrer" target="_blank">http://reviews.llvm.org/D22028</a><wbr>, which is not a good idea. I think that the default behavior (which is user-visible) should not be changed in this patch.<br></blockquote><div><br></div></span><div>I disagree. The original default behavior was to preserve the full path which was also user visible :)</div></div></div></div></blockquote><div><br></div></span><div>And yet we found a compelling-enough use case to change it. </div></div></div></div></blockquote><div><br></div><div>It was probably better to introduce the option but not flipping the default the first time. The stripping-path-fully mode had not been widely tested at that time.</div><div><br></div><div>I am a little curious about the use case for D22028. The pgo name of static function is only affected by source module path. Why would that be different for pgo-gen/use builds? In most common setup I saw, the source paths should remain the same.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>We may need to revisit that decision, but clearly the current default is intentional and part of changing away from that is explaining why we no longer care about that use case (or care about it less than some other thing).</div><span><div></div></span></div></div></div></blockquote><div><br></div><div>We care about all use cases, which is why the more general form of option is introduced -- it makes sure the use case in D22028 can also be handled but more safely (stripping all prefix will bound to cause problems).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>The whole rationale for changing the current default is that it is generally not safe -- mainly problem #1 because of counter variables for static functions can not guaranteed to be unique when full path is stripped.</div></div></div></div></blockquote><div><br></div></span><div>Can the counter variables be static to match the static nature of the functions they describe? (there would still be collisions when indexing the profile data though; the function CFG hash could be included in the "name" to avoid this)</div></div></div></div></blockquote><div><br></div><div>For simple functions, cfg hash collision is also very likely, so the first line of defense is always the name key. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> The ThinLTO issue is secondary (probably irrelevant here because of other bugs).</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div><br></div><div>The issue addressed in D22028 is actually not common -- the source module paths should generally match in profile-gen and profile-use phases, so using internal option for that use case seems more reasonable to me.</div></div></div></div></blockquote><div><br></div></span><div>This is subjective, but I think it is quite reasonable to assume that each build will use a different output directory. Hence any build that generates .cpp files into the output directory (which seems reasonable too) is susceptible.</div></div></div></div></blockquote><div><br></div><div>so the case is for generated source files? Should they be accessed with relative paths?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>Overall, requiring a user to use a compiler-internal option for something that seems to happen in practice (e.g. back when I was a PlayStation we actually ran into it and spent time fixing it) is a pretty poor experience. I think we should aim to do better (though we might settle for less if that proves challenging).</div></div></div></div></blockquote><div><br></div><div>I agree in general. However I think it is reasonable for a user to use an internal option for corner use cases. Another choice is to introduce an external option for this which user can rely on.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="m_-3463647051349438932m_6445137844656645529m_201298186416225277gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Overall, it sounds like this approach of relying on users to tweak internal compiler options (-mllvm) to get correct behavior in their environment is not the kind of user experience we want to deliver (or the kind of implementation that we want to maintain). IIRC, when we added -static-func-full-module-prefi<wbr>x, it was with the understanding that it was a simple hack for working around the larger issue of relying on the module name which we knew was not very robust. The further addition of the "InLTO" complicates things even further. It seems like a code smell that we do not have a Single Point Of Truth.<br>
<br>
I proposed a solution at one point <a href="https://groups.google.com/d/msg/llvm-dev/s_VZbFTWbVs/d0b4Zh80CgAJ" rel="noreferrer" target="_blank">https://groups.google.com/d/ms<wbr>g/llvm-dev/s_VZbFTWbVs/d0b4Zh8<wbr>0CgAJ</a> though it may no longer be applicable. It seems like ThinLTO already has to solve a problem of finding unique identifiers for all functions (even static), so we may want to piggy-back on that mechanism (this is just a high-level thought; haven't looked into the details).<br>
<br></blockquote><div><br></div></span><div>For LTO/ThinLTO, we solved the issue by using meta data which uses getPGOFuncName as singe source of truth.</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="m_-3463647051349438932m_6445137844656645529m_201298186416225277gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
So:<br>
<br>
- I specifically object to changing user-visible defaults in this patch. Those changes should be isolated, and I don't think we have justification to change those defaults anyway.<br></blockquote><div><br></div></span><div>See my reply about the safety issue of keeping the current default. </div><span class="m_-3463647051349438932m_6445137844656645529m_201298186416225277gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
- I'm slightly opposed to adding the -static-func-strip-dirname-pre<wbr>fix flag, since it seems like a workaround (among others that have already piled up) for a more fundamental issue. This is a frog-in-boiling-water situation; if solving the fundamental issue would be a huge amount of work, then adding the new flag is probably fine for now, but we need to keep in mind the larger situation. IIUC, defaulting `-static-func-strip-dirname-pr<wbr>efix=-1` would emulate the current default behavior, so -static-func-full-module-prefi<wbr>x could just be removed in the same patch.<br></blockquote><div><br></div></span><div>The current -static-func-full-module-path=<wbr>false is simply a special case of the new option. For users who rely on this option may hit the correctness issue, they won't have any fallback without the new option.</div><span class="m_-3463647051349438932m_6445137844656645529m_201298186416225277gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
- I would encourage brainstorming/discussion of alternative solutions that solve the fundamental problem (which seems to be more about having a stable globally unique identifier than being specifically about preserving/mangling the "name" per se).<br></blockquote><div><br></div></span><div>The problem itself is simple: handle name conflicts between</div><div><br></div><div>/a/b/c/foo.c:static_func</div><div>/e/f/g/foo.c:static_func</div><div><br></div><div>Path info is a natural choice. Note that FE instrumentation also uses module path to uniquely identify static_func as well.</div></div></div></div></blockquote><div><br></div></span><div>Yes, the problem was inherited from FE instrumentation. I remember that when I explained to Justin the issue, he said that it was clearly buggy and not intentional (an oversight when implementing FEPGO).</div></div></div></div></blockquote><div><br></div><div>It works very well in practice -- though it is not guaranteed to be 100% free of conflict. I won't label it as buggy.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>It seems that the fundamental issue is coming up with a unique identifier for the current TU that is stable across compiler invocations. How do other compilers handle this?</div><div><br></div></div></div></div></blockquote><div>GCC does not suffer from the problem because it does not dump profile into one file but one profile file per module. The profile data file tree structure mirrors the build output file structure so there will be problem if profile-gen and use do not share the same structure. Runtime options are provided to strip prefixes from output directories. Compiler time option is also provided to relocate profile data (e.g. pointing to different root).</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>For example, path names are not enough. E.g. a user may build /a/b/c/foo.c with two different sets of compiler options, yet static functions of the same name must still be treated as separate. A file like:</div><div><br></div><div>foo.c:</div><div><br></div><div>static void PreprocessHelper(/* something */) {</div><div>#ifdef USE_AVX</div><div> // something</div><div>#else</div><div> // something else</div><div>#endif</div><div>}</div><div><br></div><div>#ifdef USE_AVX</div><div>void foo_avx(/* something */) {</div><div>#else</div><div>void foo_noavx(/* something */) {<br></div><div>#endif</div><div> PreprocessHelper(/* something */);</div><div> // something<br>}</div><div><br></div><div><br></div><div>IIRC, one option (suggested by pcc if I remember correctly) is to use a hash of the TU's exported symbols (or something like that) to uniquely identify the TU.</div></div></div></div></blockquote><div><br></div><div>Should be limited to user defined public symbols. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> That seems more robust than a path name.<br></div></div></div></div></blockquote><div><br></div><div>I believe so. </div><div><br></div><div>Content based ID has its advantage but has disadvantages too. For instance more expensive to compute, less readable names. Using path based naming, we can immediately identify where the static function is defined. Perhaps we can use base name plus content hash. </div><div><br></div><div>Teresa, Rong, do you see a situation when module ID needs to be identified but it is difficult or too expensive to access the module's content?</div><div><br></div><div>David</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>thanks,</div><div><br></div><div>David</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
<br>
<br>
<a href="https://reviews.llvm.org/D29512" rel="noreferrer" target="_blank">https://reviews.llvm.org/D2951<wbr>2</a><br>
<br>
<br>
<br>
</blockquote></div><br></div></div>
</blockquote></div><br></div></div>
</blockquote></div><br></div></div>