[PATCH] D29512: [PGO] Directory name stripping in global identifier for static functions

Rong Xu via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 6 10:27:37 PST 2017


On Sun, Feb 5, 2017 at 12:11 AM, Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Fri, Feb 3, 2017 at 11:28 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Fri, Feb 3, 2017 at 10:18 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Feb 3, 2017 at 8:31 PM, Sean Silva via Phabricator <
>>> reviews at reviews.llvm.org> wrote:
>>>
>>>> silvas added a comment.
>>>>
>>>> This change does two things (as you mention in the description):
>>>>
>>>> 1. Adding -static-func-strip-dirname-prefix which provides a way to
>>>> have more control when `-static-func-full-module-prefix=true` is
>>>> specified.
>>>>
>>>
>>> This is actually a more general form of -static-func-full-mdoule-prefix.
>>>
>>>
>>>> 2. Changing the default of -static-func-full-module-prefix to true.
>>>>
>>>> IIRC, -static-func-full-module-prefix defaults to false because it
>>>> caused issues when set to true (in fact, it was introduced to avoid these
>>>> issues). The default value of -static-func-strip-dirname-prefix
>>>> introduced in this patch (i.e. 0) is effectively a no-op; so ignore 1. for
>>>> now. This means that the net effect of this patch is that compilation will,
>>>> by default, have a regression on the issue fixed by r275193 /
>>>> http://reviews.llvm.org/D22028, which is not a good idea. I think that
>>>> the default behavior (which is user-visible) should not be changed in this
>>>> patch.
>>>>
>>>
>>> I disagree. The original default behavior was to preserve the full path
>>> which was also user visible :)
>>>
>>
>> And yet we found a compelling-enough use case to change it.
>>
>
> It was probably better to introduce the option but not flipping the
> default the first time. The stripping-path-fully mode had not been widely
> tested at that time.
>
> I am a little curious about the use case for D22028. The pgo name of
> static function is only affected by source module path. Why would that be
> different for pgo-gen/use builds? In most common setup I saw, the source
> paths should remain the same.
>
>
>> We may need to revisit that decision, but clearly the current default is
>> intentional and part of changing away from that is explaining why we no
>> longer care about that use case (or care about it less than some other
>> thing).
>>
>
> We care about all use cases, which is why the more general form of option
> is introduced -- it makes sure the use case in D22028 can also be handled
> but more safely (stripping all prefix will bound to cause problems).
>
>
>>
>>
>>>
>>> The whole rationale for changing the current default is that it is
>>> generally not safe -- mainly problem #1 because of counter variables for
>>> static functions can not guaranteed to be unique when full path is stripped.
>>>
>>
>> Can the counter variables be static to match the static nature of the
>> functions they describe? (there would still be collisions when indexing the
>> profile data though; the function CFG hash could be included in the "name"
>> to avoid this)
>>
>
> For simple functions, cfg hash collision is also very likely, so the first
> line of defense is always the name key.
>
>
>>
>>
>>> The ThinLTO issue is secondary (probably irrelevant here because of
>>> other bugs).
>>>
>>
>>>
>>> The issue addressed in D22028 is actually not common -- the source
>>> module paths should generally match in profile-gen and profile-use phases,
>>> so using internal option for that use case seems more reasonable to me.
>>>
>>
>> This is subjective, but I think it is quite reasonable to assume that
>> each build will use a different output directory. Hence any build that
>> generates .cpp files into the output directory (which seems reasonable too)
>> is susceptible.
>>
>
> so the case is for generated source files? Should they be accessed with
> relative paths?
>
>
>>
>> Overall, requiring a user to use a compiler-internal option for something
>> that seems to happen in practice (e.g. back when I was a PlayStation we
>> actually ran into it and spent time fixing it) is a pretty poor experience.
>> I think we should aim to do better (though we might settle for less if that
>> proves challenging).
>>
>
> I agree in general. However I think it is reasonable for a user to use an
> internal option for corner use cases.  Another choice is to introduce an
> external option for this which user can rely on.
>
>>
>>
>>>
>>>
>>>
>>>>
>>>> Overall, it sounds like this approach of relying on users to tweak
>>>> internal compiler options (-mllvm) to get correct behavior in their
>>>> environment is not the kind of user experience we want to deliver (or the
>>>> kind of implementation that we want to maintain). IIRC, when we added
>>>> -static-func-full-module-prefix, it was with the understanding that it
>>>> was a simple hack for working around the larger issue of relying on the
>>>> module name which we knew was not very robust. The further addition of the
>>>> "InLTO" complicates things even further. It seems like a code smell that we
>>>> do not have a Single Point Of Truth.
>>>>
>>>> I proposed a solution at one point https://groups.google.com/d/ms
>>>> g/llvm-dev/s_VZbFTWbVs/d0b4Zh80CgAJ though it may no longer be
>>>> applicable. It seems like ThinLTO already has to solve a problem of finding
>>>> unique identifiers for all functions (even static), so we may want to
>>>> piggy-back on that mechanism (this is just a high-level thought; haven't
>>>> looked into the details).
>>>>
>>>>
>>> For LTO/ThinLTO,  we solved the issue by using meta data which uses
>>> getPGOFuncName as singe source of truth.
>>>
>>
>>>
>>>
>>>> So:
>>>>
>>>> - I specifically object to changing user-visible defaults in this
>>>> patch. Those changes should be isolated, and I don't think we have
>>>> justification to change those defaults anyway.
>>>>
>>>
>>> See my reply about the safety issue of keeping the current default.
>>>
>>>
>>>
>>>> - I'm slightly opposed to adding the -static-func-strip-dirname-prefix
>>>> flag, since it seems like a workaround (among others that have already
>>>> piled up) for a more fundamental issue. This is a frog-in-boiling-water
>>>> situation; if solving the fundamental issue would be a huge amount of work,
>>>> then adding the new flag is probably fine for now, but we need to keep in
>>>> mind the larger situation. IIUC, defaulting `-static-func-strip-dirname-prefix=-1`
>>>> would emulate the current default behavior, so
>>>> -static-func-full-module-prefix could just be removed in the same
>>>> patch.
>>>>
>>>
>>> The current -static-func-full-module-path=false is simply a special
>>> case of the new option. For users who rely on this option may hit the
>>> correctness issue, they won't have any fallback without the new option.
>>>
>>>
>>>> - I would encourage brainstorming/discussion of alternative solutions
>>>> that solve the fundamental problem (which seems to be more about having a
>>>> stable globally unique identifier than being specifically about
>>>> preserving/mangling the "name" per se).
>>>>
>>>
>>> The problem itself is simple: handle name conflicts between
>>>
>>> /a/b/c/foo.c:static_func
>>> /e/f/g/foo.c:static_func
>>>
>>> Path info is a natural choice. Note that FE instrumentation also uses
>>> module path to uniquely identify static_func as well.
>>>
>>
>> Yes, the problem was inherited from FE instrumentation. I remember that
>> when I explained to Justin the issue, he said that it was clearly buggy and
>> not intentional (an oversight when implementing FEPGO).
>>
>
> It works very well in practice -- though it is not guaranteed to be 100%
> free of conflict. I won't label it as buggy.
>
>
>>
>> It seems that the fundamental issue is coming up with a unique identifier
>> for the current TU that is stable across compiler invocations. How do other
>> compilers handle this?
>>
>> GCC does not suffer from the problem because it does not dump profile
> into one file but one profile file per module. The profile data file tree
> structure mirrors the build output file structure so there will be problem
> if profile-gen and use do not share the same structure. Runtime options are
> provided to strip prefixes from output directories. Compiler time option is
> also provided to relocate profile data (e.g. pointing to different root).
>
>
>
>> For example, path names are not enough. E.g. a user may build
>> /a/b/c/foo.c with two different sets of compiler options, yet static
>> functions of the same name must still be treated as separate. A file like:
>>
>> foo.c:
>>
>> static void PreprocessHelper(/* something */) {
>> #ifdef USE_AVX
>>   // something
>> #else
>>   // something else
>> #endif
>> }
>>
>> #ifdef USE_AVX
>> void foo_avx(/* something */) {
>> #else
>> void foo_noavx(/* something */) {
>> #endif
>>   PreprocessHelper(/* something */);
>>   // something
>> }
>>
>>
>> IIRC, one option (suggested by pcc if I remember correctly) is to use a
>> hash of the TU's exported symbols (or something like that) to uniquely
>> identify the TU.
>>
>
> Should be limited to user defined public symbols.
>
>
>> That seems more robust than a path name.
>>
>
> I believe so.
>
> Content based ID has its advantage but has disadvantages too. For instance
> more expensive to compute, less readable names. Using path based naming, we
> can immediately identify where the static function is defined. Perhaps we
> can use base name plus content hash.
>
> Teresa, Rong, do you see a situation when  module ID needs to be
> identified but it is difficult or  too expensive to access the module's
> content?
>

Another disadvantage is this is less tolerable to source changes: if the
use adds another exported symbol in the use-compilation (which has nothing
to do with another of the existing functions), the suddenly voids all the
static function's profile.  This is not happening in current path name
based scheme.



>
> David
>
>
>> -- Sean Silva
>>
>>
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>> https://reviews.llvm.org/D29512
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170206/83a24b1f/attachment.html>


More information about the llvm-commits mailing list