[llvm] r350671 - [PGO] Use SourceFileName rather module name in PGOFuncName

Rong Xu via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 22 23:20:00 PST 2019


On Tue, Jan 22, 2019 at 5:03 PM David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Tue, Jan 22, 2019 at 4:10 PM Rong Xu <xur at google.com> wrote:
>
>> On Tue, Jan 22, 2019 at 2:44 PM David Blaikie <dblaikie at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Jan 22, 2019 at 2:39 PM Rong Xu <xur at google.com> wrote:
>>>
>>>> On Tue, Jan 22, 2019 at 1:24 PM Teresa Johnson <tejohnson at google.com>
>>>> wrote:
>>>>
>>>>> On Tue, Jan 22, 2019 at 1:06 PM David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hey Rong, Teresa,
>>>>> >
>>>>> > This seems like it might be problematic to me - Couldn't multiple
>>>>> modules have the same source file name (built with different preprocessor
>>>>> defines, etc) - at least I think that's the case for some projects at
>>>>> Google.
>>>>> >
>>>>> > Does this identifier need to be unique? What are the ramifications
>>>>> if multiple modules had the same source file name & this situation?
>>>>>
>>>>> Rong will know for sure, but I think the profiles will be merged and
>>>>> it is possible there will be a profile mismatch.
>>>>>
>>>>> However,  note that this is what already happens for PGO - this change
>>>>> retains the status quo since PGO is matched during the compile from source
>>>>> where the module name (F.getParent()->getName()) is the same as
>>>>> F.getParent()->getSourceFileName(). So any uniqueness issue already exists
>>>>> with PGO.
>>>>>
>>>>> This change is to enable context sensitive PGO which must be done in
>>>>> the *LTO backends, where we have recorded the original module name in the
>>>>> SourceFileName in the bitcode (in the *LTO backends the module name is the
>>>>> .o bitcode file name).
>>>>>
>>>>>
>>>> Thanks for Teresa explanation and the background. I just want to add
>>>> one more thing: we use both filename and function's CFG checksum in the
>>>> static function's PGOFuncName. If multiple functions have the same source
>>>> name, but with different CFG checksums, they can coexist in the profile.
>>>> Profile-use compilation will pick the exact match one. In theory, there
>>>> still exist some collisions, but the chance are very small.
>>>>
>>>
>>> So if this was the original source file name, it wouldn't be too hard to
>>> collide, even with the CFG checksum mixed in, I think. In Google, at least,
>>> it's not too uncommon (I've seen it a few times, at least) for a source
>>> file to be built into multiple libraries, either by mistake or
>>> intentionally (if it's intentional, then the builds probably use different
>>> parameters - different #defines and the like) - and a file-local static
>>> function in such a file might not depend on those #defines or other
>>> parameters, so it would be the same across files - while still being
>>> distinct functions.
>>>
>>> For Google applications, we use distributed thinlto build. The patch
>> won't affect them as the IR files (with .o suffix) is passed the post LTO
>> compilation. This patch will treat the .o as the source files.
>>
>>
>>> But if the filename being used is the name of the intermediate .o file
>>> rather than the original source file, well that filename has to be unique
>>> for the linker to be able to read both .o files (even if they came from the
>>> same source file) & so it should provide a good uniqueness.
>>>
>>> This only affects the compilation through linker plugin. Without this
>> patch, if we call that function after LTO linking, all the static function
>> will have a mismatch. As the profile-use and profile-gen have different
>> module name (with a unique temporary string in the file name). This is much
>> worse than the collision.
>>
>
> Ah, OK - so what's the source name if the input to the linker is an IR
> file (with .o suffix) as would usually be the case (but with linker plugin
> rather than distributed thinlto) - is it still the .o file name passed to
> the linekr? or is it now the user source code file name? If it's still the
> .o file name, that'd still avoid a collision, I think?
>

As Teresa pointed out, I was wrong here. Distributed thinlto and ld-plugin
thinlto should have the same behavior. With this patch, we will always use
the source name in the commandline for PGOFuncName().

Let me summarize it with an example here.
For a static function foo in a/b/c.cc, if we have
clang++ -O2 -flto=thin a/b/c.c ...
Without this patch, all the current callers to getPGOFuncName() in thinlto
mode will return the a/b/c.c:foo.
But if we call getPGOFuncName() after LTO link, we will return
a/b/c.o:foo (for distributed thinlto, the module name is a/b/c.o), or
a/b/c.o-<temp_suffix>:foo (for ld-plugin thinlto,, note that <temp_suffix>
is different for each compilation)
This is particular bad for ld-plugin mode as -fprofile-use and
-fprofile-gen will have different <temp_suffix> -- all the static functions
in -fprofile-use will not get any profile.

With this patch, we will always return
a/b/c.c:foo for thinlto no matter where getPGOFuncName() is called.

As Teresa pointed out in an earlier reply, we don't expect any behavior
change for current thinlto usage.

If I understand correctly, the concern is usage like the following:
cd a1 && clang++ c.c -o /tmp/c1.o
cd ../a2 && clang++ c.c -o /tmp/c2.o
The source name in command line c.c so current implementation will return
c.c:foo for both cases. We don't have uniqueness either.
This patch won't introduce regressions in this sense.



>
> - Dave
>
>
>>
>>
>>> - Dave
>>>
>>>
>>>>
>>>>
>>>>> Teresa
>>>>>
>>>>> >
>>>>> > - Dave
>>>>>
>>>>> On Tue, Jan 8, 2019 at 2:43 PM Rong Xu via llvm-commits <
>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>
>>>>>> Author: xur
>>>>>> Date: Tue Jan  8 14:39:47 2019
>>>>>> New Revision: 350671
>>>>>>
>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=350671&view=rev
>>>>>> Log:
>>>>>> [PGO] Use SourceFileName rather module name in PGOFuncName
>>>>>>
>>>>>> In LTO or Thin-lto mode (though linker plugin), the module
>>>>>> names are of temp file names which are different for
>>>>>> different compilations. Using SourceFileName avoids the issue.
>>>>>> This should not change any functionality for current PGO as
>>>>>> all the current callers of getPGOFuncName() is before LTO.
>>>>>>
>>>>>> Differential Revision: https://reviews.llvm.org/D56327
>>>>>>
>>>>>> Modified:
>>>>>>     llvm/trunk/lib/ProfileData/InstrProf.cpp
>>>>>>
>>>>>> Modified: llvm/trunk/lib/ProfileData/InstrProf.cpp
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/ProfileData/InstrProf.cpp?rev=350671&r1=350670&r2=350671&view=diff
>>>>>>
>>>>>> ==============================================================================
>>>>>> --- llvm/trunk/lib/ProfileData/InstrProf.cpp (original)
>>>>>> +++ llvm/trunk/lib/ProfileData/InstrProf.cpp Tue Jan  8 14:39:47 2019
>>>>>> @@ -252,11 +252,12 @@ static StringRef stripDirPrefix(StringRe
>>>>>>  // data, its original linkage must be non-internal.
>>>>>>  std::string getPGOFuncName(const Function &F, bool InLTO, uint64_t
>>>>>> Version) {
>>>>>>    if (!InLTO) {
>>>>>> -    StringRef FileName = (StaticFuncFullModulePrefix
>>>>>> -                              ? F.getParent()->getName()
>>>>>> -                              :
>>>>>> sys::path::filename(F.getParent()->getName()));
>>>>>> -    if (StaticFuncFullModulePrefix && StaticFuncStripDirNamePrefix
>>>>>> != 0)
>>>>>> -      FileName = stripDirPrefix(FileName,
>>>>>> StaticFuncStripDirNamePrefix);
>>>>>> +    StringRef FileName(F.getParent()->getSourceFileName());
>>>>>> +    uint32_t StripLevel = StaticFuncFullModulePrefix ? 0 :
>>>>>> (uint32_t)-1;
>>>>>> +    if (StripLevel < StaticFuncStripDirNamePrefix)
>>>>>> +      StripLevel = StaticFuncStripDirNamePrefix;
>>>>>> +    if (StripLevel)
>>>>>> +      FileName = stripDirPrefix(FileName, StripLevel);
>>>>>>      return getPGOFuncName(F.getName(), F.getLinkage(), FileName,
>>>>>> Version);
>>>>>>    }
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190122/2b54695a/attachment.html>


More information about the llvm-commits mailing list