[llvm] r350671 - [PGO] Use SourceFileName rather module name in PGOFuncName

Rong Xu via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 23 11:05:14 PST 2019


On Wed, Jan 23, 2019 at 7:25 AM Teresa Johnson <tejohnson at google.com> wrote:

>
>
> On Tue, Jan 22, 2019 at 10:24 PM Rong Xu <xur at google.com> wrote:
>
>>
>>
>> On Tue, Jan 22, 2019 at 10:03 PM Teresa Johnson <tejohnson at google.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jan 22, 2019 at 9:17 PM Rong Xu <xur at google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jan 22, 2019 at 5:51 PM Teresa Johnson <tejohnson at google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 5:03 PM David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 4:10 PM Rong Xu <xur at google.com> wrote:
>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 2:44 PM David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 2:39 PM Rong Xu <xur at google.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Jan 22, 2019 at 1:24 PM Teresa Johnson <
>>>>>>>>> tejohnson at google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Jan 22, 2019 at 1:06 PM David Blaikie <dblaikie at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hey Rong, Teresa,
>>>>>>>>>> >
>>>>>>>>>> > This seems like it might be problematic to me - Couldn't
>>>>>>>>>> multiple modules have the same source file name (built with different
>>>>>>>>>> preprocessor defines, etc) - at least I think that's the case for some
>>>>>>>>>> projects at Google.
>>>>>>>>>> >
>>>>>>>>>> > Does this identifier need to be unique? What are the
>>>>>>>>>> ramifications if multiple modules had the same source file name & this
>>>>>>>>>> situation?
>>>>>>>>>>
>>>>>>>>>> Rong will know for sure, but I think the profiles will be merged
>>>>>>>>>> and it is possible there will be a profile mismatch.
>>>>>>>>>>
>>>>>>>>>> However,  note that this is what already happens for PGO - this
>>>>>>>>>> change retains the status quo since PGO is matched during the compile from
>>>>>>>>>> source where the module name (F.getParent()->getName()) is the same as
>>>>>>>>>> F.getParent()->getSourceFileName(). So any uniqueness issue already exists
>>>>>>>>>> with PGO.
>>>>>>>>>>
>>>>>>>>>> This change is to enable context sensitive PGO which must be done
>>>>>>>>>> in the *LTO backends, where we have recorded the original module name in
>>>>>>>>>> the SourceFileName in the bitcode (in the *LTO backends the module name is
>>>>>>>>>> the .o bitcode file name).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Thanks for Teresa explanation and the background. I just want to
>>>>>>>>> add one more thing: we use both filename and function's CFG checksum in the
>>>>>>>>> static function's PGOFuncName. If multiple functions have the same source
>>>>>>>>> name, but with different CFG checksums, they can coexist in the profile.
>>>>>>>>> Profile-use compilation will pick the exact match one. In theory, there
>>>>>>>>> still exist some collisions, but the chance are very small.
>>>>>>>>>
>>>>>>>>
>>>>>>>> So if this was the original source file name, it wouldn't be too
>>>>>>>> hard to collide, even with the CFG checksum mixed in, I think. In Google,
>>>>>>>> at least, it's not too uncommon (I've seen it a few times, at least) for a
>>>>>>>> source file to be built into multiple libraries, either by mistake or
>>>>>>>> intentionally (if it's intentional, then the builds probably use different
>>>>>>>> parameters - different #defines and the like) - and a file-local static
>>>>>>>> function in such a file might not depend on those #defines or other
>>>>>>>> parameters, so it would be the same across files - while still being
>>>>>>>> distinct functions.
>>>>>>>>
>>>>>>>> For Google applications, we use distributed thinlto build. The
>>>>>>> patch won't affect them as the IR files (with .o suffix) is passed the post
>>>>>>> LTO compilation. This patch will treat the .o as the source files.
>>>>>>>
>>>>>>
>>>>> This is not correct. The source file name (the module id when we
>>>>> compile from cc to bitcode .o) is encoded in the bitcode file:
>>>>> https://llvm.org/docs/LangRef.html#source-filename
>>>>>
>>>>> When the .o bitcode file is processed through the LTO backend,
>>>>> regardless of whether it is in process or distributed, that will be saved
>>>>> in the sourceFileName field on the module, which is what is being accessed
>>>>> here. So it will always be the actual source .cc file name - the same as
>>>>> the module id in the .cc->.o compile step, which is why this change is a
>>>>> no-op for existing PGO which is done during the compile step.
>>>>>
>>>>> We are talking different things: Teresa is talking the callers to this
>>>> function for indirectly promotion or importing. In these callers, the
>>>> parameter of InLTO is set. So we will use the metadata.
>>>> What I was referring to is in the context sensitive patch, where the
>>>> caller to this function does not set InLTO flag.
>>>> The reason for this is many functions are internaliazed after
>>>> PGOInstrumentaiton pass (where the metadata is set).
>>>> When we call from there, we will use the IR module name (.o) as the
>>>> source name. If you check the profile for context sensitive profile, there
>>>> are many entries like bar.o:foo()
>>>>
>>>
>>> I'm confused, since with !InLTO it should go into the code block
>>> modified with this patch, which means that it should use
>>> F.getParent()->getSourceFileName() instead of F.getParent()->getName(). The
>>> SourceFileName should be the original source (.cc) name not the .o file
>>> name. The SourceFileName should be set on the Module when reading the
>>> bitcode in the backend, from the source file name recorded when we compiled
>>> the source to bitcode. For that reason, before this patch I would have
>>> expected to see names like "bar.o:foo" but with this patch I would expect
>>> that to instead be "bar.cc:foo". How are we getting "bar.o" in the
>>> SourceFileName field on the Module? Sorry if I am misunderstanding...
>>>
>>> I run the test again. Yes. You are right. In thinlto, regardless
>> distributed mode or not, we should see bar.cc:foo.
>> I was under the impression F.getParent()->getSourceFileName() would
>> return the IR file name (.o). But that was wrong.
>> Obviously, thinlto importer will set the source file name from the IR
>> file.
>> It is good thing that distributed mode and ld-plugin have the same
>> bahaviors.
>>
>
> Ok great. Right, they should be consistent.
>
> I'd suggest changing the name of the IsLTO flag as it is now somewhat
> confusing (since we will do CS PGO during the LTO step too).  It's really
> specific to ICP done during the LTO step for PGO data matched earlier.
> Maybe change it to "InLTOICP"?
>
Yes. It makes sense to change the flag name. I'll draft a CL for that.


>
>> For fulllto, F.getParent()->getSourceFileName() return ld-temp.o. Not
>> sure if this is a bug.
>>
>
> That's because for full LTO there isn't a single source module - this is
> essentially a new module created from all of the regular LTO modules
> combined. Presumably this name should be fine as it will be consistent
> between gen/use compiles.
>
> This ld-temp.o name is fine with me, as far as they are consistent across
the compilations.

Thanks!


> Teresa
>
>
>>
>>
>>> Teresa
>>>
>>>
>>>>
>>>>
>>>>> Teresa
>>>>>
>>>>>
>>>>>>>
>>>>>>>> But if the filename being used is the name of the intermediate .o
>>>>>>>> file rather than the original source file, well that filename has to be
>>>>>>>> unique for the linker to be able to read both .o files (even if they came
>>>>>>>> from the same source file) & so it should provide a good uniqueness.
>>>>>>>>
>>>>>>>> This only affects the compilation through linker plugin. Without
>>>>>>> this patch, if we call that function after LTO linking, all the static
>>>>>>> function will have a mismatch. As the profile-use and profile-gen have
>>>>>>> different module name (with a unique temporary string in the file name).
>>>>>>> This is much worse than the collision.
>>>>>>>
>>>>>>
>>>>>> Ah, OK - so what's the source name if the input to the linker is an
>>>>>> IR file (with .o suffix) as would usually be the case (but with linker
>>>>>> plugin rather than distributed thinlto) - is it still the .o file name
>>>>>> passed to the linekr? or is it now the user source code file name? If it's
>>>>>> still the .o file name, that'd still avoid a collision, I think?
>>>>>>
>>>>>> - Dave
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Teresa
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > - Dave
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 8, 2019 at 2:43 PM Rong Xu via llvm-commits <
>>>>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Author: xur
>>>>>>>>>>> Date: Tue Jan  8 14:39:47 2019
>>>>>>>>>>> New Revision: 350671
>>>>>>>>>>>
>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=350671&view=rev
>>>>>>>>>>> Log:
>>>>>>>>>>> [PGO] Use SourceFileName rather module name in PGOFuncName
>>>>>>>>>>>
>>>>>>>>>>> In LTO or Thin-lto mode (though linker plugin), the module
>>>>>>>>>>> names are of temp file names which are different for
>>>>>>>>>>> different compilations. Using SourceFileName avoids the issue.
>>>>>>>>>>> This should not change any functionality for current PGO as
>>>>>>>>>>> all the current callers of getPGOFuncName() is before LTO.
>>>>>>>>>>>
>>>>>>>>>>> Differential Revision: https://reviews.llvm.org/D56327
>>>>>>>>>>>
>>>>>>>>>>> Modified:
>>>>>>>>>>>     llvm/trunk/lib/ProfileData/InstrProf.cpp
>>>>>>>>>>>
>>>>>>>>>>> Modified: llvm/trunk/lib/ProfileData/InstrProf.cpp
>>>>>>>>>>> URL:
>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/ProfileData/InstrProf.cpp?rev=350671&r1=350670&r2=350671&view=diff
>>>>>>>>>>>
>>>>>>>>>>> ==============================================================================
>>>>>>>>>>> --- llvm/trunk/lib/ProfileData/InstrProf.cpp (original)
>>>>>>>>>>> +++ llvm/trunk/lib/ProfileData/InstrProf.cpp Tue Jan  8 14:39:47
>>>>>>>>>>> 2019
>>>>>>>>>>> @@ -252,11 +252,12 @@ static StringRef stripDirPrefix(StringRe
>>>>>>>>>>>  // data, its original linkage must be non-internal.
>>>>>>>>>>>  std::string getPGOFuncName(const Function &F, bool InLTO,
>>>>>>>>>>> uint64_t Version) {
>>>>>>>>>>>    if (!InLTO) {
>>>>>>>>>>> -    StringRef FileName = (StaticFuncFullModulePrefix
>>>>>>>>>>> -                              ? F.getParent()->getName()
>>>>>>>>>>> -                              :
>>>>>>>>>>> sys::path::filename(F.getParent()->getName()));
>>>>>>>>>>> -    if (StaticFuncFullModulePrefix &&
>>>>>>>>>>> StaticFuncStripDirNamePrefix != 0)
>>>>>>>>>>> -      FileName = stripDirPrefix(FileName,
>>>>>>>>>>> StaticFuncStripDirNamePrefix);
>>>>>>>>>>> +    StringRef FileName(F.getParent()->getSourceFileName());
>>>>>>>>>>> +    uint32_t StripLevel = StaticFuncFullModulePrefix ? 0 :
>>>>>>>>>>> (uint32_t)-1;
>>>>>>>>>>> +    if (StripLevel < StaticFuncStripDirNamePrefix)
>>>>>>>>>>> +      StripLevel = StaticFuncStripDirNamePrefix;
>>>>>>>>>>> +    if (StripLevel)
>>>>>>>>>>> +      FileName = stripDirPrefix(FileName, StripLevel);
>>>>>>>>>>>      return getPGOFuncName(F.getName(), F.getLinkage(),
>>>>>>>>>>> FileName, Version);
>>>>>>>>>>>    }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>>>
>>>>
>>>
>>> --
>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>
>>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190123/148af866/attachment.html>


More information about the llvm-commits mailing list