[PATCH] D11908: Clang support for -fthinlto.

Wed Sep 30 11:11:13 PDT 2015

> On 2015-Sep-30, at 10:40, Teresa Johnson <tejohnson at google.com> wrote:
> 
> On Wed, Sep 30, 2015 at 10:19 AM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>>> On 2015-Sep-23, at 10:28, Teresa Johnson <tejohnson at google.com> wrote:
>>> 
>>> tejohnson updated this revision to Diff 35527.
>>> tejohnson added a comment.
>>> 
>>> Updated the patch for the new LLVM/gold patch (http://reviews.llvm.org/D13107).
>>> 
>>> Two changes:
>>> 
>>> - I put back the original change to pass a new plugin option to gold. Since the function index design/name has been generalized to not be ThinLTO-specific, it makes sense to have the ThinLTO-related gold behavior for modules with those sections to be dependent on an option.
>>> - Added 2 tests.
>>> 
>>> 
>>> http://reviews.llvm.org/D11908
>>> 
>>> Files:
>>> include/clang/Driver/Options.td
>>> include/clang/Frontend/CodeGenOptions.def
>>> lib/CodeGen/BackendUtil.cpp
>>> lib/Driver/Driver.cpp
>>> lib/Driver/Tools.cpp
>>> lib/Frontend/CompilerInvocation.cpp
>>> test/Driver/thinlto.c
>>> test/Misc/thinlto.c
>>> 
>>> <D11908.35527.patch>
>> 
>>> Index: test/Misc/thinlto.c
>>> ===================================================================
>>> --- /dev/null
>>> +++ test/Misc/thinlto.c
>>> @@ -0,0 +1,22 @@
>>> +// RUN: %clang_cc1 -fthinlto -emit-llvm-bc < %s | llvm-bcanalyzer -dump | FileCheck %s
>>> +// CHECK: <FUNCTION_SUMMARY_BLOCK
>>> +// CHECK-NEXT: <PERMODULE_ENTRY
>>> +// CHECK-NEXT: <PERMODULE_ENTRY
>>> +// CHECK-NEXT: </FUNCTION_SUMMARY_BLOCK
>>> +
>>> +// RUN: %clang -fthinlto %s -o %t -fuse-ld=gold
>> 
>> Why isn't this using -cc1?
> 
> I can try it again, but I believe -fuse-ld is not a -cc1 option.

Oh, right, of course.  -cc1 doesn't invoke the linker, the driver does.

In that case, it seems to me like you should:
1. have a driver test that uses -### to check that you're getting the
expected linker command-line (in test/Driver), and
2. somewhere (probably in test/tools/gold in LLVM?) confirm that gold
does the right thing with the thinlto plugin.

>>> +// RUN: llvm-bcanalyzer -dump %t.thinlto.bc | FileCheck %s --check-prefix=COMBINED
>>> +// COMBINED: <MODULE_STRTAB_BLOCK
>>> +// COMBINED-NEXT: <ENTRY
>>> +// COMBINED-NEXT: </MODULE_STRTAB_BLOCK
>>> +// COMBINED-NEXT: <FUNCTION_SUMMARY_BLOCK
>>> +// COMBINED-NEXT: <COMBINED_ENTRY
>>> +// COMBINED-NEXT: <COMBINED_ENTRY
>>> +// COMBINED-NEXT: </FUNCTION_SUMMARY_BLOCK
>>> +
>>> +__attribute__((noinline)) void foo() {
>>> +}
>>> +
>>> +int main() {
>>> +  foo();
>>> +}
>>> Index: test/Driver/thinlto.c
>>> ===================================================================
>>> --- /dev/null
>>> +++ test/Driver/thinlto.c
>>> @@ -0,0 +1,11 @@
>>> +// -fthinlto causes a switch to llvm-bc object files.
>>> +// RUN: %clang -ccc-print-phases -c %s -fthinlto 2> %t.log
>>> +// RUN: grep '2: compiler, {1}, ir' %t.log
>>> +// RUN: grep '3: backend, {2}, lto-bc' %t.log
>>> +
>>> +// RUN: %clang -ccc-print-phases %s -fthinlto 2> %t.log
>>> +// RUN: grep '0: input, ".*thinlto.c", c' %t.log
>>> +// RUN: grep '1: preprocessor, {0}, cpp-output' %t.log
>>> +// RUN: grep '2: compiler, {1}, ir' %t.log
>>> +// RUN: grep '3: backend, {2}, lto-bc' %t.log
>>> +// RUN: grep '4: linker, {3}, image' %t.log
>> 
>> Usually we prefer FileCheck tests over grep (typically the grep tests
>> are just old and haven't been converted yet).  Is there a particular
>> reason you're using grep, or were you just copying an example?
> 
> This was cloned from test/Driver/lto.c. Let me see if I can rework
> this with FileCheck.
> 
>> 
>>> Index: lib/Frontend/CompilerInvocation.cpp
>>> ===================================================================
>>> --- lib/Frontend/CompilerInvocation.cpp
>>> +++ lib/Frontend/CompilerInvocation.cpp
>>> @@ -518,6 +518,7 @@
>>>   Opts.MergeFunctions = Args.hasArg(OPT_fmerge_functions);
>>> 
>>>   Opts.PrepareForLTO = Args.hasArg(OPT_flto);
>>> +  Opts.EmitThinLTOIndex = Args.hasArg(OPT_fthinlto);
>>> 
>>>   Opts.MSVolatile = Args.hasArg(OPT_fms_volatile);
>>> 
>>> Index: lib/Driver/Tools.cpp
>>> ===================================================================
>>> --- lib/Driver/Tools.cpp
>>> +++ lib/Driver/Tools.cpp
>>> @@ -1664,6 +1664,9 @@
>>>   std::string CPU = getCPUName(Args, ToolChain.getTriple());
>>>   if (!CPU.empty())
>>>     CmdArgs.push_back(Args.MakeArgString(Twine("-plugin-opt=mcpu=") + CPU));
>>> +
>>> +  if (Args.hasArg(options::OPT_fthinlto))
>>> +    CmdArgs.push_back("-plugin-opt=thinlto");
>>> }
>>> 
>>> /// This is a helper function for validating the optional refinement step
>>> @@ -2818,6 +2821,8 @@
>>>   // preprocessing, precompiling or assembling.
>>>   Args.ClaimAllArgs(options::OPT_flto);
>>>   Args.ClaimAllArgs(options::OPT_fno_lto);
>>> +  Args.ClaimAllArgs(options::OPT_fthinlto);
>>> +  Args.ClaimAllArgs(options::OPT_fno_thinlto);
>>> }
>>> 
>>> static void appendUserToPath(SmallVectorImpl<char> &Result) {
>>> @@ -3236,6 +3241,8 @@
>>>            "Invalid action for clang tool.");
>>> 
>>>     if (JA.getType() == types::TY_LTO_IR || JA.getType() == types::TY_LTO_BC) {
>>> +      // Enables PrepareForLTO, which we want for -fthinlto as well.
>>> +      // ThinLTO also uses the TY_LTO_* types.
>>>       CmdArgs.push_back("-flto");
>>>     }
>>>     if (JA.getType() == types::TY_Nothing) {
>>> @@ -3268,6 +3275,10 @@
>>>     // the use-list order, since serialization to bitcode is part of the flow.
>>>     if (JA.getType() == types::TY_LLVM_BC)
>>>       CmdArgs.push_back("-emit-llvm-uselists");
>>> +
>>> +    if (Args.hasArg(options::OPT_fthinlto)) {
>>> +      CmdArgs.push_back("-fthinlto");
>>> +    }
>>>   }
>>> 
>>>   // We normally speed up the clang process a bit by skipping destructors at
>>> Index: lib/Driver/Driver.cpp
>>> ===================================================================
>>> --- lib/Driver/Driver.cpp
>>> +++ lib/Driver/Driver.cpp
>>> @@ -1576,7 +1576,8 @@
>>> }
>>> 
>>> bool Driver::IsUsingLTO(const ArgList &Args) const {
>>> -  return Args.hasFlag(options::OPT_flto, options::OPT_fno_lto, false);
>>> +  return Args.hasFlag(options::OPT_flto, options::OPT_fno_lto, false) ||
>>> +      Args.hasFlag(options::OPT_fthinlto, options::OPT_fno_thinlto, false);
>>> }
>>> 
>>> void Driver::BuildJobs(Compilation &C) const {
>>> Index: lib/CodeGen/BackendUtil.cpp
>>> ===================================================================
>>> --- lib/CodeGen/BackendUtil.cpp
>>> +++ lib/CodeGen/BackendUtil.cpp
>>> @@ -268,6 +268,7 @@
>>>     return;
>>> 
>>>   unsigned OptLevel = CodeGenOpts.OptimizationLevel;
>>> +
>> 
>> This whitespace change seems unrelated.
> 
> Will remove.
> 
>> 
>>>   CodeGenOptions::InliningMethod Inlining = CodeGenOpts.getInlining();
>>> 
>>>   // Handle disabling of LLVM optimization, where we want to preserve the
>>> @@ -610,7 +611,8 @@
>>> 
>>>   case Backend_EmitBC:
>>>     getPerModulePasses()->add(
>>> -        createBitcodeWriterPass(*OS, CodeGenOpts.EmitLLVMUseLists));
>>> +        createBitcodeWriterPass(*OS, CodeGenOpts.EmitLLVMUseLists,
>>> +                                CodeGenOpts.EmitThinLTOIndex));
>>>     break;
>>> 
>>>   case Backend_EmitLL:
>>> Index: include/clang/Frontend/CodeGenOptions.def
>>> ===================================================================
>>> --- include/clang/Frontend/CodeGenOptions.def
>>> +++ include/clang/Frontend/CodeGenOptions.def
>>> @@ -72,7 +72,9 @@
>>> CODEGENOPT(InstrumentForProfiling , 1, 0) ///< Set when -pg is enabled.
>>> CODEGENOPT(LessPreciseFPMAD  , 1, 0) ///< Enable less precise MAD instructions to
>>>                                      ///< be generated.
>>> -CODEGENOPT(PrepareForLTO     , 1, 0) ///< Set when -flto is enabled on the
>>> +CODEGENOPT(PrepareForLTO     , 1, 0) ///< Set when -flto or -fthinlto is enabled
>>> +                                     ///< on the compile step.
>>> +CODEGENOPT(EmitThinLTOIndex  , 1, 0) ///< Set when -fthinlto is enabled on the
>>>                                      ///< compile step.
>>> CODEGENOPT(MergeAllConstants , 1, 1) ///< Merge identical constants.
>>> CODEGENOPT(MergeFunctions    , 1, 0) ///< Set when -fmerge-functions is enabled.
>>> Index: include/clang/Driver/Options.td
>>> ===================================================================
>>> --- include/clang/Driver/Options.td
>>> +++ include/clang/Driver/Options.td
>>> @@ -686,6 +686,9 @@
>>> def flto_EQ : Joined<["-"], "flto=">, Group<clang_ignored_gcc_optimization_f_Group>;
>>> def flto : Flag<["-"], "flto">, Flags<[CC1Option]>, Group<f_Group>;
>>> def fno_lto : Flag<["-"], "fno-lto">, Group<f_Group>;
>>> +def fthinlto : Flag<["-"], "fthinlto">, Flags<[CC1Option]>, Group<f_Group>;
>> 
>> I wonder whether this is the right way to add new variants of LTO.
>> I.e., maybe this should be "-flto=thin"?  Do you happen to know how this
>> would conflict with GCC options?  (I see we ignore "-flto="...)
> 
> I looked at GCC docs and sources. It does have a -flto= variant. From
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html:
> -----------------
> If you specify the optional n, the optimization and code generation
> done at link time is executed in parallel using n parallel jobs by
> utilizing an installed make program. The environment variable MAKE may
> be used to override the program used. The default value for n is 1.
> 
> You can also specify -flto=jobserver to use GNU make's job server mode
> to determine the number of parallel jobs. This is useful when the
> Makefile calling GCC is already executing in parallel. You must
> prepend a ‘+’ to the command recipe in the parent Makefile for this to
> work. This option likely only works if MAKE is GNU make.
> ----------------
> 
> I would anticipate that we may want to support something like this for
> ThinLTO eventually to specify the number of parallel jobs for the
> ThinLTO backend processes. So it might be confusing to overload
> -flto=.

Hmm.  You're right that the GCC usage seems pretty different.

I guess you're envisioning -fthinlto=jobserver?  I wonder if we could
do this as -flto=thin,jobserver or something similar?

It's pretty hard to remove driver options, so I think it's important to
get the interface right.  I anticipate that we'll add more LTO variants
in the future, so we should take care that this scales reasonably.

(This may be a good discussion for cfe-dev, not sure.)

>> E.g., as a user I'd expect -fno-lto to turn off whatever LTO was turned
>> on.  And I might expect -flto to override -fthinlto, or vice versa.
> 
> Good point, I think the last one should win. Looks like right now
> -fthinlto is effectively always winning. I will fix that.
> 
> Not sure about -fno-lto behavior if we leave -fthinlto as a separate
> option. Let me know what you think based on the GCC usage of -flto=.

Right, if they're totally separate, then either behaviour for -fno-lto
is confusing.

>>> +def fthinlto_be : Flag<["-"], "fthinlto-be">, Flags<[CC1Option]>, Group<f_Group>;
>> 
>> What's -fthinlto-be for?  It seems to be unused/untested.
> 
> That snuck in from some follow-on patches I am working on (invoking
> the LTO pipeline + TBD importing pass in the BE processes). Will
> remove from here.
> 
>> 
>>> +def fno_thinlto : Flag<["-"], "fno-thinlto">, Group<f_Group>;
>>> def fmacro_backtrace_limit_EQ : Joined<["-"], "fmacro-backtrace-limit=">,
>>>                                 Group<f_Group>, Flags<[DriverOption, CoreOption]>;
>>> def fmerge_all_constants : Flag<["-"], "fmerge-all-constants">, Group<f_Group>;
>>> 
>> 
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413