[PATCH] gold, libLTO: Add new flags to support bit set lowering.

Wed Mar 18 14:26:54 PDT 2015

> On 2015-Mar-18, at 13:17, Peter Collingbourne <peter at pcc.me.uk> wrote:
> 
> On Wed, Mar 18, 2015 at 12:46:48PM -0700, Duncan P. N. Exon Smith wrote:
>> 
>>> On 2015-Mar-18, at 12:08, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>> 
>>> On Wed, Mar 18, 2015 at 09:16:54AM -0400, Rafael Espíndola wrote:
>>>>> I don't think this should be a problem; there is no user-visible behaviour
>>>>> change when using the supported way of enabling control flow integrity.
>>>>> 
>>>>> The only supported way to enable CFI for the user to supply -fsanitize=cfi*
>>>>> at link time as well as at compile time. (Admittedly this should be
>>>>> documented better.) The LowerBitSets pass only has an effect on the module
>>>>> if -fsanitize=cfi* was passed as compile time. And the related change
>>>>> http://reviews.llvm.org/D8402 modifies the Clang driver to supply the
>>>>> lowerbitsets flag if -fsanitize=cfi* is supplied at link time.
>>>> 
>>>> If the pass does nothing if a given flag is not present, why not
>>>> always run the pass and avoid the option?
>>> 
>>> I would like a way to run only a couple of passes at link time if CFI is
>>> enabled and optimizations are enabled but LTO is not specifically enabled. In
>>> particular, I've found that running only simplifycfg and globaldce makes a
>>> significant difference in binary size (with the full LTO pipeline, a Chrome
>>> binary is 179424536 bytes, and with simplifycfg+globaldce it is 160340400
>>> bytes). So we need some way to communicate an optimization level to the
>>> linker so that it knows what level to use.
>>> 
>>> Duncan and I discussed this on IRC last night, and we agreed that module
>>> flags inserted at compile time would be the best way to communicate this
>>> information. I thought further about how this could work and I came up with
>>> something relatively simple.
>>> 
>>> The solution I have in mind is that a module flag named "LTO Opt Level"
>>> will control the opt level. An opt level of 0 runs no optimization passes, a
>>> level of 1 runs only simplifycfg and globaldce and 2 runs the entire LTO pass
>>> pipeline. -flto causes us to set the flag to 2, otherwise -O >= 1 causes us
>>> to set it to 1, otherwise it is 0. When modules have conflicting opt levels,
>>> we pick the maximum.
>>> 
>>> I have patches that implement this and I'll upload them later today.
>> 
>> I'm still not sure about the high-level approach.  I didn't put together
>> that you'd need to send an "LTO Opt Level" through module flags (I just
>> understood the -lowerbitsets part).
>> 
>> I think there may be a minefield of semantic issues with using the LTO
>> pipeline when the user hasn't specifically enabled LTO.  Can you point
>> me at the discussion on the list about this so I can catch up?  (I'm
>> sorry I missed it.)
> 
> What discussion there has been was in the proposal thread:
> https://groups.google.com/d/msg/llvm-dev/YzJSG2VGydI/76zJXpv4OuQJ

Thanks for this.  Sorry for chasing you in circles.

> 
>> For example, what semantics do you expect in the following case?
>> 
>>    $ clang -O3 -flto a.c
>>    $ clang -O1 -fsanitize=cfi b.c
>>    $ clang a.o b.o
>> 
>> Here, a.c is supposed to be compiled with LTO but b.c isn't.  I'm not
>> sure how you would merge the LTO Opt Level in this case (among other
>> problems).
> 
> In this case we compile both with LTO and use the maximum opt level, which
> is 2 (2 from a.o and 1 from b.o). This is an approximation of what ought to
> happen, but I reckon it isn't too bad to handle these types of cases poorly,
> since in most cases one would be controlling the build flags for an entire
> project compiled with CFI, so they would normally be consistent.
> 
>> IMO, the following flow makes more sense (but maybe this was already
>> discussed?):
>> 
>>  - If -fno-lto, run -lowerbitsets near the end of the -cc1 opt
>>    pipeline.  *Do not* enable -flto.
> 
> This won't work (unless perhaps if the whole program was in that translation
> unit). The lowerbitsets pass needs whole-program visibility.

Yes, I understand now, after reading the original thread.

IMO, we should continue to always run the -lowerbitsets pass.  It's free
if there's no work to do.  There's no reason to encode this in the
module or pass any info from `clang` related to this pass.

But I guess your real goal here is to stop running the rest of the
optimizations, and that's the contentious part.  Currently clang doesn't
control the LTO pass pipeline, and how/whether to do that isn't clear.

Starting to do that, whether it's via -disable-opt, "LTO Opt Level", or
some other mechanism, isn't really related to lowering bitsets, and
probably needs its own discussion.