[PATCH] Global merge on constants

Duncan Sands baldrick at free.fr
Tue Mar 12 01:07:45 PDT 2013


Hi Quentin,

> Sorry for the late reply, I was checking the information Anton’s pointed out and
> now, I need your opinions on what I saw.
>
> So, according to the documentation, TypeInfos and more specifically the
> arguments of the clause part in landing pad instructions are supposed to be
> GlobalVariable.

I'm pretty sure it needs to be a GlobalVariable or a bitcast of one.  In short
something which turns into a simple label in the assembler.

>
> Now, if you look at the code in the backend, the Verifier pass checks that the
> argument of the catch clause is Constant (which is a superclass of
> GlobalVariable and GEP) and that the type is a pointer type
> (Verifier::visitLandingPadInst).

At the IR level it doesn't matter much what a typeinfo is.  The restrictions are
coming from codegen.

>
> Note: The type doesn’t change when the variable is merged (otherwise there is
> something wrong!).
>
> Similarly, LLParser (LLParser::ParseLandingPad) and BitcodeReader
> (BitcodeReader::ParseFunctionBody) do not rely on GlobalVariable, they just
> check that the type is not an ArrayType.
>
> Then, for the type info properly speaking, MachineModuleInfo uses GlobalVariable
> to map the type to an id (MachineModuleInfo::getTypeIDFor related) and
> AsmPrinter uses their name for printing.
>
> As far as I can tell, it feels like it is no more a problem to merge the global
> constants (i.e., potentially changing a GlobalVariable into a GEP), even if they
> are related to EH. In particular, the type info is currently built using
> llvm::ExtractTypeInfo, which already takes a Value and do a stripPointerCast
> that gets a global variable even from the GEP and GEP is already a subclass of
> Constant like GlobalVariable.
>
> Therefore, do you think it is still a problem to merge global constant?

Yes, as I mentioned above I think the typeinfo needs to be a GlobalVariable
(or a bitcast of one) not an offset into one (GEP).  See below for some thoughts
on how to relax this.

>
> One last thing, I don’t know if .ll file can hit that, since I don’t know if
> clang is doing much work on then, but clang also expects GlobalVariable for
> landing pad instruction in one very specific case for ObjC++ exception
> (PersonalityHasOnlyCXXUses). From my tests, ll files don’t get into that path
> but maybe I’m missing something.
> What do you think?
>
> Thanks for your help,
>
> -Quentin
> PS: dragon egg seems fine with Constant for clauses in landing pad instruction too!

Front-ends know what they produce, they aren't really relevant here.  Dragonegg
produces (bitcasts of) GlobalVariables.  Using Constant is just a convenient way
of handling both GlobalVariables and bitcasts of global variables in a uniform
way.

Note that it isn't only exception handling typeinfos that is an issue here:
people using a GlobalVariable G in inline asm may well refer to it by its
name "G" in the inline asm.  Merging globals could break this.

Here is a possible solution to all these issues:

(1) Enhance aliases so that they can be offsets into a global, not just an alias
to a global (there is a PR requesting this somewhere);
(2) When merging a global G into a mega-global, turn "G" into an alias pointing
to the position in the mega global.

Eg:

Before:

   @F = global...
   @G = global...
   @H = global...

After:

   @Merged = global ... contains F, G then H
   @F = alias bitcast @Merged to ...
   @G = alias getelementptr @Merged, ...
   @H = alias getelementptr @Merged, ...

This should then result in F, G and H turning up as labels in the assembler,
usable as typeinfos.  It would likewise solve the inline asm issue.

The problems are: I'm not sure aliases are supported on all platforms, and
I'm even less sure which platforms support this kind of alias.

Ciao, Duncan.


>
> On Mar 1, 2013, at 10:53 AM, Anton Korobeynikov <anton at korobeynikov.info
> <mailto:anton at korobeynikov.info>> wrote:
>
>> Hi Quentin,
>>
>> Indeed, GlobalMerge did not merge global constants because EH
>> processing code expected typeinfo and other stuff to be globals, not
>> gep's.
>>
>> The situation changed with addition of landingpad instruction. You
>> might want to check how it handle GEPs for typeinfos (though, I'd say
>> that LangRef states that typeinfo is a global).
>>
>>
>> On Fri, Mar 1, 2013 at 10:05 PM, Quentin Colombet <qcolombet at apple.com
>> <mailto:qcolombet at apple.com>> wrote:
>>> Hi Duncan,
>>>
>>> I've checked in LLVM documentation and the unnamed_addr marker is not
>>> specific to global variables that are constant. Hence, based on your
>>> comment, I was thinking that the global merge pass may perform illegal
>>> transformations.
>>> Actually, I thought about it and from my understanding, merging two globals
>>> like global merge pass does is not incompatible with having two distinct
>>> addresses.
>>> Indeed, global merge keeps all globals but wraps them into a structure.
>>> E.g.,
>>> int a;
>>> int b;
>>> =>
>>> struct {
>>> int a;
>>> int b;
>>> } MergedGV;
>>>
>>> Thus, &MergedGV.a != &MergedGV.b
>>>
>>> This brings me back to my initial question, does gathering constants into
>>> structures still cause a problem with EH since it does not seem incompatible
>>> with unnamed_addr?
>>>
>>> Note that I've run the test case that was triggering the failure, and it
>>> works with my patch enabled. But since I don't know how EH are handled I
>>> prefer to have an external opinion.
>>>
>>> Reference of the test case triggering the failure:
>>> http://llvm.org/bugs/show_bug.cgi?id=7716
>>>
>>> Thanks,
>>>
>>> -Quentin
>>>
>>> On Feb 28, 2013, at 10:08 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>
>>> Ping?
>>>
>>> -Quentin
>>>
>>> On Feb 27, 2013, at 9:55 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>
>>> Hi Duncan,
>>>
>>> Does it mean that the current behavior of global merge is wrong or is this
>>> unnamed_addr marker apply only to global constants?
>>>
>>> Anyway, thanks for the information.
>>>
>>> -Quentin
>>>
>>> On Feb 27, 2013, at 12:50 AM, Duncan Sands <baldrick at free.fr> wrote:
>>>
>>> Hi Quentin, you should only be merging them if they are marked with
>>> unnamed_addr (this indicates that no-one is expecting the globals
>>> to have distinct addressses).  As EH type infos shouldn't get
>>> unnamed_addr, that should mean that the EH issue isn't a problem any
>>> more.
>>>
>>> Ciao, Duncan.
>>>
>>> You will find enclosed a patch that adds the support of constant variables
>>> for
>>> the generic global merge pass.
>>> I've also attached a motivating example, test.c.
>>>
>>> In that file there are several global variables accessed within the same
>>> function.
>>> If all the variables are non-const, global merge generates one unique
>>> variable,
>>> which translates into one unique load in the assembly file.
>>>
>>> clang -arch arm -O3 test.c -o - -S
>>>
>>> If some are const (the example is very stupid), global merge does not merge
>>> anything and one load for each variable is issued, i.e., 3
>>> clang -arch arm -O3 test.c -o - -S -DWITHCONST
>>>
>>> If global merge had merged the constant globals, only 2 loads would have
>>> been
>>> issued. The patch allows to do that.
>>>
>>> Although the patch is fairly simple, it was not done before because it seems
>>> it
>>> was breaking the EH processing (see comment in the original code).
>>> Is it still true?
>>>
>>> If yes, how could I reproduce that?
>>>
>>> Note: that the patch disables by default the handling of global constants
>>> because the current heuristic may generate worse code. A tuning may be
>>> necessary, but it would be done as a second step
>>>
>>> -Quentin
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>>
>> --
>> With best regards, Anton Korobeynikov
>> Faculty of Mathematics and Mechanics, Saint Petersburg State University
>




More information about the llvm-commits mailing list