[LLVMdev] How to tell whether a GlobalValue is user-defined

Nick Kledzik kledzik at apple.com
Mon Aug 25 09:54:04 PDT 2014


On Aug 25, 2014, at 8:26 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:

> On 21 August 2014 19:32, Akira Hatanaka <ahatanak at gmail.com> wrote:
>> Is there a way to distinguish between GlobalValues that are user-defined and
>> those that are compiler-defined? I am looking for a function that I can use
>> to tell if a GlobalValue is user-defined , something like
>> "GlobalValue::isUserDefined", which returns true for user-defined
>> GlobalValue.
>> 
>> I'm trying to make changes to prevent llvm from placing user-defined
>> constant arrays in the merge able constant sections. Currently, clang places
>> 16-byte constant arrays that are marked "unnamed_addr" into __literal16 for
>> macho (see following example).
>> 
>> $ cat test1.c
>> 
>> static const int s_dashArraysSize1[4] = {2, 2, 4, 6};
>> 
>> 
>> int foo1(int a) {
>> 
>>  return s_dashArraysSize1[a];
>> 
>> }
>> 
>> 
>> $ clang test1.c -S -O3 -o - | tail -n 10
>> 
>> .section __TEXT,__literal16,16byte_literals
>> 
>> .align 4                       ## @s_dashArraysSize1
>> 
>> _s_dashArraysSize1:
>> 
>> .long 2                       ## 0x2
>> 
>> .long 2                       ## 0x2
>> 
>> .long 4                       ## 0x4
>> 
>> .long 6                       ## 0x6
>> 
>> 
>> 
>> This is not desirable because macho linker wasn't originally designed to
>> handle user-defined symbols in those sections and having to handle them
>> complicates the linker. Also, there is no benefit in doing so, since the
>> linker currently doesn't try to merge user-defined variables anyway.
> 
> What does "user-defined" means in here? Since the linker can is
> involved, I assume it has something to do with the final symbol name.
> 
> At the linker level (symbol names, sections, atoms, relocations, etc),
> what exactly that is not supported?


The literalN sections were developed long ago to support coalescing of unnamed constants like 9.897 in source code for architectures that could not embed large constants in instructions.  The linker could knew how to break up the section (e.g. __literal8 is always 8 byte chunks) and coalesce copies by content.  

~6 years ago we discovered that gcc would sometimes put user named constants into the literal sections (e.g. const double foo 9.897).  This was an issue because C language rules say &a != &b, but if ‘a’ and ‘b’ are the contain the same literal value from different translation units, the linker could merge them to the same address.  For whatever reason, we could not fix gcc, so we changed to linker to never coalesce items in literal sections if there was a (non ‘L’ and non ‘l’) symbol on it.

The current state of LLVM is that is it going out of its way to move “named” constants from __const section to __literalN section. But the only possible advantage to doing that is that the hopes that the linker might coalesce it.  But the linker won’t coalesce it because it is named.  So, is there a way to keep the named values in the __const section?

-Nick





More information about the llvm-dev mailing list