[llvm-dev] [EXTERNAL] Re: RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Mon May 6 23:42:55 PDT 2019


I have the same question as James has. It seems to me that you can name any
address using an absolute symbol, and that should suffice to handle
memory-mapped peripherals and such. If you really need to define data
(whether it's in .data or .bss) or a function at a fixed memory address,
that's not something you can do with absolute symbols (but you can do with
linker scripts), but is this what you really want?

*From: *James Y Knight via llvm-dev <llvm-dev at lists.llvm.org>
*Date: *Tue, May 7, 2019 at 2:39 AM
*To: *Snider, Todd
*Cc: *llvm-dev

I don't think it's a "trick" at all -- it's just a definition of the symbol
> at an absolute address. That's what absolute symbols are for. (You can also
> use ".size sym, 4" ".type sym, object", if you want to let the linker know
> that the symbol refers to 4 bytes of data. I'm not sure if that's part of
> your concern about it not being real?)
>
> For the use-case of accessing memory-mapped peripheral registers, this
> functionality seems already sufficient. Do you disagree?
>
> However, if you have a requirement to place initialized data at a fixed
> address, then this pre-existing functionality does not address that
> requirement. But, I'm not sure what use-cases you're thinking of where this
> is a requirement. Can you talk about what you have in mind?
>
>
> On Mon, May 6, 2019 at 9:10 AM Snider, Todd <t-snider at ti.com> wrote:
>
>>
>>
>> James,
>>
>>
>>
>> What you are doing below is tricking the compiler into believing that it
>> is dealing with a real int object that has actual space allocated to it in
>> x2.o, but sym is not defined as a real data object in x1.o
>>
>>
>>
>> Thanks, but that doesn’t really address my use case. I still contend that
>> associating a placement address with an actual data object (whether it be
>> initialized or not) or function, where the symbol is defined in a section
>> containing the definition of the data object or function, is a useful
>> feature for customers.
>>
>>
>>
>> ~ Todd
>>
>>
>>
>> *From:* James Y Knight [mailto:jyknight at google.com]
>> *Sent:* Friday, May 3, 2019 4:35 PM
>> *To:* Snider, Todd; Peter Smith; Finkel, Hal J.; llvm-dev
>> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: RFC - a proposal to support
>> additional symbol metadata in ELF object files in the ARM compiler
>>
>>
>>
>> It should result in an object file with a global absolute symbol. E.g.
>> (here I'm building on x86-64 linux):
>>
>>
>>
>> $ echo '.globl sym; sym = 0x600' | as -o /tmp/x1.o
>>
>> $ nm /tmp/x1.o
>>
>> 0000000000000600 A sym
>>
>> Compiling a binary that uses this, for demonstration:
>>
>> $ printf $'extern int sym; int main() { sym = 5; }' | clang -c -xc - -o
>> /tmp/x2.o
>>
>> $ clang -o /tmp/x /tmp/x1.o /tmp/x2.o
>>
>>
>>
>> And, hey, let's run it and see it crash...
>>
>>
>>
>> $ gdb /tmp/x
>>
>> ...
>>
>> (gdb) run
>>
>> Starting program: /tmp/x
>>
>>
>>
>> Program received signal SIGSEGV, Segmentation fault.
>>
>> 0x0000000000400486 in main ()
>>
>> (gdb) p $_siginfo._sifields._sigfault.si_addr
>>
>> $1 = (void *) 0x600
>>
>> (gdb) x/i $pc
>> => 0x400486 <main+6>:   movl   $0x5,0x600
>>
>> Yep, crashed writing to 0x600, the invalid address we expected.
>>
>>
>>
>> On Fri, May 3, 2019 at 5:06 PM Snider, Todd <t-snider at ti.com> wrote:
>>
>> Hi James,
>>
>>
>>
>> Can you explain further the existing mechanisms in clang for expressing
>> placement instructions for an extern symbol? I tried the “.globl a; a =
>> 0x1000" asm source suggestion and did not see any information in the
>> resulting object file that the linker could interpret as a placement
>> instruction.
>>
>>
>>
>> With regards to your argument about not needing or being able to
>> pre-initialize data: even if a global object is not explicitly initialized,
>> it may be generated into a section that is zero initialized at load or run
>> time. In such cases, the location attribute is often combined with a
>> “noinit” attribute that some compilers support which tells the linker to
>> not initialize a specific object.
>>
>>
>>
>> ~ Todd
>>
>>
>>
>> *From:* James Y Knight [mailto:jyknight at google.com]
>> *Sent:* Friday, May 3, 2019 3:26 PM
>> *To:* Snider, Todd
>> *Cc:* Peter Smith; Finkel, Hal J.; llvm-dev
>> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: RFC - a proposal to support
>> additional symbol metadata in ELF object files in the ARM compiler
>>
>>
>>
>> The need to place an extern symbol at a particular fixed address can
>> already be done just by emitting an absolute symbol. This works today, no
>> object-file modifications needed. The source-level attribute isn't really
>> necessary either, although having it does make things marginally nicer.
>> (Without it, you can just emit ".globl a; a = 0x1000" assembly, either in
>> module-level inline-asm, or a separate assembly file).
>>
>>
>>
>> But the new functionality provided by this proposed extension is the
>> allowance for placing *initialized* data at a fixed address. That seems
>> like a rather strange requirement to me. You don't need (and, generally
>> can't even reasonably HAVE) pre-initialized data for something like a
>> memory-mapped peripheral register. Perhaps you could say why this would be
>> a widely useful feature for the embedded processors you're concerned about?
>>
>>
>>
>> The one case I'm aware of where fixed-placement initialized data is
>> useful is when setting the "fuses" on an embedded CPU. The fuses are
>> probably not actually in accessible memory at all. But, from the
>> point-of-view of the flash programming system if you write flash data to a
>> particular address, it will write to the config fuses instead. Expressing
>> the fuse configuration as initialized data in the code, rather than
>> separate metadata, can be convenient. But, for that, an ELF extension isn't
>> needed -- you only have one of those, and it's specified by the platform,
>> which can simply provide the required linker config.
>>
>>
>>
>>
>>
>> On Fri, May 3, 2019 at 10:42 AM Snider, Todd via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Our motivation for the "location" or "at" attribute is really as simple
>> as allowing the user to avoid having to mess with a linker command file.
>>
>> Working on an application for an embedded processor, they may have
>> special hardware features on their board (I/O ports, peripheral register as
>> Peter mentioned, etc) that they know must reside at a specific memory
>> address.
>>
>> The location attribute makes it easy for the user to express to the
>> linker a constraint on the placement of an object without having to manage
>> the placement themselves in the linker command file.
>>
>> ~ Todd
>>
>> -----Original Message-----
>> From: Peter Smith [mailto:peter.smith at linaro.org]
>> Sent: Wednesday, May 1, 2019 10:27 AM
>> To: Finkel, Hal J.
>> Cc: Christof Douma; Snider, Todd; llvm-dev
>> Subject: [EXTERNAL] Re: [llvm-dev] RFC - a proposal to support additional
>> symbol metadata in ELF object files in the ARM compiler
>>
>> On Wed, 1 May 2019 at 15:03, Finkel, Hal J. <hfinkel at anl.gov> wrote:
>> >
>> > On 5/1/19 7:22 AM, Christof Douma via llvm-dev wrote:
>> > > Hi Snider.
>> > >
>> > > As you and Peter mentioned there are indeed toolchains that allow
>> location placement from within the C/C++ source code, using attributes or
>> similar. I always wonder if such extension is worth the effort. There are
>> downsides like the non-standard ways of communicating this information to
>> the linker, different places that control location of things (linker and
>> compiler sources). I would love to understand more of what is problematic
>> in the more common approach for placement that is already available.
>> > >
>> > > The conceptual model I follow is that the C/C++ source describes the
>> semantics of the program, and the linker sources (LD scripts or similar,
>> depending on the toolchain in use) describe the placement of the program on
>> the system/device. This gives rise to two common ways for placement that
>> are used a lot that work without any non-standard extensions:
>> > >
>> > > * Define a variable in C/C++ in a dedicated section that a linker can
>> move individually ('section' attribute in the compiler, and regular section
>> placement in the linker).
>> > > * Define a symbol in the linker at a certain place and used an extern
>> declaration in C/C++. At this point you can either take the address of it
>> (commonly used) or use it as a regular object (less common).
>> > >
>> > > I am very interested to hear what the weakness in these methods are,
>> to understand the need of a 'location' attribute.
>> >
>> >
>> > I like the idea of these fixed-location variables being defined as
>> > actual global variables. The optimizer can actually reason about them
>> > that way. The common alternative that I've seen is that programmers
>> > don't generate variables at all, but rather, do something like this:
>> >
>> >   #define DEV_DATA (*((volatile unsigned long *)(0x2000A000)))
>> >
>> > and the optimizer needs to make very pessimistic assumptions about the
>> > aliasing, etc. in this case. However, in the end, do we actually want
>> > symbols that the linker resolves? Or do we want the immediate address?
>> > Would the latter be more efficient?
>> >
>> > Having to define sections for each of these variables and then maintain
>> > the location mappings in a linker script can be annoying -- on the other
>> > hand, if you target multiple systems for which the addresses might be
>> > different then having the locations in a separate file might be best
>> anyway.
>> >
>> > What I don't understand about this proposal is how general it is. How
>> > much of what is specified in a linker script can be specified this way?
>> > Do we really just want a way to embed linker-script fragments into an
>> > object file?
>> >
>>
>> I suspect that clang/llvm will be agnostic with respect to what can be
>> done in the linker. In effect the linker is given the instruction to
>> place a section at a particular address and it is up to the linker to
>> work out how to do that or error if it can't.
>>
>> The majority of the cases I've seen this used for are memory mapped
>> peripheral registers that typically live way outside the normal memory
>> map covered by the linker script. These cases are not too difficult to
>> handle as the linker can generate its own fragment of linker script
>> (or equivalent) from the Input Section. The more difficult case is
>> where the location is in the middle of an existing OutputSection and
>> this can involve changes to the linker's layout to flow non-location
>> sections around it, this is a fertile source of corner case bugs. How
>> much or little of this to support might be best left to the linker.
>>
>> Embedding linker script fragments is an interesting idea, and could
>> mean that any linker that supports GNU linker scripts could use the
>> feature. I think that there would be a number of challenges:
>> - Precedence of section selectors, i.e. how to stop an earlier linker
>> script pattern from matching the location, I guess a tempname style
>> section name might help, although wildcards might pick it up.
>> - The linker script fragment would need to not clash with an existing
>> OutputSection. I think that this could work for memory mapped
>> peripherals but it wouldn't for some of the other use cases that a
>> linker might want to support.
>> - Embedded ELF linkers may not support GNU Linker Script syntax.
>> Although custom targets could change the linker script format as they
>> see fit.
>>
>> Will be interesting to hear what use cases Todd had in mind.
>>
>> Peter
>> >  -Hal
>> >
>> > >
>> > > Thanks,
>> > > Christof
>> > >
>> > > On 30/04/2019, 16:51, "llvm-dev on behalf of Peter Smith via
>> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
>> llvm-dev at lists.llvm.org> wrote:
>> > >
>> > >     On Tue, 30 Apr 2019 at 16:17, Snider, Todd via llvm-dev
>> > >     <llvm-dev at lists.llvm.org> wrote:
>> > >     >
>> > >     >
>> > >     >
>> > >     > Hello All,
>> > >     >
>> > >     >
>> > >     >
>> > >     > In ARM embedded applications, there are some compilers that
>> support useful function and variable attributes that help the compiler
>> communicate information about symbols to downstream object consumers (i.e.
>> linkers).
>> > >     >
>> > >     >
>> > >     >
>> > >     > One such attribute is the “location” attribute. This attribute
>> can be applied to a global or local static data object or a function to
>> indicate to the linker that the definition of the data object or function
>> should be placed at a specific address in memory.
>> > >     >
>> > >     >
>> > >     >
>> > >     > For example, in the following code:
>> > >     >
>> > >     >
>> > >     >
>> > >     > #include <stdio.h>
>> > >     >
>> > >     >
>> > >     >
>> > >     > extern int a;
>> > >     >
>> > >     > int a __attribute__((location(0x1000))) = 4;
>> > >     >
>> > >     >
>> > >     >
>> > >     > struct bstruct
>> > >     >
>> > >     > {
>> > >     >
>> > >     >     int f1;
>> > >     >
>> > >     >     int f2;
>> > >     >
>> > >     > };
>> > >     >
>> > >     >
>> > >     >
>> > >     > struct bstruct b __attribute__((location(0x1004))) = {10, 12};
>> > >     >
>> > >     > double c __attribute__((location(0x1010))) = 1.0;
>> > >     >
>> > >     > char d[] __attribute__((location(0x2000)))  = {1, 2, 3, 4};
>> > >     >
>> > >     > void foo(double x) __attribute((location(0x4000)));
>> > >     >
>> > >     >
>> > >     >
>> > >     > void foo(double x) { printf("%f\n", x); }
>> > >     >
>> > >     >
>> > >     >
>> > >     > A location attribute has been applied to several  data objects
>> and the function “foo.”  The compiler would then encode information into
>> the compiled object file that tells the downstream linker about these
>> memory placement constraints on the data objects and function.
>> > >     >
>> > >     >
>> > >     >
>> > >     > Without extending the ELF object format, how would this work?
>> > >     >
>> > >     >
>> > >     >
>> > >     > I propose to encode metadata information about a symbol in
>> special absolute symbols, “__sym_attr_metadata.<int>”, that the linker can
>> recognize when scanning the symbol table for an incoming object file. In an
>> ELF symbol table entry:
>> > >     >
>> > >     >
>> > >     >
>> > >     > typedef struct {
>> > >     >
>> > >     >        Elf32_Word     st_name;
>> > >     >
>> > >     >        Elf32_Addr     st_value;
>> > >     >
>> > >     >        Elf32_Word     st_size;
>> > >     >
>> > >     >        unsigned char  st_info;
>> > >     >
>> > >     >        unsigned char  st_other;
>> > >     >
>> > >     >        Elf32_Half     st_shndx;
>> > >     >
>> > >     > } Elf32_Sym;
>> > >     >
>> > >     >
>> > >     >
>> > >     > typedef struct {
>> > >     >
>> > >     >        Elf64_Word     st_name;
>> > >     >
>> > >     >        unsigned char  st_info;
>> > >     >
>> > >     >        unsigned char  st_other;
>> > >     >
>> > >     >        Elf64_Half     st_shndx;
>> > >     >
>> > >     >        Elf64_Addr     st_value;
>> > >     >
>> > >     >        Elf64_Xword    st_size;
>> > >     >
>> > >     > } Elf64_Sym;
>> > >     >
>> > >     >
>> > >     >
>> > >     > The st_size and st_value fields could be used to represent
>> attribute information about a given symbol:
>> > >     >
>> > >     >
>> > >     >
>> > >     > The st_size field can be split into an attribute ID and a
>> symbol index for the symbol that the attribute applies to
>> > >     >
>> > >     > attribute ID: bits 0..7
>> > >     > symbol index: bits 8..31
>> > >     >
>> > >     > The st_value field can contain the value associated with the
>> attribute (i.e. the address argument of a location attribute)
>> > >     >
>> > >     >
>> > >     >
>> > >     > If the compiler is generating assembly code, a new directive
>> similar to the .eabi_attribute can be used:
>> > >     >
>> > >     >
>> > >     >
>> > >     >         .symbol_attribute <symbol name>, <attribute kind>,
>> <attribute value>
>> > >     >
>> > >     >
>> > >     >
>> > >     > Where:
>> > >     >
>> > >     > symbol name - will unambiguously identify the symbol that the
>> attribute/value pair applies to
>> > >     > attribute kind - is an unsigned integer between 1 and 255 that
>> specifies the kind of attribute to be applied to the symbol
>> > >     >
>> > >     > I propose a starting base set of 2 attribute IDs: used (1),
>> location (2)
>> > >     > the compiler will emit the integer constant that identifies the
>> attribute kind
>> > >     >
>> > >     > attribute value - a value that is appropriate for the specified
>> attribute kind
>> > >     >
>> > >     >
>> > >     >
>> > >     > Thoughts? Comments? Concerns?
>> > >     >
>> > >
>> > >     Hello Todd,
>> > >
>> > >     Thanks for bringing this up, I've got a few comments for you
>> based on
>> > >     the implementation of a similar attribute in another Embedded
>> Compiler
>> > >     (
>> http://infocenter.arm.com/help/topic/com.arm.doc.dui0472m/chr1359124981140.html
>> ).
>> > >      In that case it was __attribute__((at(address))) but the name is
>> not
>> > >     that important.
>> > >
>> > >     The communication with the linker in that case was via section
>> name
>> > >     and not symbol, from memory at(<address>) translated to a section
>> name
>> > >     of .ARM.__at_<address>. For us this had some advantages:
>> > >     - We could use __attribute__((section(".ARM.__at_<address>")))
>> when
>> > >     the compiler didn't support the attribute, it also needed no
>> support
>> > >     in the assembler. This wasn't ideal as it is nice to be able to
>> use
>> > >     expressions for the address, but it gets you most of the way
>> there.
>> > >     - In practice you'd likely need a separate section for each
>> variable
>> > >     to avoid problems at link time. For example if you had two
>> variables
>> > >     with non-contiguous locations you'd most likely not want these in
>> the
>> > >     same section so this mapped quite well to something similar to
>> > >     __attribute__((section(name))).
>> > >     - We did find some properties of __attribute__((section("name")))
>> > >     inconvenient, especially that variables would come out as
>> SHT_PROGBITS
>> > >     when in many cases the user wanted SHT_NOBITS (memory mapped
>> > >     peripheral), we had our custom attribute fix that.
>> > >
>> > >     If you used a section name rather than a symbol then you may not
>> need
>> > >     any backend changes and it would generalise over all ELF targets.
>> > >     Linker support is another question entirely though.
>> > >
>> > >     Peter
>> > >
>> > >     >
>> > >     >
>> > >     > The anticipated next steps would be to add support for the
>> location attribute and update the ARM/ELF LLVM back-end to support encoding
>> the used attribute with the new mechanism.
>> > >     >
>> > >     >
>> > >     >
>> > >     > ~ Todd Snider
>> > >     >
>> > >     >
>> > >     >
>> > >     > Code Generation Tools Group
>> > >     >
>> > >     > Texas Instruments Incorporated
>> > >     >
>> > >     >
>> > >     >
>> > >     >
>> > >     >
>> > >     > _______________________________________________
>> > >     > LLVM Developers mailing list
>> > >     > llvm-dev at lists.llvm.org
>> > >     > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > >     _______________________________________________
>> > >     LLVM Developers mailing list
>> > >     llvm-dev at lists.llvm.org
>> > >     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> > >
>> > >
>> > > IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>> > > _______________________________________________
>> > > LLVM Developers mailing list
>> > > llvm-dev at lists.llvm.org
>> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> > --
>> > Hal Finkel
>> > Lead, Compiler Technology and Programming Languages
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190507/391f2f3b/attachment.html>


More information about the llvm-dev mailing list