[llvm-dev] [EXTERNAL] Re: RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler

James Y Knight via llvm-dev llvm-dev at lists.llvm.org
Fri May 3 14:35:04 PDT 2019


It should result in an object file with a global absolute symbol. E.g.
(here I'm building on x86-64 linux):

$ echo '.globl sym; sym = 0x600' | as -o /tmp/x1.o
$ nm /tmp/x1.o
0000000000000600 A sym

Compiling a binary that uses this, for demonstration:
$ printf $'extern int sym; int main() { sym = 5; }' | clang -c -xc - -o
/tmp/x2.o
$ clang -o /tmp/x /tmp/x1.o /tmp/x2.o

And, hey, let's run it and see it crash...

$ gdb /tmp/x
...
(gdb) run
Starting program: /tmp/x

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400486 in main ()
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x600
(gdb) x/i $pc
=> 0x400486 <main+6>:   movl   $0x5,0x600

Yep, crashed writing to 0x600, the invalid address we expected.

On Fri, May 3, 2019 at 5:06 PM Snider, Todd <t-snider at ti.com> wrote:

> Hi James,
>
>
>
> Can you explain further the existing mechanisms in clang for expressing
> placement instructions for an extern symbol? I tried the “.globl a; a =
> 0x1000" asm source suggestion and did not see any information in the
> resulting object file that the linker could interpret as a placement
> instruction.
>
>
>
> With regards to your argument about not needing or being able to
> pre-initialize data: even if a global object is not explicitly initialized,
> it may be generated into a section that is zero initialized at load or run
> time. In such cases, the location attribute is often combined with a
> “noinit” attribute that some compilers support which tells the linker to
> not initialize a specific object.
>
>
>
> ~ Todd
>
>
>
> *From:* James Y Knight [mailto:jyknight at google.com]
> *Sent:* Friday, May 3, 2019 3:26 PM
> *To:* Snider, Todd
> *Cc:* Peter Smith; Finkel, Hal J.; llvm-dev
> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: RFC - a proposal to support
> additional symbol metadata in ELF object files in the ARM compiler
>
>
>
> The need to place an extern symbol at a particular fixed address can
> already be done just by emitting an absolute symbol. This works today, no
> object-file modifications needed. The source-level attribute isn't really
> necessary either, although having it does make things marginally nicer.
> (Without it, you can just emit ".globl a; a = 0x1000" assembly, either in
> module-level inline-asm, or a separate assembly file).
>
>
>
> But the new functionality provided by this proposed extension is the
> allowance for placing *initialized* data at a fixed address. That seems
> like a rather strange requirement to me. You don't need (and, generally
> can't even reasonably HAVE) pre-initialized data for something like a
> memory-mapped peripheral register. Perhaps you could say why this would be
> a widely useful feature for the embedded processors you're concerned about?
>
>
>
> The one case I'm aware of where fixed-placement initialized data is useful
> is when setting the "fuses" on an embedded CPU. The fuses are probably not
> actually in accessible memory at all. But, from the point-of-view of the
> flash programming system if you write flash data to a particular address,
> it will write to the config fuses instead. Expressing the fuse
> configuration as initialized data in the code, rather than separate
> metadata, can be convenient. But, for that, an ELF extension isn't needed
> -- you only have one of those, and it's specified by the platform, which
> can simply provide the required linker config.
>
>
>
>
>
> On Fri, May 3, 2019 at 10:42 AM Snider, Todd via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Our motivation for the "location" or "at" attribute is really as simple as
> allowing the user to avoid having to mess with a linker command file.
>
> Working on an application for an embedded processor, they may have special
> hardware features on their board (I/O ports, peripheral register as Peter
> mentioned, etc) that they know must reside at a specific memory address.
>
> The location attribute makes it easy for the user to express to the linker
> a constraint on the placement of an object without having to manage the
> placement themselves in the linker command file.
>
> ~ Todd
>
> -----Original Message-----
> From: Peter Smith [mailto:peter.smith at linaro.org]
> Sent: Wednesday, May 1, 2019 10:27 AM
> To: Finkel, Hal J.
> Cc: Christof Douma; Snider, Todd; llvm-dev
> Subject: [EXTERNAL] Re: [llvm-dev] RFC - a proposal to support additional
> symbol metadata in ELF object files in the ARM compiler
>
> On Wed, 1 May 2019 at 15:03, Finkel, Hal J. <hfinkel at anl.gov> wrote:
> >
> > On 5/1/19 7:22 AM, Christof Douma via llvm-dev wrote:
> > > Hi Snider.
> > >
> > > As you and Peter mentioned there are indeed toolchains that allow
> location placement from within the C/C++ source code, using attributes or
> similar. I always wonder if such extension is worth the effort. There are
> downsides like the non-standard ways of communicating this information to
> the linker, different places that control location of things (linker and
> compiler sources). I would love to understand more of what is problematic
> in the more common approach for placement that is already available.
> > >
> > > The conceptual model I follow is that the C/C++ source describes the
> semantics of the program, and the linker sources (LD scripts or similar,
> depending on the toolchain in use) describe the placement of the program on
> the system/device. This gives rise to two common ways for placement that
> are used a lot that work without any non-standard extensions:
> > >
> > > * Define a variable in C/C++ in a dedicated section that a linker can
> move individually ('section' attribute in the compiler, and regular section
> placement in the linker).
> > > * Define a symbol in the linker at a certain place and used an extern
> declaration in C/C++. At this point you can either take the address of it
> (commonly used) or use it as a regular object (less common).
> > >
> > > I am very interested to hear what the weakness in these methods are,
> to understand the need of a 'location' attribute.
> >
> >
> > I like the idea of these fixed-location variables being defined as
> > actual global variables. The optimizer can actually reason about them
> > that way. The common alternative that I've seen is that programmers
> > don't generate variables at all, but rather, do something like this:
> >
> >   #define DEV_DATA (*((volatile unsigned long *)(0x2000A000)))
> >
> > and the optimizer needs to make very pessimistic assumptions about the
> > aliasing, etc. in this case. However, in the end, do we actually want
> > symbols that the linker resolves? Or do we want the immediate address?
> > Would the latter be more efficient?
> >
> > Having to define sections for each of these variables and then maintain
> > the location mappings in a linker script can be annoying -- on the other
> > hand, if you target multiple systems for which the addresses might be
> > different then having the locations in a separate file might be best
> anyway.
> >
> > What I don't understand about this proposal is how general it is. How
> > much of what is specified in a linker script can be specified this way?
> > Do we really just want a way to embed linker-script fragments into an
> > object file?
> >
>
> I suspect that clang/llvm will be agnostic with respect to what can be
> done in the linker. In effect the linker is given the instruction to
> place a section at a particular address and it is up to the linker to
> work out how to do that or error if it can't.
>
> The majority of the cases I've seen this used for are memory mapped
> peripheral registers that typically live way outside the normal memory
> map covered by the linker script. These cases are not too difficult to
> handle as the linker can generate its own fragment of linker script
> (or equivalent) from the Input Section. The more difficult case is
> where the location is in the middle of an existing OutputSection and
> this can involve changes to the linker's layout to flow non-location
> sections around it, this is a fertile source of corner case bugs. How
> much or little of this to support might be best left to the linker.
>
> Embedding linker script fragments is an interesting idea, and could
> mean that any linker that supports GNU linker scripts could use the
> feature. I think that there would be a number of challenges:
> - Precedence of section selectors, i.e. how to stop an earlier linker
> script pattern from matching the location, I guess a tempname style
> section name might help, although wildcards might pick it up.
> - The linker script fragment would need to not clash with an existing
> OutputSection. I think that this could work for memory mapped
> peripherals but it wouldn't for some of the other use cases that a
> linker might want to support.
> - Embedded ELF linkers may not support GNU Linker Script syntax.
> Although custom targets could change the linker script format as they
> see fit.
>
> Will be interesting to hear what use cases Todd had in mind.
>
> Peter
> >  -Hal
> >
> > >
> > > Thanks,
> > > Christof
> > >
> > > On 30/04/2019, 16:51, "llvm-dev on behalf of Peter Smith via
> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
> llvm-dev at lists.llvm.org> wrote:
> > >
> > >     On Tue, 30 Apr 2019 at 16:17, Snider, Todd via llvm-dev
> > >     <llvm-dev at lists.llvm.org> wrote:
> > >     >
> > >     >
> > >     >
> > >     > Hello All,
> > >     >
> > >     >
> > >     >
> > >     > In ARM embedded applications, there are some compilers that
> support useful function and variable attributes that help the compiler
> communicate information about symbols to downstream object consumers (i.e.
> linkers).
> > >     >
> > >     >
> > >     >
> > >     > One such attribute is the “location” attribute. This attribute
> can be applied to a global or local static data object or a function to
> indicate to the linker that the definition of the data object or function
> should be placed at a specific address in memory.
> > >     >
> > >     >
> > >     >
> > >     > For example, in the following code:
> > >     >
> > >     >
> > >     >
> > >     > #include <stdio.h>
> > >     >
> > >     >
> > >     >
> > >     > extern int a;
> > >     >
> > >     > int a __attribute__((location(0x1000))) = 4;
> > >     >
> > >     >
> > >     >
> > >     > struct bstruct
> > >     >
> > >     > {
> > >     >
> > >     >     int f1;
> > >     >
> > >     >     int f2;
> > >     >
> > >     > };
> > >     >
> > >     >
> > >     >
> > >     > struct bstruct b __attribute__((location(0x1004))) = {10, 12};
> > >     >
> > >     > double c __attribute__((location(0x1010))) = 1.0;
> > >     >
> > >     > char d[] __attribute__((location(0x2000)))  = {1, 2, 3, 4};
> > >     >
> > >     > void foo(double x) __attribute((location(0x4000)));
> > >     >
> > >     >
> > >     >
> > >     > void foo(double x) { printf("%f\n", x); }
> > >     >
> > >     >
> > >     >
> > >     > A location attribute has been applied to several  data objects
> and the function “foo.”  The compiler would then encode information into
> the compiled object file that tells the downstream linker about these
> memory placement constraints on the data objects and function.
> > >     >
> > >     >
> > >     >
> > >     > Without extending the ELF object format, how would this work?
> > >     >
> > >     >
> > >     >
> > >     > I propose to encode metadata information about a symbol in
> special absolute symbols, “__sym_attr_metadata.<int>”, that the linker can
> recognize when scanning the symbol table for an incoming object file. In an
> ELF symbol table entry:
> > >     >
> > >     >
> > >     >
> > >     > typedef struct {
> > >     >
> > >     >        Elf32_Word     st_name;
> > >     >
> > >     >        Elf32_Addr     st_value;
> > >     >
> > >     >        Elf32_Word     st_size;
> > >     >
> > >     >        unsigned char  st_info;
> > >     >
> > >     >        unsigned char  st_other;
> > >     >
> > >     >        Elf32_Half     st_shndx;
> > >     >
> > >     > } Elf32_Sym;
> > >     >
> > >     >
> > >     >
> > >     > typedef struct {
> > >     >
> > >     >        Elf64_Word     st_name;
> > >     >
> > >     >        unsigned char  st_info;
> > >     >
> > >     >        unsigned char  st_other;
> > >     >
> > >     >        Elf64_Half     st_shndx;
> > >     >
> > >     >        Elf64_Addr     st_value;
> > >     >
> > >     >        Elf64_Xword    st_size;
> > >     >
> > >     > } Elf64_Sym;
> > >     >
> > >     >
> > >     >
> > >     > The st_size and st_value fields could be used to represent
> attribute information about a given symbol:
> > >     >
> > >     >
> > >     >
> > >     > The st_size field can be split into an attribute ID and a symbol
> index for the symbol that the attribute applies to
> > >     >
> > >     > attribute ID: bits 0..7
> > >     > symbol index: bits 8..31
> > >     >
> > >     > The st_value field can contain the value associated with the
> attribute (i.e. the address argument of a location attribute)
> > >     >
> > >     >
> > >     >
> > >     > If the compiler is generating assembly code, a new directive
> similar to the .eabi_attribute can be used:
> > >     >
> > >     >
> > >     >
> > >     >         .symbol_attribute <symbol name>, <attribute kind>,
> <attribute value>
> > >     >
> > >     >
> > >     >
> > >     > Where:
> > >     >
> > >     > symbol name - will unambiguously identify the symbol that the
> attribute/value pair applies to
> > >     > attribute kind - is an unsigned integer between 1 and 255 that
> specifies the kind of attribute to be applied to the symbol
> > >     >
> > >     > I propose a starting base set of 2 attribute IDs: used (1),
> location (2)
> > >     > the compiler will emit the integer constant that identifies the
> attribute kind
> > >     >
> > >     > attribute value - a value that is appropriate for the specified
> attribute kind
> > >     >
> > >     >
> > >     >
> > >     > Thoughts? Comments? Concerns?
> > >     >
> > >
> > >     Hello Todd,
> > >
> > >     Thanks for bringing this up, I've got a few comments for you based
> on
> > >     the implementation of a similar attribute in another Embedded
> Compiler
> > >     (
> http://infocenter.arm.com/help/topic/com.arm.doc.dui0472m/chr1359124981140.html
> ).
> > >      In that case it was __attribute__((at(address))) but the name is
> not
> > >     that important.
> > >
> > >     The communication with the linker in that case was via section name
> > >     and not symbol, from memory at(<address>) translated to a section
> name
> > >     of .ARM.__at_<address>. For us this had some advantages:
> > >     - We could use __attribute__((section(".ARM.__at_<address>")))
> when
> > >     the compiler didn't support the attribute, it also needed no
> support
> > >     in the assembler. This wasn't ideal as it is nice to be able to use
> > >     expressions for the address, but it gets you most of the way there.
> > >     - In practice you'd likely need a separate section for each
> variable
> > >     to avoid problems at link time. For example if you had two
> variables
> > >     with non-contiguous locations you'd most likely not want these in
> the
> > >     same section so this mapped quite well to something similar to
> > >     __attribute__((section(name))).
> > >     - We did find some properties of __attribute__((section("name")))
> > >     inconvenient, especially that variables would come out as
> SHT_PROGBITS
> > >     when in many cases the user wanted SHT_NOBITS (memory mapped
> > >     peripheral), we had our custom attribute fix that.
> > >
> > >     If you used a section name rather than a symbol then you may not
> need
> > >     any backend changes and it would generalise over all ELF targets.
> > >     Linker support is another question entirely though.
> > >
> > >     Peter
> > >
> > >     >
> > >     >
> > >     > The anticipated next steps would be to add support for the
> location attribute and update the ARM/ELF LLVM back-end to support encoding
> the used attribute with the new mechanism.
> > >     >
> > >     >
> > >     >
> > >     > ~ Todd Snider
> > >     >
> > >     >
> > >     >
> > >     > Code Generation Tools Group
> > >     >
> > >     > Texas Instruments Incorporated
> > >     >
> > >     >
> > >     >
> > >     >
> > >     >
> > >     > _______________________________________________
> > >     > LLVM Developers mailing list
> > >     > llvm-dev at lists.llvm.org
> > >     > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >     _______________________________________________
> > >     LLVM Developers mailing list
> > >     llvm-dev at lists.llvm.org
> > >     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> > >
> > > IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > --
> > Hal Finkel
> > Lead, Compiler Technology and Programming Languages
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190503/6abb2967/attachment-0001.html>


More information about the llvm-dev mailing list