[llvm-dev] [EXTERNAL] Re: RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Thu May 9 05:02:44 PDT 2019


*From: *Snider, Todd <t-snider at ti.com>
*Date: *Thu, May 9, 2019 at 3:53 AM
*To: *Rui Ueyama, James Y Knight
*Cc: *llvm-dev


>
> James, Rui,
>
>
>
> If we are only talking about addressable hardware registers, peripherals,
> etc., then the absolute address symbol is one way to facilitate access to a
> symbol associated with a specific address.
>
>
>
> And yes, I would agree that data or code can be placed at a specific
> address using linker scripts, but it is not the most user-friendly of
> solutions.
>
>
>
> Consider a simple example, say ex1.c contains:
>
>
>
>   int xyz __attribute__((section(“.bss:xyz”))) = 10;
>
>
>
> The compiler will generate a definition of xyz into the section
> “.bss:xyz”, in the linker script something like this can be added to
> dictate the placement:
>
>
>
>    .special_bss: { ex1.o(.bss:xyz); } > 0x1000
>
>
>
> This is straightforward, but now there is a coupling between the
> application’s source code and the linker script.
>
>
>
> I think the location attribute gives a developer a cleaner, more concise
> means of expressing a placement constraint on a piece of code or data.
>
>
>
> Even using the location attribute on the above example shows this,
>
>
>
> int xyz __attribute__((location(0x1000))) = 10;
>
>
>
> In addition to the definition of xyz in its own section, the compiler will
> emit metadata that the linker understands as a specific placement
> instruction for xyz’s section. No edit of a linker script is needed.
>
>
>
> The use cases where I’ve seen the location attribute be particularly
> helpful are instances where code in ROM or a boot loader needs to access
> code or data at a particular address. For example, a boot routine in ROM
> may have security requirements for code and data that is loaded into FLASH
> memory and may have hardcoded addresses that it accesses to perform the
> security check. The code that is to be loaded into that FLASH memory can
> use location attributes to reserve space for the data objects at the
> specific addresses that the boot routine needs to access. In this instance
> using location attributes helps to reduce the maintenance that a developer
> may otherwise have to do with linker scripts.
>

In the above scenario I believe you will end up having to write a linker
script anyway. If you write a program that reside in a flash memory at a
specific location, I think not only some specific data but the entire
program needs to be instructed how to lay it out. There might be a scenario
that you don't care about how other parts of your program are located in
memory, but what if the address of the flash memory collides with the
default layout? What is the expected behavior?

There are many tricky scenarios that I do not know what is the expected
behavior:

 - If a user attempt to locate a function at 0x1000, data at 0x2000,
another function at 0x3000, and another data at 0x4000. Should we create
four segments for each function and data?
 - What if a specified location collides with other data's specified
location?
 - What if a specified location collides with the default layout?
 - What if a user attempts to put data and function to the same page?

I think if you can just say "place this piece of data at address 0xXXXX"
and everything automagically works, it's great, but putting some piece of
data at a specific location have global effect how other pieces of data and
functions are laid out, so it looks like that kind of directive
underspecifies what we actually want.

I mentioned earlier in this thread that a motivation for the location
> attribute is to allow the user to avoid messing with a linker command file
> or script. While the location attribute does not provide new functionality,
> I am arguing that the location attribute provides enough value in terms of
> usability improvements vs. existing methods to justify adding support for
> it.
>
>
>
> ~ Todd
>
>
>
> *From:* Rui Ueyama [mailto:ruiu at google.com]
> *Sent:* Tuesday, May 7, 2019 1:43 AM
> *To:* James Y Knight
> *Cc:* Snider, Todd; llvm-dev
> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: RFC - a proposal to support
> additional symbol metadata in ELF object files in the ARM compiler
>
>
>
> I have the same question as James has. It seems to me that you can name
> any address using an absolute symbol, and that should suffice to handle
> memory-mapped peripherals and such. If you really need to define data
> (whether it's in .data or .bss) or a function at a fixed memory address,
> that's not something you can do with absolute symbols (but you can do with
> linker scripts), but is this what you really want?
>
>
>
> *From: *James Y Knight via llvm-dev <llvm-dev at lists.llvm.org>
> *Date: *Tue, May 7, 2019 at 2:39 AM
> *To: *Snider, Todd
> *Cc: *llvm-dev
>
> I don't think it's a "trick" at all -- it's just a definition of the
> symbol at an absolute address. That's what absolute symbols are for. (You
> can also use ".size sym, 4" ".type sym, object", if you want to let the
> linker know that the symbol refers to 4 bytes of data. I'm not sure if
> that's part of your concern about it not being real?)
>
>
>
> For the use-case of accessing memory-mapped peripheral registers, this
> functionality seems already sufficient. Do you disagree?
>
>
>
> However, if you have a requirement to place initialized data at a fixed
> address, then this pre-existing functionality does not address that
> requirement. But, I'm not sure what use-cases you're thinking of where this
> is a requirement. Can you talk about what you have in mind?
>
>
>
>
>
> On Mon, May 6, 2019 at 9:10 AM Snider, Todd <t-snider at ti.com> wrote:
>
>
>
> James,
>
>
>
> What you are doing below is tricking the compiler into believing that it
> is dealing with a real int object that has actual space allocated to it in
> x2.o, but sym is not defined as a real data object in x1.o
>
>
>
> Thanks, but that doesn’t really address my use case. I still contend that
> associating a placement address with an actual data object (whether it be
> initialized or not) or function, where the symbol is defined in a section
> containing the definition of the data object or function, is a useful
> feature for customers.
>
>
>
> ~ Todd
>
>
>
> *From:* James Y Knight [mailto:jyknight at google.com]
> *Sent:* Friday, May 3, 2019 4:35 PM
> *To:* Snider, Todd; Peter Smith; Finkel, Hal J.; llvm-dev
> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: RFC - a proposal to support
> additional symbol metadata in ELF object files in the ARM compiler
>
>
>
> It should result in an object file with a global absolute symbol. E.g.
> (here I'm building on x86-64 linux):
>
>
>
> $ echo '.globl sym; sym = 0x600' | as -o /tmp/x1.o
>
> $ nm /tmp/x1.o
>
> 0000000000000600 A sym
>
> Compiling a binary that uses this, for demonstration:
>
> $ printf $'extern int sym; int main() { sym = 5; }' | clang -c -xc - -o
> /tmp/x2.o
>
> $ clang -o /tmp/x /tmp/x1.o /tmp/x2.o
>
>
>
> And, hey, let's run it and see it crash...
>
>
>
> $ gdb /tmp/x
>
> ...
>
> (gdb) run
>
> Starting program: /tmp/x
>
>
>
> Program received signal SIGSEGV, Segmentation fault.
>
> 0x0000000000400486 in main ()
>
> (gdb) p $_siginfo._sifields._sigfault.si_addr
>
> $1 = (void *) 0x600
>
> (gdb) x/i $pc
> => 0x400486 <main+6>:   movl   $0x5,0x600
>
> Yep, crashed writing to 0x600, the invalid address we expected.
>
>
>
> On Fri, May 3, 2019 at 5:06 PM Snider, Todd <t-snider at ti.com> wrote:
>
> Hi James,
>
>
>
> Can you explain further the existing mechanisms in clang for expressing
> placement instructions for an extern symbol? I tried the “.globl a; a =
> 0x1000" asm source suggestion and did not see any information in the
> resulting object file that the linker could interpret as a placement
> instruction.
>
>
>
> With regards to your argument about not needing or being able to
> pre-initialize data: even if a global object is not explicitly initialized,
> it may be generated into a section that is zero initialized at load or run
> time. In such cases, the location attribute is often combined with a
> “noinit” attribute that some compilers support which tells the linker to
> not initialize a specific object.
>
>
>
> ~ Todd
>
>
>
> *From:* James Y Knight [mailto:jyknight at google.com]
> *Sent:* Friday, May 3, 2019 3:26 PM
> *To:* Snider, Todd
> *Cc:* Peter Smith; Finkel, Hal J.; llvm-dev
> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: RFC - a proposal to support
> additional symbol metadata in ELF object files in the ARM compiler
>
>
>
> The need to place an extern symbol at a particular fixed address can
> already be done just by emitting an absolute symbol. This works today, no
> object-file modifications needed. The source-level attribute isn't really
> necessary either, although having it does make things marginally nicer.
> (Without it, you can just emit ".globl a; a = 0x1000" assembly, either in
> module-level inline-asm, or a separate assembly file).
>
>
>
> But the new functionality provided by this proposed extension is the
> allowance for placing *initialized* data at a fixed address. That seems
> like a rather strange requirement to me. You don't need (and, generally
> can't even reasonably HAVE) pre-initialized data for something like a
> memory-mapped peripheral register. Perhaps you could say why this would be
> a widely useful feature for the embedded processors you're concerned about?
>
>
>
> The one case I'm aware of where fixed-placement initialized data is useful
> is when setting the "fuses" on an embedded CPU. The fuses are probably not
> actually in accessible memory at all. But, from the point-of-view of the
> flash programming system if you write flash data to a particular address,
> it will write to the config fuses instead. Expressing the fuse
> configuration as initialized data in the code, rather than separate
> metadata, can be convenient. But, for that, an ELF extension isn't needed
> -- you only have one of those, and it's specified by the platform, which
> can simply provide the required linker config.
>
>
>
>
>
> On Fri, May 3, 2019 at 10:42 AM Snider, Todd via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Our motivation for the "location" or "at" attribute is really as simple as
> allowing the user to avoid having to mess with a linker command file.
>
> Working on an application for an embedded processor, they may have special
> hardware features on their board (I/O ports, peripheral register as Peter
> mentioned, etc) that they know must reside at a specific memory address.
>
> The location attribute makes it easy for the user to express to the linker
> a constraint on the placement of an object without having to manage the
> placement themselves in the linker command file.
>
> ~ Todd
>
> -----Original Message-----
> From: Peter Smith [mailto:peter.smith at linaro.org]
> Sent: Wednesday, May 1, 2019 10:27 AM
> To: Finkel, Hal J.
> Cc: Christof Douma; Snider, Todd; llvm-dev
> Subject: [EXTERNAL] Re: [llvm-dev] RFC - a proposal to support additional
> symbol metadata in ELF object files in the ARM compiler
>
> On Wed, 1 May 2019 at 15:03, Finkel, Hal J. <hfinkel at anl.gov> wrote:
> >
> > On 5/1/19 7:22 AM, Christof Douma via llvm-dev wrote:
> > > Hi Snider.
> > >
> > > As you and Peter mentioned there are indeed toolchains that allow
> location placement from within the C/C++ source code, using attributes or
> similar. I always wonder if such extension is worth the effort. There are
> downsides like the non-standard ways of communicating this information to
> the linker, different places that control location of things (linker and
> compiler sources). I would love to understand more of what is problematic
> in the more common approach for placement that is already available.
> > >
> > > The conceptual model I follow is that the C/C++ source describes the
> semantics of the program, and the linker sources (LD scripts or similar,
> depending on the toolchain in use) describe the placement of the program on
> the system/device. This gives rise to two common ways for placement that
> are used a lot that work without any non-standard extensions:
> > >
> > > * Define a variable in C/C++ in a dedicated section that a linker can
> move individually ('section' attribute in the compiler, and regular section
> placement in the linker).
> > > * Define a symbol in the linker at a certain place and used an extern
> declaration in C/C++. At this point you can either take the address of it
> (commonly used) or use it as a regular object (less common).
> > >
> > > I am very interested to hear what the weakness in these methods are,
> to understand the need of a 'location' attribute.
> >
> >
> > I like the idea of these fixed-location variables being defined as
> > actual global variables. The optimizer can actually reason about them
> > that way. The common alternative that I've seen is that programmers
> > don't generate variables at all, but rather, do something like this:
> >
> >   #define DEV_DATA (*((volatile unsigned long *)(0x2000A000)))
> >
> > and the optimizer needs to make very pessimistic assumptions about the
> > aliasing, etc. in this case. However, in the end, do we actually want
> > symbols that the linker resolves? Or do we want the immediate address?
> > Would the latter be more efficient?
> >
> > Having to define sections for each of these variables and then maintain
> > the location mappings in a linker script can be annoying -- on the other
> > hand, if you target multiple systems for which the addresses might be
> > different then having the locations in a separate file might be best
> anyway.
> >
> > What I don't understand about this proposal is how general it is. How
> > much of what is specified in a linker script can be specified this way?
> > Do we really just want a way to embed linker-script fragments into an
> > object file?
> >
>
> I suspect that clang/llvm will be agnostic with respect to what can be
> done in the linker. In effect the linker is given the instruction to
> place a section at a particular address and it is up to the linker to
> work out how to do that or error if it can't.
>
> The majority of the cases I've seen this used for are memory mapped
> peripheral registers that typically live way outside the normal memory
> map covered by the linker script. These cases are not too difficult to
> handle as the linker can generate its own fragment of linker script
> (or equivalent) from the Input Section. The more difficult case is
> where the location is in the middle of an existing OutputSection and
> this can involve changes to the linker's layout to flow non-location
> sections around it, this is a fertile source of corner case bugs. How
> much or little of this to support might be best left to the linker.
>
> Embedding linker script fragments is an interesting idea, and could
> mean that any linker that supports GNU linker scripts could use the
> feature. I think that there would be a number of challenges:
> - Precedence of section selectors, i.e. how to stop an earlier linker
> script pattern from matching the location, I guess a tempname style
> section name might help, although wildcards might pick it up.
> - The linker script fragment would need to not clash with an existing
> OutputSection. I think that this could work for memory mapped
> peripherals but it wouldn't for some of the other use cases that a
> linker might want to support.
> - Embedded ELF linkers may not support GNU Linker Script syntax.
> Although custom targets could change the linker script format as they
> see fit.
>
> Will be interesting to hear what use cases Todd had in mind.
>
> Peter
> >  -Hal
> >
> > >
> > > Thanks,
> > > Christof
> > >
> > > On 30/04/2019, 16:51, "llvm-dev on behalf of Peter Smith via
> llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of
> llvm-dev at lists.llvm.org> wrote:
> > >
> > >     On Tue, 30 Apr 2019 at 16:17, Snider, Todd via llvm-dev
> > >     <llvm-dev at lists.llvm.org> wrote:
> > >     >
> > >     >
> > >     >
> > >     > Hello All,
> > >     >
> > >     >
> > >     >
> > >     > In ARM embedded applications, there are some compilers that
> support useful function and variable attributes that help the compiler
> communicate information about symbols to downstream object consumers (i.e.
> linkers).
> > >     >
> > >     >
> > >     >
> > >     > One such attribute is the “location” attribute. This attribute
> can be applied to a global or local static data object or a function to
> indicate to the linker that the definition of the data object or function
> should be placed at a specific address in memory.
> > >     >
> > >     >
> > >     >
> > >     > For example, in the following code:
> > >     >
> > >     >
> > >     >
> > >     > #include <stdio.h>
> > >     >
> > >     >
> > >     >
> > >     > extern int a;
> > >     >
> > >     > int a __attribute__((location(0x1000))) = 4;
> > >     >
> > >     >
> > >     >
> > >     > struct bstruct
> > >     >
> > >     > {
> > >     >
> > >     >     int f1;
> > >     >
> > >     >     int f2;
> > >     >
> > >     > };
> > >     >
> > >     >
> > >     >
> > >     > struct bstruct b __attribute__((location(0x1004))) = {10, 12};
> > >     >
> > >     > double c __attribute__((location(0x1010))) = 1.0;
> > >     >
> > >     > char d[] __attribute__((location(0x2000)))  = {1, 2, 3, 4};
> > >     >
> > >     > void foo(double x) __attribute((location(0x4000)));
> > >     >
> > >     >
> > >     >
> > >     > void foo(double x) { printf("%f\n", x); }
> > >     >
> > >     >
> > >     >
> > >     > A location attribute has been applied to several  data objects
> and the function “foo.”  The compiler would then encode information into
> the compiled object file that tells the downstream linker about these
> memory placement constraints on the data objects and function.
> > >     >
> > >     >
> > >     >
> > >     > Without extending the ELF object format, how would this work?
> > >     >
> > >     >
> > >     >
> > >     > I propose to encode metadata information about a symbol in
> special absolute symbols, “__sym_attr_metadata.<int>”, that the linker can
> recognize when scanning the symbol table for an incoming object file. In an
> ELF symbol table entry:
> > >     >
> > >     >
> > >     >
> > >     > typedef struct {
> > >     >
> > >     >        Elf32_Word     st_name;
> > >     >
> > >     >        Elf32_Addr     st_value;
> > >     >
> > >     >        Elf32_Word     st_size;
> > >     >
> > >     >        unsigned char  st_info;
> > >     >
> > >     >        unsigned char  st_other;
> > >     >
> > >     >        Elf32_Half     st_shndx;
> > >     >
> > >     > } Elf32_Sym;
> > >     >
> > >     >
> > >     >
> > >     > typedef struct {
> > >     >
> > >     >        Elf64_Word     st_name;
> > >     >
> > >     >        unsigned char  st_info;
> > >     >
> > >     >        unsigned char  st_other;
> > >     >
> > >     >        Elf64_Half     st_shndx;
> > >     >
> > >     >        Elf64_Addr     st_value;
> > >     >
> > >     >        Elf64_Xword    st_size;
> > >     >
> > >     > } Elf64_Sym;
> > >     >
> > >     >
> > >     >
> > >     > The st_size and st_value fields could be used to represent
> attribute information about a given symbol:
> > >     >
> > >     >
> > >     >
> > >     > The st_size field can be split into an attribute ID and a symbol
> index for the symbol that the attribute applies to
> > >     >
> > >     > attribute ID: bits 0..7
> > >     > symbol index: bits 8..31
> > >     >
> > >     > The st_value field can contain the value associated with the
> attribute (i.e. the address argument of a location attribute)
> > >     >
> > >     >
> > >     >
> > >     > If the compiler is generating assembly code, a new directive
> similar to the .eabi_attribute can be used:
> > >     >
> > >     >
> > >     >
> > >     >         .symbol_attribute <symbol name>, <attribute kind>,
> <attribute value>
> > >     >
> > >     >
> > >     >
> > >     > Where:
> > >     >
> > >     > symbol name - will unambiguously identify the symbol that the
> attribute/value pair applies to
> > >     > attribute kind - is an unsigned integer between 1 and 255 that
> specifies the kind of attribute to be applied to the symbol
> > >     >
> > >     > I propose a starting base set of 2 attribute IDs: used (1),
> location (2)
> > >     > the compiler will emit the integer constant that identifies the
> attribute kind
> > >     >
> > >     > attribute value - a value that is appropriate for the specified
> attribute kind
> > >     >
> > >     >
> > >     >
> > >     > Thoughts? Comments? Concerns?
> > >     >
> > >
> > >     Hello Todd,
> > >
> > >     Thanks for bringing this up, I've got a few comments for you based
> on
> > >     the implementation of a similar attribute in another Embedded
> Compiler
> > >     (
> http://infocenter.arm.com/help/topic/com.arm.doc.dui0472m/chr1359124981140.html
> ).
> > >      In that case it was __attribute__((at(address))) but the name is
> not
> > >     that important.
> > >
> > >     The communication with the linker in that case was via section name
> > >     and not symbol, from memory at(<address>) translated to a section
> name
> > >     of .ARM.__at_<address>. For us this had some advantages:
> > >     - We could use __attribute__((section(".ARM.__at_<address>")))
> when
> > >     the compiler didn't support the attribute, it also needed no
> support
> > >     in the assembler. This wasn't ideal as it is nice to be able to use
> > >     expressions for the address, but it gets you most of the way there.
> > >     - In practice you'd likely need a separate section for each
> variable
> > >     to avoid problems at link time. For example if you had two
> variables
> > >     with non-contiguous locations you'd most likely not want these in
> the
> > >     same section so this mapped quite well to something similar to
> > >     __attribute__((section(name))).
> > >     - We did find some properties of __attribute__((section("name")))
> > >     inconvenient, especially that variables would come out as
> SHT_PROGBITS
> > >     when in many cases the user wanted SHT_NOBITS (memory mapped
> > >     peripheral), we had our custom attribute fix that.
> > >
> > >     If you used a section name rather than a symbol then you may not
> need
> > >     any backend changes and it would generalise over all ELF targets.
> > >     Linker support is another question entirely though.
> > >
> > >     Peter
> > >
> > >     >
> > >     >
> > >     > The anticipated next steps would be to add support for the
> location attribute and update the ARM/ELF LLVM back-end to support encoding
> the used attribute with the new mechanism.
> > >     >
> > >     >
> > >     >
> > >     > ~ Todd Snider
> > >     >
> > >     >
> > >     >
> > >     > Code Generation Tools Group
> > >     >
> > >     > Texas Instruments Incorporated
> > >     >
> > >     >
> > >     >
> > >     >
> > >     >
> > >     > _______________________________________________
> > >     > LLVM Developers mailing list
> > >     > llvm-dev at lists.llvm.org
> > >     > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >     _______________________________________________
> > >     LLVM Developers mailing list
> > >     llvm-dev at lists.llvm.org
> > >     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> > >
> > > IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > --
> > Hal Finkel
> > Lead, Compiler Technology and Programming Languages
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190509/4035838c/attachment-0001.html>


More information about the llvm-dev mailing list