[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 4 14:15:37 PDT 2016


On Wed, Oct 26, 2016 at 10:45 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:

> On Wed, Oct 26, 2016 at 9:48 PM, Chris Lattner <clattner at apple.com> wrote:
>
>> On Oct 26, 2016, at 1:34 AM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>
>> On Tue, Oct 25, 2016 at 10:48 PM, Chris Lattner <clattner at apple.com>
>> wrote:
>>
>>> Responding to both of your emails in one, sorry for the delay:
>>>
>>> On Oct 25, 2016, at 11:20 AM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>> I think there are a couple of additional considerations we should make
>>> here:
>>>
>>>    - What are we trying to model? To me it's clear that GlobalConstant
>>>    is for modelling integers, not pointers. That alone may not necessarily be
>>>    enough to motivate a representational change, but…
>>>
>>> I understand where you’re coming from, but I think we’re modeling three
>>> different things, and disagreeing about how to clump them together.  The
>>> three things I see in flight are:
>>>
>>> 1) typical globals that are laid out in some unknown way in the address
>>> space.
>>> 2) globals that may be tied to a specific knowable address range due to
>>> a limited compilation model (e.g. a deeply embedded core) that fits into an
>>> immedaite range (e.g. 0…255, 0…65536, etc).
>>> 3) Immediates that are treated as symbolic for CFI’s perspective (so
>>> they can’t just be used as a literal immediate) that are resolved at link
>>> time, but are known to have limited range.
>>>
>>> There is also "4) immediates with an obvious known value”, but those are
>>> obviously ConstantInt’s and not interesting to discuss here.
>>>
>>> The design I’m arguing for is to clump #2 and #3 into the same group.
>>>
>>
>> I am not sure if this is sound if we want the no-alias assumption (see
>> also below) to hold for #2 but not for #3.
>>
>>
>>>   This can be done one of two different ways, but both ways use the same
>>> “declaration side” reference, which has a !range metadata attached to it.
>>> The three approaches I see are:
>>>
>>> a) Introduce a new GlobalConstant definition, whose value is the
>>> concrete address that the linker should resolve.
>>> b) Use an alias as the definition, whose body is a ptrtoint constant of
>>> the same value.
>>> c) Use a zero size globalvariable with a range metadata specifying the
>>> exact address decided.
>>>
>>> I’m not very knowledgable about why approach b won’t work, but if it
>>> could, it seems preferable because it fits in with our current model.
>>>
>>
>> b would work in that it would give us the right bits in the object file,
>> but it would be a little odd to use a different type for declarations as
>> for definitions. That said, I don't have a strong objection to it.
>>
>>
>> I can understand what you’re saying here, but this is already the case
>> for aliases.  You can never have a “declaration side” for an alias that is
>> an alias (you have to use an external global variable or a function with no
>> body).
>>
>> From the discussion over the last day it sounds to me that “b” is the
>> best approach, except for the (significant) annoyance that these things can
>> be possibly aliased.  However, I don’t understand how this works in
>> practice today for aliases.  By their very name, they are *all about*
>> introducing aliases, so how is AA allowed to assume that two external
>> global variable references are unaliased anyway?  One may be resolved as an
>> alias to the other afterall, completely independent of your proposal.
>>
>
> I suppose that one way to think about it is that by using aliases you are
> stepping outside of the bounds of the language, i.e. no valid C/C++
> declaration can be used to take the address of an alias without using
> reserved names or language extensions (clang uses aliases to implement some
> standard language features but they all have reserved names as far as I'm
> aware).
>
> Maybe this hasn't come up simply because language implementations (and
> users of language extensions) happen to never use aliases in a way that
> could expose the AA assumption.
>

Further to this, I think there are three things in play here:
- "absolute": this primarily controls code generation, i.e. we need to know
whether to emit absolute or relative relocations in PIC mode
- "range" i.e. the range of the "address": this also controls code
generation (used for selecting the narrowest possible relocation type) but
could also affect midend and backend optimizers (e.g. computeKnownBits).
This is !range metadata as in D25878 but also in principle could be
modelled as a value with an integer type of a specific width.
- "mayalias", i.e. whether the address may be an alias on the definition
side. That could include an alias of a real global object or an absolute
symbol formed with inttoptr. This is fundamentally a midend attribute that
could be used by AA for example. In practice a global with mayalias should
be treated by the midend like a pointer obtained by calling an external
readnone function.

As a preliminary step I think we can merge "absolute" and "range", i.e. we
can't possibly know the range unless we also know that we can use absolute
relocations. This is as implemented in D25878.

So let's look at !range and mayalias and see how they interact:

- !range alone: this is the "linker script" scenario where something
external to the object provides some absolute memory mapping
- mayalias alone: this could be used by language frontends that allow
aliases at the language level
- mayalias + !range: there are a couple of use cases a) the combination of
the above two cases, or b) the sort of absolute constant references I'd
like to have for CFI.

Looking more closely at the third case, for part a pointers are a more
accurate modelling and in part b integers are. Given that our model permits
either pointers and integers in this specific case I would be prepared to
accept that pointer modelling would be sufficient for b in order to avoid
needing to model parts a and b differently.

To be clear, what I think we should do at this point is to extend
GlobalVariable with a mayalias attribute. This would overcome the modeling
issue with D25878 and at that point we would be able to move forward with
it. As Chris proposed, aliases would be used to model definitions.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161104/9eb08678/attachment.html>


More information about the llvm-dev mailing list