[llvm-dev] RFC: Absolute or "fixed address" symbols as immediate operands

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 4 17:37:19 PDT 2016


On Fri, Nov 4, 2016 at 2:15 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:

>
>
> On Wed, Oct 26, 2016 at 10:45 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>> On Wed, Oct 26, 2016 at 9:48 PM, Chris Lattner <clattner at apple.com>
>> wrote:
>>
>>> On Oct 26, 2016, at 1:34 AM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>> On Tue, Oct 25, 2016 at 10:48 PM, Chris Lattner <clattner at apple.com>
>>> wrote:
>>>
>>>> Responding to both of your emails in one, sorry for the delay:
>>>>
>>>> On Oct 25, 2016, at 11:20 AM, Peter Collingbourne <peter at pcc.me.uk>
>>>> wrote:
>>>> I think there are a couple of additional considerations we should make
>>>> here:
>>>>
>>>>    - What are we trying to model? To me it's clear that GlobalConstant
>>>>    is for modelling integers, not pointers. That alone may not necessarily be
>>>>    enough to motivate a representational change, but…
>>>>
>>>> I understand where you’re coming from, but I think we’re modeling three
>>>> different things, and disagreeing about how to clump them together.  The
>>>> three things I see in flight are:
>>>>
>>>> 1) typical globals that are laid out in some unknown way in the address
>>>> space.
>>>> 2) globals that may be tied to a specific knowable address range due to
>>>> a limited compilation model (e.g. a deeply embedded core) that fits into an
>>>> immedaite range (e.g. 0…255, 0…65536, etc).
>>>> 3) Immediates that are treated as symbolic for CFI’s perspective (so
>>>> they can’t just be used as a literal immediate) that are resolved at link
>>>> time, but are known to have limited range.
>>>>
>>>> There is also "4) immediates with an obvious known value”, but those
>>>> are obviously ConstantInt’s and not interesting to discuss here.
>>>>
>>>> The design I’m arguing for is to clump #2 and #3 into the same group.
>>>>
>>>
>>> I am not sure if this is sound if we want the no-alias assumption (see
>>> also below) to hold for #2 but not for #3.
>>>
>>>
>>>>   This can be done one of two different ways, but both ways use the
>>>> same “declaration side” reference, which has a !range metadata attached to
>>>> it.  The three approaches I see are:
>>>>
>>>> a) Introduce a new GlobalConstant definition, whose value is the
>>>> concrete address that the linker should resolve.
>>>> b) Use an alias as the definition, whose body is a ptrtoint constant of
>>>> the same value.
>>>> c) Use a zero size globalvariable with a range metadata specifying the
>>>> exact address decided.
>>>>
>>>> I’m not very knowledgable about why approach b won’t work, but if it
>>>> could, it seems preferable because it fits in with our current model.
>>>>
>>>
>>> b would work in that it would give us the right bits in the object file,
>>> but it would be a little odd to use a different type for declarations as
>>> for definitions. That said, I don't have a strong objection to it.
>>>
>>>
>>> I can understand what you’re saying here, but this is already the case
>>> for aliases.  You can never have a “declaration side” for an alias that is
>>> an alias (you have to use an external global variable or a function with no
>>> body).
>>>
>>> From the discussion over the last day it sounds to me that “b” is the
>>> best approach, except for the (significant) annoyance that these things can
>>> be possibly aliased.  However, I don’t understand how this works in
>>> practice today for aliases.  By their very name, they are *all about*
>>> introducing aliases, so how is AA allowed to assume that two external
>>> global variable references are unaliased anyway?  One may be resolved as an
>>> alias to the other afterall, completely independent of your proposal.
>>>
>>
>> I suppose that one way to think about it is that by using aliases you are
>> stepping outside of the bounds of the language, i.e. no valid C/C++
>> declaration can be used to take the address of an alias without using
>> reserved names or language extensions (clang uses aliases to implement some
>> standard language features but they all have reserved names as far as I'm
>> aware).
>>
>> Maybe this hasn't come up simply because language implementations (and
>> users of language extensions) happen to never use aliases in a way that
>> could expose the AA assumption.
>>
>
> Further to this, I think there are three things in play here:
> - "absolute": this primarily controls code generation, i.e. we need to
> know whether to emit absolute or relative relocations in PIC mode
> - "range" i.e. the range of the "address": this also controls code
> generation (used for selecting the narrowest possible relocation type) but
> could also affect midend and backend optimizers (e.g. computeKnownBits).
> This is !range metadata as in D25878 but also in principle could be
> modelled as a value with an integer type of a specific width.
> - "mayalias", i.e. whether the address may be an alias on the definition
> side. That could include an alias of a real global object or an absolute
> symbol formed with inttoptr. This is fundamentally a midend attribute that
> could be used by AA for example. In practice a global with mayalias should
> be treated by the midend like a pointer obtained by calling an external
> readnone function.
>
> As a preliminary step I think we can merge "absolute" and "range", i.e. we
> can't possibly know the range unless we also know that we can use absolute
> relocations. This is as implemented in D25878.
>
> So let's look at !range and mayalias and see how they interact:
>
> - !range alone: this is the "linker script" scenario where something
> external to the object provides some absolute memory mapping
> - mayalias alone: this could be used by language frontends that allow
> aliases at the language level
> - mayalias + !range: there are a couple of use cases a) the combination of
> the above two cases, or b) the sort of absolute constant references I'd
> like to have for CFI.
>
> Looking more closely at the third case, for part a pointers are a more
> accurate modelling and in part b integers are. Given that our model permits
> either pointers and integers in this specific case I would be prepared to
> accept that pointer modelling would be sufficient for b in order to avoid
> needing to model parts a and b differently.
>
> To be clear, what I think we should do at this point is to extend
> GlobalVariable with a mayalias attribute. This would overcome the modeling
> issue with D25878 and at that point we would be able to move forward with
> it. As Chris proposed, aliases would be used to model definitions.
>

Per offline discussion with Chris and Eli: because the absolute constants
are not dereferenceable, there's no real impact on AA. The one case I
raised which could be a problem was equality comparisons, i.e. whether

@a = external global i8, !range !0
@b = external global i8, !range !0

define i1 @foo() {
  ret i1 icmp eq (i8* @a, i8* @b)
}

!0 = !{i64 0, i64 256}

could be simplified to "ret i1 false". However, my understanding is that AA
should have no impact on these comparisons.

So at this point there's no real need for a mayalias attribute for the
purposes of this feature, and what we're left with is exactly D25878. I'll
go ahead and refresh that patch.

Thanks,
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161104/69796378/attachment.html>


More information about the llvm-dev mailing list