[cfe-dev] [RFC] Improved address space conversion semantics for targets and language dialects

Mon Mar 18 05:12:11 PDT 2019

On 2019-03-14 18:32, Anastasia Stulova wrote:
>> It's true that the address space overlap semantics can be used for
>> conversion legality (that's how it's used today, after all) but in
>> pretty much all of the locations that use the overlapping/superset
>> accessors today, what we are actually interested in knowing is
>> conversion legality, even for things like overload resolution. If doing
>> such a conversion was not legal, then obviously we cannot consider an
>> overload to be viable, for example. None of the using code really seems
>> interested in knowing about address space overlap per se, so I don't
>> feel like it's the clearest way of asking for the relevant information.
>
> When we rank the overload with various address spaces we use this
> logic - subsets are preferred to supersets. This could of course be
> changed but this is one place where we actually use this logic. Not sure
> if there are more. Apparently the concept fits other rules for qualifiers from
> C++. May be the following comment can help to understand more:
> https://reviews.llvm.org/D55850#inline-496966

Right... But will 'shorter' implicit AS conversions really be considered 
better than 'longer' ones in this case? I could be wrong, but it doesn't 
seem to me like the overload logic makes any considerations about which 
AS conversion is better, just that not having to do an AS conversion is 
better than doing one. OpenCL doesn't really have an example of this 
kind of address space layout, so it's hard to simply test what the 
behavior is.

If there was a '__sublocal' address space which was a subspace of 
__local, and you had overloads for __generic and __local and a pointer 
type of __sublocal, then it would still consider the overload to be 
ambiguous, not prefer the __local overload.

Ultimately, 'isAddressSpaceSupersetOf' is mostly used to mean 'implicit 
conversion between these ASes is permitted', even in the overload 
resolution case. I find that it's easier to understand interfaces when 
their name matches their usage, and that simply doesn't seem to be the 
case for this one.

> I am just thinking this change might have bigger impact than it seems
> originally. But I am not against it of we think it's more intuitive and can simplify
> code base.
>
> Also in general it really helps when implementation follows the logic from
> specification. It is often the only way to reason about it. Documenting code
> sufficiently has always been sensitive aspects. So if we are to switch to
> different logic we should be prepared to provide enough documentation for the
> developers.

This is a very good point. It's certainly the case that both OpenCL and 
Embedded-C express the AS compatibility in terms of the superset and 
subset model.

We could change the new design to simply let targets specify which 
address spaces are supersets, just like in today's model. However, I 
find that this gets a bit confusing when dealing with address spaces 
across targets and languages; if a target wants to encode some kind of 
special OpenCL interoperability, it's a bit odd to say that 'my AS42 is 
a subset of __generic' when in reality, they don't have anything to do 
with each other. Or maybe it's not as strange as I feel it is?

It also means that explicit conversion would always be permitted if 
implicit conversion is permitted in one of the directions. In other 
words, targets wouldn't be able to configure explicit conversion for 
their address spaces; the behavior would always be implied. This makes 
it hard to keep the current C/C++ address space model (no implicit 
conversions, all explicit conversions) in a more generic design. That 
model only works currently because some code is locked behind the OpenCL 
language option.

>
>> Well, depending on the address space semantics of a particular target,
>> the developer will be forced to do this anyway. It works in OpenCL
>> because of __generic, but there's no guarantee of there being a
>> 'default/generic' address space to use for 'this' in an arbitrary target
>> or language.
>
> Yes, that's why I am wondering if generic address space should be
> introduced in C++ as purely logical address space concept?
If this would be a logical concept only, would it be implemented purely 
on the Clang level and not via LLVM addrspaces? I'm not sure how the 
realization of that address space would work out. For targets without 
such an address space, would every use of a pointer to the generic 
address space require a check to determine which 'subspace' the pointer 
belongs to? Would targets have to enumerate their address spaces to 
support this?
>> I agree that could be a problem in regular C++. Wasn't there a proposal
>> for letting you template the method qualifiers somehow?
>
> There is this paper
>
>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0847r1.html
>
> but it has an issue  with superfluous template instantiation problem
> because the methods are templated on the full qualified type of 'this'
> and not just address spaces.
>
> http://lists.llvm.org/pipermail/cfe-dev/2018-December/060545.html
>
> I would quite like to investigate  some solution specific to the address
> spaces. However, this approach still solves duplication at the source
> level quite well.
There's also issues like constructors with address spaces, and operator 
new with address spaces... but perhaps those are better dealt with 
another time.
>
>> I've been thinking about this as well. I'm not sure if I like the idea
>> of expressing it in the source, though. That would mean that for a
>> particular target or language, you'd always need to include a special
>> header with the AS definitions, which is a bit odd. It would be an
>> extension as well, so I'm not sure how portable across compilers it
>> would be either.
> Do you think this can  be simplified by the use of implicit headers in Clang?
> This is not uncommon. We include OpenCL builtin function header implicitly
> for example.

That is a viable idea, but I think that if we would go through the 
trouble of including an implicit header containing 
implementation-specific configuration, then we might as well encode it 
into the target definitions and save on the header in the first place.

I think it makes more sense for a language extension like OpenCL as it 
needs to encode those regardless of target, so having a reusable set of 
definitions is good for that use case. But for a specific target, it's 
less useful.

>
>> One idea I've been contemplating is a TableGen backend that lets you
>> define address space names, keywords and semantics as TableGen
>> definitions. Both definition kinds for languages and targets would
>> exist. Not sure if it's important enough to warrant a new backend, though.
> Ok, that could work. However, it's still not portable across different compilers.
Yes, it's a bit unfortunate. Embedded-C (and other standards that define 
AS support) make most of it implementation-defined, so I think finding a 
method that works across different compilers is tricky...
>
> Cheers,
> Anastasia
>
> From: Bevin Hansson <bevin.hansson at ericsson.com>
> Sent: 14 March 2019 09:26
> To: Anastasia Stulova; clang-dev developer list
> Cc: nd
> Subject: Re: [RFC] Improved address space conversion semantics for targets and language dialects
>    
>
> Thank you for the feedback!
>
> On 2019-03-08 15:45, Anastasia Stulova wrote:
>>> The problem with modeling the address space conversion semantics on
>>> superspaces and subspaces is that these two concepts are orthogonal. A
>>> target or language could have address spaces which 'overlap' in some
>>> way, yet disallow implicit or explicit conversion between them. It
>>> could also have address spaces which do not overlap, but for which
>>> explicit or implicit conversion is permitted.
>> I am trying to understand how this could work. I think the current definition
>> of overlapping in embedded C TC expresses logical overlapping, but not
>> necessary physical one. My understanding is that if address spaces overlap
>> logically they can be converted explicitly in both directions and might be
>> converted implicitly (depending on whether one address space is a superset
>> of another). Logical overlapping can imply that memory segments physically
>> overlap but might not. In the latter case it's just a logical concept to simplify
>> programming or compiler implementation. That's how we are using generic
>> address space in OpenCL for example, that isn't a physical memory segment.
>> So I am trying to understand why something that doesn’t overlap (either
>> logically or physically) would still be convertible? I just found the current logic
>> with overlapping quite useful in various places for C++ (i.e. overload resolution
>> where subset is preferable to superset) and it might have wider implications in
>> case we are to change those.
> It's true that the address space overlap semantics can be used for
> conversion legality (that's how it's used today, after all) but in
> pretty much all of the locations that use the overlapping/superset
> accessors today, what we are actually interested in knowing is
> conversion legality, even for things like overload resolution. If doing
> such a conversion was not legal, then obviously we cannot consider an
> overload to be viable, for example. None of the using code really seems
> interested in knowing about address space overlap per se, so I don't
> feel like it's the clearest way of asking for the relevant information.
>
> For overloading, even a complex address space design like A( B( C ) ) )
> doesn't really necessitate knowing that C is both a subset of A and B.
> If you have two overloads, one for an A 'this' and another for a B
> 'this', and you try calling a method on a C T*, then it should simply be
> ambiguous anyway, since there's two conversion sequences of equal rank
> from the original C T*.
>
>> Also (may be it belongs to a separate discussion though) for C++ specifically
>> generic address space becomes really key because it's used for implementing
>> hidden 'this' parameter/expression. I am not quite convinced the current
>> semantic of it taken from C is sufficient. Because it isn't the same as default
>> address space where implementation decides to put objects by default but it is
>> an address space to which every other should be allowed to convert, unless
>> there is a good reason not to (i.e. logical superset of all or most of the other
>> address spaces). If there isn't such address space... the application developer
>> will be forced to write all the implicit operations/methods for each address space
>> in which a class variable can be declared. It is quite impractical, especially if
>> there is no special logic needed for different address spaces!
> Well, depending on the address space semantics of a particular target,
> the developer will be forced to do this anyway. It works in OpenCL
> because of __generic, but there's no guarantee of there being a
> 'default/generic' address space to use for 'this' in an arbitrary target
> or language.
>
> I agree that could be a problem in regular C++. Wasn't there a proposal
> for letting you template the method qualifiers somehow?
>
>>> The method would initially consult any language address space
>>> conversion rules (such as conversion rules in OpenCL), and if no such
>>> rules apply, proceed to fall back on a TargetInfo hook.
>>> The TargetInfo hook would have the same format as the ASTContext
>>> method, but would return the validity of the conversion for the
>>> particular compilation target. The default behavior of this hook would
>>> be that all implicit address space conversions are disallowed, and all
>>> explicit conversions are permitted.
>>> (An alternative setup here would be that the ASTContext method queries
>>> the TargetInfo directly, and have the language semantics be defined in
>>> the TargetInfo base method instead. This would let targets override
>>> language semantics. I don't know if this is necessary, or desirable.)
>> I would vote against target rules ever overriding language ones. This
>> reduces portability of code among targets which defeats the purpose of
>> the language mode in my view.
>>
>> The first idea makes sense to me i.e. use target rules if any of address
>> spaces is larger than FirstTargetAddressSpace otherwise language rules
>> should be used.
> Yes, I agree with that.
>
>>> * A patch which replaces the currently used methods for address space
>>> compatibility mentioned earlier (isAddressSpaceSupersetOf,
>>> isAddressSpaceOverlapping) with calls to the new methods in ASTContext.
>>> There are some other users of these methods, such as
>>> Qualifiers::compatiblyIncludes, but it's not entirely clear how to
>>> update these as a Qualifiers does not have access to ASTContext.
>> Wondering if it could migrate to ASTContext or it can take ASTContext as
>> parameter. Although both might cause the layering violations. :(
> Yes, I feel like it would get messy if we have to start passing
> ASTContexts into Qualifiers methods like that. Might as well just call
> something on the ASTContext directly instead.
>
> I think the approach would be to either
> * remove the AS check from compatiblyIncludes and go through the uses to
> determine if any of the callees care about address spaces and need to be
> amended, or
> * do as you suggest and move the method to ASTContext instead.
>
> They both feel sort of invasive, though.
>
>>> * Possibly a patch to remove the old address space compatibility
>>> methods, if it can be determined that they are no longer needed.
>> I would prefer to migrate to the new implementation completely
>> instead of maintaining 2 approaches. We just need to make sure that the
>> new logic accommodates the old one with the new functionality if needed.
>> This should work and makes code base more readable and maintainable.
>> However, if it's not possible to do this directly at the start we can gradually
>> replace it.
> Sure, we certainly shouldn't have the same thing implemented twice. I
> simply meant that it might not be reasonable to replace/remove
> everything in a single swoop, so eventually there would be a patch that
> removes the old system.
>> Overall, this work would be a good step forward towards better upstream
>> support for embedded and heterogeneous devices. One extra thought I have
>> (again it might belong to a separate RCF) is whether providing some way to
>> define address space compatibility rules using some sort of syntax in a language
>> makes sense? The application of this would be (a) portability of code across
>> different compilers (b) portability of code among accelerators in the same domain
>> (i.e. ML, Graphics, ...).
> I've been thinking about this as well. I'm not sure if I like the idea
> of expressing it in the source, though. That would mean that for a
> particular target or language, you'd always need to include a special
> header with the AS definitions, which is a bit odd. It would be an
> extension as well, so I'm not sure how portable across compilers it
> would be either.
>
> One idea I've been contemplating is a TableGen backend that lets you
> define address space names, keywords and semantics as TableGen
> definitions. Both definition kinds for languages and targets would
> exist. Not sure if it's important enough to warrant a new backend, though.
>
> / Bevin
>