[cfe-dev] [RFC] Improved address space conversion semantics for targets and language dialects

Thu Mar 14 10:32:11 PDT 2019

> It's true that the address space overlap semantics can be used for 
> conversion legality (that's how it's used today, after all) but in 
> pretty much all of the locations that use the overlapping/superset 
> accessors today, what we are actually interested in knowing is 
> conversion legality, even for things like overload resolution. If doing 
> such a conversion was not legal, then obviously we cannot consider an 
> overload to be viable, for example. None of the using code really seems 
> interested in knowing about address space overlap per se, so I don't 
> feel like it's the clearest way of asking for the relevant information.

When we rank the overload with various address spaces we use this
logic - subsets are preferred to supersets. This could of course be
changed but this is one place where we actually use this logic. Not sure
if there are more. Apparently the concept fits other rules for qualifiers from
C++. May be the following comment can help to understand more:
https://reviews.llvm.org/D55850#inline-496966

I am just thinking this change might have bigger impact than it seems
originally. But I am not against it of we think it's more intuitive and can simplify
code base.

Also in general it really helps when implementation follows the logic from
specification. It is often the only way to reason about it. Documenting code
sufficiently has always been sensitive aspects. So if we are to switch to
different logic we should be prepared to provide enough documentation for the
developers.

> Well, depending on the address space semantics of a particular target, 
> the developer will be forced to do this anyway. It works in OpenCL 
> because of __generic, but there's no guarantee of there being a 
> 'default/generic' address space to use for 'this' in an arbitrary target 
> or language.

Yes, that's why I am wondering if generic address space should be
introduced in C++ as purely logical address space concept?

> I agree that could be a problem in regular C++. Wasn't there a proposal 
> for letting you template the method qualifiers somehow?

There is this paper

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0847r1.html

but it has an issue  with superfluous template instantiation problem
because the methods are templated on the full qualified type of 'this'
and not just address spaces.

http://lists.llvm.org/pipermail/cfe-dev/2018-December/060545.html

I would quite like to investigate  some solution specific to the address
spaces. However, this approach still solves duplication at the source
level quite well.

> I've been thinking about this as well. I'm not sure if I like the idea 
> of expressing it in the source, though. That would mean that for a 
> particular target or language, you'd always need to include a special 
> header with the AS definitions, which is a bit odd. It would be an 
> extension as well, so I'm not sure how portable across compilers it 
> would be either.

Do you think this can  be simplified by the use of implicit headers in Clang?
This is not uncommon. We include OpenCL builtin function header implicitly
for example.

> One idea I've been contemplating is a TableGen backend that lets you 
> define address space names, keywords and semantics as TableGen 
> definitions. Both definition kinds for languages and targets would 
> exist. Not sure if it's important enough to warrant a new backend, though.

Ok, that could work. However, it's still not portable across different compilers.

Cheers,
Anastasia

From: Bevin Hansson <bevin.hansson at ericsson.com>
Sent: 14 March 2019 09:26
To: Anastasia Stulova; clang-dev developer list
Cc: nd
Subject: Re: [RFC] Improved address space conversion semantics for targets and language dialects

Thank you for the feedback!

On 2019-03-08 15:45, Anastasia Stulova wrote:
>> The problem with modeling the address space conversion semantics on
>> superspaces and subspaces is that these two concepts are orthogonal. A
>> target or language could have address spaces which 'overlap' in some
>> way, yet disallow implicit or explicit conversion between them. It
>> could also have address spaces which do not overlap, but for which
>> explicit or implicit conversion is permitted.
> I am trying to understand how this could work. I think the current definition
> of overlapping in embedded C TC expresses logical overlapping, but not
> necessary physical one. My understanding is that if address spaces overlap
> logically they can be converted explicitly in both directions and might be
> converted implicitly (depending on whether one address space is a superset
> of another). Logical overlapping can imply that memory segments physically
> overlap but might not. In the latter case it's just a logical concept to simplify
> programming or compiler implementation. That's how we are using generic
> address space in OpenCL for example, that isn't a physical memory segment.
> So I am trying to understand why something that doesn’t overlap (either
> logically or physically) would still be convertible? I just found the current logic
> with overlapping quite useful in various places for C++ (i.e. overload resolution
> where subset is preferable to superset) and it might have wider implications in
> case we are to change those.

It's true that the address space overlap semantics can be used for 
conversion legality (that's how it's used today, after all) but in 
pretty much all of the locations that use the overlapping/superset 
accessors today, what we are actually interested in knowing is 
conversion legality, even for things like overload resolution. If doing 
such a conversion was not legal, then obviously we cannot consider an 
overload to be viable, for example. None of the using code really seems 
interested in knowing about address space overlap per se, so I don't 
feel like it's the clearest way of asking for the relevant information.

For overloading, even a complex address space design like A( B( C ) ) ) 
doesn't really necessitate knowing that C is both a subset of A and B. 
If you have two overloads, one for an A 'this' and another for a B 
'this', and you try calling a method on a C T*, then it should simply be 
ambiguous anyway, since there's two conversion sequences of equal rank 
from the original C T*.

>
> Also (may be it belongs to a separate discussion though) for C++ specifically
> generic address space becomes really key because it's used for implementing
> hidden 'this' parameter/expression. I am not quite convinced the current
> semantic of it taken from C is sufficient. Because it isn't the same as default
> address space where implementation decides to put objects by default but it is
> an address space to which every other should be allowed to convert, unless
> there is a good reason not to (i.e. logical superset of all or most of the other
> address spaces). If there isn't such address space... the application developer
> will be forced to write all the implicit operations/methods for each address space
> in which a class variable can be declared. It is quite impractical, especially if
> there is no special logic needed for different address spaces!

Well, depending on the address space semantics of a particular target, 
the developer will be forced to do this anyway. It works in OpenCL 
because of __generic, but there's no guarantee of there being a 
'default/generic' address space to use for 'this' in an arbitrary target 
or language.

I agree that could be a problem in regular C++. Wasn't there a proposal 
for letting you template the method qualifiers somehow?

>
>> The method would initially consult any language address space
>> conversion rules (such as conversion rules in OpenCL), and if no such
>> rules apply, proceed to fall back on a TargetInfo hook.
>> The TargetInfo hook would have the same format as the ASTContext
>> method, but would return the validity of the conversion for the
>> particular compilation target. The default behavior of this hook would
>> be that all implicit address space conversions are disallowed, and all
>> explicit conversions are permitted.
>> (An alternative setup here would be that the ASTContext method queries
>> the TargetInfo directly, and have the language semantics be defined in
>> the TargetInfo base method instead. This would let targets override
>> language semantics. I don't know if this is necessary, or desirable.)
>
> I would vote against target rules ever overriding language ones. This
> reduces portability of code among targets which defeats the purpose of
> the language mode in my view.
>
> The first idea makes sense to me i.e. use target rules if any of address
> spaces is larger than FirstTargetAddressSpace otherwise language rules
> should be used.

Yes, I agree with that.

>> * A patch which replaces the currently used methods for address space
>> compatibility mentioned earlier (isAddressSpaceSupersetOf,
>> isAddressSpaceOverlapping) with calls to the new methods in ASTContext.
>> There are some other users of these methods, such as
>> Qualifiers::compatiblyIncludes, but it's not entirely clear how to
>> update these as a Qualifiers does not have access to ASTContext.
>
> Wondering if it could migrate to ASTContext or it can take ASTContext as
> parameter. Although both might cause the layering violations. :(

Yes, I feel like it would get messy if we have to start passing 
ASTContexts into Qualifiers methods like that. Might as well just call 
something on the ASTContext directly instead.

I think the approach would be to either
* remove the AS check from compatiblyIncludes and go through the uses to 
determine if any of the callees care about address spaces and need to be 
amended, or
* do as you suggest and move the method to ASTContext instead.

They both feel sort of invasive, though.

>> * Possibly a patch to remove the old address space compatibility
>> methods, if it can be determined that they are no longer needed.
> I would prefer to migrate to the new implementation completely
> instead of maintaining 2 approaches. We just need to make sure that the
> new logic accommodates the old one with the new functionality if needed.
> This should work and makes code base more readable and maintainable.
> However, if it's not possible to do this directly at the start we can gradually
> replace it.
Sure, we certainly shouldn't have the same thing implemented twice. I 
simply meant that it might not be reasonable to replace/remove 
everything in a single swoop, so eventually there would be a patch that 
removes the old system.
> Overall, this work would be a good step forward towards better upstream
> support for embedded and heterogeneous devices. One extra thought I have
> (again it might belong to a separate RCF) is whether providing some way to
> define address space compatibility rules using some sort of syntax in a language
> makes sense? The application of this would be (a) portability of code across
> different compilers (b) portability of code among accelerators in the same domain
> (i.e. ML, Graphics, ...).

I've been thinking about this as well. I'm not sure if I like the idea 
of expressing it in the source, though. That would mean that for a 
particular target or language, you'd always need to include a special 
header with the AS definitions, which is a bit odd. It would be an 
extension as well, so I'm not sure how portable across compilers it 
would be either.

One idea I've been contemplating is a TableGen backend that lets you 
define address space names, keywords and semantics as TableGen 
definitions. Both definition kinds for languages and targets would 
exist. Not sure if it's important enough to warrant a new backend, though.

/ Bevin