[cfe-dev] [RFC] Improved address space conversion semantics for targets and language dialects

Fri Mar 8 06:45:08 PST 2019

> The problem with modeling the address space conversion semantics on
> superspaces and subspaces is that these two concepts are orthogonal. A
> target or language could have address spaces which 'overlap' in some
> way, yet disallow implicit or explicit conversion between them. It
> could also have address spaces which do not overlap, but for which
> explicit or implicit conversion is permitted.

I am trying to understand how this could work. I think the current definition
of overlapping in embedded C TC expresses logical overlapping, but not
necessary physical one. My understanding is that if address spaces overlap
logically they can be converted explicitly in both directions and might be
converted implicitly (depending on whether one address space is a superset
of another). Logical overlapping can imply that memory segments physically
overlap but might not. In the latter case it's just a logical concept to simplify
programming or compiler implementation. That's how we are using generic
address space in OpenCL for example, that isn't a physical memory segment.
So I am trying to understand why something that doesn’t overlap (either 
logically or physically) would still be convertible? I just found the current logic
with overlapping quite useful in various places for C++ (i.e. overload resolution
where subset is preferable to superset) and it might have wider implications in
case we are to change those.

Also (may be it belongs to a separate discussion though) for C++ specifically
generic address space becomes really key because it's used for implementing
hidden 'this' parameter/expression. I am not quite convinced the current
semantic of it taken from C is sufficient. Because it isn't the same as default
address space where implementation decides to put objects by default but it is
an address space to which every other should be allowed to convert, unless
there is a good reason not to (i.e. logical superset of all or most of the other
address spaces). If there isn't such address space... the application developer
will be forced to write all the implicit operations/methods for each address space
in which a class variable can be declared. It is quite impractical, especially if
there is no special logic needed for different address spaces!

> The method would initially consult any language address space
> conversion rules (such as conversion rules in OpenCL), and if no such
> rules apply, proceed to fall back on a TargetInfo hook.

> The TargetInfo hook would have the same format as the ASTContext
> method, but would return the validity of the conversion for the
> particular compilation target. The default behavior of this hook would
> be that all implicit address space conversions are disallowed, and all
> explicit conversions are permitted.

> (An alternative setup here would be that the ASTContext method queries
> the TargetInfo directly, and have the language semantics be defined in
> the TargetInfo base method instead. This would let targets override
> language semantics. I don't know if this is necessary, or desirable.)

I would vote against target rules ever overriding language ones. This
reduces portability of code among targets which defeats the purpose of
the language mode in my view.

The first idea makes sense to me i.e. use target rules if any of address
spaces is larger than FirstTargetAddressSpace otherwise language rules
should be used.

> * A patch which replaces the currently used methods for address space
> compatibility mentioned earlier (isAddressSpaceSupersetOf,
> isAddressSpaceOverlapping) with calls to the new methods in ASTContext.

> There are some other users of these methods, such as
> Qualifiers::compatiblyIncludes, but it's not entirely clear how to
> update these as a Qualifiers does not have access to ASTContext.

Wondering if it could migrate to ASTContext or it can take ASTContext as
parameter. Although both might cause the layering violations. :(

> * Possibly a patch to remove the old address space compatibility
> methods, if it can be determined that they are no longer needed.

I would prefer to migrate to the new implementation completely
instead of maintaining 2 approaches. We just need to make sure that the
new logic accommodates the old one with the new functionality if needed.
This should work and makes code base more readable and maintainable.
However, if it's not possible to do this directly at the start we can gradually
replace it.

Overall, this work would be a good step forward towards better upstream
support for embedded and heterogeneous devices. One extra thought I have
(again it might belong to a separate RCF) is whether providing some way to
define address space compatibility rules using some sort of syntax in a language
makes sense? The application of this would be (a) portability of code across
different compilers (b) portability of code among accelerators in the same domain
(i.e. ML, Graphics, ...).

Thanks,
Anastasia

From: cfe-dev <cfe-dev-bounces at lists.llvm.org> on behalf of Bevin Hansson via cfe-dev <cfe-dev at lists.llvm.org>
Sent: 06 March 2019 18:11
To: cfe-dev at lists.llvm.org
Subject: [cfe-dev] [RFC] Improved address space conversion semantics for targets and language dialects

== Introduction ==

During the work that Anastasia has been doing on enabling OpenCL C++, 
points have been raised about the state of address space support in 
Clang. Currently, this support is rather ad-hoc. The representation of 
address spaces in qualifiers and the lowering of Clang address spaces 
to their LLVM counterparts are sound, but the behavioral semantics of 
address spaces given by the Embedded C TR are not really sufficient to 
model address space behaviors for arbitrary target architectures.

Here are some of the reviews in which this has come up:
 * https://reviews.llvm.org/D58346
 * https://reviews.llvm.org/D57464

Many address space semantics are locked behind the OpenCL language 
option, even though those semantics would likely be applicable to 
non-OpenCL cases as well. This means that, when not using any 
particular address space-using language dialect, the address space 
semantics are far too loosely defined. When using address spaces 
outside of the ones defined in LangAS (the 'target' address spaces), 
you can convert between any two address spaces explicitly, even though 
this might not make sense on a particular target. There is no way for a 
target to define which address spaces are compatible with each other.

Technically, this behavior is in accordance with the Embedded-C TR 
(explicitly converting between all address spaces is allowed, but 
undefined if they aren't compatible), but I do not believe this 
behavior is meaningful. If a target's address spaces are disjoint, 
there is no reason to let a user convert between them, even with a 
cast.

In order to make the support for address spaces more complete, general 
and also useful for targets with a need to define more specific rules 
for their address spaces, a generalization of the conversion semantics 
for address spaces is needed.

== Current implementation ==

Currently, address space compatibility is defined in terms of 
superspaces. An address space can encompass others, in which case it 
would be considered a superset/superspace of the other address spaces.

Given two address spaces, Super and Sub, where Super is a superspace of 
Sub, then it is valid to implicitly convert a `Sub T*` to a `Super T*`, 
as all pointers to Sub are encompassed by pointers to Super. It is not 
necessarily safe to implicitly convert in the other direction. Also, an 
address space is a superspace of itself. 

This is currently implemented in Qualifiers::isAddressSpaceSupersetOf. 
The OpenCL __generic address space is the superspace of all other 
address spaces except for the OpenCL __constant address space. This 
method is used when checking pointer compatibility during assignment 
(and other forms of initialization).

Explicitly converting (casting) between two address spaces is permitted 
if either of them is a superspace of the other. This is implemented in 
Qualifiers::isAddressSpaceOverlapping. This check is only done in 
OpenCL mode; when using address spaces in regular C, explicit 
conversion is always permitted.

== Issues ==

The problem with modeling the address space conversion semantics on 
superspaces and subspaces is that these two concepts are orthogonal. A 
target or language could have address spaces which 'overlap' in some 
way, yet disallow implicit or explicit conversion between them. It 
could also have address spaces which do not overlap, but for which 
explicit or implicit conversion is permitted. 

== Suggestion ==

The suggestion in this RFC to improve the way targets and languages in 
Clang can express the semantics of address space conversions is to add 
a mechanism to ASTContext and TargetInfo which lets us query if a 
conversion from one address space to another address space is either:

 * invalid
 * valid implicitly
 * valid explicitly

A suggestion for the interface on ASTContext would be

  bool isAddressSpaceConvertible(LangAS From, LangAS To, bool Explicit)

The method would initially consult any language address space 
conversion rules (such as conversion rules in OpenCL), and if no such 
rules apply, proceed to fall back on a TargetInfo hook.

The TargetInfo hook would have the same format as the ASTContext 
method, but would return the validity of the conversion for the 
particular compilation target. The default behavior of this hook would 
be that all implicit address space conversions are disallowed, and all 
explicit conversions are permitted.

(An alternative setup here would be that the ASTContext method queries 
the TargetInfo directly, and have the language semantics be defined in 
the TargetInfo base method instead. This would let targets override 
language semantics. I don't know if this is necessary, or desirable.)

It's important to point out that implicit validity should imply the
explicit one. If a call to this ASTContext method is made as below, and
returns true:
  Ctx.isAddressSpaceConvertible(From, To, false)
then the following should also return true:
  Ctx.isAddressSpaceConvertible(From, To, true)

If it did not, then `To T* p = from_ptr` would be permitted, but
`To T* p = (To T*)from_ptr` would not be, which is rather
counterintuitive.

== Necessary work == 

The steps to implement this RFC should be as follows:

* A patch which adds the aforementioned methods to ASTContext and 
TargetInfo, and defines the necessary semantics for the default and 
language-specific conversions in them.

* A patch which replaces the currently used methods for address space 
compatibility mentioned earlier (isAddressSpaceSupersetOf, 
isAddressSpaceOverlapping) with calls to the new methods in ASTContext. 

There are some other users of these methods, such as 
Qualifiers::compatiblyIncludes, but it's not entirely clear how to 
update these as a Qualifiers does not have access to ASTContext.

* Possibly a patch to remove the old address space compatibility 
methods, if it can be determined that they are no longer needed.

 Thank you for reading!