[cfe-dev] [LLVMdev] Language-specific vs target-specific address spaces (was Re: [PATCH] OpenCL support - update on keywords)

Wed Mar 2 06:38:55 PST 2011

On Tue, Mar 1, 2011 at 4:06 PM, David Neto wrote:
> On Mon, Feb 28, 2011 at 4:41 PM, Peter Collingbourne wrote:
>>
>> The more I think about it, the more I become uncomfortable with the
>> concept of language-specific address spaces in LLVM.  These are the
>> main issues I see with language-specific address spaces:
>
> ...
>
>> Instead of language-specific address spaces, each target should
>> concentrate on exposing all of its address spaces as target-specific
>> address spaces, and frontends should use a language -> target mapping
>> in target-specific code.  We can continue to expose the target's main
>> shared writable address space as address space 0 as we do now.
>>
>> For example, Clang could define a set of internal address space
>> constants for OpenCL and use TargetCodeGenInfo to provide the mapping
>> to target address spaces.
>
> In principle this is a fine idea.
>
> I think the difficulty is that LLVM and Clang provide an
> infrastructure for numbered address spaces, but no standard assignment
> on top of that infrastructure.

You can trace back the origins of the addrspace attribute in the
mailing list archives to this thread:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011385.html.
>From there, it is pretty clear that addrspace was introduced
specifically as a mechanism for implementing the 'named address space'
extensions defined in the Embedded C standard (ISO/IEC TR 18037,
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf).

The Embedded C standard gives this overview of the 'named address
space' extension:

  Many embedded processors have multiple distinct banks of memory
  and require that data be grouped in different banks to achieve
  maximum performance.  Ensuring the simultaneous flow of data and
  coefficient data to the multiplier/accumulator of processors
  designed for FIR filtering, for example, is critical to their
  operation.  In order to allow the programmer to declare the memory
  space from which a specific data object must be fetched, this
  Technical Report specifies basic support for multiple address
  spaces.  As a result, optimizing compilers can utilize the ability
  of processors that support multiple address spaces, for instance,
  to read data from two separate memories in a single cycle to
  maximize execution speed.

If you dig into the Embedded C standard, you'll find that the 'named
address space' extension is highly target-specific. It is only
portable insofar as two target processors have similar memory
organization and use identical names for their address spaces.

So the reason that there aren't any conventions for the address space
numbers in clang/llvm is because there aren't any conventions for how
chip designers incorporate memories into the architectures that they
design.

The one convention that the Embedded C standard does specify is that
when the address space of a type is unspecified, the type is assumed
to be in the 'generic' space. Clang currently emits an address space
of zero in this case. Arguably, LLVM could define a single enum value,
GENERIC, for use by the code generators.

> The trick is define some conventions,
> e.g. what the numbers might mean for a language front-end, and whether
> the interpretation of the numbers change as the IR moves to later
> stages.  We're working in a bit of a vacuum.
>
> ...
>
> So I think we need a couple of things:
> - proposals for number assignments and their associated semantics.
> - code to flesh out and embody those semantics. e.g. a sample
> implementation / translation layer

In my opinion, any knowledge that front ends have of address spaces
should be dictated by the target's back end. Perhaps we should add
some virtual methods to LLVM's TargetMachine interface so front ends
can query the back end for the names and numbers of the address spaces
that they recognize, and expose them to end users in a standard way.
But having front ends impose the requirement on back ends that they
recognize some arbitrary set of language-specific address spaces seems
like a great misuse of the feature to me for reasons that Peter has
already pointed out.

> Basically Anton got the ball rolling: his code patch was a bit of
> both.  And I think he's planning to post a number of OpenCL proposals
> in general.

It seems to me, as Speziale already pointed out, that the OpenCL type
qualifiers aren't address space qualifiers at all (in the Embedded C
sense). They might be better implemented as a separate set of
qualifiers in the way that Objective-C defines its garbage-collection
qualifiers, __strong and __weak. See the Qualifiers class in
AST/Type.h.

> As it is, I hope that backends that do not understand address spaces
> at all know to error out when they receive IR that uses address
> spaces.

This is currently not the case. The back ends for architectures that
don't have multiple address spaces simply ignore the address space
number on the address operands of load and store nodes. The back ends
that do support multiple address spaces treat any address space number
that they don't recognize in the same way that they address space 0.

-Ken