[LLVMdev] Reducing Generic Address Space Usage

Tue Mar 25 14:31:05 PDT 2014

This is a follow-up discussion on
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html.
The front-end change was already pushed in r204677, so we want to continue
with the IR optimization.

In general, we want to write an IR pass to convert generic address space
usage to non-generic address space usage, because accessing the generic
address space in CUDA and OpenCL is significantly slower than accessing
non-generic ones (such as shared and constant),.

Here is an example Justin gave:

  %ptr = ...
  %val = load i32* %ptr

In this case, %ptr is a generic address space pointer (assuming an address
space mapping where 0 is generic).  But if an analysis can prove that the
pointer %ptr was originally addrspacecast'd from a specific address space
(or some other mechanism through which the pointer's specific address space
can be determined), it may be beneficial to explicitly convert the IR to
something like:

  %ptr = ...
  %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
  %val = load i32 addrspace(3)* %ptr.0

Such a translation may generate better code for some targets.

There are two major design decisions we need to make:

1. Where does this pass live? Target-independent or target-dependent?

Both NVPTX and R600 backend want this optimization, which seems a good
justification for making this optimization target-independent.

However, we have three concerns on this:
a) I doubt this optimization is valid for all targets, because LLVM
language reference (
http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says
addrspacecast "can be a no-op cast or a complex value modification,
depending on the target and the address space pair."
b) NVPTX and R600 have different address numbering for the generic address
space, which makes things more complicated.
c) We don't have a good understanding of the R600 backend.

Therefore, I would vote for making this optimization NVPTX-specific for
now. If other targets need this, we can later think about how to reuse the
code.

2. How effective do we want this optimization to be?

In the short term, I want it to be able to eliminate unnecessary
non-generic-to-generic addrspacecasts the front-end generates for the NVPTX
target. For example,

%p1 = addrspace i32 addrspace(3)* %p0 to i32*
%v = load i32* %p1

=>

%v = load i32 addrspace(3)* %p0

We want similar optimization for store+addrspacecast and gep+addrspacecast
as well.

In a long term, we could for sure improve this optimization to handle more
instructions and more patterns.

Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140325/12e4a145/attachment.html>