[LLVMdev] Reducing Generic Address Space Usage
Philip Reames
listmail at philipreames.com
Wed Mar 26 14:10:46 PDT 2014
On 03/25/2014 02:31 PM, Jingyue Wu wrote:
> This is a follow-up discussion on
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html.
> The front-end change was already pushed in r204677, so we want to
> continue with the IR optimization.
>
> In general, we want to write an IR pass to convert generic address
> space usage to non-generic address space usage, because accessing the
> generic address space in CUDA and OpenCL is significantly slower than
> accessing non-generic ones (such as shared and constant),.
>
> Here is an example Justin gave:
>
> %ptr = ...
> %val = load i32* %ptr
>
> In this case, %ptr is a generic address space pointer (assuming an
> address space mapping where 0 is generic). But if an analysis can
> prove that the pointer %ptr was originally addrspacecast'd from a
> specific address space (or some other mechanism through which the
> pointer's specific address space can be determined), it may be
> beneficial to explicitly convert the IR to something like:
>
> %ptr = ...
> %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
> %val = load i32 addrspace(3)* %ptr.0
>
> Such a translation may generate better code for some targets.
Just a note of caution: for some of us, address spaces are semantically
important. (i.e. having a cast introduced from one to another would be
incorrect) I have no problem with the mechanism you're describing being
implemented, but it needs to be an opt in feature.
>
> There are two major design decisions we need to make:
>
> 1. Where does this pass live? Target-independent or target-dependent?
>
> Both NVPTX and R600 backend want this optimization, which seems a good
> justification for making this optimization target-independent.
>
> However, we have three concerns on this:
> a) I doubt this optimization is valid for all targets, because LLVM
> language reference
> (http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says
> addrspacecast "can be a no-op cast or a complex value modification,
> depending on the target and the address space pair."
> b) NVPTX and R600 have different address numbering for the generic
> address space, which makes things more complicated.
> c) We don't have a good understanding of the R600 backend.
>
> Therefore, I would vote for making this optimization NVPTX-specific
> for now. If other targets need this, we can later think about how to
> reuse the code.
No opinion, but if it is target independent, it needs to be behind an
optin target hook.
>
> 2. How effective do we want this optimization to be?
>
> In the short term, I want it to be able to eliminate unnecessary
> non-generic-to-generic addrspacecasts the front-end generates for the
> NVPTX target. For example,
>
> %p1 = addrspace i32 addrspace(3)* %p0 to i32*
> %v = load i32* %p1
>
> =>
>
> %v = load i32 addrspace(3)* %p0
>
> We want similar optimization for store+addrspacecast and
> gep+addrspacecast as well.
>
> In a long term, we could for sure improve this optimization to handle
> more instructions and more patterns.
Just to note, this last bit raises much less worries for me about
correctness of my work. If you've loading from a pointer which was in
different address space, it seems very logical to combine that with the
load. We'd also never generate code like that. :)
To restate my concern in general terms, it's the introduction of *new*
casts which worry me, not the exploitation/optimization of existing ones.
Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140326/1748fee2/attachment.html>
More information about the llvm-dev
mailing list