[LLVMdev] Reducing Generic Address Space Usage

Philip Reames listmail at philipreames.com
Wed Mar 26 14:10:46 PDT 2014


On 03/25/2014 02:31 PM, Jingyue Wu wrote:
> This is a follow-up discussion on 
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html. 
> The front-end change was already pushed in r204677, so we want to 
> continue with the IR optimization.
>
> In general, we want to write an IR pass to convert generic address 
> space usage to non-generic address space usage, because accessing the 
> generic address space in CUDA and OpenCL is significantly slower than 
> accessing non-generic ones (such as shared and constant),.
>
> Here is an example Justin gave:
>
> %ptr = ...
> %val = load i32* %ptr
>
> In this case, %ptr is a generic address space pointer (assuming an 
> address space mapping where 0 is generic). But if an analysis can 
> prove that the pointer %ptr was originally addrspacecast'd from a 
> specific address space (or some other mechanism through which the 
> pointer's specific address space can be determined), it may be 
> beneficial to explicitly convert the IR to something like:
>
> %ptr = ...
> %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
> %val = load i32 addrspace(3)* %ptr.0
>
> Such a translation may generate better code for some targets.
Just a note of caution: for some of us, address spaces are semantically 
important.  (i.e. having a cast introduced from one to another would be 
incorrect)  I have no problem with the mechanism you're describing being 
implemented, but it needs to be an opt in feature.

>
> There are two major design decisions we need to make:
>
> 1. Where does this pass live? Target-independent or target-dependent?
>
> Both NVPTX and R600 backend want this optimization, which seems a good 
> justification for making this optimization target-independent.
>
> However, we have three concerns on this:
> a) I doubt this optimization is valid for all targets, because LLVM 
> language reference 
> (http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction) says 
> addrspacecast "can be a no-op cast or a complex value modification, 
> depending on the target and the address space pair."
> b) NVPTX and R600 have different address numbering for the generic 
> address space, which makes things more complicated.
> c) We don't have a good understanding of the R600 backend.
>
> Therefore, I would vote for making this optimization NVPTX-specific 
> for now. If other targets need this, we can later think about how to 
> reuse the code.
No opinion, but if it is target independent, it needs to be behind an 
optin target hook.
>
> 2. How effective do we want this optimization to be?
>
> In the short term, I want it to be able to eliminate unnecessary 
> non-generic-to-generic addrspacecasts the front-end generates for the 
> NVPTX target. For example,
>
> %p1 = addrspace i32 addrspace(3)* %p0 to i32*
> %v = load i32* %p1
>
> =>
>
> %v = load i32 addrspace(3)* %p0
>
> We want similar optimization for store+addrspacecast and 
> gep+addrspacecast as well.
>
> In a long term, we could for sure improve this optimization to handle 
> more instructions and more patterns.
Just to note, this last bit raises much less worries for me about 
correctness of my work.  If you've loading from a pointer which was in 
different address space, it seems very logical to combine that with the 
load.  We'd also never generate code like that.  :)

To restate my concern in general terms, it's the introduction of *new* 
casts which worry me, not the exploitation/optimization of existing ones.

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140326/1748fee2/attachment.html>


More information about the llvm-dev mailing list