PATCH: fix clang to emit correct addrspacecast for CUDA

Mon Mar 24 12:14:58 PDT 2014

On 03/24/2014 03:08 PM, Tom Stellard wrote:
> On Mon, Mar 24, 2014 at 03:01:14PM -0400, Justin Holewinski wrote:
>> On 03/24/2014 02:53 PM, Tom Stellard wrote:
>>> On Mon, Mar 24, 2014 at 02:46:06PM -0400, Justin Holewinski wrote:
>>>> I don't have anything against making this a target-independent
>>>> IR-level, as long as no-one complains about it being a core pass.
>>>> Perhaps the pass could only execute if a target explicitly enables a
>>>> flag.  Something like "preferNonGenericPointers".  The default could
>>>> be 'false', and the pass would only modify the IR if the target sets
>>>> it to 'true'.  Of course, this also assumes address space 0 is
>>>> generic.  This is currently true for the in-tree targets and
>>>> CUDA/OpenCL support in Clang, but I don't believe its a set rule
>>>> anywhere.
>>>>
>>> What do mean by 'generic' address space here?
>> In this context, 'generic' means an address space that encompasses
>> all other address spaces.  This is roughly equivalent to the OpenCL
>> 2.0 generic address space.  Is addrspace(0) not a generic address
>> space for R600?
>>
> No, it's not generic.  It's what we use for OpenCL's private address space.
> Also, I missed the beginning of this discussion, which target-independent pass
> does this impact?

None yet :)

The question is whether a pass that converts generic address space usage 
to non-generic address space usage should be target-independent, or 
specific to a particular back-end.  An example would be an IR sequence 
like the following:

   %ptr = ...
   %val = load i32* %ptr

In this case, %ptr is a generic address space pointer (assuming an 
address space mapping where 0 is generic).  But if an analysis can prove 
that the pointer %ptr was originally addrspacecast'd from a specific 
address space (or some other mechanism through which the pointer's 
specific address space can be determined), it may be beneficial to 
explicitly convert the IR to something like:

   %ptr = ...
   %ptr.0 = addrspacecast i32* to i32 addrspace(3)*
   %val = load i32 addrspace(3)* %ptr.0

Such a translation may generate better code for some targets.

>
> -Tom
>
>>> -Tom
>>>
>>>> On 03/24/2014 02:28 PM, Jingyue Wu wrote:
>>>>> I agree with your concern. However, both CUDA and OpenCL (two most
>>>>> popular users of addrspacecast I believe) support generic address
>>>>> space, and could benefit from this optimization Would we end up
>>>>> with duplicated code (at least one for CUDA one for opencl) if we
>>>>> put it in the back-end?
>>>>>
>>>>> Jingyue
>>>>>
>>>>>
>>>>> On Mon, Mar 24, 2014 at 11:22 AM, Justin Holewinski
>>>>> <jholewinski at nvidia.com <mailto:jholewinski at nvidia.com>> wrote:
>>>>>
>>>>>     The hard part would be making this optimization general enough to
>>>>>     be target-independent.  Optimizing to non-zero address spaces may
>>>>>     not make sense for all targets (or even all future versions of
>>>>>     PTX).  I agree that there should be an IR-level optimization for
>>>>>     this, but perhaps its too target-specific and should actually live
>>>>>     in the back-end.
>>>>>
>>>>>
>>>>>     On 03/24/2014 01:05 PM, Jingyue Wu wrote:
>>>>>>     Right. We are aware of this issue, and think it should be
>>>>>>     addressed in the IR optimizer (similar to InstCombineLoadCast and
>>>>>>     InstCombineStoreToCast) instead of clang. Do you think this is an
>>>>>>     appropriate approach? Is this optimization general enough to stay
>>>>>>     in the IR optimizer or target-dependent?
>>>>>>
>>>>>>     Jingyue
>>>>>>
>>>>>>
>>>>>>     On Mon, Mar 24, 2014 at 4:54 AM, Justin Holewinski
>>>>>>     <justin.holewinski at gmail.com
>>>>>>     <mailto:justin.holewinski at gmail.com>> wrote:
>>>>>>
>>>>>>         Hi Jingyue,
>>>>>>
>>>>>>         I committed the addrspacecast isel patterns to NVPTX.  Also,
>>>>>>         I wanted to point out that your changes in the last test case
>>>>>>         in this patch (address-spaces.cu <http://address-spaces.cu>)
>>>>>>         represent changes that may lead to performance degradation.
>>>>>>          Specific address spaces should be used whenever possible for
>>>>>>         loads/stores.  Casting everything to a generic address is
>>>>>>         still correct, but may lead to additional indirections for
>>>>>>         the hardware.
>>>>>>
>>>>>>
>>>>>>         On Fri, Mar 21, 2014 at 2:25 PM, Justin Holewinski
>>>>>>         <jholewinski at nvidia.com <mailto:jholewinski at nvidia.com>> wrote:
>>>>>>
>>>>>>             addrspacecast support in NVPTX is on my todo list.  I'll
>>>>>>             try to put something together in the next few days.
>>>>>>
>>>>>>
>>>>>>             On 3/21/14, 2:20 PM, Jingyue Wu wrote:
>>>>>>>             Hi,
>>>>>>>
>>>>>>>             Static local variables in CUDA can be declared with
>>>>>>>             address space qualifiers, such as __shared__. Therefore,
>>>>>>>             the codegen needs to potentially addrspacecast a static
>>>>>>>             local variable to the type expected by its declaration.
>>>>>>>             Peter did something similar for global variables in
>>>>>>>             r157167.
>>>>>>>
>>>>>>>             All clang tests passed.
>>>>>>>
>>>>>>>             Justin: The NVPTX backend support for addrspacecast
>>>>>>>             seems not complete. We can send you follow-up patches
>>>>>>>             once this one gets in.
>>>>>>>
>>>>>>>             Jingyue
>>>>>>             --             Thanks,
>>>>>>
>>>>>>             Justin Holewinski
>>>>>>
>>>>>>             ------------------------------------------------------------------------
>>>>>>             This email message is for the sole use of the intended
>>>>>>             recipient(s) and may contain confidential information.
>>>>>>             Any unauthorized review, use, disclosure or distribution
>>>>>>             is prohibited. If you are not the intended recipient,
>>>>>>             please contact the sender by reply email and destroy all
>>>>>>             copies of the original message.
>>>>>>             ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         --
>>>>>>
>>>>>>         Thanks,
>>>>>>
>>>>>>         Justin Holewinski
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> cfe-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140324/90aa8865/attachment.html>