PATCH: fix clang to emit correct addrspacecast for CUDA

Jingyue Wu jingyue at google.com
Mon Mar 24 12:32:50 PDT 2014


Forgot to reply to all.


On Mon, Mar 24, 2014 at 12:30 PM, Justin Holewinski
<jholewinski at nvidia.com>wrote:

>  Okay, I see.  That makes sense.  Thanks!
>
>
> On 03/24/2014 03:22 PM, Jingyue Wu wrote:
>
>  We discussed this offline with Peter before. __shared__ is a declaration
> qualifier instead of a type qualifier. Therefore, clang will only generate
> (float *) in the AST, because it wants to faithfully model the type system.
> it's true that codegen creates an addrspace(3) allocation for 'a', but "a"
> is still declared and used as float * in the AST. This is why we fix
> codegen to addrspacecast every __shared__ variable. If __shared__ is part
> of the type system, I would agree to add the __shared__ type in the AST and
> emit the code you expected.
>
>  Regarding "ExpectedAddrSpace != AddrSpace", as aforementioned, AddrSpace
> is the allocation address space (3 for __shared__), and ExpectedAddrSpace
> is the declared address space (0 in this case). Therefore, they are
> different, and need an addrspacecast.
>
>  Jingyue
>
>
> On Mon, Mar 24, 2014 at 11:36 AM, Justin Holewinski <
> jholewinski at nvidia.com> wrote:
>
>>  I'm not sure I understand the problem with address-space-conversion.cu.
>> I would expect the frontend to look at the pointer type when determining
>> the type of store to emit.
>>
>> For example, if there is code like the following:
>>
>> __shared__ float a;
>> float *ptr = &a;
>> *ptr = 2.0f;
>>
>>
>> I would expect the frontend to create an addrspace(3) allocation for 'a',
>> store the value of 'addrspacecast float addrspace(3)* to float*' into
>> 'ptr', and then emit a 'store ..., float*' for the final assignment.
>>
>> But if the indirection through 'ptr' was not there, the frontend would
>> see an addrspace(3) pointer and emit the corresponding store instead of
>> first casting to addrspace(0).
>>
>> Maybe I'm just misunderstanding what you're trying to do here.  Why is
>> ExpectedAddrSpace != AddrSpace for the examples in address-spaces.cu?
>>
>>
>>
>> On 03/24/2014 02:22 PM, Jingyue Wu wrote:
>>
>> Justin,
>>
>>  I overlooked that you were referring specifically to the last test case
>> in address-spaces.cu.
>>
>>  In that particular example, although the previous code looks to emit
>> more optimized code, it only worked by chance because the program only
>> loads from and stores to "shared int lk". If the test case had been more
>> complicated, e.g., "int *lkp = &lk" after "shared int lk", the codegen
>> would have emitted a StoreInst with mismatched types, and crashed just as
>> in many cases in address-space-conversion.cu.
>>
>>  Jingyue
>>
>>
>> On Mon, Mar 24, 2014 at 10:05 AM, Jingyue Wu <jingyue at google.com> wrote:
>>
>>> Right. We are aware of this issue, and think it should be addressed in
>>> the IR optimizer (similar to InstCombineLoadCast and
>>> InstCombineStoreToCast) instead of clang. Do you think this is an
>>> appropriate approach? Is this optimization general enough to stay in the IR
>>> optimizer or target-dependent?
>>>
>>>  Jingyue
>>>
>>>
>>> On Mon, Mar 24, 2014 at 4:54 AM, Justin Holewinski <
>>> justin.holewinski at gmail.com> wrote:
>>>
>>>> Hi Jingyue,
>>>>
>>>>  I committed the addrspacecast isel patterns to NVPTX.  Also, I wanted
>>>> to point out that your changes in the last test case in this patch (
>>>> address-spaces.cu) represent changes that may lead to performance
>>>> degradation.  Specific address spaces should be used whenever possible for
>>>> loads/stores.  Casting everything to a generic address is still correct,
>>>> but may lead to additional indirections for the hardware.
>>>>
>>>>
>>>> On Fri, Mar 21, 2014 at 2:25 PM, Justin Holewinski <
>>>> jholewinski at nvidia.com> wrote:
>>>>
>>>>>  addrspacecast support in NVPTX is on my todo list.  I'll try to put
>>>>> something together in the next few days.
>>>>>
>>>>>
>>>>> On 3/21/14, 2:20 PM, Jingyue Wu wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>  Static local variables in CUDA can be declared with address space
>>>>> qualifiers, such as __shared__. Therefore, the codegen needs to potentially
>>>>> addrspacecast a static local variable to the type expected by its
>>>>> declaration. Peter did something similar for global variables in r157167.
>>>>>
>>>>>  All clang tests passed.
>>>>>
>>>>>  Justin: The NVPTX backend support for addrspacecast seems not
>>>>> complete. We can send you follow-up patches once this one gets in.
>>>>>
>>>>>  Jingyue
>>>>>
>>>>>
>>>>>
>>>>>   --
>>>>> Thanks,
>>>>>
>>>>> Justin Holewinski
>>>>>
>>>>>  ------------------------------
>>>>>  This email message is for the sole use of the intended recipient(s)
>>>>> and may contain confidential information.  Any unauthorized review, use,
>>>>> disclosure or distribution is prohibited.  If you are not the intended
>>>>> recipient, please contact the sender by reply email and destroy all copies
>>>>> of the original message.
>>>>>  ------------------------------
>>>>>
>>>>
>>>>
>>>>
>>>>   --
>>>>
>>>> Thanks,
>>>>
>>>>  Justin Holewinski
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140324/12319aca/attachment.html>


More information about the cfe-commits mailing list