[cfe-dev] [RFC] Delayed target-specific diagnostic when compiling for the devices.

Thu Jan 17 09:44:15 PST 2019

-------------
Best regards,
Alexey Bataev

17.01.2019 12:40, Finkel, Hal J. пишет:
> On 1/17/19 11:11 AM, Alexey Bataev wrote:
>> The compiler does not know anything about the layout on the host when
>> it compiles for the device.
>>
> No, the compiler does know about the host layout (e.g., can't we
> construct this by calling getContext().getAuxTargetInfo(), or similar?).

The compiler knows, but the backend is unable to correctly represent
unsupported types.

>
>> We cannot do anything with the types that are not supported by the
>> target device and we cannot use the layout from the host. And it is
>> user responsibility to write and use the code that is compatible with
>> the the target devices.
>>
>> He/she does not need to use macros/void* types, there are templates.
>>
> No. This doesn't solve the problem (because you still need to share the
> instantiations between the devices). Also, even if it did, does not
> address the legacy-code problem that the feature is intended to address.
> The user already has classes and data on the host and wishes to access
> *parts* of that data on the device. We should make as much of that work
> as possible.
>

Design your code to be compatible with the both, the host and the device.

>> You cannot use classes, which use types incompatible with the device.
>> There is a problem with the data layout on the device and we just
>> don't know how to represent such classes on the device.
>>
> There's no reason for this to be true. To be clear, the model of a
> shared address space only makes sense, from a user perspective, if the
> data layout is the same between the host and the target. Not mostly
> similar, but the same. Otherwise, users will constantly be tracking down
> subtle data-layout incompatibilities.

That's why he/she must use only types, which are supported and have the
same data layout on the host and on the device. Otherwise, there is data
layout incompatibility.

>
> Thanks again,
>
> Hal
>
>
>> -------------
>> Best regards,
>> Alexey Bataev
>> 17.01.2019 11:47, Finkel, Hal J. пишет:
>>> On 1/17/19 9:52 AM, Alexey Bataev wrote:
>>>> Because the type is not compatible with the target device.
>>> But it's not that simple. The situation is that the programming
>>> environment supports the type, but *operations* on that type are not
>>> supported in certain contexts (e.g., when compiled for a certain
>>> device). As you point out, we already need to move in this explicit
>>> direction by, for example, allowing typedefs for types that are not
>>> supported in all contexts, function declarations, and so on. In the end,
>>> we should allow our users to design their classes and abstractions using
>>> good software-engineering practice without worrying about access-context
>>> partitioning.
>>>
>>> Also, the other problem here is that the function I used as an example
>>> is a very common C++ idiom. There are a lot of classes with function
>>> that return a reference to themselves. Classes can have lots of data
>>> members, and those members might not be accessed on the device (even if
>>> the class itself might be accessed on the device). We're moving to a
>>> world in which unified memory is common - the promise of this technology
>>> is that configuration data and complex data structures, which might be
>>> occasionally accessed (but for which explicitly managing data movement
>>> is not performance relevant) are handled transparently. If use of these
>>> data structures is transitively poisoned by use of any type not
>>> supported on the device (including by pointers to types that use those
>>> types), then we'll force unhelpful and technically-unnecessary
>>> refactoring, thus reducing the value of the feature.
>>>
>>> In the current implementation we pre-process the source twice, and so we
>>> can:
>>>
>>>  1. Use ifdefs to change the data memebers when compiling for different
>>> targets. This is hard to get right because, in order to keep the data
>>> layout otherwise the same, the user needs to understand the layout rules
>>> in order to put something in the structure that is supported on the
>>> target and keeps the layout the same (this is very error prone). Also,
>>> if we move to a single-preprocessing-stage model, this no longer works.
>>>
>>>  2. Replace all pointers to relevant types with void*, or similar, and
>>> use a lot of casts. This is also bad.
>>>
>>> We shouldn't be forcing users to play these games. The compiler knows
>>> the layout on the host and it can use it on the target. The fact that
>>> some operations on some types might not be supported on the target is
>>> not relevant to handling pointers/references to containing types.
>>>
>>> Thanks again,
>>>
>>> Hal
>>>
>>>
>>>> -------------
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>> 17.01.2019 10:50, Finkel, Hal J. пишет:
>>>>> On 1/17/19 9:27 AM, Alexey Bataev wrote:
>>>>>> It should be compilable for the device only iff function foo is not used
>>>>>> on the device.
>>>>> Says whom? I disagree. This function should work on the device. Why
>>>>> should it not?
>>>>>
>>>>>  -Hal
>>>>>
>>>>>
>>>>>> -------------
>>>>>> Best regards,
>>>>>> Alexey Bataev
>>>>>>
>>>>>> 17.01.2019 10:24, Finkel, Hal J. пишет:
>>>>>>> On 1/17/19 4:05 AM, Alexey Bataev wrote:
>>>>>>>> Best regards,
>>>>>>>> Alexey Bataev
>>>>>>>>
>>>>>>>>> 17 янв. 2019 г., в 0:46, Finkel, Hal J. <hfinkel at anl.gov> написал(а):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 1/16/19 8:45 AM, Alexey Bataev wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, I thought about this. But we need to delay the diagnostic until
>>>>>>>>>> the Codegen phase. What I need is the way to associate the diagnostic
>>>>>>>>>> with the function so that this diagnostic is available in CodeGen.
>>>>>>>>>>
>>>>>>>>>> Also, we need to postpone the diagnotics not only for functions,
>>>>>>>>>> but,for example, for some types. For example, __float128 type is not
>>>>>>>>>> supported by CUDA. We can get error messages when we ran into
>>>>>>>>>> something like `typedef __float128 SomeOtherType` (say, in some system
>>>>>>>>>> header files) and get the error diagnostic when we compile for the
>>>>>>>>>> device. Though, actually, this type is not used in the device code,
>>>>>>>>>> the diagnostic is still emitted and we need to delay too and emit it
>>>>>>>>>> only iff the type is used in the device code.
>>>>>>>>>>
>>>>>>>>> This should be fixed for CUDA too, right?
>>>>>>>>>
>>>>>>>>> Also, we still get to have pointers to aggregates containing those types
>>>>>>>>> on the device, right?
>>>>>>>>>
>>>>>>>> No, why? This is not allowed and should be diagnosed too. If somebody tries somehow to use not allowed type for the device variables/functions - it should be diagnosed.
>>>>>>> Because this should be allowed. If I have:
>>>>>>>
>>>>>>> struct X {
>>>>>>>   int a;
>>>>>>>   __float128 b;
>>>>>>> };
>>>>>>>
>>>>>>> and we have some function which does this:
>>>>>>>
>>>>>>> X *foo(X *x) {
>>>>>>>   return x;
>>>>>>> }
>>>>>>>
>>>>>>> We'll certainly want this function to compile for all targets, even if
>>>>>>> there's no __float128 support on some accelerator. The whole model only
>>>>>>> really makes sense if the accelerator shares the aggregate-layout rules
>>>>>>> of the host, and this is a needless hassle for users if this causes an
>>>>>>> error (especially in a unified-memory environment where configuration
>>>>>>> data structures, etc. are shared between devices).
>>>>>>>
>>>>>>> Thanks again,
>>>>>>>
>>>>>>> Hal
>>>>>>>
>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>>
>>>>>>>>> Hal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -------------
>>>>>>>>>> Best regards,
>>>>>>>>>> Alexey Bataev
>>>>>>>>>> 15.01.2019 17:33, John McCall пишет:
>>>>>>>>>>>> On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
>>>>>>>>>>>> This is not only for asm, we need to delay all target-specific
>>>>>>>>>>>> diagnostics.
>>>>>>>>>>>> I'm not saying that we need to move the host diagnostic, only the
>>>>>>>>>>>> diagnostic for the device compilation.
>>>>>>>>>>>> As for Cuda, it is a little but different. In Cuda the programmer
>>>>>>>>>>>> must explicitly mark the device functions,  while in OpenMP it must
>>>>>>>>>>>> be done implicitly. Thus, we cannot reuse the solution used for Cuda.
>>>>>>>>>>> All it means is that you can't just use the solution used for CUDA
>>>>>>>>>>> "off the shelf".  The basic idea of associating diagnostics with the
>>>>>>>>>>> current function and then emitting those diagnostics later when you
>>>>>>>>>>> realize that you have to emit that function is still completely
>>>>>>>>>>> applicable.
>>>>>>>>>>>
>>>>>>>>>>> John.
>>>>>>>>> -- 
>>>>>>>>> Hal Finkel
>>>>>>>>> Lead, Compiler Technology and Programming Languages
>>>>>>>>> Leadership Computing Facility
>>>>>>>>> Argonne National Laboratory
>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190117/0849e522/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190117/0849e522/attachment.sig>