[cfe-dev] [RFC] Delayed target-specific diagnostic when compiling for the devices.

Mon Feb 25 10:38:59 PST 2019

Hi Hal, I made a simple reproducer, seems to me float128 is already
emitted correctly for NVPTX target. I just need to check the alignment
and the record layout. And tere is another one question because of this.
What if there is a difference between the host and the device in the
memory layout for different types? Shall we use the host memory layout
when we try to compile for the device? But it may lead to the problems
with the memory access on the devices. Maybe, we should emit an error
message if there is a difference in the type layout on the host and on
the device?

-------------
Best regards,
Alexey Bataev

25.02.2019 13:33, Finkel, Hal J. пишет:
>
> Hi, Alexey,
>
> Thanks again for continuing to work on this. Yes, this is what we had
> in mind. Regarding how to do it, I *think* that you need to emit it as
> a i128, essentially, however, it occurs to me that it might not be
> that simple. If the alignment of the float128 is the same as i128,
> then I think that a single i128 will do. If the alignment of float128
> is less than that of i128, then you need to emit a series of smaller
> fields to match the layout based on the alignment of the unsupported type.
>
>  -Hal
>
> On 2/25/19 12:10 PM, Alexey Bataev wrote:
>>
>> I'm not saying that this is "terribly different", I'm just asking how
>> to represent unsupported types. But seems to me I was wrong about
>> codegen. The data types are emitted correctly, but we cannot use it
>> for any expression.
>>
>> -------------
>> Best regards,
>> Alexey Bataev
>> 25.02.2019 12:53, Justin Lebar пишет:
>>> > Hi Justin, no, the idea was different. Hal and David want to be
>>> able to compile the unsupported types, but emit errors for
>>> expressions with the unsupported types. It is required for unified
>>> memory support. I.e., if the type is used in, say, a structure or
>>> class, this class must be compiled correctly, even if the type is
>>> not supported. Hal, David, is this your intention?
>>>
>>> To me that idea is not terribly different.
>>>
>>> It seems to me all we're saying is, in addition to the rule of "emit
>>> an error if required to codegen something we don't support", we add
>>> the rule of "learn to codegen structs that contain unsupported
>>> types".  This should be relatively simple so long as one never
>>> *uses* the type, right?
>>>
>>> On Mon, Feb 25, 2019 at 9:27 AM Alexey Bataev <a.bataev at outlook.com
>>> <mailto:a.bataev at outlook.com>> wrote:
>>>
>>>     Hi Justin, no, the idea was different. Hal and David want to be
>>>     able to compile the unsupported types, but emit errors for
>>>     expressions with the unsupported types. It is required for
>>>     unified memory support. I.e., if the type is used in, say, a
>>>     structure or class, this class must be compiled correctly, even
>>>     if the type is not supported. Hal, David, is this your intention?
>>>
>>>     -------------
>>>     Best regards,
>>>     Alexey Bataev
>>>
>>>     25.02.2019 12:24, Justin Lebar пишет:
>>>>     > I can try to disable emission of the error messages about
>>>>     unsupported type for NVPTX devices, but how we're going to emit
>>>>     it in the PTX format? 
>>>>
>>>>     I thought the idea in this thread was: if we have to codegen
>>>>     something the backed doesn't support, then we emit an error.
>>>>
>>>>     That is, it's not "no errors for unsupported types," it's
>>>>     "deferred errors for unsupported types."
>>>>
>>>>     On Mon, Feb 25, 2019, 7:40 AM Alexey Bataev
>>>>     <a.bataev at outlook.com <mailto:a.bataev at outlook.com>> wrote:
>>>>
>>>>         Hi Hal, David, I have a question about the unsupported
>>>>         types. Ok, I can try to disable emission of the error
>>>>         messages about unsupported type for NVPTX devices, but how
>>>>         we're going to emit it in the PTX format? PTX supports only
>>>>         f16, f32 and f64 type. If we going to enable float128 type,
>>>>         for example, there is no way to emit it for NVPTX
>>>>         correctly. Any ideas how to do this? Because currently, I
>>>>         think, it will just lead to the incorrect codegen and will
>>>>         cause a crash in the NVPTX backend.
>>>>
>>>>         -------------
>>>>         Best regards,
>>>>         Alexey Bataev
>>>>
>>>>         17.01.2019 12:40, Finkel, Hal J. пишет:
>>>>>         On 1/17/19 11:11 AM, Alexey Bataev wrote:
>>>>>>         The compiler does not know anything about the layout on the host when
>>>>>>         it compiles for the device.
>>>>>>
>>>>>         No, the compiler does know about the host layout (e.g., can't we
>>>>>         construct this by calling getContext().getAuxTargetInfo(), or similar?).
>>>>>
>>>>>
>>>>>>         We cannot do anything with the types that are not supported by the
>>>>>>         target device and we cannot use the layout from the host. And it is
>>>>>>         user responsibility to write and use the code that is compatible with
>>>>>>         the the target devices.
>>>>>>
>>>>>>         He/she does not need to use macros/void* types, there are templates.
>>>>>>
>>>>>         No. This doesn't solve the problem (because you still need to share the
>>>>>         instantiations between the devices). Also, even if it did, does not
>>>>>         address the legacy-code problem that the feature is intended to address.
>>>>>         The user already has classes and data on the host and wishes to access
>>>>>         *parts* of that data on the device. We should make as much of that work
>>>>>         as possible.
>>>>>
>>>>>
>>>>>>         You cannot use classes, which use types incompatible with the device.
>>>>>>         There is a problem with the data layout on the device and we just
>>>>>>         don't know how to represent such classes on the device.
>>>>>>
>>>>>         There's no reason for this to be true. To be clear, the model of a
>>>>>         shared address space only makes sense, from a user perspective, if the
>>>>>         data layout is the same between the host and the target. Not mostly
>>>>>         similar, but the same. Otherwise, users will constantly be tracking down
>>>>>         subtle data-layout incompatibilities.
>>>>>
>>>>>         Thanks again,
>>>>>
>>>>>         Hal
>>>>>
>>>>>
>>>>>>         -------------
>>>>>>         Best regards,
>>>>>>         Alexey Bataev
>>>>>>         17.01.2019 11:47, Finkel, Hal J. пишет:
>>>>>>>         On 1/17/19 9:52 AM, Alexey Bataev wrote:
>>>>>>>>         Because the type is not compatible with the target device.
>>>>>>>         But it's not that simple. The situation is that the programming
>>>>>>>         environment supports the type, but *operations* on that type are not
>>>>>>>         supported in certain contexts (e.g., when compiled for a certain
>>>>>>>         device). As you point out, we already need to move in this explicit
>>>>>>>         direction by, for example, allowing typedefs for types that are not
>>>>>>>         supported in all contexts, function declarations, and so on. In the end,
>>>>>>>         we should allow our users to design their classes and abstractions using
>>>>>>>         good software-engineering practice without worrying about access-context
>>>>>>>         partitioning.
>>>>>>>
>>>>>>>         Also, the other problem here is that the function I used as an example
>>>>>>>         is a very common C++ idiom. There are a lot of classes with function
>>>>>>>         that return a reference to themselves. Classes can have lots of data
>>>>>>>         members, and those members might not be accessed on the device (even if
>>>>>>>         the class itself might be accessed on the device). We're moving to a
>>>>>>>         world in which unified memory is common - the promise of this technology
>>>>>>>         is that configuration data and complex data structures, which might be
>>>>>>>         occasionally accessed (but for which explicitly managing data movement
>>>>>>>         is not performance relevant) are handled transparently. If use of these
>>>>>>>         data structures is transitively poisoned by use of any type not
>>>>>>>         supported on the device (including by pointers to types that use those
>>>>>>>         types), then we'll force unhelpful and technically-unnecessary
>>>>>>>         refactoring, thus reducing the value of the feature.
>>>>>>>
>>>>>>>         In the current implementation we pre-process the source twice, and so we
>>>>>>>         can:
>>>>>>>
>>>>>>>          1. Use ifdefs to change the data memebers when compiling for different
>>>>>>>         targets. This is hard to get right because, in order to keep the data
>>>>>>>         layout otherwise the same, the user needs to understand the layout rules
>>>>>>>         in order to put something in the structure that is supported on the
>>>>>>>         target and keeps the layout the same (this is very error prone). Also,
>>>>>>>         if we move to a single-preprocessing-stage model, this no longer works.
>>>>>>>
>>>>>>>          2. Replace all pointers to relevant types with void*, or similar, and
>>>>>>>         use a lot of casts. This is also bad.
>>>>>>>
>>>>>>>         We shouldn't be forcing users to play these games. The compiler knows
>>>>>>>         the layout on the host and it can use it on the target. The fact that
>>>>>>>         some operations on some types might not be supported on the target is
>>>>>>>         not relevant to handling pointers/references to containing types.
>>>>>>>
>>>>>>>         Thanks again,
>>>>>>>
>>>>>>>         Hal
>>>>>>>
>>>>>>>
>>>>>>>>         -------------
>>>>>>>>         Best regards,
>>>>>>>>         Alexey Bataev
>>>>>>>>
>>>>>>>>         17.01.2019 10:50, Finkel, Hal J. пишет:
>>>>>>>>>         On 1/17/19 9:27 AM, Alexey Bataev wrote:
>>>>>>>>>>         It should be compilable for the device only iff function foo is not used
>>>>>>>>>>         on the device.
>>>>>>>>>         Says whom? I disagree. This function should work on the device. Why
>>>>>>>>>         should it not?
>>>>>>>>>
>>>>>>>>>          -Hal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>         -------------
>>>>>>>>>>         Best regards,
>>>>>>>>>>         Alexey Bataev
>>>>>>>>>>
>>>>>>>>>>         17.01.2019 10:24, Finkel, Hal J. пишет:
>>>>>>>>>>>         On 1/17/19 4:05 AM, Alexey Bataev wrote:
>>>>>>>>>>>>         Best regards,
>>>>>>>>>>>>         Alexey Bataev
>>>>>>>>>>>>
>>>>>>>>>>>>>         17 янв. 2019 г., в 0:46, Finkel, Hal J. <hfinkel at anl.gov> <mailto:hfinkel at anl.gov> написал(а):
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>         On 1/16/19 8:45 AM, Alexey Bataev wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         Yes, I thought about this. But we need to delay the diagnostic until
>>>>>>>>>>>>>>         the Codegen phase. What I need is the way to associate the diagnostic
>>>>>>>>>>>>>>         with the function so that this diagnostic is available in CodeGen.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         Also, we need to postpone the diagnotics not only for functions,
>>>>>>>>>>>>>>         but,for example, for some types. For example, __float128 type is not
>>>>>>>>>>>>>>         supported by CUDA. We can get error messages when we ran into
>>>>>>>>>>>>>>         something like `typedef __float128 SomeOtherType` (say, in some system
>>>>>>>>>>>>>>         header files) and get the error diagnostic when we compile for the
>>>>>>>>>>>>>>         device. Though, actually, this type is not used in the device code,
>>>>>>>>>>>>>>         the diagnostic is still emitted and we need to delay too and emit it
>>>>>>>>>>>>>>         only iff the type is used in the device code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>         This should be fixed for CUDA too, right?
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Also, we still get to have pointers to aggregates containing those types
>>>>>>>>>>>>>         on the device, right?
>>>>>>>>>>>>>
>>>>>>>>>>>>         No, why? This is not allowed and should be diagnosed too. If somebody tries somehow to use not allowed type for the device variables/functions - it should be diagnosed.
>>>>>>>>>>>         Because this should be allowed. If I have:
>>>>>>>>>>>
>>>>>>>>>>>         struct X {
>>>>>>>>>>>           int a;
>>>>>>>>>>>           __float128 b;
>>>>>>>>>>>         };
>>>>>>>>>>>
>>>>>>>>>>>         and we have some function which does this:
>>>>>>>>>>>
>>>>>>>>>>>         X *foo(X *x) {
>>>>>>>>>>>           return x;
>>>>>>>>>>>         }
>>>>>>>>>>>
>>>>>>>>>>>         We'll certainly want this function to compile for all targets, even if
>>>>>>>>>>>         there's no __float128 support on some accelerator. The whole model only
>>>>>>>>>>>         really makes sense if the accelerator shares the aggregate-layout rules
>>>>>>>>>>>         of the host, and this is a needless hassle for users if this causes an
>>>>>>>>>>>         error (especially in a unified-memory environment where configuration
>>>>>>>>>>>         data structures, etc. are shared between devices).
>>>>>>>>>>>
>>>>>>>>>>>         Thanks again,
>>>>>>>>>>>
>>>>>>>>>>>         Hal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>         Thanks again,
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Hal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>         -------------
>>>>>>>>>>>>>>         Best regards,
>>>>>>>>>>>>>>         Alexey Bataev
>>>>>>>>>>>>>>         15.01.2019 17:33, John McCall пишет:
>>>>>>>>>>>>>>>>         On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
>>>>>>>>>>>>>>>>         This is not only for asm, we need to delay all target-specific
>>>>>>>>>>>>>>>>         diagnostics.
>>>>>>>>>>>>>>>>         I'm not saying that we need to move the host diagnostic, only the
>>>>>>>>>>>>>>>>         diagnostic for the device compilation.
>>>>>>>>>>>>>>>>         As for Cuda, it is a little but different. In Cuda the programmer
>>>>>>>>>>>>>>>>         must explicitly mark the device functions,  while in OpenMP it must
>>>>>>>>>>>>>>>>         be done implicitly. Thus, we cannot reuse the solution used for Cuda.
>>>>>>>>>>>>>>>         All it means is that you can't just use the solution used for CUDA
>>>>>>>>>>>>>>>         "off the shelf".  The basic idea of associating diagnostics with the
>>>>>>>>>>>>>>>         current function and then emitting those diagnostics later when you
>>>>>>>>>>>>>>>         realize that you have to emit that function is still completely
>>>>>>>>>>>>>>>         applicable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         John.
>>>>>>>>>>>>>         -- 
>>>>>>>>>>>>>         Hal Finkel
>>>>>>>>>>>>>         Lead, Compiler Technology and Programming Languages
>>>>>>>>>>>>>         Leadership Computing Facility
>>>>>>>>>>>>>         Argonne National Laboratory
>>>>>>>>>>>>>
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190225/79e473fd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190225/79e473fd/attachment.sig>