[cfe-dev] [RFC] Delayed target-specific diagnostic when compiling for the devices.

Alexey Bataev via cfe-dev cfe-dev at lists.llvm.org
Mon Feb 25 10:10:15 PST 2019


I'm not saying that this is "terribly different", I'm just asking how to
represent unsupported types. But seems to me I was wrong about codegen.
The data types are emitted correctly, but we cannot use it for any
expression.

-------------
Best regards,
Alexey Bataev

25.02.2019 12:53, Justin Lebar пишет:
> > Hi Justin, no, the idea was different. Hal and David want to be able
> to compile the unsupported types, but emit errors for expressions with
> the unsupported types. It is required for unified memory support.
> I.e., if the type is used in, say, a structure or class, this class
> must be compiled correctly, even if the type is not supported. Hal,
> David, is this your intention?
>
> To me that idea is not terribly different.
>
> It seems to me all we're saying is, in addition to the rule of "emit
> an error if required to codegen something we don't support", we add
> the rule of "learn to codegen structs that contain unsupported
> types".  This should be relatively simple so long as one never *uses*
> the type, right?
>
> On Mon, Feb 25, 2019 at 9:27 AM Alexey Bataev <a.bataev at outlook.com
> <mailto:a.bataev at outlook.com>> wrote:
>
>     Hi Justin, no, the idea was different. Hal and David want to be
>     able to compile the unsupported types, but emit errors for
>     expressions with the unsupported types. It is required for unified
>     memory support. I.e., if the type is used in, say, a structure or
>     class, this class must be compiled correctly, even if the type is
>     not supported. Hal, David, is this your intention?
>
>     -------------
>     Best regards,
>     Alexey Bataev
>
>     25.02.2019 12:24, Justin Lebar пишет:
>>     > I can try to disable emission of the error messages about
>>     unsupported type for NVPTX devices, but how we're going to emit
>>     it in the PTX format? 
>>
>>     I thought the idea in this thread was: if we have to codegen
>>     something the backed doesn't support, then we emit an error.
>>
>>     That is, it's not "no errors for unsupported types," it's
>>     "deferred errors for unsupported types."
>>
>>     On Mon, Feb 25, 2019, 7:40 AM Alexey Bataev <a.bataev at outlook.com
>>     <mailto:a.bataev at outlook.com>> wrote:
>>
>>         Hi Hal, David, I have a question about the unsupported types.
>>         Ok, I can try to disable emission of the error messages about
>>         unsupported type for NVPTX devices, but how we're going to
>>         emit it in the PTX format? PTX supports only f16, f32 and f64
>>         type. If we going to enable float128 type, for example, there
>>         is no way to emit it for NVPTX correctly. Any ideas how to do
>>         this? Because currently, I think, it will just lead to the
>>         incorrect codegen and will cause a crash in the NVPTX backend.
>>
>>         -------------
>>         Best regards,
>>         Alexey Bataev
>>
>>         17.01.2019 12:40, Finkel, Hal J. пишет:
>>>         On 1/17/19 11:11 AM, Alexey Bataev wrote:
>>>>         The compiler does not know anything about the layout on the host when
>>>>         it compiles for the device.
>>>>
>>>         No, the compiler does know about the host layout (e.g., can't we
>>>         construct this by calling getContext().getAuxTargetInfo(), or similar?).
>>>
>>>
>>>>         We cannot do anything with the types that are not supported by the
>>>>         target device and we cannot use the layout from the host. And it is
>>>>         user responsibility to write and use the code that is compatible with
>>>>         the the target devices.
>>>>
>>>>         He/she does not need to use macros/void* types, there are templates.
>>>>
>>>         No. This doesn't solve the problem (because you still need to share the
>>>         instantiations between the devices). Also, even if it did, does not
>>>         address the legacy-code problem that the feature is intended to address.
>>>         The user already has classes and data on the host and wishes to access
>>>         *parts* of that data on the device. We should make as much of that work
>>>         as possible.
>>>
>>>
>>>>         You cannot use classes, which use types incompatible with the device.
>>>>         There is a problem with the data layout on the device and we just
>>>>         don't know how to represent such classes on the device.
>>>>
>>>         There's no reason for this to be true. To be clear, the model of a
>>>         shared address space only makes sense, from a user perspective, if the
>>>         data layout is the same between the host and the target. Not mostly
>>>         similar, but the same. Otherwise, users will constantly be tracking down
>>>         subtle data-layout incompatibilities.
>>>
>>>         Thanks again,
>>>
>>>         Hal
>>>
>>>
>>>>         -------------
>>>>         Best regards,
>>>>         Alexey Bataev
>>>>         17.01.2019 11:47, Finkel, Hal J. пишет:
>>>>>         On 1/17/19 9:52 AM, Alexey Bataev wrote:
>>>>>>         Because the type is not compatible with the target device.
>>>>>         But it's not that simple. The situation is that the programming
>>>>>         environment supports the type, but *operations* on that type are not
>>>>>         supported in certain contexts (e.g., when compiled for a certain
>>>>>         device). As you point out, we already need to move in this explicit
>>>>>         direction by, for example, allowing typedefs for types that are not
>>>>>         supported in all contexts, function declarations, and so on. In the end,
>>>>>         we should allow our users to design their classes and abstractions using
>>>>>         good software-engineering practice without worrying about access-context
>>>>>         partitioning.
>>>>>
>>>>>         Also, the other problem here is that the function I used as an example
>>>>>         is a very common C++ idiom. There are a lot of classes with function
>>>>>         that return a reference to themselves. Classes can have lots of data
>>>>>         members, and those members might not be accessed on the device (even if
>>>>>         the class itself might be accessed on the device). We're moving to a
>>>>>         world in which unified memory is common - the promise of this technology
>>>>>         is that configuration data and complex data structures, which might be
>>>>>         occasionally accessed (but for which explicitly managing data movement
>>>>>         is not performance relevant) are handled transparently. If use of these
>>>>>         data structures is transitively poisoned by use of any type not
>>>>>         supported on the device (including by pointers to types that use those
>>>>>         types), then we'll force unhelpful and technically-unnecessary
>>>>>         refactoring, thus reducing the value of the feature.
>>>>>
>>>>>         In the current implementation we pre-process the source twice, and so we
>>>>>         can:
>>>>>
>>>>>          1. Use ifdefs to change the data memebers when compiling for different
>>>>>         targets. This is hard to get right because, in order to keep the data
>>>>>         layout otherwise the same, the user needs to understand the layout rules
>>>>>         in order to put something in the structure that is supported on the
>>>>>         target and keeps the layout the same (this is very error prone). Also,
>>>>>         if we move to a single-preprocessing-stage model, this no longer works.
>>>>>
>>>>>          2. Replace all pointers to relevant types with void*, or similar, and
>>>>>         use a lot of casts. This is also bad.
>>>>>
>>>>>         We shouldn't be forcing users to play these games. The compiler knows
>>>>>         the layout on the host and it can use it on the target. The fact that
>>>>>         some operations on some types might not be supported on the target is
>>>>>         not relevant to handling pointers/references to containing types.
>>>>>
>>>>>         Thanks again,
>>>>>
>>>>>         Hal
>>>>>
>>>>>
>>>>>>         -------------
>>>>>>         Best regards,
>>>>>>         Alexey Bataev
>>>>>>
>>>>>>         17.01.2019 10:50, Finkel, Hal J. пишет:
>>>>>>>         On 1/17/19 9:27 AM, Alexey Bataev wrote:
>>>>>>>>         It should be compilable for the device only iff function foo is not used
>>>>>>>>         on the device.
>>>>>>>         Says whom? I disagree. This function should work on the device. Why
>>>>>>>         should it not?
>>>>>>>
>>>>>>>          -Hal
>>>>>>>
>>>>>>>
>>>>>>>>         -------------
>>>>>>>>         Best regards,
>>>>>>>>         Alexey Bataev
>>>>>>>>
>>>>>>>>         17.01.2019 10:24, Finkel, Hal J. пишет:
>>>>>>>>>         On 1/17/19 4:05 AM, Alexey Bataev wrote:
>>>>>>>>>>         Best regards,
>>>>>>>>>>         Alexey Bataev
>>>>>>>>>>
>>>>>>>>>>>         17 янв. 2019 г., в 0:46, Finkel, Hal J. <hfinkel at anl.gov> <mailto:hfinkel at anl.gov> написал(а):
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>         On 1/16/19 8:45 AM, Alexey Bataev wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>         Yes, I thought about this. But we need to delay the diagnostic until
>>>>>>>>>>>>         the Codegen phase. What I need is the way to associate the diagnostic
>>>>>>>>>>>>         with the function so that this diagnostic is available in CodeGen.
>>>>>>>>>>>>
>>>>>>>>>>>>         Also, we need to postpone the diagnotics not only for functions,
>>>>>>>>>>>>         but,for example, for some types. For example, __float128 type is not
>>>>>>>>>>>>         supported by CUDA. We can get error messages when we ran into
>>>>>>>>>>>>         something like `typedef __float128 SomeOtherType` (say, in some system
>>>>>>>>>>>>         header files) and get the error diagnostic when we compile for the
>>>>>>>>>>>>         device. Though, actually, this type is not used in the device code,
>>>>>>>>>>>>         the diagnostic is still emitted and we need to delay too and emit it
>>>>>>>>>>>>         only iff the type is used in the device code.
>>>>>>>>>>>>
>>>>>>>>>>>         This should be fixed for CUDA too, right?
>>>>>>>>>>>
>>>>>>>>>>>         Also, we still get to have pointers to aggregates containing those types
>>>>>>>>>>>         on the device, right?
>>>>>>>>>>>
>>>>>>>>>>         No, why? This is not allowed and should be diagnosed too. If somebody tries somehow to use not allowed type for the device variables/functions - it should be diagnosed.
>>>>>>>>>         Because this should be allowed. If I have:
>>>>>>>>>
>>>>>>>>>         struct X {
>>>>>>>>>           int a;
>>>>>>>>>           __float128 b;
>>>>>>>>>         };
>>>>>>>>>
>>>>>>>>>         and we have some function which does this:
>>>>>>>>>
>>>>>>>>>         X *foo(X *x) {
>>>>>>>>>           return x;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         We'll certainly want this function to compile for all targets, even if
>>>>>>>>>         there's no __float128 support on some accelerator. The whole model only
>>>>>>>>>         really makes sense if the accelerator shares the aggregate-layout rules
>>>>>>>>>         of the host, and this is a needless hassle for users if this causes an
>>>>>>>>>         error (especially in a unified-memory environment where configuration
>>>>>>>>>         data structures, etc. are shared between devices).
>>>>>>>>>
>>>>>>>>>         Thanks again,
>>>>>>>>>
>>>>>>>>>         Hal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>         Thanks again,
>>>>>>>>>>>
>>>>>>>>>>>         Hal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>         -------------
>>>>>>>>>>>>         Best regards,
>>>>>>>>>>>>         Alexey Bataev
>>>>>>>>>>>>         15.01.2019 17:33, John McCall пишет:
>>>>>>>>>>>>>>         On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
>>>>>>>>>>>>>>         This is not only for asm, we need to delay all target-specific
>>>>>>>>>>>>>>         diagnostics.
>>>>>>>>>>>>>>         I'm not saying that we need to move the host diagnostic, only the
>>>>>>>>>>>>>>         diagnostic for the device compilation.
>>>>>>>>>>>>>>         As for Cuda, it is a little but different. In Cuda the programmer
>>>>>>>>>>>>>>         must explicitly mark the device functions,  while in OpenMP it must
>>>>>>>>>>>>>>         be done implicitly. Thus, we cannot reuse the solution used for Cuda.
>>>>>>>>>>>>>         All it means is that you can't just use the solution used for CUDA
>>>>>>>>>>>>>         "off the shelf".  The basic idea of associating diagnostics with the
>>>>>>>>>>>>>         current function and then emitting those diagnostics later when you
>>>>>>>>>>>>>         realize that you have to emit that function is still completely
>>>>>>>>>>>>>         applicable.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         John.
>>>>>>>>>>>         -- 
>>>>>>>>>>>         Hal Finkel
>>>>>>>>>>>         Lead, Compiler Technology and Programming Languages
>>>>>>>>>>>         Leadership Computing Facility
>>>>>>>>>>>         Argonne National Laboratory
>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190225/2d9ca743/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190225/2d9ca743/attachment.sig>


More information about the cfe-dev mailing list