[cfe-dev] [RFC] Delayed target-specific diagnostic when compiling for the devices.

Doerfert, Johannes via cfe-dev cfe-dev at lists.llvm.org
Mon Feb 25 08:11:10 PST 2019


I don't know if this was already discussed but I recently stumbled over
a "use case" that does not even require us to generate code. When I
build the MiniFE proxy app [1], which includes the <string> header, it
was trying to determine properties of a __float128 (I frogot where
exactly). I had to manually add "#undef __GLIBCXX_USE_FLOAT128" to "fix"
it but that for sure no general solution.

[1] https://proxyapps.exascaleproject.org/app/minife/


On 02/25, Alexey Bataev wrote:
> Hi Hal, David, I have a question about the unsupported types. Ok, I can
> try to disable emission of the error messages about unsupported type for
> NVPTX devices, but how we're going to emit it in the PTX format? PTX
> supports only f16, f32 and f64 type. If we going to enable float128
> type, for example, there is no way to emit it for NVPTX correctly. Any
> ideas how to do this? Because currently, I think, it will just lead to
> the incorrect codegen and will cause a crash in the NVPTX backend.
> 
> -------------
> Best regards,
> Alexey Bataev
> 
> 17.01.2019 12:40, Finkel, Hal J. пишет:
> > On 1/17/19 11:11 AM, Alexey Bataev wrote:
> >> The compiler does not know anything about the layout on the host when
> >> it compiles for the device.
> >>
> > No, the compiler does know about the host layout (e.g., can't we
> > construct this by calling getContext().getAuxTargetInfo(), or similar?).
> >
> >
> >> We cannot do anything with the types that are not supported by the
> >> target device and we cannot use the layout from the host. And it is
> >> user responsibility to write and use the code that is compatible with
> >> the the target devices.
> >>
> >> He/she does not need to use macros/void* types, there are templates.
> >>
> > No. This doesn't solve the problem (because you still need to share the
> > instantiations between the devices). Also, even if it did, does not
> > address the legacy-code problem that the feature is intended to address.
> > The user already has classes and data on the host and wishes to access
> > *parts* of that data on the device. We should make as much of that work
> > as possible.
> >
> >
> >> You cannot use classes, which use types incompatible with the device.
> >> There is a problem with the data layout on the device and we just
> >> don't know how to represent such classes on the device.
> >>
> > There's no reason for this to be true. To be clear, the model of a
> > shared address space only makes sense, from a user perspective, if the
> > data layout is the same between the host and the target. Not mostly
> > similar, but the same. Otherwise, users will constantly be tracking down
> > subtle data-layout incompatibilities.
> >
> > Thanks again,
> >
> > Hal
> >
> >
> >> -------------
> >> Best regards,
> >> Alexey Bataev
> >> 17.01.2019 11:47, Finkel, Hal J. пишет:
> >>> On 1/17/19 9:52 AM, Alexey Bataev wrote:
> >>>> Because the type is not compatible with the target device.
> >>> But it's not that simple. The situation is that the programming
> >>> environment supports the type, but *operations* on that type are not
> >>> supported in certain contexts (e.g., when compiled for a certain
> >>> device). As you point out, we already need to move in this explicit
> >>> direction by, for example, allowing typedefs for types that are not
> >>> supported in all contexts, function declarations, and so on. In the end,
> >>> we should allow our users to design their classes and abstractions using
> >>> good software-engineering practice without worrying about access-context
> >>> partitioning.
> >>>
> >>> Also, the other problem here is that the function I used as an example
> >>> is a very common C++ idiom. There are a lot of classes with function
> >>> that return a reference to themselves. Classes can have lots of data
> >>> members, and those members might not be accessed on the device (even if
> >>> the class itself might be accessed on the device). We're moving to a
> >>> world in which unified memory is common - the promise of this technology
> >>> is that configuration data and complex data structures, which might be
> >>> occasionally accessed (but for which explicitly managing data movement
> >>> is not performance relevant) are handled transparently. If use of these
> >>> data structures is transitively poisoned by use of any type not
> >>> supported on the device (including by pointers to types that use those
> >>> types), then we'll force unhelpful and technically-unnecessary
> >>> refactoring, thus reducing the value of the feature.
> >>>
> >>> In the current implementation we pre-process the source twice, and so we
> >>> can:
> >>>
> >>>  1. Use ifdefs to change the data memebers when compiling for different
> >>> targets. This is hard to get right because, in order to keep the data
> >>> layout otherwise the same, the user needs to understand the layout rules
> >>> in order to put something in the structure that is supported on the
> >>> target and keeps the layout the same (this is very error prone). Also,
> >>> if we move to a single-preprocessing-stage model, this no longer works.
> >>>
> >>>  2. Replace all pointers to relevant types with void*, or similar, and
> >>> use a lot of casts. This is also bad.
> >>>
> >>> We shouldn't be forcing users to play these games. The compiler knows
> >>> the layout on the host and it can use it on the target. The fact that
> >>> some operations on some types might not be supported on the target is
> >>> not relevant to handling pointers/references to containing types.
> >>>
> >>> Thanks again,
> >>>
> >>> Hal
> >>>
> >>>
> >>>> -------------
> >>>> Best regards,
> >>>> Alexey Bataev
> >>>>
> >>>> 17.01.2019 10:50, Finkel, Hal J. пишет:
> >>>>> On 1/17/19 9:27 AM, Alexey Bataev wrote:
> >>>>>> It should be compilable for the device only iff function foo is not used
> >>>>>> on the device.
> >>>>> Says whom? I disagree. This function should work on the device. Why
> >>>>> should it not?
> >>>>>
> >>>>>  -Hal
> >>>>>
> >>>>>
> >>>>>> -------------
> >>>>>> Best regards,
> >>>>>> Alexey Bataev
> >>>>>>
> >>>>>> 17.01.2019 10:24, Finkel, Hal J. пишет:
> >>>>>>> On 1/17/19 4:05 AM, Alexey Bataev wrote:
> >>>>>>>> Best regards,
> >>>>>>>> Alexey Bataev
> >>>>>>>>
> >>>>>>>>> 17 янв. 2019 г., в 0:46, Finkel, Hal J. <hfinkel at anl.gov> написал(а):
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 1/16/19 8:45 AM, Alexey Bataev wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes, I thought about this. But we need to delay the diagnostic until
> >>>>>>>>>> the Codegen phase. What I need is the way to associate the diagnostic
> >>>>>>>>>> with the function so that this diagnostic is available in CodeGen.
> >>>>>>>>>>
> >>>>>>>>>> Also, we need to postpone the diagnotics not only for functions,
> >>>>>>>>>> but,for example, for some types. For example, __float128 type is not
> >>>>>>>>>> supported by CUDA. We can get error messages when we ran into
> >>>>>>>>>> something like `typedef __float128 SomeOtherType` (say, in some system
> >>>>>>>>>> header files) and get the error diagnostic when we compile for the
> >>>>>>>>>> device. Though, actually, this type is not used in the device code,
> >>>>>>>>>> the diagnostic is still emitted and we need to delay too and emit it
> >>>>>>>>>> only iff the type is used in the device code.
> >>>>>>>>>>
> >>>>>>>>> This should be fixed for CUDA too, right?
> >>>>>>>>>
> >>>>>>>>> Also, we still get to have pointers to aggregates containing those types
> >>>>>>>>> on the device, right?
> >>>>>>>>>
> >>>>>>>> No, why? This is not allowed and should be diagnosed too. If somebody tries somehow to use not allowed type for the device variables/functions - it should be diagnosed.
> >>>>>>> Because this should be allowed. If I have:
> >>>>>>>
> >>>>>>> struct X {
> >>>>>>>   int a;
> >>>>>>>   __float128 b;
> >>>>>>> };
> >>>>>>>
> >>>>>>> and we have some function which does this:
> >>>>>>>
> >>>>>>> X *foo(X *x) {
> >>>>>>>   return x;
> >>>>>>> }
> >>>>>>>
> >>>>>>> We'll certainly want this function to compile for all targets, even if
> >>>>>>> there's no __float128 support on some accelerator. The whole model only
> >>>>>>> really makes sense if the accelerator shares the aggregate-layout rules
> >>>>>>> of the host, and this is a needless hassle for users if this causes an
> >>>>>>> error (especially in a unified-memory environment where configuration
> >>>>>>> data structures, etc. are shared between devices).
> >>>>>>>
> >>>>>>> Thanks again,
> >>>>>>>
> >>>>>>> Hal
> >>>>>>>
> >>>>>>>
> >>>>>>>>> Thanks again,
> >>>>>>>>>
> >>>>>>>>> Hal
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -------------
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Alexey Bataev
> >>>>>>>>>> 15.01.2019 17:33, John McCall пишет:
> >>>>>>>>>>>> On 15 Jan 2019, at 17:20, Alexey Bataev wrote:
> >>>>>>>>>>>> This is not only for asm, we need to delay all target-specific
> >>>>>>>>>>>> diagnostics.
> >>>>>>>>>>>> I'm not saying that we need to move the host diagnostic, only the
> >>>>>>>>>>>> diagnostic for the device compilation.
> >>>>>>>>>>>> As for Cuda, it is a little but different. In Cuda the programmer
> >>>>>>>>>>>> must explicitly mark the device functions,  while in OpenMP it must
> >>>>>>>>>>>> be done implicitly. Thus, we cannot reuse the solution used for Cuda.
> >>>>>>>>>>> All it means is that you can't just use the solution used for CUDA
> >>>>>>>>>>> "off the shelf".  The basic idea of associating diagnostics with the
> >>>>>>>>>>> current function and then emitting those diagnostics later when you
> >>>>>>>>>>> realize that you have to emit that function is still completely
> >>>>>>>>>>> applicable.
> >>>>>>>>>>>
> >>>>>>>>>>> John.
> >>>>>>>>> -- 
> >>>>>>>>> Hal Finkel
> >>>>>>>>> Lead, Compiler Technology and Programming Languages
> >>>>>>>>> Leadership Computing Facility
> >>>>>>>>> Argonne National Laboratory
> >>>>>>>>>




-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190225/2fd2e1b2/attachment.sig>


More information about the cfe-dev mailing list