[cfe-dev] [RFC] Delayed target-specific diagnostic when compiling for the devices.

Thu Jan 17 07:20:08 PST 2019

On 1/17/19 4:01 AM, Alexey Bataev wrote:
> We cannot build the call graph during parsing. The function may be parsed already, but it may be called later.

Calling this "during parsing" maybe a bit over simplistic - it also
needs to deal with template instantiation, etc. However, the
parse-before-call is not a problem because all functions need to be
declared before being called, so we always have a way to record the
information (that seems to be the logic of the current approach).

> I've come to an idea how to reuse the existing solution for Cuda for our purposes without the redesign and significant changes. Need some time to try to implement a prototype.
> Just one question though. Should I reuse the existing classes/objects for Cuda or better to add the specific ones for OpenMP, but almost with the same behavior? Or just rename the existing Cuda classes to be more common, like CudaDeferredDiags to something like DeviceDeferredDiags?

I definitely vote for renaming and generalizing the current
functionality unless there's some strong reason not to do so.

 -Hal

>
> Best regards,
> Alexey Bataev
>
>> 17 янв. 2019 г., в 0:42, Finkel, Hal J. <hfinkel at anl.gov> написал(а):
>>
>>
>>> On 1/15/19 5:57 PM, Alexey Bataev wrote:
>>> Could you reference such functions in the initializer? Could you call them indirectly? Could you take their addresses?
>>
>> It seems like the answer is yes - the CUDACallGraph is built by
>> Sema::CheckCUDACall, and that's called by Sema::DiagnoseUseOfDecl.
>>
>>
>>> Or they just can be directly called from other functions?
>>>
>>> Best regards,
>>> Alexey Bataev
>>>
>>>> 15 янв. 2019 г., в 18:16, Alexey Bataev <a.bataev at outlook.com> написал(а):
>>>>
>>>> Because currently this kind of the analysis is implemented in Codegen and Codegen decides, which function should be emitted and which is not.
>>>> To implement it in Sema, we'll need to reimpelement almost everything from the codegen, because we will need to analyse all the statements in all functions. It significantly increases the compilation time.
>>
>> Interestingly, it seems that this is exactly what is done for
>> host-device functions for CUDA. How much this increases compile time I
>> don't know, but given the way that the call graph is built during
>> initial parsing, it doesn't obviously incur the cost of a full second
>> walk. This seems better than waiting until CodeGen.
>>
>>  -Hal
>>
>>
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>>>> 15 янв. 2019 г., в 18:01, John McCall <jmccall at apple.com> написал(а):
>>>>>>
>>>>>> On 15 Jan 2019, at 17:58, Alexey Bataev wrote:
>>>>>> __host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device. You know, that the user marked those functions as the device functions. In OpenMP, you cannot say before the codegen phase whether the function is used on the device or not. We should not emit all the functions available, only those, which are used (implicitly or explicitly, directly or indirectly) in the target regions.
>>>>> I don't see why you couldn't do that analysis in Sema.
>>>>>
>>>>> John.
>> -- 
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory