<div dir="ltr"><div dir="ltr">> __host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device</div><div dir="ltr"><br></div><div>That is not the case for templated or inline __host__ __device__ functions.  They explicitly are not emitted for host/device unless they are called from a host/device context.  CUDA code relies heavily on this fact.  As a result, you are allowed to do "host-only" things from a __host__ __device__ function so long as it's not codegen'ed for device.  Similarly, you can do "device-only" things from a __host__ __device__ function so long as it's not codegen'ed for host.</div><div><br></div><div>The notion of "deferred diagnostics" in clang's CUDA support is explicitly there to handle the case when we do not know whether or not a __host__ __device__ function must be emitted for host or device and so we don't know whether or not to raise an error when you do a "wrong-side" thing (i.e. you're compiling for device and you did a host-only thing, or you're compiling for host and you did a device-only thing).</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Jan 15, 2019 at 2:58 PM Alexey Bataev <<a href="mailto:a.bataev@outlook.com">a.bataev@outlook.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">__host__ __device__ functions are still device functions and it means that they must be emitted when you compile for the device. You know, that the user marked those functions as the device functions. In OpenMP, you cannot say before the codegen phase whether the function is used on the device or not. We should not emit all the functions available, only those, which are used (implicitly or explicitly, directly or indirectly) in the target regions.<br>

<br>

Best regards,<br>

Alexey Bataev<br>

<br>

>> 15 янв. 2019 г., в 17:34, John McCall <<a href="mailto:jmccall@apple.com" target="_blank">jmccall@apple.com</a>> написал(а):<br>

>> <br>

>> On 15 Jan 2019, at 17:20, Alexey Bataev wrote:<br>

>> This is not only for asm, we need to delay all target-specific diagnostics.<br>

>> I'm not saying that we need to move the host diagnostic, only the diagnostic for the device compilation.<br>

>> As for Cuda, it is a little but different. In Cuda the programmer must explicitly mark the device functions,  while in OpenMP it must be done implicitly. Thus, we cannot reuse the solution used for Cuda.<br>

> <br>

> All it means is that you can't just use the solution used for CUDA "off the shelf".  The basic idea of associating diagnostics with the current function and then emitting those diagnostics later when you realize that you have to emit that function is still completely applicable.<br>

> <br>

> John.<br>

</blockquote></div>