<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 7, 2019 at 1:22 PM Siva Chandra <<a href="mailto:sivachandra@google.com">sivachandra@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Hal,<br>

<br>

You had asked me a question about nvtpx on<br>

<a href="https://reviews.llvm.org/D67867" rel="noreferrer" target="_blank">https://reviews.llvm.org/D67867</a>. I did some homework on that and below<br>

is what I have learnt.<br>

<br>

For CUDA/nvptx, a libc in general might be irrelevant. However, I<br>

learned from Art (copied in this email) that there is a desire to have<br>

a single library of math functions that clang can rely on for the GPU.<br>

So, even if a libc in general could be irrelevant, a subset of the<br>

libc might indeed become relevant for GPUs.<br>

<br>

We want llvm-libc to expose a thin layer of C symbols over the<br>

underlying C++ implementation library. My patch<br>

(<a href="https://reviews.llvm.org/rL373764" rel="noreferrer" target="_blank">https://reviews.llvm.org/rL373764</a>) showcased one way of doing this<br>

for ELF using the section attribute followed by a post-processing<br>

step. We might have to take a different approach for nvptx because ELF<br>

like sections and tooling might not be feasible/available (as there is<br>

no linking phase during GPU-side comiplation for NVIDIA GPUs). Art<br>

explained to me that device code undergoes whole program analysis by<br>

LLVM. Hence, we can provide an explicit C wrapper layer over the C++<br>

implementation library. If source level wrappers are not desirable, we<br>

can consider using IR level aliases (will we have to deal with mangled<br>

names??). This gives the benefit that, while it looks like a normal C<br>

function call from the user's point of view, the whole program<br>

analysis performed by LLVM will eliminate the additional wrapper call<br>

preventing any  performance hits.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:verdana,sans-serif">We're currently using the <a href="https://github.com/llvm-mirror/clang/blob/master/lib/Headers/__clang_cuda_device_functions.h">wrappers in Clang headers</a>, so this proposal should not make things worse.</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">The #1 on my wish list for the standard library is to have libm available to clang/llvm as bitcode library, which would make it possible to re-enable lowering to various library calls in LLVM when we target NVPTX and, possibly, avoid rather precarious dependency on the binary libdevice bitcode blob which comes with CUDA SDK.</div></div><div> </div><div><div class="gmail_default" style="font-family:verdana,sans-serif">AMDGPU folks are also using <a href="https://github.com/RadeonOpenCompute/ROCm-Device-Libs/tree/master/ocml/src">bitcode libraries</a>, so providing standard math library as bitcode may benefit them, too.</div></div><div><br></div><div><div class="gmail_default" style="font-family:verdana,sans-serif">--Artem</div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Thanks,<br>

Siva Chandra<br>

</blockquote></div><br></div>