[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

Wed Aug 8 06:39:07 PDT 2018

Hahnfeld added a comment.

In https://reviews.llvm.org/D47849#1192134, @gtbercea wrote:

> This patch is concerned with calling device functions when you're on the device. The correctness issues you mention are orthogonal to this and should be handled by another patch. I don't think this patch should be held up any longer.

I'm confused by now, could you please highlight the point that I'm missing?

IIRC you started to work on this to fix the problem with inline assembly (see https://reviews.llvm.org/D47849#1125019). AFAICS this patch fixes declarations of math functions but you still cannot include `math.h` which most "correct" codes do.

In https://reviews.llvm.org/D47849#1170670, @tra wrote:

> The rumors of "high performance" functions in the libdevice are somewhat exaggerated , IMO. If you take a look at the IR in the libdevice of recent CUDA version, you will see that a lot of the functions just call their llvm counterpart. If it turns out that in some case llvm generates slower code than what nvidia provides, I'm sure it will be possible to implement a reasonably fast replacement.

So regarding performance it's not yet clear to me which cases actually benefit: Is there a particular function that is slow if LLVM's backend resolves the call vs. the wrapper script directly calls libdevice?
If I understand @tra's comment correctly, I think we should have clear evidence (ie a small "benchmark") that this patch actually improves performance.

Repository:
  rC Clang

https://reviews.llvm.org/D47849