[PATCH] D19990: [CUDA] Implement __ldg using intrinsics.

Mon May 9 10:22:10 PDT 2016

jlebar added a comment.

Art pointed me to the fact that CUDA 8 adds a bunch more load intrinsics, and I said ohmygosh maybe we *do* want to do the variadic intrinsic thing here.

But now looking at how __builtin_add_overflow is implemented, we'd need special sema checking to make it work.  We would also need some sort of argument promotion logic to make the value and pointer into the same types.  In both cases it seems like maybe it's better to leave this stuff to clang, rather than trying to write a buggy implementation ourselves?

Even with the many new load intrinsics, listing all the intrinsics is a relatively small part of the code required.  The majority of the code necessary is in our CUDA header, but even with a variadic builtin, that would be hard to reduce without some serious template magic, and that would be doubly difficult to do without exposing crummy diagnostics to users.

What do you all think?

http://reviews.llvm.org/D19990