[PATCH] D19990: [CUDA] Implement __ldg using intrinsics.
Justin Lebar via cfe-commits
cfe-commits at lists.llvm.org
Thu May 5 13:43:19 PDT 2016
jlebar added inline comments.
================
Comment at: include/clang/Basic/BuiltinsNVPTX.def:569-603
@@ -568,1 +568,37 @@
+// __ldg. This is not implemented as a builtin by nvcc.
+BUILTIN(__nvvm_ldg_c, "ccC*", "")
+BUILTIN(__nvvm_ldg_s, "ssC*", "")
+BUILTIN(__nvvm_ldg_i, "iiC*", "")
+BUILTIN(__nvvm_ldg_l, "LiLiC*", "")
+BUILTIN(__nvvm_ldg_ll, "LLiLLiC*", "")
+
+BUILTIN(__nvvm_ldg_uc, "UcUcC*", "")
+BUILTIN(__nvvm_ldg_us, "UsUsC*", "")
+BUILTIN(__nvvm_ldg_ui, "UiUiC*", "")
+BUILTIN(__nvvm_ldg_ul, "ULiULiC*", "")
+BUILTIN(__nvvm_ldg_ull, "ULLiULLiC*", "")
+
+BUILTIN(__nvvm_ldg_f, "ffC*", "")
+BUILTIN(__nvvm_ldg_d, "ddC*", "")
+
+BUILTIN(__nvvm_ldg_c2, "E2cE2cC*", "")
+BUILTIN(__nvvm_ldg_c4, "E4cE4cC*", "")
+BUILTIN(__nvvm_ldg_s2, "E2sE2sC*", "")
+BUILTIN(__nvvm_ldg_s4, "E4sE4sC*", "")
+BUILTIN(__nvvm_ldg_i2, "E2iE2iC*", "")
+BUILTIN(__nvvm_ldg_i4, "E4iE4iC*", "")
+BUILTIN(__nvvm_ldg_ll2, "E2LLiE2LLiC*", "")
+
+BUILTIN(__nvvm_ldg_uc2, "E2UcE2UcC*", "")
+BUILTIN(__nvvm_ldg_uc4, "E4UcE4UcC*", "")
+BUILTIN(__nvvm_ldg_us2, "E2UsE2UsC*", "")
+BUILTIN(__nvvm_ldg_us4, "E4UsE4UsC*", "")
+BUILTIN(__nvvm_ldg_ui2, "E2UiE2UiC*", "")
+BUILTIN(__nvvm_ldg_ui4, "E4UiE4UiC*", "")
+BUILTIN(__nvvm_ldg_ull2, "E2ULLiE2ULLiC*", "")
+
+BUILTIN(__nvvm_ldg_f2, "E2fE2fC*", "")
+BUILTIN(__nvvm_ldg_f4, "E4fE4fC*", "")
+BUILTIN(__nvvm_ldg_d2, "E2dE2dC*", "")
+
----------------
jlebar wrote:
> majnemer wrote:
> > Would it be crazy to instead provide a generic builtin? Would cut down on the number of variants...
> >
> > `__builtin_add_overflow` is an example of such a builtin.
> Art is going to send you flowers. :) He and I just had an argument about this.
>
> I think this isn't an unreasonable thing to want, but I think it's beneficial to be consistent with our existing API. So if we offer a generic thing for ldg, it would be nice to have one for atomics above, which are basically the same.
>
> So I told Art I'd prefer to add it to our list.
Oh, another thing is that, you really see the benefit of having a generic builtin when you start hitting the combinatorial explosion of all the different kinds of loads. Like, as-is it's not so bad, but if you want to support all forms of ld.global.nc, there are four different caching behaviors. Supporting all forms of ld is way worse.
Which is to say, if we're going to do the generic thing, it seems like we benefit the most by making it generic on more than the types. But we're not ready to do that; I don't think most of these loads even exist in llvm atm.
http://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
http://reviews.llvm.org/D19990
More information about the cfe-commits
mailing list