[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

Tue Sep 21 23:53:13 PDT 2021

jlebar added a comment.

> One alternative would be to use run-time dispatch, but, given that texture lookup is a single instruction, the overhead would be substantial-to-prohibitive.

I guess I'm confused...  Is the parameter value that we're "overloading" on usually/always a constant?  In that case, there's no overhead with runtime dispatch.  Or, is it not a constant?  In which case, how does nvcc generate a single instruction for this idiom at all?

But then I see `switch` statements in the code, so now I'm extra confused.  :)

Overall, I am unsure of why we need all of this magic.  We can rely on LLVM to optimize away constant integer comparisons, and also even comparisons between string literals.

What specifically would be inefficient if this were a series of "real" overloaded functions, with none of the macros, templates, or builtins?  (Assuming efficiency is the concern here?)

================
Comment at: clang/lib/AST/ExprConstant.cpp:11097

+static int EvaluateTextureOp(const CallExpr *E) {
+  // Sorted list of known operations stuuported by '__nv_tex_surf_handler'
----------------
Write a comment explaining what this function does?

(It seems to...translate a string into an integer?  If so, to me, it's strange that it uses a sorted list for this because...what if I add another function?  Won't that mess up all the numbers?  Anyway, to be clarified in the comment.)

Now that I read more, I see that you don't care about this being a stable mapping etc etc...

I don't really get why this has to be a builtin at all, though.  If it's always a string literal, a simple strcmp will do the job, LLVM can optimize this?  And I'm almost sure you can assert that the char* is always a string literal, so you can guarantee that it's always optimized away.

================
Comment at: clang/lib/AST/ExprConstant.cpp:11098
+static int EvaluateTextureOp(const CallExpr *E) {
+  // Sorted list of known operations stuuported by '__nv_tex_surf_handler'
+  static constexpr StringRef TextureOps[] = {"__isurf1DLayeredread",
----------------
stuuported

================
Comment at: clang/lib/AST/ExprConstant.cpp:11209
+  const StringLiteral *S =
+      dyn_cast<StringLiteral>(E->getArg(0)->IgnoreParenCasts());
+  auto I = llvm::lower_bound(TextureOps, S->getString());
----------------
how do we know the arg is a string constant?  Looking at the builtin def it doesn't seem that we enforce it there.

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:12
+#ifndef __CUDA__
+#error "This file is for CUDA __compilation only."
+#endif
----------------
is `__compilation` intentional?  (Maybe search-and-replace bug?)

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:41
+
+namespace {
+
----------------
what are you trying to accomplish with an anon ns inside a header?

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:41
+
+namespace {
+
----------------
jlebar wrote:
> what are you trying to accomplish with an anon ns inside a header?
I know you wrote it in the commit message, but this file could really use comments, otherwise I'm afraid you are going to be the only human being on the planet who can edit this...

For starters, it seems that the purpose of this file is to define the __nv_tex_surf_handler "function" -- is that right?

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:57-58
+template <> struct __FT<float> {
+  using __bt = float;
+  using __ft = float4;
+};
----------------
I have no idea what bt and ft are supposed to stand for.  "fetch type" and ...?  But __FT stands for "fundamental type" per the comment?

Oh, I found it later, "base type".

I'm all for brevity, but would `__base_ty` and `__fetch_ty` be too long?

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:90
+// Derived base/fetch types for N-element vectors.
+template <class __T> struct __FT {
+  using __bt = decltype(__T::x);
----------------
There are only a limited number of these.  Could we assert that __T is one of the expected vector types, just for readability and maybe to help the next person who tries to edit this?

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:91
+template <class __T> struct __FT {
+  using __bt = decltype(__T::x);
+  using __ft = typename __FT<__bt>::__ft;
----------------
this is c++11-only.  Which, you know what, fine by me.  But might be worth an explicit #error at least?

================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:92
+  using __bt = decltype(__T::x);
+  using __ft = typename __FT<__bt>::__ft;
+};
----------------
I think this is also C++11 syntax

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089