[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Sep 22 10:30:09 PDT 2021


tra added a comment.

In D110089#3014388 <https://reviews.llvm.org/D110089#3014388>, @jlebar wrote:

>> One alternative would be to use run-time dispatch, but, given that texture lookup is a single instruction, the overhead would be substantial-to-prohibitive.
>
> I guess I'm confused...  Is the parameter value that we're "overloading" on usually/always a constant?  In that case, there's no overhead with runtime dispatch.  Or, is it not a constant?  In which case, how does nvcc generate a single instruction for this idiom at all?

It's a string literal.  And you're actually right, clang does manage to optimize strcmp with a known value. https://godbolt.org/z/h351hfsMf

However, it's only part of the problem. Depending on which particular operation is used, the arguments vary, too. I still need to use templates that effectively need to be parameterized by that string literal argument and I can't easily do it until C++20.
I'd need to push strcmp-based runtime dispatch down to the implementation of the texture lookups with the same operand signature. That's harder to generalize, as I'd have to implement string-based dispatch for quite a few subsets of the operations -- basically for each variant of cartesian product of `{dimensionality, Lod, Level, Sparse}`.

Another downside is that the string comparison code will result in functions being much larger than necessary. Probably not a big thing overall, but why add overhead that would be paid for by all users and which does not buy us anything? Having one trivial compiler builtin that simplifies things a lot is a better trade-off, IMO.

> But then I see `switch` statements in the code, so now I'm extra confused.  :)

That switch is for a special case of texture lookup which may result in one of four texture instruction variants. All others map 1:1.

> Overall, I am unsure of why we need all of this magic.  We can rely on LLVM to optimize away constant integer comparisons, and also even comparisons between string literals.

It makes it possible to usa a string literal to parameterize templates, which allows to generate variants of `__nv_tex_surf_handler` in a relatively concise way.

> What specifically would be inefficient if this were a series of "real" overloaded functions, with none of the macros, templates, or builtins?  (Assuming efficiency is the concern here?)

It's both efficiency and avoidance of typos in repetitive nearly identical code. 
There are ~2500 variants of high-level texture lookup variants. They end up calling about 600 different `__nv_tex_surf_handler` overloads that, in turn,  end up generating ~70 unique inline assembly variants. 
The current code structure reflects that hierarchy. This is essentially the reason for the parameterization by the operation name happening early, instead of being used as a key for runtime dispatch at the end.



================
Comment at: clang/lib/AST/ExprConstant.cpp:11097
 
+static int EvaluateTextureOp(const CallExpr *E) {
+  // Sorted list of known operations stuuported by '__nv_tex_surf_handler'
----------------
jlebar wrote:
> Write a comment explaining what this function does?
> 
> (It seems to...translate a string into an integer?  If so, to me, it's strange that it uses a sorted list for this because...what if I add another function?  Won't that mess up all the numbers?  Anyway, to be clarified in the comment.)
> 
> Now that I read more, I see that you don't care about this being a stable mapping etc etc...
> 
> I don't really get why this has to be a builtin at all, though.  If it's always a string literal, a simple strcmp will do the job, LLVM can optimize this?  And I'm almost sure you can assert that the char* is always a string literal, so you can guarantee that it's always optimized away.
Yes, it's just a 1:1 map.  We do not care about specific values as they only matter within one TU. I'll document that.

I can't easily use string literal to parameterize a template. 

Hmm. Perhaps I can implement a `constexpr perfect_hash(literal)` in a header. This would eliminate the need for the builtin.
E.g. https://godbolt.org/z/bzzMbaKhe

Let me give it a try.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089



More information about the cfe-commits mailing list