[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

Justin Lebar via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 24 13:26:32 PDT 2021


jlebar accepted this revision.
jlebar added a comment.

Okay, I give up on the phab interface.  It's unreadable with all the existing
comments and lint errors.

Hope you don't mind comments this way.  I'm just going to put it all in a giant
code block so it doesn't get wrapped or whatever.

  +// __nv_tex_surf_handler() provided by this header as a macro.
  +#define __nv_tex_surf_handler(__op, __ptr, ...)                                \
  +  __cuda_tex::__tex_fetch<__cuda_tex::__Tag<__cuda_tex::__tex_op_hash(__op)>>( \
  +      __ptr, __VA_ARGS__)
  
  ::__cuda_tex
  
  +// Put all functions into anonymous namespace so they have internal linkage.
  
  Say a little more?  Specifically, you want anon ns because this is device code
  and it has to work even without being linked.
  
  (Also, are you sure that plain `inline` doesn't do the right thing?  Like, we
  have lots of CUDA headers that are `inline`'ed without all being in an anon
  ns.)
  
  +// First, we need a perfect hash function and a few constexpr helper functions
  +// for converting a string literal into a numeric value which can be used to
  +// parametrize a template. We can not use string literals for that as that would
  +// require C++20.
  +//
  +// The hash function was generated with 'gperf' and then manually converted into
  +// its constexpr equivalent.
  +//
  +// NOTE: the perfect hashing scheme comes with inherent self-test. If the hash
  +// function has a collision for any of the texture operations, the compilation
  +// will fail due to an attempt to redefine a tag with the same value. If the
  +// header compiles, then the hash function is good enough for the job.
  
  I guess if it has a self-test then that's fine.  Though is this really better
  than a series of `if` statements with strcmp?  I guess I am scared of this kind
  of thing because I did it once in ccache.  I thought I was very clever and got
  a good speedup.  1 year later I found out I'd broken handling of __DATE__ and
  __TIME__.  o.O



================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:26
+#define __nv_tex_surf_handler(__op, __ptr, ...)                                \
+  __cuda_tex::__tex_fetch<__cuda_tex::__Tag<__cuda_tex::__tex_op_hash(__op)>>( \
+      __ptr, __VA_ARGS__)
----------------
`::__cuda_tex` (appears twice)


================
Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:53
+
+// Put all functions into anonymous namespace so they have internal linkage.
+namespace {
----------------
Write a little more?  This looks super-suspicious, but you need it specifically because these are *device* functions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089



More information about the cfe-commits mailing list