[PATCH] D114367: [NVPTX] Auto-generate tests for sufrace and texture instructions

Mon Nov 29 11:58:23 PST 2021

tra added a comment.

Looks good overall. My main concern is with the test reaching out to LLVM sources and parsing tablegen files.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:16
+#
+# RUN:  %python %s --verify --gen-list=%t.list --llvm-tablegen=%S/../../../include/llvm/IR/IntrinsicsNVVM.td  --inst-tablegen=%S/../../../lib/Target/NVPTX/NVPTXIntrinsics.td
+
----------------
`%S` only points to the source directory for the test itself. It does not guarantee that anything outside of it is available.
E.g. a test could be run remotely, with only the test directory and the tested binaries being shipped.
While it may work within developer's sandbox, it will break for some users and buildbots.

We should probably disable this run or figure out how to incorporate the needed info w/o reaching out to the LLVM's source tree.
E.g. extract the list of instructions in tablegen files now and commit it as the input data for the test.
Parsing tablegen files themselves will be fragile anyways. If/when we eventually clean that copy/paste mess, we'd need to expand all the tablegen macros in order to figure out the list of instructions they will produce. We may as well scrip that now and just use the results for the tests.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:111
+  if target == "cuda":
+    print("target triple = \"nvptx-unknown-cuda\"\n")
+  elif target == "nvcl":
----------------
`nvptx64` would be a better choice as that's what we actually use. 

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:190-192
+  # With 'cuda' environment surface is copied with ld.param, so the
+  # instruction uses a register. For 'nvcl' the instruction uses the
+  # parameter directly.
----------------
Interesting. PTX spec is a bit vague about what's OK and not OK to do with function parameters. We do know that `ld.param` is the only way to read them. Taking address forces creation of a local copy. I wonder what surface lookup with direct use of parameter does under the hood -- treats it as a taken address and forces a local copy, too?

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:207-210
+    # When a parameter is lowered as a .surfref, it still has the
+    # corresponding ld.param.u64, which is illegal. Do not emit the
+    # metadata to keep the parameter as .b64 instead.
+    has_surface_param = False
----------------
Can you elaborate? an example on godbolt.org would be helpful.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:485-486
+    #
+    # So if I'm reading this correctly, the type should be {%rN, %rM},
+    # i.e. v2b32, but NVPTX generates it as {%rN, %fM}.
+    #
----------------
`b32` does not prescribe any specific type, so f32 registers are fine to use, too. Having a mix of registers in what's notionally a vector does seem odd, but it does make sense for NVIDIA GPUs. The actual registers are type-agnostic and can be used for both FP and integer ops.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:606
+
+    # FIXME: missing intrinsics.
+    # Multi-sample textures and multi-sample texture arrays
----------------
We didn't touch texture/surface intrinsics since the original code drop by NVIDIA. We are likely to be missing a bunch of newer ones, added by NVIDIA since then.
Whether it's worth bothering to add them is the question. Unless we have specific use case for them in LLVM (e.g. if we can use them to optimize something), it may be easier to just implement the instructions in clang's headers as inline asm.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114367/new/

https://reviews.llvm.org/D114367