[PATCH] D114367: [NVPTX] Auto-generate tests for sufrace and texture instructions

Mon Dec 6 09:31:52 PST 2021

asavonic added inline comments.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:16
+#
+# RUN:  %python %s --verify --gen-list=%t.list --llvm-tablegen=%S/../../../include/llvm/IR/IntrinsicsNVVM.td  --inst-tablegen=%S/../../../lib/Target/NVPTX/NVPTXIntrinsics.td
+
----------------
tra wrote:
> `%S` only points to the source directory for the test itself. It does not guarantee that anything outside of it is available.
> E.g. a test could be run remotely, with only the test directory and the tested binaries being shipped.
> While it may work within developer's sandbox, it will break for some users and buildbots.
> 
> We should probably disable this run or figure out how to incorporate the needed info w/o reaching out to the LLVM's source tree.
> E.g. extract the list of instructions in tablegen files now and commit it as the input data for the test.
> Parsing tablegen files themselves will be fragile anyways. If/when we eventually clean that copy/paste mess, we'd need to expand all the tablegen macros in order to figure out the list of instructions they will produce. We may as well scrip that now and just use the results for the tests.
Agree. I added this to make sure that the script covers all intrinsics/instructions we currently have. Lets disable this RUN line for now.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:111
+  if target == "cuda":
+    print("target triple = \"nvptx-unknown-cuda\"\n")
+  elif target == "nvcl":
----------------
tra wrote:
> `nvptx64` would be a better choice as that's what we actually use. 
Done.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:190-192
+  # With 'cuda' environment surface is copied with ld.param, so the
+  # instruction uses a register. For 'nvcl' the instruction uses the
+  # parameter directly.
----------------
tra wrote:
> Interesting. PTX spec is a bit vague about what's OK and not OK to do with function parameters. We do know that `ld.param` is the only way to read them. Taking address forces creation of a local copy. I wonder what surface lookup with direct use of parameter does under the hood -- treats it as a taken address and forces a local copy, too?
PTX spec s5.3 "Texture Sampler and Surface Types" summarizes how surface/texture variables can be used, but does not include a lot of details on how they actually work.
```
Variable definition within global (module) scope and in kernel entry parameter lists. 
```
It does not say this explicitly, but I *guess* that `.surfref` as a kernel parameter works the same as a global `.surfref` and therefore can be used directly.
```
Creating pointers to opaque variables using mov, e.g., mov.u64 reg, opaque_var;. The resulting pointer may be stored to and loaded from memory, passed as a parameter to functions, and de-referenced by texture and surface load, store, and query instructions, but the pointer cannot otherwise be treated as an address, i.e., accessing the pointer with ld and st instructions, or performing pointer arithmetic will result in undefined results. 
```
However, when we take a "pointer" to a surface/texture as a register, it now works like any other type and needs an `ld.param` if it is passed to a function.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:207-210
+    # When a parameter is lowered as a .surfref, it still has the
+    # corresponding ld.param.u64, which is illegal. Do not emit the
+    # metadata to keep the parameter as .b64 instead.
+    has_surface_param = False
----------------
tra wrote:
> Can you elaborate? an example on godbolt.org would be helpful.
If `has_surface_param` is set, we emit the metadata that turns the first parameter into a `.surfref` instead of `.u64`. However, NVPTX specifically keeps `ld.param` instructions when compiling for CUDA target (see NVPTXReplaceImageHandles.cpp:1809), and ptxas does not accept this.
```
.visible .entry test_suld_1db8trap_param(
	.param .surfref test_suld_1db8trap_param_param_0,
	.param .u64 test_suld_1db8trap_param_param_1,
	.param .u32 test_suld_1db8trap_param_param_2
)
{
[...]
  // ptxas: error   : Illegal operand type to instruction 'ld'
  ld.param.u64 	%rd1, [test_suld_1db8trap_param_param_0];
[...]
  suld.b.1d.b8.trap {%rs1}, [%rd1, {%r1}];
[...]
}
```
By not emitting the metadata, we lower the argument as a `.u64` and it can now be used with `ld.param`. I think this is considered to be an indirect access to a surface (PTX spec s5.3 Texture Sampler and Surface Types).
```
.visible .entry test_suld_1db8trap_param(
	.param .u64 test_suld_1db8trap_param_param_0,
	.param .u64 test_suld_1db8trap_param_param_1,
	.param .u32 test_suld_1db8trap_param_param_2
)
{
[...]
  ld.param.u64 	%rd1, [test_suld_1db8trap_param_param_0];
[...]
  suld.b.1d.b8.trap {%rs1}, [%rd1, {%r1}];
[...]
}
```

For NVCL environment the `ld.param` instruction is not generated and the `.surfref` parameter is used directly:
```
.entry test_suld_1db8trap_param(
	.param .surfref test_suld_1db8trap_param_param_0,
	.param .u64 .ptr .align 2 test_suld_1db8trap_param_param_1,
	.param .u32 test_suld_1db8trap_param_param_2
)
{
[...]
  suld.b.1d.b8.trap {%rs1}, [test_suld_1db8trap_param_param_0, {%r1}];
[...]
}
```
To be honest, I don't know if there is any reason for this difference between CUDA and NVCL handling. Maybe the check at NVPTXReplaceImageHandles.cpp:1809 is overly restrictive and can be improved to allow both variants (.surfref and .u64).

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:485-486
+    #
+    # So if I'm reading this correctly, the type should be {%rN, %rM},
+    # i.e. v2b32, but NVPTX generates it as {%rN, %fM}.
+    #
----------------
tra wrote:
> `b32` does not prescribe any specific type, so f32 registers are fine to use, too. Having a mix of registers in what's notionally a vector does seem odd, but it does make sense for NVIDIA GPUs. The actual registers are type-agnostic and can be used for both FP and integer ops.
Thank you for clarifying this. Removed the FIXME.

================
Comment at: llvm/test/CodeGen/NVPTX/surf-tex.py:606
+
+    # FIXME: missing intrinsics.
+    # Multi-sample textures and multi-sample texture arrays
----------------
tra wrote:
> We didn't touch texture/surface intrinsics since the original code drop by NVIDIA. We are likely to be missing a bunch of newer ones, added by NVIDIA since then.
> Whether it's worth bothering to add them is the question. Unless we have specific use case for them in LLVM (e.g. if we can use them to optimize something), it may be easier to just implement the instructions in clang's headers as inline asm.
> 
> 
I see. We can keep these FIXMEs here as a documentation on what is not supported.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114367/new/

https://reviews.llvm.org/D114367