[llvm] [NVPTX] Add TMA bulk tensor copy intrinsics (PR #96083)

Mon Jul 22 11:00:13 PDT 2024

================
@@ -0,0 +1,40 @@
+//===--- NVVMIntrinsicFlags.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// This file contains the definitions of the enumerations and flags
+/// associated with NVVM Intrinsics.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_NVVMINTRINSICFLAGS_H
+#define LLVM_SUPPORT_NVVMINTRINSICFLAGS_H
+
+#include <stdint.h>
+
+namespace llvm {
+namespace nvvm {
+
+enum class CpAsyncBulkTensorLoadMode {
+  TILE = 0,
+  IM2COL = 1,
+};
+
+typedef union {
+  int V;
+  struct {
+    unsigned CacheHint : 1;
+    unsigned MultiCast : 1;
+    unsigned LoadMode : 3; // CpAsyncBulkTensorLoadMode
+    unsigned reserved : 27;
+  } U;
+} CpAsyncBulkTensorFlags;
----------------
Artem-B wrote:

OK. It's a reasonable approach, though multiple intrinsics would not be unreasonable, either, especially if we could automatically generate them via tablegen. PTX instructions do tend to be ... rather non-unifirm in their variants, so I can see the attractiveness of just passing a single opaque value and handling its bits as needed.
My concern was mostly about the mechanics of conversion between the opaque integer and the bits. Considering that the value never leaves the boundaries of the compilation, we do not need to worry about the bitfield order in the integer, but I's still prefer to do it without type-punning.

AFAICT, before c++20 the only valid way to convert between the union fields is via memcpy. In c++20 one would use `std::bitcast` to do the job, but we're not there yet, as LLVM build with c++17 at the moment.

Alternatively, you could use a class/struct with the fields, and define explicit methods to convert to/from an integer, and avoid the type-punning issues altogether.


https://github.com/llvm/llvm-project/pull/96083