[llvm] [NVPTX] Add TMA bulk tensor copy intrinsics (PR #96083)

Fri Jul 19 13:26:23 PDT 2024

================
@@ -0,0 +1,40 @@
+//===--- NVVMIntrinsicFlags.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// This file contains the definitions of the enumerations and flags
+/// associated with NVVM Intrinsics.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_NVVMINTRINSICFLAGS_H
+#define LLVM_SUPPORT_NVVMINTRINSICFLAGS_H
+
+#include <stdint.h>
+
+namespace llvm {
+namespace nvvm {
+
+enum class CpAsyncBulkTensorLoadMode {
+  TILE = 0,
+  IM2COL = 1,
+};
+
+typedef union {
+  int V;
+  struct {
+    unsigned CacheHint : 1;
+    unsigned MultiCast : 1;
+    unsigned LoadMode : 3; // CpAsyncBulkTensorLoadMode
+    unsigned reserved : 27;
+  } U;
+} CpAsyncBulkTensorFlags;
----------------
Artem-B wrote:

Type-punning union fields is generally not a good idea in C++.
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c183-dont-use-a-union-for-type-punning

Can you elaborate on why we need to do that? AFAICT, we seem to use it to pass just one i32 argument to the instruction node, as opposed to specifying cache/multicast/loadmode separately.

Is there a reason we can not pass each parameter separately?

https://github.com/llvm/llvm-project/pull/96083