[llvm] [NVPTX] Add TMA bulk tensor copy intrinsics (PR #96083)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 19 13:26:23 PDT 2024


================
@@ -0,0 +1,40 @@
+//===--- NVVMIntrinsicFlags.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// This file contains the definitions of the enumerations and flags
+/// associated with NVVM Intrinsics.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_NVVMINTRINSICFLAGS_H
+#define LLVM_SUPPORT_NVVMINTRINSICFLAGS_H
+
+#include <stdint.h>
+
+namespace llvm {
+namespace nvvm {
+
+enum class CpAsyncBulkTensorLoadMode {
+  TILE = 0,
+  IM2COL = 1,
+};
+
+typedef union {
+  int V;
+  struct {
+    unsigned CacheHint : 1;
+    unsigned MultiCast : 1;
+    unsigned LoadMode : 3; // CpAsyncBulkTensorLoadMode
+    unsigned reserved : 27;
+  } U;
+} CpAsyncBulkTensorFlags;
----------------
Artem-B wrote:

Type-punning union fields is generally not a good idea in C++.
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c183-dont-use-a-union-for-type-punning

Can you elaborate on why we need to do that? AFAICT, we seem to use it to pass just one i32 argument to the instruction node, as opposed to specifying cache/multicast/loadmode separately.

Is there a reason we can not pass each parameter separately?

https://github.com/llvm/llvm-project/pull/96083


More information about the llvm-commits mailing list