[llvm] [NVPTX][Docs] [NFC] Update docs on intrinsics (PR #133136)

Durgadoss R via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 2 09:04:03 PDT 2025


https://github.com/durga4github updated https://github.com/llvm/llvm-project/pull/133136

>From 9976a6f6f2603b71a1eade990cc303ecbde127f5 Mon Sep 17 00:00:00 2001
From: Durgadoss R <durgadossr at nvidia.com>
Date: Mon, 24 Feb 2025 19:52:28 +0530
Subject: [PATCH] [NVPTX][Docs] [NFC] Update docs on intrinsics

Recently, we have added a set of complex intrinsics
on TMA, tcgen05 and Cvt family of instructions.

This patch captures the key learnings from our
experience so far and documents them as guidelines
for future design.

Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
---
 llvm/include/llvm/IR/IntrinsicsNVVM.td | 78 ++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 80e10f33b770d..3e9588a515c9e 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -10,6 +10,84 @@
 //
 //===----------------------------------------------------------------------===//
 
+//===----------------------------------------------------------------------===//
+// Guidelines on NVPTX Intrinsic design
+//===----------------------------------------------------------------------===//
+// 
+// The NVPTX intrinsics are used to model instructions in the PTX ISA.
+// While simpler intrinsics can represent certain features effectively,
+// more complex instructions like TMA and MMA are not as straightforward
+// to model. A single variant of these complex instructions can expand
+// into hundreds of intrinsics. Additionally, any expansion in the
+// corresponding ISA can exponentially increase these numbers, making it
+// difficult to manage them in the IR and backend passes. Therefore,
+// a careful design of intrinsic interfaces can ease maintenance and
+// contribute to a sustainable, long-term solution.
+//
+// The default approach is to have a 1:1 match between the intrinsic and
+// the instruction where the instruction suffixes map to the intrinsic name
+// and the instruction arguments map to the intrinsic arguments or return
+// value.
+//
+// However, when there are too many instruction/intrinsic variants like
+// the TMA/MMA family, it is desirable to encode some variants as a
+// constant argument, referred to as 'flags'.
+// TODO: Add a guideline to quantify the metric on 'how many intrinsics' here.
+//
+// Below are a set of guidelines that may help in choosing
+// an appropriate design for the complex intrinsics:
+// 
+// 1. Each flag argument represents one set of instruction modifiers.
+//    These flags are compile-time integer constants.
+// 
+// 2. When an intrinsic uses flags, document it with details of the
+//    flag usage in the ``NVPTXUsage.rst`` file.
+// 3. Annotate all flag arguments with ImmArg<ArgIdx<>>.
+// 4. Place the flag arguments at the end of the (actual)argument list.
+// 
+// 5. Use `i1` for boolean flags and `i8` for others. Usually,
+//    the `i8` types represent an `enum` encoding the family of
+//    modifiers.
+// 6. Note that, the specific variant for non-boolean flags may not be
+//    obvious in the IR. So, maintain consistency between the enum value
+//    definitions and their usage in the backend.
+//    * Provide a meaningful default value in the enums wherever applicable.
+//    * TODO: Investigate auto-upgrade capability for intrinsics
+//      when only flag value mappings change.
+//
+// 7. Identify the key features of an intrinsic and distinguish between
+//    first-order and supplementary information. Typically, encoding the
+//    first-order information in the intrinsic name while using flags
+//    for supplementary details improves readability.
+//    For example:
+// 
+//    i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature,
+//    whereas an optional scaling applied to matrices is relatively secondary.
+// 
+//    ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order
+//    information, while features like an optional cache hint tend to be
+//    secondary.
+// 
+// 8. If there are invalid combinations within a set of modifiers, avoid
+//    encoding them as flags, as much as possible. This helps reduce the
+//    need for error handling of unsupported cases in the backend.
+//    For example, some 'cvt' intrinsics support only a subset of the
+//    possible rounding modes; so it is preferable not to encode the
+//    rounding modes as flags.
+// 9. Similarly, when there are invalid combinations across a set of
+//    modifiers, avoid encoding them as flags to prevent additional
+//    complexity in error handling.
+// 
+// 10. Maintain a consistent design within an intrinsic family, including
+//     argument ordering as well as the usage and ordering of flags.
+// 11. When designing an intrinsic corresponding to an instruction or its variant,
+//     consider the entire instruction family. This may reveal common features
+//     that can be modelled consistently across the family.
+// 
+// In summary, strive to balance the aspects mentioned above, to achieve
+// a scalable design with maximum readability.
+//===----------------------------------------------------------------------===//
+
 // The following intrinsics were once defined here, but are now auto-upgraded
 // to target-generic LLVM intrinsics.
 //



More information about the llvm-commits mailing list