[llvm] [NVPTX] Update architecture support checks for tcgen05 intrinsics (PR #161519)

Fri Oct 3 13:03:07 PDT 2025

================
@@ -99,6 +99,45 @@ def PrmtMode : Operand<i32> {
 // NVPTX Instruction Predicate Definitions
 //===----------------------------------------------------------------------===//
 
+// Helper predicate to compose multiple predicates.
+class AnyPred<list<Predicate> Preds>
+    : Predicate<"(" #
+                !interleave(!foreach(pred, Preds, pred.CondString),
+                            ") || (") #
+                ")">;
+
+// Checks PTX version and family-specific and architecture-specific SM versions.
+// For example, sm_100{f/a} and any future variants in the same family will match.
+class PTXWithFamilySMs<int PTXVersion, list<int> SMVersions> :
+  Predicate<"Subtarget->getPTXVersion() >= " # PTXVersion #
+            " && Subtarget->hasFamilySpecificFeatures()" #
+            " && (" #
+            !interleave(!foreach(sm, SMVersions,
+                        "(Subtarget->getSmFamilyVersion() == " # !div(sm, 10) #
+                        " && Subtarget->getSmVersion() >= " # sm # ")"),
+                        " || ") #
+            ")">;
+
+// Checks PTX version and architecture-specific SM versions.
+// For example, sm_100{a} will match.
+class PTXWithAccelSMs<int PTXVersion, list<int> SMVersions> :
+  Predicate<"Subtarget->getPTXVersion() >= " # PTXVersion #
+            " && Subtarget->hasArchAccelFeatures()" #
+            " && (" #
+            !interleave(!foreach(sm, SMVersions,
+                        "Subtarget->getSmVersion() == " # sm),
+                        " || ") #
+            ")">;
+
+// Helper predicate to call a subtarget method.
+class callSubtarget<string SubtargetMethod> : Predicate<"Subtarget->" # SubtargetMethod # "()">;
+
+// Composed predicate to check tcgen05.shift instructions support.
+def hasTcgen05ShiftSupport : AnyPred<[
+                                  PTXWithAccelSMs<90, [100, 110, 103]>,
----------------
Artem-B wrote:

I'm not sure I like full enumeration as a general approach. While it does get the job done now, we'll need to keep updating it for every new PTX and SM variant we add.

On an optimistic assumption that NVIDIA is not going to rename GPU variants again, we can probably consider sm_101/sm_110 a special case, and figure out how to generalize other instruction availability cases.

Ideally we want to maintain some sort of automatic applicability where we can.

That said, I'm not quite sure where NVIDIA is going with the "accelerated arch" specific instructions. First, they were only for that particular arch, then they actually got supported by other GPU variants, then NVIDIA came up with `f` variant available within the major architecture. Perhaps tcgen05 will never show up in the future generations and we will never need to update these predicates. But I suspect we will need to do so for as long as CUDA versions keep supporting Blackwell, and that's going to be a while.

I think we need to make the check on the first PTX version to be ">=", so it automatically works for the newer PTX versions without us having to add new entries to the list.

https://github.com/llvm/llvm-project/pull/161519