[llvm] [NVPTX] Add family-specific architectures support (PR #141899)

Tue Jun 17 11:51:55 PDT 2025

================
@@ -33,20 +33,66 @@ class FeaturePTX<int version>:
    SubtargetFeature<"ptx"# version, "PTXVersion",
                     "" # version,
                     "Use PTX version " # version>;
-
+// NVPTX Architecture Hierarchy and Ordering:
+//
+// GPU architectures: sm_2Y/sm_3Y/sm_5Y/sm_6Y/sm_7Y/sm_8Y/sm_9Y/sm_10Y/sm_12Y
+// ('Y' represents version within the architecture)
+// The architectures have name of form sm_XYz where 'X' represent the generation
+// number, 'Y' represents the version within the architecture, and 'z' represents
+// the optional feature suffix.
+// If X1Y1 <= X2Y2, then GPU capabilities of sm_X1Y1 are included in sm_X2Y2.
+// For example, take sm_90 (9 represents 'X', 0 represents 'Y', and no feature
+// suffix) and sm_103 architectures (10 represents 'X', 3 represents 'Y', and no
+// feature suffix). Since 90 <= 103, sm_90 is compatible with sm_103.
+//
+// The family-specific architectures have 'f' feature suffix and they follow
+// following order:
+// sm_X{Y2}f > sm_X{Y1}f iff Y2 > Y1
+// sm_XY{f} > sm_{XY}{}
+//
+// For example, take sm_100f (10 represents 'X', 0 represents 'Y', and 'f'
+// represents 'z') and sm_103f (10 represents 'X', 3 represents 'Y', and 'f'
+// represents 'z') architectures. Since Y1 < Y2, sm_100f is compatible with
+// sm_103f. Similarly based on the second rule, sm_90 is compatible with sm_103f.
+//
+// The architecture-specific architectures have 'a' feature suffix and they follow
+// following order:
+// sm_XY{a} > sm_XY{f} > sm_{XY}{}
+//
+// For example, take sm_103a (10 represents 'X', 3 represents 'Y', and 'a'
+// represents 'z'), sm_103f, and sm_103 architectures. The sm_103 is compatible
+// with sm_103a and sm_103f, and sm_103f is compatible with sm_103a.
+//
+// Encoding := Arch * 100 + 10 (for 'f') + 1 (for 'a')
+// Arch := X * 10 + Y
----------------
Artem-B wrote:

Sort of, but not quite. 
I'm still proposing to maintain the encoding that allows us to end up with an `f` variant if we mask out `a`, only instead of encoding a and f per digit, encode them per bit.

* sm_100 = 10000 -> 1000 
* sm_100f = 10010 -> 1002
* sm_100a = 10011 -> 1003

In both cases we can make a more generic variant by masking out the more specific parts.

The suggestion is largely cosmetic. Just thought that a whole digit for something that's effectively a flag is a bit of an overkill. I don't have a strong preference the current scheme also works for me.

I'll leave it up to you.


https://github.com/llvm/llvm-project/pull/141899