[llvm-branch-commits] [clang] [llvm] AMDGPU: Use module flags to control xnack and sramecc (PR #204595)

via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Thu Jun 18 06:39:32 PDT 2026


llvmorg-github-actions[bot] wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang-driver

Author: Matt Arsenault (arsenm)

<details>
<summary>Changes</summary>

This ensures these ABI details are encoded in the IR module
rather than depending on external state from command-line flags.
Previously, these were encoded as function-level subtarget features.
The code object output was a single target ID directive implied
by the global subtarget. The backend would previously check if a
function's subtarget feature mismatched the global subtarget. This
is avoided by making xnack and sramecc module-level properties from
the start. This also provides proper linker compatibility
enforcement, moving the error point earlier.

The old encoding was also an abuse of the subtarget feature system.
Subtarget features are a bitvector, and later features in the string
can override earlier ones. The old handling added a special case
where explicit settings were preserved: ordinarily +feature,-feature
should result in the feature being disabled, but +xnack,-xnack would
preserve the explicit "-xnack" state, which differs from the absence
of any xnack setting.

The new flags are encoded as 0/1, with the "any" case represented
as the absence of the flag. I considered an explicit tri-state unknown
value, but decided against it.

This also removes warnings when using these module flags on targets
that do not support the corresponding feature. Previously, messages
were written directly to stderr instead of using proper diagnostics.
Avoiding the warning reduces burden on frontends to check which targets
require the flags.

For migration purposes, the subtarget features still exist. Currently,
they are still respected in the various binary tools, pending
disassembler changes to determine target ID modifiers from e_flags.
CodeGen requires using the module flags. An error will be raised when
attempting to use the old global subtarget features. These should be
removed after a migration period for frontends to update. Functionality
wise, bitcode autoupgrade should work. Old bitcode will not have the
flags, resulting in a different target ID in the output binary than
expected, but it should run correctly.

New cl::opts exist only because it was inconvenient to update all
tests using multiple xnack modes. Users should never use these.

Co-Authored-By: Claude Opus 4.6 <noreply@<!-- -->anthropic.com>

---

Patch is 162.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/204595.diff


98 Files Affected:

- (modified) clang/include/clang/Basic/TargetOptions.h (+16) 
- (modified) clang/include/clang/Options/Options.td (+12-4) 
- (modified) clang/lib/Basic/Targets/AMDGPU.cpp (+10) 
- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+21) 
- (modified) clang/lib/Driver/ToolChains/Clang.cpp (+13) 
- (modified) clang/lib/Driver/ToolChains/Clang.h (+2) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+27) 
- (modified) clang/test/CIR/CodeGenHIP/target-features.hip (+9-8) 
- (modified) clang/test/CodeGenCXX/dynamic-cast-address-space.cpp (+2-2) 
- (added) clang/test/CodeGenOpenCL/amdgpu-module-flag-xnack-sramecc.cl (+22) 
- (added) clang/test/CodeGenOpenCL/amdgpu-xnack-any-only.cl (+27) 
- (modified) clang/test/Driver/amdgpu-features.c (+4-4) 
- (modified) clang/test/Driver/amdgpu-openmp-toolchain.c (+1-1) 
- (modified) clang/test/Driver/amdgpu-toolchain.c (+2-2) 
- (modified) clang/test/Driver/amdgpu-xnack-sramecc-flags.c (+36-33) 
- (modified) clang/test/Driver/hip-sanitize-options.hip (+17-17) 
- (modified) clang/test/Driver/hip-target-id.hip (+6-8) 
- (modified) clang/test/Driver/hip-toolchain-features.hip (+6-10) 
- (modified) clang/test/Driver/target-id.cl (+7-3) 
- (modified) llvm/docs/AMDGPUUsage.rst (+39-1) 
- (modified) llvm/docs/ReleaseNotes.md (+2) 
- (modified) llvm/lib/IR/Verifier.cpp (+19) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+33-49) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+45-1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h (+5) 
- (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+1-1) 
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.cpp (+20-7) 
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+5-1) 
- (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+8) 
- (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (+2-4) 
- (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+47-4) 
- (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+13) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/amdpal-callable.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/break-smem-soft-clauses.mir (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/cluster-flat-loads-postra.mir (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/cluster_stores.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll (+52-52) 
- (modified) llvm/test/CodeGen/AMDGPU/elf-header-flags-sramecc.ll (+8-8) 
- (modified) llvm/test/CodeGen/AMDGPU/elf-header-flags-xnack.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-reg.ll (+11-11) 
- (modified) llvm/test/CodeGen/AMDGPU/gfx902-without-xnack.ll (+3-2) 
- (modified) llvm/test/CodeGen/AMDGPU/greedy-reverse-local-assignment.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/hazard-hidden-bundle.mir (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/hazard-in-bundle.mir (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/hsa-metadata-resource-usage-function-ordering.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll (+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/immv216.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/limit-soft-clause-reg-pressure.mir (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll (+3-3) 
- (added) llvm/test/CodeGen/AMDGPU/mattr-xnack-sramecc-legacy.ll (+23) 
- (added) llvm/test/CodeGen/AMDGPU/module-flag-sramecc.ll (+66) 
- (added) llvm/test/CodeGen/AMDGPU/module-flag-xnack-sramecc-combined.ll (+15) 
- (added) llvm/test/CodeGen/AMDGPU/module-flag-xnack.ll (+75) 
- (modified) llvm/test/CodeGen/AMDGPU/nsa-reassign.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/nsa-vmem-hazard.mir (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/occupancy-levels.ll (+4-9) 
- (modified) llvm/test/CodeGen/AMDGPU/post-ra-soft-clause-dbg-info.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/s_addk_i32.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/s_mulk_i32.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/soft-clause-dbg-value.mir (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/sram-ecc-default.ll (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-disabled.ll (+3-2) 
- (modified) llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-enabled.ll (+3-2) 
- (modified) llvm/test/CodeGen/AMDGPU/target-id-xnack-always-on.ll (+7-7) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+5-4) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+2-1) 
- (removed) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-invalid-any-off-on.ll (-24) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-disabled.ll (+3-4) 
- (modified) llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-enabled.ll (+3-4) 
- (added) llvm/test/Linker/Inputs/amdgpu-sramecc-module-flag-0.ll (+6) 
- (added) llvm/test/Linker/Inputs/amdgpu-sramecc-module-flag-1.ll (+6) 
- (added) llvm/test/Linker/Inputs/amdgpu-sramecc-module-flag-any.ll (+6) 
- (added) llvm/test/Linker/Inputs/amdgpu-xnack-module-flag-0.ll (+6) 
- (added) llvm/test/Linker/Inputs/amdgpu-xnack-module-flag-1.ll (+6) 
- (added) llvm/test/Linker/Inputs/amdgpu-xnack-module-flag-any.ll (+6) 
- (added) llvm/test/Linker/amdgpu-sramecc-module-flag-0.ll (+24) 
- (added) llvm/test/Linker/amdgpu-sramecc-module-flag-1.ll (+23) 
- (added) llvm/test/Linker/amdgpu-sramecc-module-flag-any.ll (+25) 
- (added) llvm/test/Linker/amdgpu-xnack-module-flag-0.ll (+24) 
- (added) llvm/test/Linker/amdgpu-xnack-module-flag-1.ll (+23) 
- (added) llvm/test/Linker/amdgpu-xnack-module-flag-any.ll (+25) 
- (modified) llvm/test/MC/AMDGPU/xnack-mask.s (+1-1) 
- (added) llvm/test/Verifier/AMDGPU/module-flag-sramecc.ll (+46) 
- (added) llvm/test/Verifier/AMDGPU/module-flag-xnack.ll (+46) 


``````````diff
diff --git a/clang/include/clang/Basic/TargetOptions.h b/clang/include/clang/Basic/TargetOptions.h
index 51a2cfe14976c..466f0ef22f265 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -92,6 +92,22 @@ class TargetOptions {
   /// \brief AMDGPU Printf lowering scheme
   AMDGPUPrintfKind AMDGPUPrintfKindVal = AMDGPUPrintfKind::Hostcall;
 
+  /// \brief Enumeration values for AMDGPU xnack/sramecc settings
+  enum class AMDGPUFeatureState {
+    /// Feature state not specified and should generate most compatible code.
+    Any = 0,
+    /// Feature explicitly disabled
+    Disabled = 1,
+    /// Feature explicitly enabled
+    Enabled = 2
+  };
+
+  /// \brief AMDGPU xnack setting from -mxnack/-mno-xnack
+  AMDGPUFeatureState AMDGPUXnackState = AMDGPUFeatureState::Any;
+
+  /// \brief AMDGPU sramecc setting from -msramecc/-mno-sramecc
+  AMDGPUFeatureState AMDGPUSramEccState = AMDGPUFeatureState::Any;
+
   // The code model to be used as specified by the user. Corresponds to
   // CodeModel::Model enum defined in include/llvm/Support/CodeGen.h, plus
   // "default" for the case when the user has not explicitly specified a
diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td
index 0a24f8b35a3ea..7e2e69cc33582 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -5879,10 +5879,18 @@ defm cumode : SimpleMFlag<"cumode",
   " execution mode (AMDGPU only)", m_amdgpu_Features_Group>;
 defm tgsplit : SimpleMFlag<"tgsplit", "Enable", "Disable",
   " threadgroup split execution mode (AMDGPU only)", m_amdgpu_Features_Group>;
-defm xnack : SimpleMFlag<"xnack", "Enable", "Disable",
-  " XNACK (AMDGPU only)", m_amdgpu_Features_Group>;
-defm sramecc : SimpleMFlag<"sramecc", "Enable", "Disable",
-  " SRAMECC (AMDGPU only)", m_amdgpu_Features_Group>;
+def mxnack : Flag<["-"], "mxnack">, Group<m_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Enable XNACK (AMDGPU only)">;
+def mno_xnack : Flag<["-"], "mno-xnack">, Group<m_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Disable XNACK (AMDGPU only)">;
+def msramecc : Flag<["-"], "msramecc">, Group<m_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Enable SRAMECC (AMDGPU only)">;
+def mno_sramecc : Flag<["-"], "mno-sramecc">, Group<m_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Disable SRAMECC (AMDGPU only)">;
 defm wavefrontsize64 : SimpleMFlag<"wavefrontsize64",
   "Specify wavefront size 64", "Specify wavefront size 32",
   " mode (AMDGPU only)">;
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index bfa956fa9a4e3..6a87a67a688f9 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -229,6 +229,16 @@ AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple,
       ReadOnlyFeatures.insert(F);
   }
   HalfArgsAndReturns = true;
+
+  if (Opts.AMDGPUXnackState != TargetOptions::AMDGPUFeatureState::Any) {
+    OffloadArchFeatures["xnack"] =
+        Opts.AMDGPUXnackState == TargetOptions::AMDGPUFeatureState::Enabled;
+  }
+
+  if (Opts.AMDGPUSramEccState != TargetOptions::AMDGPUFeatureState::Any) {
+    OffloadArchFeatures["sramecc"] =
+        Opts.AMDGPUSramEccState == TargetOptions::AMDGPUFeatureState::Enabled;
+  }
 }
 
 void AMDGPUTargetInfo::adjust(DiagnosticsEngine &Diags, LangOptions &Opts,
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index ddbebb1d70336..e095716a21f7e 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1163,6 +1163,27 @@ void CodeGenModule::Release() {
       getModule().addModuleFlag(llvm::Module::Error, "amdgpu_printf_kind",
                                 MDStr);
     }
+
+    const TargetOptions &TargetOpts = getTarget().getTargetOpts();
+
+    if (TargetOpts.AMDGPUXnackState != TargetOptions::AMDGPUFeatureState::Any) {
+      // TODO: Avoid emitting the xnack flag on targets which do not support
+      // xnack configuration.
+      getModule().addModuleFlag(
+          llvm::Module::Error, "amdgpu.xnack",
+          llvm::ConstantInt::get(
+              Int32Ty, TargetOpts.AMDGPUXnackState ==
+                           TargetOptions::AMDGPUFeatureState::Enabled));
+    }
+
+    if (TargetOpts.AMDGPUSramEccState !=
+        TargetOptions::AMDGPUFeatureState::Any) {
+      getModule().addModuleFlag(
+          llvm::Module::Error, "amdgpu.sramecc",
+          llvm::ConstantInt::get(
+              Int32Ty, TargetOpts.AMDGPUSramEccState ==
+                           TargetOptions::AMDGPUFeatureState::Enabled));
+    }
   }
 
   // Emit a global array containing all external kernels or device variables
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index e3288c81d4c95..d58b077c85acd 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1535,6 +1535,15 @@ void Clang::AddARMTargetArgs(const llvm::Triple &Triple, const ArgList &Args,
   AddUnalignedAccessWarning(CmdArgs);
 }
 
+void Clang::AddAMDGPUTargetArgs(const ArgList &Args,
+                                ArgStringList &CmdArgs) const {
+  // Pass through -mxnack/-mno-xnack and -msramecc/-mno-sramecc flags to cc1.
+  if (Arg *A = Args.getLastArg(options::OPT_mxnack, options::OPT_mno_xnack))
+    A->render(Args, CmdArgs);
+  if (Arg *A = Args.getLastArg(options::OPT_msramecc, options::OPT_mno_sramecc))
+    A->render(Args, CmdArgs);
+}
+
 void Clang::RenderTargetOptions(const llvm::Triple &EffectiveTriple,
                                 const ArgList &Args, bool KernelOrKext,
                                 ArgStringList &CmdArgs) const {
@@ -1562,6 +1571,10 @@ void Clang::RenderTargetOptions(const llvm::Triple &EffectiveTriple,
     AddAArch64TargetArgs(Args, CmdArgs);
     break;
 
+  case llvm::Triple::amdgcn:
+    AddAMDGPUTargetArgs(Args, CmdArgs);
+    break;
+
   case llvm::Triple::loongarch32:
   case llvm::Triple::loongarch64:
     AddLoongArchTargetArgs(Args, CmdArgs);
diff --git a/clang/lib/Driver/ToolChains/Clang.h b/clang/lib/Driver/ToolChains/Clang.h
index 9adad5c5430f2..23f270ffb76a3 100644
--- a/clang/lib/Driver/ToolChains/Clang.h
+++ b/clang/lib/Driver/ToolChains/Clang.h
@@ -51,6 +51,8 @@ class LLVM_LIBRARY_VISIBILITY Clang : public Tool {
 
   void AddAArch64TargetArgs(const llvm::opt::ArgList &Args,
                             llvm::opt::ArgStringList &CmdArgs) const;
+  void AddAMDGPUTargetArgs(const llvm::opt::ArgList &Args,
+                           llvm::opt::ArgStringList &CmdArgs) const;
   void AddARMTargetArgs(const llvm::Triple &Triple,
                         const llvm::opt::ArgList &Args,
                         llvm::opt::ArgStringList &CmdArgs,
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index d2847739e3143..e8f9bc5a60737 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -5024,6 +5024,18 @@ static void GenerateTargetArgs(const TargetOptions &Opts,
   if (!Opts.DarwinTargetVariantSDKVersion.empty())
     GenerateArg(Consumer, OPT_darwin_target_variant_sdk_version_EQ,
                 Opts.DarwinTargetVariantSDKVersion.getAsString());
+
+  // Generate AMDGPU xnack and sramecc flags.
+  if (Opts.AMDGPUXnackState == TargetOptions::AMDGPUFeatureState::Enabled)
+    GenerateArg(Consumer, OPT_mxnack);
+  else if (Opts.AMDGPUXnackState == TargetOptions::AMDGPUFeatureState::Disabled)
+    GenerateArg(Consumer, OPT_mno_xnack);
+
+  if (Opts.AMDGPUSramEccState == TargetOptions::AMDGPUFeatureState::Enabled)
+    GenerateArg(Consumer, OPT_msramecc);
+  else if (Opts.AMDGPUSramEccState ==
+           TargetOptions::AMDGPUFeatureState::Disabled)
+    GenerateArg(Consumer, OPT_mno_sramecc);
 }
 
 static bool ParseTargetArgs(TargetOptions &Opts, ArgList &Args,
@@ -5055,6 +5067,21 @@ static bool ParseTargetArgs(TargetOptions &Opts, ArgList &Args,
       Opts.DarwinTargetVariantSDKVersion = Version;
   }
 
+  if (Arg *A = Args.getLastArg(options::OPT_mxnack, options::OPT_mno_xnack)) {
+    bool IsEnabled = A->getOption().matches(options::OPT_mxnack);
+    Opts.AMDGPUXnackState = IsEnabled
+                                ? TargetOptions::AMDGPUFeatureState::Enabled
+                                : TargetOptions::AMDGPUFeatureState::Disabled;
+  }
+
+  if (Arg *A =
+          Args.getLastArg(options::OPT_msramecc, options::OPT_mno_sramecc)) {
+    bool IsEnabled = A->getOption().matches(options::OPT_msramecc);
+    Opts.AMDGPUSramEccState = IsEnabled
+                                  ? TargetOptions::AMDGPUFeatureState::Enabled
+                                  : TargetOptions::AMDGPUFeatureState::Disabled;
+  }
+
   return Diags.getNumErrors() == NumErrorsBefore;
 }
 
diff --git a/clang/test/CIR/CodeGenHIP/target-features.hip b/clang/test/CIR/CodeGenHIP/target-features.hip
index 8d414edcd8e2c..afce90caca435 100644
--- a/clang/test/CIR/CodeGenHIP/target-features.hip
+++ b/clang/test/CIR/CodeGenHIP/target-features.hip
@@ -18,17 +18,17 @@
 // only the delta (differing features) is emitted on cir.target-features.
 
 // RUN: %clang_cc1 -triple=amdgcn-amd-amdhsa -x hip -fclangir \
-// RUN:            -fcuda-is-device -target-cpu gfx900 -target-feature +xnack \
+// RUN:            -fcuda-is-device -target-cpu gfx900 -mxnack \
 // RUN:            -emit-cir %s -o %t-delta.cir
 // RUN: FileCheck --check-prefix=CIR-DELTA %s --input-file=%t-delta.cir
 
 // RUN: %clang_cc1 -triple=amdgcn-amd-amdhsa -x hip -fclangir \
-// RUN:            -fcuda-is-device -target-cpu gfx900 -target-feature +xnack \
+// RUN:            -fcuda-is-device -target-cpu gfx900 -mxnack \
 // RUN:            -emit-llvm %s -o %t-delta.cir.ll
 // RUN: FileCheck --check-prefix=LLVM-DELTA %s --input-file=%t-delta.cir.ll
 
 // RUN: %clang_cc1 -triple=amdgcn-amd-amdhsa -x hip \
-// RUN:            -fcuda-is-device -target-cpu gfx900 -target-feature +xnack \
+// RUN:            -fcuda-is-device -target-cpu gfx900 -mxnack \
 // RUN:            -emit-llvm %s -o %t-delta.ll
 // RUN: FileCheck --check-prefix=LLVM-DELTA %s --input-file=%t-delta.ll
 
@@ -49,17 +49,18 @@ __device__ void device_fn() {}
 // LLVM-DAG: attributes #[[K_ATTR]] = {{.*}}"target-cpu"="gfx900"
 // LLVM-DAG: attributes #[[D_ATTR]] = {{.*}}"target-cpu"="gfx900"
 
-// AMDGPU with gfx900 + an extra +xnack feature: only the delta is emitted.
+// AMDGPU with gfx900 + xnack enabled via -mxnack: emitted as module flag.
 
 // CIR-DELTA: cir.func{{.*}} @_Z6kernelv()
 // CIR-DELTA-SAME: "cir.target-cpu" = "gfx900"
-// CIR-DELTA-SAME: "cir.target-features" = "+xnack"
+// CIR-DELTA-NOT: cir.target-features
 
 // CIR-DELTA: cir.func{{.*}} @_Z9device_fnv()
 // CIR-DELTA-SAME: "cir.target-cpu" = "gfx900"
-// CIR-DELTA-SAME: "cir.target-features" = "+xnack"
+// CIR-DELTA-NOT: cir.target-features
 
 // LLVM-DELTA: define{{.*}} void @_Z6kernelv(){{.*}} #[[K_ATTR_D:[0-9]+]]
 // LLVM-DELTA: define{{.*}} void @_Z9device_fnv(){{.*}} #[[D_ATTR_D:[0-9]+]]
-// LLVM-DELTA-DAG: attributes #[[K_ATTR_D]] = {{.*}}"target-cpu"="gfx900"{{.*}}"target-features"="+xnack"
-// LLVM-DELTA-DAG: attributes #[[D_ATTR_D]] = {{.*}}"target-cpu"="gfx900"{{.*}}"target-features"="+xnack"
+// LLVM-DELTA-DAG: attributes #[[K_ATTR_D]] = {{.*}}"target-cpu"="gfx900"
+// LLVM-DELTA-DAG: attributes #[[D_ATTR_D]] = {{.*}}"target-cpu"="gfx900"
+// LLVM-DELTA-DAG: !{i32 1, !"amdgpu.xnack", i32 1}
diff --git a/clang/test/CodeGenCXX/dynamic-cast-address-space.cpp b/clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
index e5873c1a8de90..2d48708601ccf 100644
--- a/clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
+++ b/clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
@@ -107,9 +107,9 @@ const B& f(A *a) {
 // CHECK: attributes #[[ATTR3]] = { nounwind }
 // CHECK: attributes #[[ATTR4]] = { noreturn }
 //.
-// WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR0]] = { mustprogress noinline optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+16-bit-insts,+add-min-max-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bf8-cvt-scale-insts,+bitop3-insts,+ci-insts,+clusters,+cube-insts,+cvt-pknorm-vop2-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot11-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+f32-to-f16bf16-cvt-sr-insts,+flat-global-insts,+fp4-cvt-scale-insts,+fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-cvt-scale-insts,+fp8-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx1251-gemm-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+gws,+image-insts,+lerp-inst,+mai-insts,+mcast-load-insts,+mqsad-insts,+mqsad-pk-insts,+msad-insts,+permlane16-swap,+permlane32-swap,+pk-add-min-max-insts,+prng-inst,+qsad-insts,+s-memrealtime,+s-memtime-inst,+s-wakeup-barrier-inst,+sad-insts,+setprio-inc-wg-inst,+swmmac-gfx1200-insts,+swmmac-gfx1250-insts,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+vmem-to-lds-load-insts,+wavefrontsize32,+wavefrontsize64,+wmma-128b-insts,+wmma-256b-insts,+xf32-insts" }
+// WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR0]] = { mustprogress noinline optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+16-bit-insts,+add-min-max-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bf8-cvt-scale-insts,+bitop3-insts,+ci-insts,+clusters,+cube-insts,+cvt-pknorm-vop2-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot11-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+f32-to-f16bf16-cvt-sr-insts,+flat-global-insts,+fp4-cvt-scale-insts,+fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-cvt-scale-insts,+fp8-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx1251-gemm-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+gws,+image-insts,+lerp-inst,+mai-insts,+mcast-load-insts,+mqsad-insts,+mqsad-pk-insts,+msad-insts,+permlane16-swap,+permlane32-swap,+pk-add-min-max-insts,+prng-inst,+qsad-insts,+s-memrealtime,+s-memtime-inst,+s-wakeup-barrier-inst,+sad-insts,+setprio-inc-wg-inst,+swmmac-gfx1200-insts,+swmmac-gfx1250-insts,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+vmem-to-lds-load-insts,+wavefrontsize32,+wavefrontsize64,+wmma-128b-insts,+wmma-256b-insts,+xf32-insts,+xnack-any-only" }
 // WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR1:[0-9]+]] = { nounwind willreturn memory(read) }
-// WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR2:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+16-bit-insts,+add-min-max-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bf8-cvt-scale-insts,+bitop3-insts,+ci-insts,+clusters,+cube-insts,+cvt-pknorm-vop2-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot11-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+f32-to-f16bf16-cvt-sr-insts,+flat-global-insts,+fp4-cvt-scale-insts,+fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-cvt-scale-insts,+fp8-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx1251-gemm-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+gws,+image-insts,+lerp-inst,+mai-insts,+mcast-load-insts,+mqsad-insts,+mqsad-pk-insts,+msad-insts,+permlane16-swap,+permlane32-swap,+pk-add-min-max-insts,+prng-inst,+qsad-insts,+s-memrealtime,+s-memtime-inst,+s-wakeup-barrier-inst,+sad-insts,+setprio-inc-wg-inst,+swmmac-gfx1200-insts,+swmmac-gfx1250-insts,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+vmem-to-lds-load-insts,+wavefrontsize32,+wavefrontsize64,+wmma-128b-insts,+wmma-256b-insts,+xf32-insts" }
+// WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR2:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+16-bit-insts,+add-min-max-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bf8-cvt-scale-insts,+bitop3-insts,+ci-insts,+clusters,+cube-insts,+cvt-pknorm-vop2-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot11-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+f32-to-f16bf16-cvt-sr-insts,+flat-global-insts,+fp4-cvt-scale-insts,+fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-cvt-scale-insts,+fp8-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx1251-gemm-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+gws,+image-insts,+lerp-inst,+mai-insts,+mcast-load-insts,+mqsad-insts,+mqsad-pk-insts,+msad-insts,+permlane16-swap,+permlane32-swap,+pk-add-min-max-insts,+prng-inst,+qsad-insts,+s-memrealtime,+s-memtime-inst,+s-wakeup-barrier-inst,+sad-insts,+setprio-inc-wg-inst,+swmmac-gfx1200-insts,+swmmac-gfx1250-insts,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+vmem-to-lds-load-insts,+wavefrontsize32,+wavefrontsize64,+wmma-128b-insts,+wmma-256b-insts,+xf32-insts,+xnack-any-only" }
 // WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR3]] = { nounwind }
 // WITH-NONZERO-DEFAULT-AS: attributes #[[ATTR4]] = { noreturn }
 //.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-module-flag-xnack-sramecc.cl b/clang/test/CodeGenOpenCL/amdgpu-module-flag-xnack-sramecc.cl
new file mode 100644
index 0000000000000..439a815f5c42d
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/amdgpu-module-flag-xnack-sramecc.cl
@@ -0,0 +1,22 @@
+// Test that xnack and sramecc module flags are emitted based on -m flags
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a \
+// RUN:   -mxnack -mno-sramecc \
+// RUN:   -emit-llvm -o - %s | FileCheck %s --check-prefixes=XNACK-ON,SRAMECC-OFF
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a \
+// RUN:   -mno-xnack -msramecc \
+// RUN:   -emit-llvm -o - %s | FileCheck %s --check-prefixes=XNACK-OFF,SRAMECC-ON
+
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx90a \
+// RUN:   -emit-llvm -o - %s | FileCheck %s --check-prefix=NO-FLAGS
+
+// XNACK-ON-DAG: !{i32 1, !"amdgpu.xnack", i32 1}
+// XNACK-OFF-DAG: !{i32 1, !"amdgpu.xnack", i32 0}
+// SRAMECC-ON-DAG: !{i32 1, !"amdgpu.sramecc", i32 1}
+// SRAMECC-OFF-DAG: !{i32 1, !"amdgpu.sramecc", i32 0}
+
+// When no explicit xnack/sramecc feature is set, no module flags are emitted
+// NO-FLAGS-NOT: !"amdgpu.xnack"
+// NO-FLAGS-NOT: !"amdgpu.sramecc"
+
+__attribute__((device)) void test() {}
diff --git a/clang/test/CodeGenOpenCL/amdgpu-xnack-any-only.cl b/clang/test/CodeGenOpenCL/amdgpu-xnack-any-only.cl
new file mode 100644
index 0000000000000..709ca1acf4cb3
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/amdgpu-xnack-any-only.cl
@@ -0,0 +1,27 @@
+// Test that xnack module flags are emitted for all targets, regardless of support.
+// Targets without FEATURE_XNACK_ON_OFF_MODES (like gfx12-5-generic, gfx1250, gfx1251)
+// will ignore the module flag during codegen, but it is still emitted by clang.
+// TODO: In the future, clang ...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/204595


More information about the llvm-branch-commits mailing list