[clang] [llvm] RFC: Implementing new mechanism for hard register operands to inline asm as a constraint. (PR #85846)

Tue Mar 19 11:58:58 PDT 2024

https://github.com/tltao created https://github.com/llvm/llvm-project/pull/85846

Recreate the hard register PR created on phabricator with some minor additions: https://reviews.llvm.org/D105142

The main idea is to allow Clang to support the ability to specify specific hardware registers in inline assembly constraints via the use of curly braces ``{}``. As such, this is mainly a Clang change.

Relevant RFCs posted here:

https://lists.llvm.org/pipermail/llvm-dev/2021-June/151370.html
https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html

Copying the Summary from the phabricator patch:


- This is put up as an RFC patch to get feedback about the introduction of a new inline asm constraint which supports hard register operands
- This is mostly a clang change, since the LLVM IR for the inline assembly already supports the {...} syntax which the backend recognizes. This change merely completes the loop in terms of introducing a user facing constraint which maps to it.

The following design decisions were taken for this patch:

For the Sema side:

- We validate the "{" constraint using a two-phase validation approach. Firstly, we check if there is a target dependent implementation to handle the {....} constraint. If it fails, then we move on to a target agnostic check where we parse and validate the {....} constraint.
- Why do we do this? Well, there are some targets which already process the {...} as a user facing constraint. For example the AMDGPU target. It supports syntax of the form {register-name} as well as {register-name[...]}. Moving this implementation to the target agnostic side seems to set a precedent, that we can keep extending the target agnostic implementation based on new cases for certain targets, which would be better served of moving it to the respective target.
- In terms of the target agnostic validation, we simply check for the following syntax {.*}. The parsed out content within the two curly braces is checked to see whether its a "valid GCC register".

For the Clang CodeGen side:

- Most of the work is done in the AddVariableConstraints function in CGStmt.cpp, which is responsible for emitting the LLVM IR corresponding to the usage of an actual register that the backend can use. Coincidentally, the LLVM IR is also {...}. As mentioned above, this is essentially mapping the LLVM Inline Assembly IR back to a user facing inline asm constraint.
- Within this function we add in logic to check if the constraint is of the form [&]{...} in addition to the "register asm" construct.
- A scenario where it was applicable to apply both the "register asm" construct and the "hard register inline asm constraint" was diagnosed as an unsupported error because there's no way the compiler will know which register the user meant. The safest option here is to error out explicitly, and put onus back on the user.
- To achieve this, and refactor it in a nice way, most of the logic pertaining to the "register asm" construct has been moved into a separate function called ShouldApplyRegisterVariableConstraint which deduces whether to apply the "register asm" construct or not.
- Furthermore, the GCCReg field is set with the "Register" only if the register is validated to be a "valid GCC Register" type. To me it doesn't make a lot of sense to set GCC Reg to something that might not necessarily be a "valid GCC register" as defined by respective targets. Would we have a case where we could have a {.*} constraint where the contents inside the curly brace is not a valid register? Yes. For example, The x86 target supports the "@cca" constraint, which is validated on the Sema side as a valid constraint, and before processing is converted to "{@cca}" (via the convertAsmConstraint function. "@cca" is not a valid "GCC register name". So in these case, we'll simply emit the constraint without setting GCCReg (with the assumption that the respective backends deal with it appropriately)

Tests:

- Various tests were updated to account for the new behaviour.
- I added a few SystemZ tests because we work on the Z backend, but I have no issues adding testing for multiple targets.
- There were a few mentions of "waiting" for the GCC implementation of the same to land before the Clang side lands. As mentioned above, the intent to implement it on the GCC side has already been put forward via the RFC. I have no issues "parking" this implementation until its ready to be merged in. However, it might be good to hash out any open questions/concerns in the interim.


>From 71c2e7959cff33f9cd6fc98a99b7261d535183af Mon Sep 17 00:00:00 2001
From: Tony Tao <tonytao at ca.ibm.com>
Date: Tue, 19 Mar 2024 14:32:48 -0400
Subject: [PATCH] Revisiting the hard register constraint PR on phabricator:
 https://reviews.llvm.org/D105142

The main idea is to allow Clang to support the ability to
specify specific hardware registers in inline assembly
constraints via the use of curly braces ``{}``. As such,
this is mainly a Clang change.

Relevant RFCs posted here

https://lists.llvm.org/pipermail/llvm-dev/2021-June/151370.html
https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html
---
 clang/docs/LanguageExtensions.rst             |  43 ++
 .../clang/Basic/DiagnosticSemaKinds.td        |   2 +
 clang/include/clang/Basic/TargetInfo.h        |  18 +-
 clang/lib/Basic/TargetInfo.cpp                |  54 ++
 clang/lib/Basic/Targets/AArch64.h             |   5 -
 clang/lib/Basic/Targets/ARM.h                 |   5 -
 clang/lib/Basic/Targets/RISCV.h               |   5 -
 clang/lib/Basic/Targets/X86.h                 |   2 +-
 clang/lib/CodeGen/CGStmt.cpp                  | 101 +++-
 .../CodeGen/SystemZ/systemz-inline-asm-02.c   |  10 +-
 .../test/CodeGen/SystemZ/systemz-inline-asm.c |  19 +-
 clang/test/CodeGen/aarch64-inline-asm.c       |  12 +-
 clang/test/CodeGen/asm-goto.c                 |   6 +-
 clang/test/CodeGen/ms-intrinsics.c            |  32 +-
 clang/test/CodeGen/ms-intrinsics.ll           | 533 ++++++++++++++++++
 .../CodeGen/x86-asm-register-constraint-mix.c |  62 ++
 .../test/CodeGen/z-hard-register-inline-asm.c |  52 ++
 clang/test/Sema/z-hard-register-inline-asm.c  |  50 ++
 llvm/test/CodeGen/SystemZ/zos-inline-asm.ll   | 250 ++++++++
 19 files changed, 1200 insertions(+), 61 deletions(-)
 create mode 100644 clang/test/CodeGen/ms-intrinsics.ll
 create mode 100644 clang/test/CodeGen/x86-asm-register-constraint-mix.c
 create mode 100644 clang/test/CodeGen/z-hard-register-inline-asm.c
 create mode 100644 clang/test/Sema/z-hard-register-inline-asm.c
 create mode 100644 llvm/test/CodeGen/SystemZ/zos-inline-asm.ll

diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 13d7261d83d7f1..39d1981a068992 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -1760,6 +1760,49 @@ references can be used instead of numeric references.
       return -1;
   }
 
+Hard Register Operands for ASM Constraints
+==========================================
+
+Clang supports the ability to specify specific hardware registers in inline
+assembly constraints via the use of curly braces ``{}``.
+
+Prior to clang-19, the only way to associate an inline assembly constraint
+with a specific register is via the local register variable feature (`GCC
+Specifying Registers for Local Variables <https://gcc.gnu.org/onlinedocs/gcc-6.5.0/gcc/Local-Register-Variables.html>`_). However, the local register variable association lasts for the entire
+scope of the variable.
+
+Hard register operands will instead only apply to the specific inline ASM
+statement which improves readability and solves a few other issues experienced
+by local register variables, such as:
+
+* function calls might clobber register variables
+* the constraints for the register operands are superfluous
+* one register variable cannot be used for 2 different inline
+  assemblies if the value is expected in different hard regs
+
+The code below is an example of an inline assembly statement using local
+register variables.
+
+.. code-block:: c++
+
+  void foo() {
+    register int *p1 asm ("r0") = bar();
+    register int *p2 asm ("r1") = bar();
+    register int *result asm ("r0");
+    asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
+  }
+Below is the same code but using hard register operands.
+
+.. code-block:: c++
+
+  void foo() {
+    int *p1 = bar();
+    int *p2 = bar();
+    int *result;
+    asm ("sysint" : "={r0}" (result) : "0" (p1), "{r1}" (p2));
+  }
+
+
 Objective-C Features
 ====================
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 8e97902564af08..09672eb3865742 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9235,6 +9235,8 @@ let CategoryName = "Inline Assembly Issue" in {
     "more than one input constraint matches the same output '%0'">;
   def err_store_value_to_reg : Error<
     "impossible constraint in asm: can't store value into a register">;
+  def err_asm_hard_reg_variable_duplicate : Error<
+    "hard register operand already defined as register variable">;
 
   def warn_asm_label_on_auto_decl : Warning<
     "ignored asm label '%0' on automatic variable">;
diff --git a/clang/include/clang/Basic/TargetInfo.h b/clang/include/clang/Basic/TargetInfo.h
index 374595edd2ce4a..01d43b838414b7 100644
--- a/clang/include/clang/Basic/TargetInfo.h
+++ b/clang/include/clang/Basic/TargetInfo.h
@@ -1043,9 +1043,17 @@ class TargetInfo : public TransferrableTargetInfo,
   ///
   /// This function is used by Sema in order to diagnose conflicts between
   /// the clobber list and the input/output lists.
+  /// The constraint should already by validated in validateHardRegisterAsmConstraint
+  /// so just do some basic checking
   virtual StringRef getConstraintRegister(StringRef Constraint,
                                           StringRef Expression) const {
-    return "";
+    StringRef Reg = Expression;
+    size_t Start = Constraint.find('{');
+    size_t End = Constraint.find('}');
+    if (Start != StringRef::npos && End != StringRef::npos && End > Start)
+      Reg = Constraint.substr(Start + 1, End - Start - 1);
+
+    return Reg;
   }
 
   struct ConstraintInfo {
@@ -1187,6 +1195,14 @@ class TargetInfo : public TransferrableTargetInfo,
   validateAsmConstraint(const char *&Name,
                         TargetInfo::ConstraintInfo &info) const = 0;
 
+  // Validate the "hard register" inline asm constraint. This constraint is
+  // of the form {<reg-name>}. This constraint is meant to be used
+  // as an alternative for the "register asm" construct to put inline
+  // asm operands into specific registers.
+  bool
+  validateHardRegisterAsmConstraint(const char *&Name,
+                                    TargetInfo::ConstraintInfo &info) const;
+
   bool resolveSymbolicName(const char *&Name,
                            ArrayRef<ConstraintInfo> OutputConstraints,
                            unsigned &Index) const;
diff --git a/clang/lib/Basic/TargetInfo.cpp b/clang/lib/Basic/TargetInfo.cpp
index 5d9055174c089a..2190a8b8bb246d 100644
--- a/clang/lib/Basic/TargetInfo.cpp
+++ b/clang/lib/Basic/TargetInfo.cpp
@@ -770,6 +770,18 @@ bool TargetInfo::validateOutputConstraint(ConstraintInfo &Info) const {
     case 'E':
     case 'F':
       break;  // Pass them.
+    case '{': {
+      // First, check the target parser in case it validates
+      // the {...} constraint differently.
+      if (validateAsmConstraint(Name, Info))
+        return true;
+
+      // If not, that's okay, we will try to validate it
+      // using a target agnostic implementation.
+      if (!validateHardRegisterAsmConstraint(Name, Info))
+        return false;
+      break;
+    }
     }
 
     Name++;
@@ -785,6 +797,36 @@ bool TargetInfo::validateOutputConstraint(ConstraintInfo &Info) const {
   return Info.allowsMemory() || Info.allowsRegister();
 }
 
+bool TargetInfo::validateHardRegisterAsmConstraint(
+    const char *&Name, TargetInfo::ConstraintInfo &Info) const {
+  // First, swallow the '{'.
+  Name++;
+
+  // Mark the start of the possible register name.
+  const char *Start = Name;
+
+  // Loop through rest of "Name".
+  // In this loop, we check whether we have a closing curly brace which
+  // validates the constraint. Also, this allows us to get the correct bounds to
+  // set our register name.
+  while (*Name && *Name != '}')
+    Name++;
+
+  // Missing '}' or if there is anything after '}', return false.
+  if (!*Name || *(Name + 1))
+    return false;
+
+  // Now we set the register name.
+  std::string Register(Start, Name - Start);
+
+  // We validate whether its a valid register to be used.
+  if (!isValidGCCRegisterName(Register))
+    return false;
+
+  Info.setAllowsRegister();
+  return true;
+}
+
 bool TargetInfo::resolveSymbolicName(const char *&Name,
                                      ArrayRef<ConstraintInfo> OutputConstraints,
                                      unsigned &Index) const {
@@ -917,6 +959,18 @@ bool TargetInfo::validateInputConstraint(
     case '!': // Disparage severely.
     case '*': // Ignore for choosing register preferences.
       break;  // Pass them.
+    case '{': {
+      // First, check the target parser in case it validates
+      // the {...} constraint differently.
+      if (validateAsmConstraint(Name, Info))
+        return true;
+
+      // If not, that's okay, we will try to validate it
+      // using a target agnostic implementation.
+      if (!validateHardRegisterAsmConstraint(Name, Info))
+        return false;
+      break;
+    }
     }
 
     Name++;
diff --git a/clang/lib/Basic/Targets/AArch64.h b/clang/lib/Basic/Targets/AArch64.h
index 2dd6b2181e87df..c40ef2a3c13e94 100644
--- a/clang/lib/Basic/Targets/AArch64.h
+++ b/clang/lib/Basic/Targets/AArch64.h
@@ -188,11 +188,6 @@ class LLVM_LIBRARY_VISIBILITY AArch64TargetInfo : public TargetInfo {
                              std::string &SuggestedModifier) const override;
   std::string_view getClobbers() const override;
 
-  StringRef getConstraintRegister(StringRef Constraint,
-                                  StringRef Expression) const override {
-    return Expression;
-  }
-
   int getEHDataRegisterNumber(unsigned RegNo) const override;
 
   bool validatePointerAuthKey(const llvm::APSInt &value) const override;
diff --git a/clang/lib/Basic/Targets/ARM.h b/clang/lib/Basic/Targets/ARM.h
index 71322a094f5edb..9ed452163c6048 100644
--- a/clang/lib/Basic/Targets/ARM.h
+++ b/clang/lib/Basic/Targets/ARM.h
@@ -213,11 +213,6 @@ class LLVM_LIBRARY_VISIBILITY ARMTargetInfo : public TargetInfo {
                              std::string &SuggestedModifier) const override;
   std::string_view getClobbers() const override;
 
-  StringRef getConstraintRegister(StringRef Constraint,
-                                  StringRef Expression) const override {
-    return Expression;
-  }
-
   CallingConvCheckResult checkCallingConvention(CallingConv CC) const override;
 
   int getEHDataRegisterNumber(unsigned RegNo) const override;
diff --git a/clang/lib/Basic/Targets/RISCV.h b/clang/lib/Basic/Targets/RISCV.h
index bfbdafb682c851..89071da7a42776 100644
--- a/clang/lib/Basic/Targets/RISCV.h
+++ b/clang/lib/Basic/Targets/RISCV.h
@@ -70,11 +70,6 @@ class RISCVTargetInfo : public TargetInfo {
 
   std::string_view getClobbers() const override { return ""; }
 
-  StringRef getConstraintRegister(StringRef Constraint,
-                                  StringRef Expression) const override {
-    return Expression;
-  }
-
   ArrayRef<const char *> getGCCRegNames() const override;
 
   int getEHDataRegisterNumber(unsigned RegNo) const override {
diff --git a/clang/lib/Basic/Targets/X86.h b/clang/lib/Basic/Targets/X86.h
index d2232c7d5275ab..e7fb2706d02590 100644
--- a/clang/lib/Basic/Targets/X86.h
+++ b/clang/lib/Basic/Targets/X86.h
@@ -307,7 +307,7 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo {
       return "di";
     // In case the constraint is 'r' we need to return Expression
     case 'r':
-      return Expression;
+      return TargetInfo::getConstraintRegister(Constraint, Expression);
     // Double letters Y<x> constraints
     case 'Y':
       if ((++I != E) && ((*I == '0') || (*I == 'z')))
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 8898e3f22a7df6..97f61eea620b2a 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -2183,9 +2183,17 @@ void CodeGenFunction::EmitSwitchStmt(const SwitchStmt &S) {
   CaseRangeBlock = SavedCRBlock;
 }
 
-static std::string
-SimplifyConstraint(const char *Constraint, const TargetInfo &Target,
-                 SmallVectorImpl<TargetInfo::ConstraintInfo> *OutCons=nullptr) {
+static std::string SimplifyConstraint(
+    const char *Constraint, const TargetInfo &Target,
+    SmallVectorImpl<TargetInfo::ConstraintInfo> *OutCons = nullptr) {
+  // If we have only the {...} constraint, do not do any simplifications. This
+  // already maps to the lower level LLVM inline assembly IR that tells the
+  // backend to allocate a specific register. Any validations would have already
+  // been done in the Sema stage or will be done in the AddVariableConstraints
+  // function.
+  if (Constraint[0] == '{' || (Constraint[0] == '&' && Constraint[1] == '{'))
+    return std::string(Constraint);
+
   std::string Result;
 
   while (*Constraint) {
@@ -2232,37 +2240,94 @@ SimplifyConstraint(const char *Constraint, const TargetInfo &Target,
 
   return Result;
 }
+/// Is it valid to apply a register constraint for a variable marked with
+/// the "register asm" construct?
+/// Optionally, if it is determined that we can, we set "Register" to the
+/// regiser name.
+static bool
+ShouldApplyRegisterVariableConstraint(const Expr &AsmExpr,
+                                      std::string *Register = nullptr) {
 
-/// AddVariableConstraints - Look at AsmExpr and if it is a variable declared
-/// as using a particular register add that as a constraint that will be used
-/// in this asm stmt.
-static std::string
-AddVariableConstraints(const std::string &Constraint, const Expr &AsmExpr,
-                       const TargetInfo &Target, CodeGenModule &CGM,
-                       const AsmStmt &Stmt, const bool EarlyClobber,
-                       std::string *GCCReg = nullptr) {
   const DeclRefExpr *AsmDeclRef = dyn_cast<DeclRefExpr>(&AsmExpr);
   if (!AsmDeclRef)
-    return Constraint;
+    return false;
   const ValueDecl &Value = *AsmDeclRef->getDecl();
   const VarDecl *Variable = dyn_cast<VarDecl>(&Value);
   if (!Variable)
-    return Constraint;
+    return false;
   if (Variable->getStorageClass() != SC_Register)
-    return Constraint;
+    return false;
   AsmLabelAttr *Attr = Variable->getAttr<AsmLabelAttr>();
   if (!Attr)
+    return false;
+
+  if (Register != nullptr)
+    // Set the register to return from Attr.
+    *Register = Attr->getLabel().str();
+  return true;
+}
+
+/// AddVariableConstraints:
+/// Look at AsmExpr and if it is a variable declared as using a particular
+/// register add that as a constraint that will be used in this asm stmt.
+/// Whether it can be used or not is dependent on querying
+/// ShouldApplyRegisterVariableConstraint() Also check whether the "hard
+/// register" inline asm constraint (i.e. "{reg-name}") is specified. If so, add
+/// that as a constraint that will be used in this asm stmt.
+static std::string
+AddVariableConstraints(const std::string &Constraint, const Expr &AsmExpr,
+                       const TargetInfo &Target, CodeGenModule &CGM,
+                       const AsmStmt &Stmt, const bool EarlyClobber,
+                       std::string *GCCReg = nullptr) {
+
+  // Do we have the "hard register" inline asm constraint.
+  bool ApplyHardRegisterConstraint =
+      Constraint[0] == '{' || (EarlyClobber && Constraint[1] == '{');
+
+  // Do we have "register asm" on a variable.
+  std::string Reg = "";
+  bool ApplyRegisterVariableConstraint =
+      ShouldApplyRegisterVariableConstraint(AsmExpr, &Reg);
+
+  // Diagnose the scenario where we apply both the register variable constraint
+  // and a hard register variable constraint as an unsupported error.
+  // Why? Because we could have a situation where the register passed in through
+  // {...} and the register passed in through the "register asm" construct could
+  // be different, and in this case, there's no way for the compiler to know
+  // which one to emit.
+  if (ApplyHardRegisterConstraint && ApplyRegisterVariableConstraint) {
+    CGM.getDiags().Report(AsmExpr.getExprLoc(),
+                          diag::err_asm_hard_reg_variable_duplicate);
     return Constraint;
-  StringRef Register = Attr->getLabel();
-  assert(Target.isValidGCCRegisterName(Register));
+  }
+
+  if (!ApplyHardRegisterConstraint && !ApplyRegisterVariableConstraint)
+    return Constraint;
+
   // We're using validateOutputConstraint here because we only care if
   // this is a register constraint.
   TargetInfo::ConstraintInfo Info(Constraint, "");
-  if (Target.validateOutputConstraint(Info) &&
-      !Info.allowsRegister()) {
+  if (Target.validateOutputConstraint(Info) && !Info.allowsRegister()) {
     CGM.ErrorUnsupported(&Stmt, "__asm__");
     return Constraint;
   }
+
+  if (ApplyHardRegisterConstraint) {
+    int Start = EarlyClobber ? 2 : 1;
+    int End = Constraint.find('}');
+    Reg = Constraint.substr(Start, End - Start);
+    // If we don't have a valid register name, simply return the constraint.
+    // For example: There are some targets like X86 that use a constraint such
+    // as "@cca", which is validated and then converted into {@cca}. Now this
+    // isn't necessarily a "GCC Register", but in terms of emission, it is
+    // valid since it lowered appropriately in the X86 backend. For the {..}
+    // constraint, we shouldn't be too strict and error out if the register
+    // itself isn't a valid "GCC register".
+    if (!Target.isValidGCCRegisterName(Reg))
+      return Constraint;
+  }
+
+  StringRef Register(Reg);
   // Canonicalize the register here before returning it.
   Register = Target.getNormalizedGCCRegisterName(Register);
   if (GCCReg != nullptr)
diff --git a/clang/test/CodeGen/SystemZ/systemz-inline-asm-02.c b/clang/test/CodeGen/SystemZ/systemz-inline-asm-02.c
index 754d7e66f04b24..60237c81fd7298 100644
--- a/clang/test/CodeGen/SystemZ/systemz-inline-asm-02.c
+++ b/clang/test/CodeGen/SystemZ/systemz-inline-asm-02.c
@@ -5,9 +5,15 @@
 // Test that an error is given if a physreg is defined by multiple operands.
 int test_physreg_defs(void) {
   register int l __asm__("r7") = 0;
+  int m;
 
   // CHECK: error: multiple outputs to hard register: r7
-  __asm__("" : "+r"(l), "=r"(l));
+  __asm__(""
+          : "+r"(l), "=r"(l));
 
-  return l;
+  // CHECK: error: multiple outputs to hard register: r6
+  __asm__(""
+          : "+{r6}"(m), "={r6}"(m));
+
+  return l + m;
 }
diff --git a/clang/test/CodeGen/SystemZ/systemz-inline-asm.c b/clang/test/CodeGen/SystemZ/systemz-inline-asm.c
index e38d37cd345e26..a3b47700dc30bb 100644
--- a/clang/test/CodeGen/SystemZ/systemz-inline-asm.c
+++ b/clang/test/CodeGen/SystemZ/systemz-inline-asm.c
@@ -134,12 +134,25 @@ long double test_f128(long double f, long double g) {
 int test_physregs(void) {
   // CHECK-LABEL: define{{.*}} signext i32 @test_physregs()
   register int l __asm__("r7") = 0;
+  int m = 0;
 
   // CHECK: call i32 asm "lr $0, $1", "={r7},{r7}"
-  __asm__("lr %0, %1" : "+r"(l));
+  __asm__("lr %0, %1"
+          : "+r"(l));
 
   // CHECK: call i32 asm "$0 $1 $2", "={r7},{r7},{r7}"
-  __asm__("%0 %1 %2" : "+r"(l) : "r"(l));
+  __asm__("%0 %1 %2"
+          : "+r"(l)
+          : "r"(l));
 
-  return l;
+  // CHECK: call i32 asm "lr $0, $1", "={r6},{r6}"
+  __asm__("lr %0, %1"
+          : "+{r6}"(m));
+
+  // CHECK: call i32 asm "$0 $1 $2", "={r6},{r6},{r6}"
+  __asm__("%0 %1 %2"
+          : "+{r6}"(m)
+          : "{r6}"(m));
+
+  return l + m;
 }
diff --git a/clang/test/CodeGen/aarch64-inline-asm.c b/clang/test/CodeGen/aarch64-inline-asm.c
index 8ddee560b11da4..860cc858275ea6 100644
--- a/clang/test/CodeGen/aarch64-inline-asm.c
+++ b/clang/test/CodeGen/aarch64-inline-asm.c
@@ -77,7 +77,15 @@ void test_gcc_registers(void) {
 
 void test_tied_earlyclobber(void) {
   register int a asm("x1");
-  asm("" : "+&r"(a));
+  asm(""
+      : "+&r"(a));
+  // CHECK: call i32 asm "", "=&{x1},0"(i32 %0)
+}
+
+void test_tied_earlyclobber2(void) {
+  int a;
+  asm(""
+      : "+&{x1}"(a));
   // CHECK: call i32 asm "", "=&{x1},0"(i32 %0)
 }
 
@@ -102,4 +110,4 @@ void test_sme_constraints(){
 
   asm("movt zt0[3, mul vl], z0" : : : "zt0");
 // CHECK: call void asm sideeffect "movt zt0[3, mul vl], z0", "~{zt0}"()
-}
\ No newline at end of file
+}
diff --git a/clang/test/CodeGen/asm-goto.c b/clang/test/CodeGen/asm-goto.c
index 4037c1b2a3d7a2..77bd77615f2998 100644
--- a/clang/test/CodeGen/asm-goto.c
+++ b/clang/test/CodeGen/asm-goto.c
@@ -55,14 +55,14 @@ int test3(int out1, int out2) {
 
 int test4(int out1, int out2) {
   // CHECK-LABEL: define{{.*}} i32 @test4(
-  // CHECK: callbr { i32, i32 } asm sideeffect "jne ${5:l}", "={si},={di},r,0,1,!i,!i
+  // CHECK: callbr { i32, i32 } asm sideeffect "jne ${5:l}", "={si},={di},r,{si},{di},!i,!i
   // CHECK: to label %asm.fallthrough [label %label_true.split, label %loop.split]
   // CHECK-LABEL: asm.fallthrough:
   if (out1 < out2)
     asm volatile goto("jne %l5" : "+S"(out1), "+D"(out2) : "r"(out1) :: label_true, loop);
   else
     asm volatile goto("jne %l7" : "+S"(out1), "+D"(out2) : "r"(out1), "r"(out2) :: label_true, loop);
-  // CHECK: callbr { i32, i32 } asm sideeffect "jne ${7:l}", "={si},={di},r,r,0,1,!i,!i
+  // CHECK: callbr { i32, i32 } asm sideeffect "jne ${7:l}", "={si},={di},r,r,{si},{di},!i,!i
   // CHECK: to label %asm.fallthrough6 [label %label_true.split11, label %loop.split14]
   // CHECK-LABEL: asm.fallthrough6:
   return out1 + out2;
@@ -92,7 +92,7 @@ int test5(int addr, int size, int limit) {
 
 int test6(int out1) {
   // CHECK-LABEL: define{{.*}} i32 @test6(
-  // CHECK: callbr i32 asm sideeffect "testl $0, $0; testl $1, $1; jne ${3:l}", "={si},r,0,!i,!i,{{.*}}
+  // CHECK: callbr i32 asm sideeffect "testl $0, $0; testl $1, $1; jne ${3:l}", "={si},r,{si},!i,!i,{{.*}}
   // CHECK: to label %asm.fallthrough [label %label_true.split, label %landing.split]
   // CHECK-LABEL: asm.fallthrough:
   // CHECK-LABEL: landing:
diff --git a/clang/test/CodeGen/ms-intrinsics.c b/clang/test/CodeGen/ms-intrinsics.c
index 5bb003d1f91fc0..375258ca609675 100644
--- a/clang/test/CodeGen/ms-intrinsics.c
+++ b/clang/test/CodeGen/ms-intrinsics.c
@@ -36,12 +36,12 @@ void test__movsb(unsigned char *Dest, unsigned char *Src, size_t Count) {
   return __movsb(Dest, Src, Count);
 }
 // CHECK-I386-LABEL: define{{.*}} void @test__movsb
-// CHECK-I386:   tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movsb\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count)
+// CHECK-I386:   tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movsb\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},{di},1,{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count)
 // CHECK-I386:   ret void
 // CHECK-I386: }
 
 // CHECK-X64-LABEL: define{{.*}} void @test__movsb
-// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movsb", "={di},={si},={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
+// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movsb", "={di},={si},={cx},{di},{si},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 
@@ -49,12 +49,12 @@ void test__stosw(unsigned short *Dest, unsigned short Data, size_t Count) {
   return __stosw(Dest, Data, Count);
 }
 // CHECK-I386-LABEL: define{{.*}} void @test__stosw
-// CHECK-I386:   call { ptr, i32 } asm sideeffect "rep stosw", "={di},={cx},{ax},0,1,~{memory},~{dirflag},~{fpsr},~{flags}"(i16 %Data, ptr %Dest, i32 %Count)
+// CHECK-I386:   call { ptr, i32 } asm sideeffect "rep stosw", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i16 %Data, ptr %Dest, i32 %Count)
 // CHECK-I386:   ret void
 // CHECK-I386: }
 
 // CHECK-X64-LABEL: define{{.*}} void @test__stosw
-// CHECK-X64:   call { ptr, i64 } asm sideeffect "rep stosw", "={di},={cx},{ax},0,1,~{memory},~{dirflag},~{fpsr},~{flags}"(i16 %Data, ptr %Dest, i64 %Count)
+// CHECK-X64:   call { ptr, i64 } asm sideeffect "rep stosw", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i16 %Data, ptr %Dest, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 
@@ -62,12 +62,12 @@ void test__movsw(unsigned short *Dest, unsigned short *Src, size_t Count) {
   return __movsw(Dest, Src, Count);
 }
 // CHECK-I386-LABEL: define{{.*}} void @test__movsw
-// CHECK-I386:   tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movsw\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count)
+// CHECK-I386:   tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movsw\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},{di},1,{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count)
 // CHECK-I386:   ret void
 // CHECK-I386: }
 
 // CHECK-X64-LABEL: define{{.*}} void @test__movsw
-// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movsw", "={di},={si},={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
+// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movsw", "={di},={si},={cx},{di},{si},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 
@@ -75,12 +75,12 @@ void test__stosd(unsigned long *Dest, unsigned long Data, size_t Count) {
   return __stosd(Dest, Data, Count);
 }
 // CHECK-I386-LABEL: define{{.*}} void @test__stosd
-// CHECK-I386:   call { ptr, i32 } asm sideeffect "rep stos$(l$|d$)", "={di},={cx},{ax},0,1,~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %Data, ptr %Dest, i32 %Count)
+// CHECK-I386:   call { ptr, i32 } asm sideeffect "rep stos$(l$|d$)", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %Data, ptr %Dest, i32 %Count)
 // CHECK-I386:   ret void
 // CHECK-I386: }
 
 // CHECK-X64-LABEL: define{{.*}} void @test__stosd
-// CHECK-X64:   call { ptr, i64 } asm sideeffect "rep stos$(l$|d$)", "={di},={cx},{ax},0,1,~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %Data, ptr %Dest, i64 %Count)
+// CHECK-X64:   call { ptr, i64 } asm sideeffect "rep stos$(l$|d$)", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %Data, ptr %Dest, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 
@@ -88,12 +88,12 @@ void test__movsd(unsigned long *Dest, unsigned long *Src, size_t Count) {
   return __movsd(Dest, Src, Count);
 }
 // CHECK-I386-LABEL: define{{.*}} void @test__movsd
-// CHECK-I386:   tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movs$(l$|d$)\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count)
+// CHECK-I386:   tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movs$(l$|d$)\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},{di},1,{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count)
 // CHECK-I386:   ret void
 // CHECK-I386: }
 
 // CHECK-X64-LABEL: define{{.*}} void @test__movsd
-// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movs$(l$|d$)", "={di},={si},={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
+// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movs$(l$|d$)", "={di},={si},={cx},{di},{si},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 
@@ -102,7 +102,7 @@ void test__stosq(unsigned __int64 *Dest, unsigned __int64 Data, size_t Count) {
   return __stosq(Dest, Data, Count);
 }
 // CHECK-X64-LABEL: define{{.*}} void @test__stosq
-// CHECK-X64:   call { ptr, i64 } asm sideeffect "rep stosq", "={di},={cx},{ax},0,1,~{memory},~{dirflag},~{fpsr},~{flags}"(i64 %Data, ptr %Dest, i64 %Count)
+// CHECK-X64:   call { ptr, i64 } asm sideeffect "rep stosq", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i64 %Data, ptr %Dest, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 
@@ -110,7 +110,7 @@ void test__movsq(unsigned __int64 *Dest, unsigned __int64 *Src, size_t Count) {
   return __movsq(Dest, Src, Count);
 }
 // CHECK-X64-LABEL: define{{.*}} void @test__movsq
-// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movsq", "={di},={si},={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
+// CHECK-X64:   call { ptr, ptr, i64 } asm sideeffect "rep movsq", "={di},={si},={cx},{di},{si},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i64 %Count)
 // CHECK-X64:   ret void
 // CHECK-X64: }
 #endif
@@ -632,13 +632,13 @@ long test_InterlockedExchange_HLERelease(long volatile *Target, long Value) {
 long test_InterlockedCompareExchange_HLEAcquire(long volatile *Destination,
                                                 long Exchange, long Comparand) {
 // CHECK-INTEL: define{{.*}} i32 @test_InterlockedCompareExchange_HLEAcquire(ptr{{[a-z_ ]*}}%Destination, i32{{[a-z_ ]*}}%Exchange, i32{{[a-z_ ]*}}%Comparand)
-// CHECK-INTEL: call i32 asm sideeffect ".byte 0xf2 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,0,*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Destination, i32 %Exchange, i32 %Comparand, ptr elementtype(i32) %Destination)
+// CHECK-INTEL: call i32 asm sideeffect ".byte 0xf2 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,{ax},*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Destination, i32 %Exchange, i32 %Comparand, ptr elementtype(i32) %Destination)
   return _InterlockedCompareExchange_HLEAcquire(Destination, Exchange, Comparand);
 }
 long test_InterlockedCompareExchange_HLERelease(long volatile *Destination,
                                             long Exchange, long Comparand) {
 // CHECK-INTEL: define{{.*}} i32 @test_InterlockedCompareExchange_HLERelease(ptr{{[a-z_ ]*}}%Destination, i32{{[a-z_ ]*}}%Exchange, i32{{[a-z_ ]*}}%Comparand)
-// CHECK-INTEL: call i32 asm sideeffect ".byte 0xf3 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,0,*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Destination, i32 %Exchange, i32 %Comparand, ptr elementtype(i32) %Destination)
+// CHECK-INTEL: call i32 asm sideeffect ".byte 0xf3 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,{ax},*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Destination, i32 %Exchange, i32 %Comparand, ptr elementtype(i32) %Destination)
   return _InterlockedCompareExchange_HLERelease(Destination, Exchange, Comparand);
 }
 #endif
@@ -656,13 +656,13 @@ __int64 test_InterlockedExchange64_HLERelease(__int64 volatile *Target, __int64
 __int64 test_InterlockedCompareExchange64_HLEAcquire(__int64 volatile *Destination,
                                                      __int64 Exchange, __int64 Comparand) {
 // CHECK-X64: define{{.*}} i64 @test_InterlockedCompareExchange64_HLEAcquire(ptr{{[a-z_ ]*}}%Destination, i64{{[a-z_ ]*}}%Exchange, i64{{[a-z_ ]*}}%Comparand)
-// CHECK-X64: call i64 asm sideeffect ".byte 0xf2 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,0,*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i64) %Destination, i64 %Exchange, i64 %Comparand, ptr elementtype(i64) %Destination)
+// CHECK-X64: call i64 asm sideeffect ".byte 0xf2 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,{ax},*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i64) %Destination, i64 %Exchange, i64 %Comparand, ptr elementtype(i64) %Destination)
   return _InterlockedCompareExchange64_HLEAcquire(Destination, Exchange, Comparand);
 }
 __int64 test_InterlockedCompareExchange64_HLERelease(__int64 volatile *Destination,
                                                      __int64 Exchange, __int64 Comparand) {
 // CHECK-X64: define{{.*}} i64 @test_InterlockedCompareExchange64_HLERelease(ptr{{[a-z_ ]*}}%Destination, i64{{[a-z_ ]*}}%Exchange, i64{{[a-z_ ]*}}%Comparand)
-// CHECK-X64: call i64 asm sideeffect ".byte 0xf3 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,0,*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i64) %Destination, i64 %Exchange, i64 %Comparand, ptr elementtype(i64) %Destination)
+// CHECK-X64: call i64 asm sideeffect ".byte 0xf3 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,{ax},*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i64) %Destination, i64 %Exchange, i64 %Comparand, ptr elementtype(i64) %Destination)
   return _InterlockedCompareExchange64_HLERelease(Destination, Exchange, Comparand);
 }
 #endif
diff --git a/clang/test/CodeGen/ms-intrinsics.ll b/clang/test/CodeGen/ms-intrinsics.ll
new file mode 100644
index 00000000000000..28361ec97fff98
--- /dev/null
+++ b/clang/test/CodeGen/ms-intrinsics.ll
@@ -0,0 +1,533 @@
+; ModuleID = '/Users/ttao/local_src/personal_upstream/llvm-project/clang/test/CodeGen/ms-intrinsics.c'
+source_filename = "/Users/ttao/local_src/personal_upstream/llvm-project/clang/test/CodeGen/ms-intrinsics.c"
+target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32-a:0:32-S32"
+target triple = "i686-unknown-windows-msvc"
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: write)
+define dso_local void @test__stosb(ptr noundef writeonly %Dest, i8 noundef zeroext %Data, i32 noundef %Count) local_unnamed_addr #0 {
+entry:
+  tail call void @llvm.memset.p0.i32(ptr align 1 %Dest, i8 %Data, i32 %Count, i1 true)
+  ret void
+}
+
+; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write)
+declare void @llvm.memset.p0.i32(ptr nocapture writeonly, i8, i32, i1 immarg) #1
+
+; Function Attrs: minsize nounwind optsize
+define dso_local void @test__movsb(ptr noundef %Dest, ptr noundef %Src, i32 noundef %Count) local_unnamed_addr #2 {
+entry:
+  %0 = tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movsb\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},{di},1,{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count) #13, !srcloc !4
+  ret void
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local void @test__stosw(ptr noundef %Dest, i16 noundef zeroext %Data, i32 noundef %Count) local_unnamed_addr #2 {
+entry:
+  %0 = tail call { ptr, i32 } asm sideeffect "rep stosw", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i16 %Data, ptr %Dest, i32 %Count) #13, !srcloc !5
+  ret void
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local void @test__movsw(ptr noundef %Dest, ptr noundef %Src, i32 noundef %Count) local_unnamed_addr #2 {
+entry:
+  %0 = tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movsw\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},{di},1,{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count) #13, !srcloc !6
+  ret void
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local void @test__stosd(ptr noundef %Dest, i32 noundef %Data, i32 noundef %Count) local_unnamed_addr #2 {
+entry:
+  %0 = tail call { ptr, i32 } asm sideeffect "rep stos$(l$|d$)", "={di},={cx},{ax},{di},{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(i32 %Data, ptr %Dest, i32 %Count) #13, !srcloc !7
+  ret void
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local void @test__movsd(ptr noundef %Dest, ptr noundef %Src, i32 noundef %Count) local_unnamed_addr #2 {
+entry:
+  %0 = tail call { ptr, ptr, i32 } asm sideeffect "xchg $(%esi, $1$|$1, esi$)\0Arep movs$(l$|d$)\0Axchg $(%esi, $1$|$1, esi$)", "={di},=r,={cx},{di},1,{cx},~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %Dest, ptr %Src, i32 %Count) #13, !srcloc !8
+  ret void
+}
+
+; Function Attrs: minsize noreturn nounwind optsize memory(inaccessiblemem: write)
+define dso_local void @test__ud2() local_unnamed_addr #3 {
+entry:
+  tail call void @llvm.trap()
+  unreachable
+}
+
+; Function Attrs: cold noreturn nounwind memory(inaccessiblemem: write)
+declare void @llvm.trap() #4
+
+; Function Attrs: minsize noreturn nounwind optsize
+define dso_local void @test__int2c() local_unnamed_addr #5 {
+entry:
+  tail call void asm sideeffect "int $$0x2c", ""() #14
+  unreachable
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nosync nounwind optsize willreturn memory(none)
+define dso_local ptr @test_ReturnAddress() local_unnamed_addr #6 {
+entry:
+  %0 = tail call ptr @llvm.returnaddress(i32 0)
+  ret ptr %0
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare ptr @llvm.returnaddress(i32 immarg) #7
+
+; Function Attrs: minsize mustprogress nofree norecurse nosync nounwind optsize willreturn memory(none)
+define dso_local ptr @test_AddressOfReturnAddress() local_unnamed_addr #6 {
+entry:
+  %0 = tail call ptr @llvm.addressofreturnaddress.p0()
+  ret ptr %0
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare ptr @llvm.addressofreturnaddress.p0() #7
+
+; Function Attrs: minsize mustprogress nofree norecurse nosync nounwind optsize willreturn memory(argmem: write)
+define dso_local noundef zeroext i8 @test_BitScanForward(ptr nocapture noundef writeonly %Index, i32 noundef %Mask) local_unnamed_addr #8 {
+entry:
+  %0 = icmp eq i32 %Mask, 0
+  br i1 %0, label %bitscan_end, label %bitscan_not_zero
+
+bitscan_end:                                      ; preds = %bitscan_not_zero, %entry
+  %bitscan_result = phi i8 [ 0, %entry ], [ 1, %bitscan_not_zero ]
+  ret i8 %bitscan_result
+
+bitscan_not_zero:                                 ; preds = %entry
+  %incdec.ptr = getelementptr inbounds i8, ptr %Index, i32 4
+  %1 = tail call i32 @llvm.cttz.i32(i32 %Mask, i1 true), !range !9
+  store i32 %1, ptr %incdec.ptr, align 4
+  br label %bitscan_end
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare i32 @llvm.cttz.i32(i32, i1 immarg) #9
+
+; Function Attrs: minsize mustprogress nofree norecurse nosync nounwind optsize willreturn memory(argmem: write)
+define dso_local noundef zeroext i8 @test_BitScanReverse(ptr nocapture noundef writeonly %Index, i32 noundef %Mask) local_unnamed_addr #8 {
+entry:
+  %0 = icmp eq i32 %Mask, 0
+  br i1 %0, label %bitscan_end, label %bitscan_not_zero
+
+bitscan_end:                                      ; preds = %bitscan_not_zero, %entry
+  %bitscan_result = phi i8 [ 0, %entry ], [ 1, %bitscan_not_zero ]
+  ret i8 %bitscan_result
+
+bitscan_not_zero:                                 ; preds = %entry
+  %incdec.ptr = getelementptr inbounds i8, ptr %Index, i32 4
+  %1 = tail call i32 @llvm.ctlz.i32(i32 %Mask, i1 true), !range !9
+  %2 = xor i32 %1, 31
+  store i32 %2, ptr %incdec.ptr, align 4
+  br label %bitscan_end
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare i32 @llvm.ctlz.i32(i32, i1 immarg) #9
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local ptr @test_InterlockedExchangePointer(ptr nocapture noundef %Target, ptr noundef %Value) local_unnamed_addr #10 {
+entry:
+  %0 = ptrtoint ptr %Value to i32
+  %1 = atomicrmw xchg ptr %Target, i32 %0 seq_cst, align 4
+  %2 = inttoptr i32 %1 to ptr
+  ret ptr %2
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local ptr @test_InterlockedCompareExchangePointer(ptr noundef %Destination, ptr noundef %Exchange, ptr noundef %Comparand) local_unnamed_addr #11 {
+entry:
+  %0 = ptrtoint ptr %Exchange to i32
+  %1 = ptrtoint ptr %Comparand to i32
+  %2 = cmpxchg volatile ptr %Destination, i32 %1, i32 %0 seq_cst seq_cst, align 4
+  %3 = extractvalue { i32, i1 } %2, 0
+  %4 = inttoptr i32 %3 to ptr
+  ret ptr %4
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local ptr @test_InterlockedCompareExchangePointer_nf(ptr noundef %Destination, ptr noundef %Exchange, ptr noundef %Comparand) local_unnamed_addr #11 {
+entry:
+  %0 = ptrtoint ptr %Exchange to i32
+  %1 = ptrtoint ptr %Comparand to i32
+  %2 = cmpxchg volatile ptr %Destination, i32 %1, i32 %0 monotonic monotonic, align 4
+  %3 = extractvalue { i32, i1 } %2, 0
+  %4 = inttoptr i32 %3 to ptr
+  ret ptr %4
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i8 @test_InterlockedExchange8(ptr nocapture noundef %value, i8 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xchg ptr %value, i8 %mask seq_cst, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedExchange16(ptr nocapture noundef %value, i16 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xchg ptr %value, i16 %mask seq_cst, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedExchange(ptr nocapture noundef %value, i32 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xchg ptr %value, i32 %mask seq_cst, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i8 @test_InterlockedExchangeAdd8(ptr nocapture noundef %value, i8 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw add ptr %value, i8 %mask seq_cst, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedExchangeAdd16(ptr nocapture noundef %value, i16 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw add ptr %value, i16 %mask seq_cst, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedExchangeAdd(ptr nocapture noundef %value, i32 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw add ptr %value, i32 %mask seq_cst, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i8 @test_InterlockedExchangeSub8(ptr nocapture noundef %value, i8 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %value, i8 %mask seq_cst, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedExchangeSub16(ptr nocapture noundef %value, i16 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %value, i16 %mask seq_cst, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedExchangeSub(ptr nocapture noundef %value, i32 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %value, i32 %mask seq_cst, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i8 @test_InterlockedOr8(ptr nocapture noundef %value, i8 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw or ptr %value, i8 %mask seq_cst, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedOr16(ptr nocapture noundef %value, i16 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw or ptr %value, i16 %mask seq_cst, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedOr(ptr nocapture noundef %value, i32 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw or ptr %value, i32 %mask seq_cst, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i8 @test_InterlockedXor8(ptr nocapture noundef %value, i8 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xor ptr %value, i8 %mask seq_cst, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedXor16(ptr nocapture noundef %value, i16 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xor ptr %value, i16 %mask seq_cst, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedXor(ptr nocapture noundef %value, i32 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xor ptr %value, i32 %mask seq_cst, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i8 @test_InterlockedAnd8(ptr nocapture noundef %value, i8 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw and ptr %value, i8 %mask seq_cst, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedAnd16(ptr nocapture noundef %value, i16 noundef signext %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw and ptr %value, i16 %mask seq_cst, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedAnd(ptr nocapture noundef %value, i32 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw and ptr %value, i32 %mask seq_cst, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local signext i8 @test_InterlockedCompareExchange8(ptr noundef %Destination, i8 noundef signext %Exchange, i8 noundef signext %Comperand) local_unnamed_addr #11 {
+entry:
+  %0 = cmpxchg volatile ptr %Destination, i8 %Comperand, i8 %Exchange seq_cst seq_cst, align 1
+  %1 = extractvalue { i8, i1 } %0, 0
+  ret i8 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local signext i16 @test_InterlockedCompareExchange16(ptr noundef %Destination, i16 noundef signext %Exchange, i16 noundef signext %Comperand) local_unnamed_addr #11 {
+entry:
+  %0 = cmpxchg volatile ptr %Destination, i16 %Comperand, i16 %Exchange seq_cst seq_cst, align 2
+  %1 = extractvalue { i16, i1 } %0, 0
+  ret i16 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local i32 @test_InterlockedCompareExchange(ptr noundef %Destination, i32 noundef %Exchange, i32 noundef %Comperand) local_unnamed_addr #11 {
+entry:
+  %0 = cmpxchg volatile ptr %Destination, i32 %Comperand, i32 %Exchange seq_cst seq_cst, align 4
+  %1 = extractvalue { i32, i1 } %0, 0
+  ret i32 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local i64 @test_InterlockedCompareExchange64(ptr noundef %Destination, i64 noundef %Exchange, i64 noundef %Comperand) local_unnamed_addr #11 {
+entry:
+  %0 = cmpxchg volatile ptr %Destination, i64 %Comperand, i64 %Exchange seq_cst seq_cst, align 8
+  %1 = extractvalue { i64, i1 } %0, 0
+  ret i64 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedIncrement16(ptr nocapture noundef %Addend) local_unnamed_addr #10 {
+entry:
+  %incdec.ptr = getelementptr inbounds i8, ptr %Addend, i32 2
+  %0 = atomicrmw add ptr %incdec.ptr, i16 1 seq_cst, align 2
+  %1 = add i16 %0, 1
+  ret i16 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedIncrement(ptr nocapture noundef %Addend) local_unnamed_addr #10 {
+entry:
+  %incdec.ptr = getelementptr inbounds i8, ptr %Addend, i32 4
+  %0 = atomicrmw add ptr %incdec.ptr, i32 1 seq_cst, align 4
+  %1 = add i32 %0, 1
+  ret i32 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local signext i16 @test_InterlockedDecrement16(ptr nocapture noundef %Addend) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %Addend, i16 1 seq_cst, align 2
+  %1 = add i16 %0, -1
+  ret i16 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i32 @test_InterlockedDecrement(ptr nocapture noundef %Addend) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %Addend, i32 1 seq_cst, align 4
+  %1 = add i32 %0, -1
+  ret i32 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local signext i8 @test_iso_volatile_load8(ptr noundef %p) local_unnamed_addr #11 {
+entry:
+  %0 = load volatile i8, ptr %p, align 1
+  ret i8 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local signext i16 @test_iso_volatile_load16(ptr noundef %p) local_unnamed_addr #11 {
+entry:
+  %0 = load volatile i16, ptr %p, align 2
+  ret i16 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local i32 @test_iso_volatile_load32(ptr noundef %p) local_unnamed_addr #11 {
+entry:
+  %0 = load volatile i32, ptr %p, align 4
+  ret i32 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local i64 @test_iso_volatile_load64(ptr noundef %p) local_unnamed_addr #11 {
+entry:
+  %0 = load volatile i64, ptr %p, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize nofree norecurse nounwind optsize memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local void @test_iso_volatile_store8(ptr noundef %p, i8 noundef signext %v) local_unnamed_addr #12 {
+entry:
+  store volatile i8 %v, ptr %p, align 1
+  ret void
+}
+
+; Function Attrs: minsize nofree norecurse nounwind optsize memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local void @test_iso_volatile_store16(ptr noundef %p, i16 noundef signext %v) local_unnamed_addr #12 {
+entry:
+  store volatile i16 %v, ptr %p, align 2
+  ret void
+}
+
+; Function Attrs: minsize nofree norecurse nounwind optsize memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local void @test_iso_volatile_store32(ptr noundef %p, i32 noundef %v) local_unnamed_addr #12 {
+entry:
+  store volatile i32 %v, ptr %p, align 4
+  ret void
+}
+
+; Function Attrs: minsize nofree norecurse nounwind optsize memory(argmem: readwrite, inaccessiblemem: readwrite)
+define dso_local void @test_iso_volatile_store64(ptr noundef %p, i64 noundef %v) local_unnamed_addr #12 {
+entry:
+  store volatile i64 %v, ptr %p, align 8
+  ret void
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedExchange64(ptr nocapture noundef %value, i64 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xchg ptr %value, i64 %mask seq_cst, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedExchangeAdd64(ptr nocapture noundef %value, i64 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw add ptr %value, i64 %mask seq_cst, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedExchangeSub64(ptr nocapture noundef %value, i64 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %value, i64 %mask seq_cst, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedOr64(ptr nocapture noundef %value, i64 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw or ptr %value, i64 %mask seq_cst, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedXor64(ptr nocapture noundef %value, i64 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw xor ptr %value, i64 %mask seq_cst, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedAnd64(ptr nocapture noundef %value, i64 noundef %mask) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw and ptr %value, i64 %mask seq_cst, align 8
+  ret i64 %0
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedIncrement64(ptr nocapture noundef %Addend) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw add ptr %Addend, i64 1 seq_cst, align 8
+  %1 = add i64 %0, 1
+  ret i64 %1
+}
+
+; Function Attrs: minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite)
+define dso_local i64 @test_InterlockedDecrement64(ptr nocapture noundef %Addend) local_unnamed_addr #10 {
+entry:
+  %0 = atomicrmw sub ptr %Addend, i64 1 seq_cst, align 8
+  %1 = add i64 %0, -1
+  ret i64 %1
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local i32 @test_InterlockedExchange_HLEAcquire(ptr noundef %Target, i32 noundef %Value) local_unnamed_addr #2 {
+entry:
+  %0 = tail call i32 asm sideeffect ".byte 0xf2 ; lock ; xchg $($0, $1$|$1, $0$)", "=r,=*m,0,*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Target, i32 %Value, ptr elementtype(i32) %Target) #13, !srcloc !10
+  ret i32 %0
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local i32 @test_InterlockedExchange_HLERelease(ptr noundef %Target, i32 noundef %Value) local_unnamed_addr #2 {
+entry:
+  %0 = tail call i32 asm sideeffect ".byte 0xf3 ; lock ; xchg $($0, $1$|$1, $0$)", "=r,=*m,0,*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Target, i32 %Value, ptr elementtype(i32) %Target) #13, !srcloc !11
+  ret i32 %0
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local i32 @test_InterlockedCompareExchange_HLEAcquire(ptr noundef %Destination, i32 noundef %Exchange, i32 noundef %Comparand) local_unnamed_addr #2 {
+entry:
+  %0 = tail call i32 asm sideeffect ".byte 0xf2 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,{ax},*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Destination, i32 %Exchange, i32 %Comparand, ptr elementtype(i32) %Destination) #13, !srcloc !12
+  ret i32 %0
+}
+
+; Function Attrs: minsize nounwind optsize
+define dso_local i32 @test_InterlockedCompareExchange_HLERelease(ptr noundef %Destination, i32 noundef %Exchange, i32 noundef %Comparand) local_unnamed_addr #2 {
+entry:
+  %0 = tail call i32 asm sideeffect ".byte 0xf3 ; lock ; cmpxchg $($2, $1$|$1, $2$)", "={ax},=*m,r,{ax},*m,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %Destination, i32 %Exchange, i32 %Comparand, ptr elementtype(i32) %Destination) #13, !srcloc !13
+  ret i32 %0
+}
+
+; Function Attrs: minsize noreturn nounwind optsize
+define dso_local void @test__fastfail() local_unnamed_addr #5 {
+entry:
+  tail call void asm sideeffect "int $$0x29", "{cx}"(i32 42) #14
+  unreachable
+}
+
+attributes #0 = { minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: write) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) }
+attributes #2 = { minsize nounwind optsize "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #3 = { minsize noreturn nounwind optsize memory(inaccessiblemem: write) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #4 = { cold noreturn nounwind memory(inaccessiblemem: write) }
+attributes #5 = { minsize noreturn nounwind optsize "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #6 = { minsize mustprogress nofree norecurse nosync nounwind optsize willreturn memory(none) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #7 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #8 = { minsize mustprogress nofree norecurse nosync nounwind optsize willreturn memory(argmem: write) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #9 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #10 = { minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #11 = { minsize mustprogress nofree norecurse nounwind optsize willreturn memory(argmem: readwrite, inaccessiblemem: readwrite) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #12 = { minsize nofree norecurse nounwind optsize memory(argmem: readwrite, inaccessiblemem: readwrite) "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }
+attributes #13 = { nounwind }
+attributes #14 = { noreturn nounwind }
+
+!llvm.module.flags = !{!0, !1, !2}
+!llvm.ident = !{!3}
+
+!0 = !{i32 1, !"NumRegisterParameters", i32 0}
+!1 = !{i32 1, !"wchar_size", i32 2}
+!2 = !{i32 1, !"MaxTLSAlign", i32 65536}
+!3 = !{!"clang version 19.0.0git (git at github.com:tltao/llvm-project.git ea72c082bc29fdceca33f37477b7588f31630a5f)"}
+!4 = !{i64 109437, i64 109490, i64 109527}
+!5 = !{i64 111630}
+!6 = !{i64 110789, i64 110842, i64 110879}
+!7 = !{i64 111264}
+!8 = !{i64 110112, i64 110165, i64 110206}
+!9 = !{i32 0, i32 33}
+!10 = !{i64 168252}
+!11 = !{i64 168520}
+!12 = !{i64 169689}
+!13 = !{i64 170075}
diff --git a/clang/test/CodeGen/x86-asm-register-constraint-mix.c b/clang/test/CodeGen/x86-asm-register-constraint-mix.c
new file mode 100644
index 00000000000000..038a978349c9ac
--- /dev/null
+++ b/clang/test/CodeGen/x86-asm-register-constraint-mix.c
@@ -0,0 +1,62 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -O2 -emit-llvm %s -o - | FileCheck %s
+
+unsigned long foo(unsigned long addr, unsigned long a0,
+                  unsigned long a1, unsigned long a2,
+                  unsigned long a3, unsigned long a4,
+                  unsigned long a5) {
+  register unsigned long result asm("rax");
+  register unsigned long addr1 asm("rax") = addr;
+  register unsigned long b0 asm("rdi") = a0;
+  register unsigned long b1 asm("rsi") = a1;
+  register unsigned long b2 asm("rdx") = a2;
+  register unsigned long b3 asm("rcx") = a3;
+  register unsigned long b4 asm("r8") = a4;
+  register unsigned long b5 asm("r9") = a5;
+
+  // CHECK: tail call i64 asm "call *$1", "={rax},{rax},{rdi},{rsi},{rdx},{rcx},{r8},{r9},{rax},~{dirflag},~{fpsr},~{flags}"(i64 %addr, i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64 undef)
+  asm("call *%1" 
+      : "+r" (result) 
+      : "r"(addr1), "r"(b0), "r"(b1), "r"(b2), "r"(b3), "r"(b4), "r"(b5));
+  return result;
+}
+
+unsigned long foo1(unsigned long addr, unsigned long a0,
+                  unsigned long a1, unsigned long a2,
+                  unsigned long a3, unsigned long a4,
+                  unsigned long a5) {
+  unsigned long result;
+  unsigned long addr1 = addr;
+  unsigned long b0 = a0;
+  unsigned long b1 = a1;
+  unsigned long b2 = a2;
+  unsigned long b3 = a3;
+  unsigned long b4 = a4;
+  unsigned long b5 = a5;
+
+  // CHECK: tail call i64 asm "call *$1", "={rax},{rax},{rdi},{rsi},{rdx},{rcx},{r8},{r9},{rax},~{dirflag},~{fpsr},~{flags}"(i64 %addr, i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64 undef)
+  asm("call *%1" 
+      : "+{rax}" (result) 
+      : "{rax}"(addr1), "{rdi}"(b0), "{rsi}"(b1), "{rdx}"(b2), "{rcx}"(b3), "{r8}"(b4), "{r9}"(b5));
+  return result;
+}
+
+unsigned long foo2(unsigned long addr, unsigned long a0,
+                  unsigned long a1, unsigned long a2,
+                  unsigned long a3, unsigned long a4,
+                  unsigned long a5) {
+  register unsigned long result asm("rax");
+  unsigned long addr1 = addr;
+  unsigned long b0 = a0;
+  register unsigned long b1 asm ("rsi") = a1;
+  unsigned long b2 = a2;
+  unsigned long b3 = a3;
+  register unsigned long b4 asm ("r8") = a4;
+  unsigned long b5 = a5;
+
+  // CHECK: tail call i64 asm "call *$1", "={rax},{rax},{rdi},{rsi},{rdx},{rcx},{r8},{r9},{rax},~{dirflag},~{fpsr},~{flags}"(i64 %addr, i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64 undef)
+  asm("call *%1" 
+      : "+r" (result) 
+      : "{rax}"(addr1), "{rdi}"(b0), "r"(b1), "{rdx}"(b2), "{rcx}"(b3), "r"(b4), "{r9}"(b5));
+  return result;
+}
diff --git a/clang/test/CodeGen/z-hard-register-inline-asm.c b/clang/test/CodeGen/z-hard-register-inline-asm.c
new file mode 100644
index 00000000000000..bf70af5765f64f
--- /dev/null
+++ b/clang/test/CodeGen/z-hard-register-inline-asm.c
@@ -0,0 +1,52 @@
+// RUN: %clang_cc1 -triple s390x-ibm-linux -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-COUNT %s
+// RUN: %clang_cc1 -triple s390x-ibm-zos -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-COUNT-2 %s
+
+void f1() {
+  int a, b;
+  register int c asm("r1");
+  register int d asm("r2");
+
+  // CHECK-COUNT: call i32 asm "lhi $0,5\0A", "={r1}"
+  // CHECK-COUNT-2: call i32 asm "lhi $0,5\0A", "={r1}"
+  __asm("lhi %0,5\n"
+        : "={r1}"(a)
+        :
+        :);
+  __asm("lhi %0,5\n"
+        : "=r"(c)
+        :
+        :);
+
+  // CHECK-COUNT: call i32 asm "lgr $0,$1\0A", "={r1},{r2}"
+  // CHECK-COUNT-2: call i32 asm "lgr $0,$1\0A", "={r1},{r2}"
+  __asm("lgr %0,%1\n"
+        : "={r1}"(a)
+        : "{r2}"(b)
+        :);
+  __asm("lgr %0,%1\n"
+        : "=r"(c)
+        : "r"(d)
+        :);
+
+  // CHECK-COUNT: call i32 asm "lgr $0,$1\0A", "={r1},{r2}"
+  // CHECK-COUNT-2: call i32 asm "lgr $0,$1\0A", "={r1},{r2}"
+  __asm("lgr %0,%1\n"
+        : "={%r1}"(a)
+        : "{%r2}"(b)
+        :);
+  __asm("lgr %0,%1\n"
+        : "={r1}"(a)
+        : "{%r2}"(b)
+        :);
+
+  // CHECK-COUNT: call i32 asm "lgr $0,$1\0A", "=&{r1},{r2}"
+  // CHECK-COUNT-2: call i32 asm "lgr $0,$1\0A", "=&{r1},{r2}"
+  __asm("lgr %0,%1\n"
+        : "=&{r1}"(a)
+        : "{%r2}"(b)
+        :);
+  __asm("lgr %0,%1\n"
+        : "=&r"(c)
+        : "r"(d)
+        :);
+}
diff --git a/clang/test/Sema/z-hard-register-inline-asm.c b/clang/test/Sema/z-hard-register-inline-asm.c
new file mode 100644
index 00000000000000..7a7b821d967f62
--- /dev/null
+++ b/clang/test/Sema/z-hard-register-inline-asm.c
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 %s -triple s390x-ibm-linux -fsyntax-only -verify
+// RUN: %clang_cc1 %s -triple s390x-ibm-zos -fsyntax-only -verify
+
+void f1() {
+  int a, b;
+  register int c asm("r2");
+  __asm("lhi %0,5\n"
+        : "={r2}"(a)
+        :);
+
+  __asm("lgr %0,%1\n"
+        : "={r2}"(a)
+        : "{r1}"(b));
+
+  __asm("lgr %0,%1\n"
+        : "={r2}"(a)
+        : "{%r1}"(b));
+
+  __asm("lgr %0,%1\n"
+        : "=&{r1}"(a)
+        : "{r2}"(b));
+
+  __asm("lhi %0,5\n"
+        : "={r2"(a) // expected-error {{invalid output constraint '={r2' in asm}}
+        :);
+
+  __asm("lhi %0,5\n"
+        : "={r17}"(a) // expected-error {{invalid output constraint '={r17}' in asm}}
+        :);
+
+  __asm("lhi %0,5\n"
+        : "={}"(a) // expected-error {{invalid output constraint '={}' in asm}}
+        :);
+
+  __asm("lhi %0,5\n"
+        : "=&{r2"(a) // expected-error {{invalid output constraint '=&{r2' in asm}}
+        :);
+
+  __asm("lgr %0,%1\n"
+        : "=r"(a)
+        : "{r1"(b)); // expected-error {{invalid input constraint '{r1' in asm}}
+
+  __asm("lgr %0,%1\n"
+        : "=r"(a)
+        : "{}"(b)); // expected-error {{invalid input constraint '{}' in asm}}
+
+  __asm("lgr %0,%1\n"
+        : "={r1}"(a)
+        : "{r17}"(b)); // expected-error {{invalid input constraint '{r17}' in asm}}
+}
diff --git a/llvm/test/CodeGen/SystemZ/zos-inline-asm.ll b/llvm/test/CodeGen/SystemZ/zos-inline-asm.ll
new file mode 100644
index 00000000000000..d5ad2494961fe7
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/zos-inline-asm.ll
@@ -0,0 +1,250 @@
+; RUN: llc -mtriple s390x-ibm-zos < %s | FileCheck %s
+; Source to generate .ll file
+;
+; void f1() {
+;   int a, b;
+;   __asm(" lhi %0,5\n"
+;         : "={r1}"(a)
+;         :
+;         :);
+;
+;   __asm(" lgr %0,%1\n"
+;         : "={r1}"(a)
+;         : "{r2}"(b)
+;         :);
+; }
+;
+; void f2() {
+;   int a, m_b;
+;   __asm(" stg %1,%0\n"
+;         : "=m"(m_b)
+;         : "{r1}"(a)
+;         :);
+; }
+;
+; void f3() {
+;   int r15, r1;
+;
+;   __asm(" svc 109\n"
+;         : "={r15}"(r15)
+;         : "{r1}"(r1), "{r15}"(25)
+;         :);
+; }
+;
+; void f4() {
+;   ptr parm;
+;   long long rc, reason;
+;   char *code;
+;
+;   __asm(" pc 0(%3)"
+;         : "={r0}"(reason), "+{r1}"(parm), "={r15}"(rc)
+;         : "r"(code)
+;         :);
+; }
+;
+; void f5() {
+;
+;   int a;
+;   int b;
+;   int c;
+;
+;   __asm(" lhi %0,10\n"
+;         " ar %0,%0\n"
+;         : "=&r"(a)
+;         :
+;         :);
+;
+;   __asm(" lhi %0,10\n"
+;         " ar %0,%0\n"
+;         : "={&r2}"(b)
+;         :
+;         :);
+;
+;   __asm(" lhi %0,10\n"
+;         " ar %0,%0\n"
+;         : "={&r2}"(c)
+;         :
+;         :);
+; }
+;
+; void f7() {
+;   int a, b, res;
+;
+;   a = 2147483640;
+;   b = 10;
+;
+;   __asm(" alr %0,%1\n"
+;         " jo *-4\n"
+;         :"=r"(res)
+;         :"r"(a), "r"(b)
+;         :);
+; }
+;
+; int f8() {
+;
+;   int a, b, res;
+;   a = b = res = -1;
+;
+;   __asm(" lhi 1,5\n"
+;         :
+;         :
+;         : "r1");
+;
+;   __asm(" lgr 2,1\n"
+;         :
+;         :
+;         : "r2");
+;
+;   __asm(" stg 2,%0\n"
+;         :
+;         : "r"(res)
+;         :);
+;
+;  return res;
+; }
+
+define hidden void @f1() {
+; CHECK-LABEL: f1:
+; CHECK: *APP
+; CHECK-NEXT: lhi 1, 5
+; CHECK: *NO_APP
+; CHECK: *APP
+; CHECK-NEXT: lgr 1, 2
+; CHECK: *NO_APP
+entry:
+  %a = alloca i32, align 4
+  %b = alloca i32, align 4
+  %0 = call i32 asm " lhi $0,5\0A", "={r1}"()
+  store i32 %0, ptr %a, align 4
+  %1 = load i32, ptr %b, align 4
+  %2 = call i32 asm " lgr $0,$1\0A", "={r1},{r2}"(i32 %1)
+  store i32 %2, ptr %a, align 4
+  ret void
+}
+
+define hidden void @f2() {
+; CHECK-LABEL: f2:
+; CHECK: *APP
+; CHECK-NEXT: stg 1, {{.*}}(4)
+; CHECK: *NO_APP
+entry:
+  %a = alloca i32, align 4
+  %m_b = alloca i32, align 4
+  %0 = load i32, ptr %a, align 4
+  call void asm " stg $1,$0\0A", "=*m,{r1}"(ptr elementtype(i32) %m_b, i32 %0)
+  ret void
+}
+
+define hidden void @f3() {
+; CHECK-LABEL: f3:
+; CHECK: l 1, {{.*}}(4)
+; CHECK: lhi 15, 25
+; CHECK: *APP
+; CHECK-NEXT: svc 109
+; CHECK: *NO_APP
+entry:
+  %r15 = alloca i32, align 4
+  %r1 = alloca i32, align 4
+  %0 = load i32, ptr %r1, align 4
+  %1 = call i32 asm " svc 109\0A", "={r15},{r1},{r15}"(i32 %0, i32 25)
+  store i32 %1, ptr %r15, align 4
+  ret void
+}
+
+define hidden void @f4() {
+; CHECK-LABEL: f4:
+; CHECK: *APP
+; CHECK-NEXT: pc 0
+; CHECK: *NO_APP
+; CHECK: stg 0, {{.*}}(4)
+; CHECK-NEXT: stg 1, {{.*}}(4)
+; CHECK-NEXT: stg 15, {{.*}}(4)
+entry:
+  %parm = alloca ptr, align 8
+  %rc = alloca i64, align 8
+  %reason = alloca i64, align 8
+  %code = alloca ptr, align 8
+  %0 = load ptr, ptr %parm, align 8
+  %1 = load ptr, ptr %code, align 8
+  %2 = call { i64, ptr, i64 } asm " pc 0($3)", "={r0},={r1},={r15},r,1"(ptr %1, ptr %0)
+  %asmresult = extractvalue { i64, ptr, i64 } %2, 0
+  %asmresult1 = extractvalue { i64, ptr, i64 } %2, 1
+  %asmresult2 = extractvalue { i64, ptr, i64 } %2, 2
+  store i64 %asmresult, ptr %reason, align 8
+  store ptr %asmresult1, ptr %parm, align 8
+  store i64 %asmresult2, ptr %rc, align 8
+  ret void
+}
+
+define hidden void @f5() {
+; CHECK-LABEL: f5:
+; CHECK: *APP
+; CHECK-NEXT: lhi {{[0-9]}}, 10
+; CHECK-NEXT: ar {{[0-9]}}, {{[0-9]}}
+; CHECK: *NO_APP
+; CHECK: *APP
+; CHECK-NEXT: lhi 2, 10
+; CHECK-NEXT: ar 2, 2
+; CHECK: *NO_APP
+; CHECK: *APP
+; CHECK-NEXT: lhi 2, 10
+; CHECK-NEXT: ar 2, 2
+; CHECK: *NO_APP
+entry:
+  %a = alloca i32, align 4
+  %b = alloca i32, align 4
+  %c = alloca i32, align 4
+  %0 = call i32 asm " lhi $0,10\0A ar $0,$0\0A", "=&r"()
+  store i32 %0, ptr %a, align 4
+  %1 = call i32 asm " lhi $0,10\0A ar $0,$0\0A", "=&{r2}"()
+  store i32 %1, ptr %b, align 4
+  %2 = call i32 asm " lhi $0,10\0A ar $0,$0\0A", "=&{r2}"()
+  store i32 %2, ptr %c, align 4
+  ret void
+}
+
+define hidden void @f7() {
+; CHECK-LABEL: f7:
+; CHECK: *APP
+; CHECK-NEXT: alr {{[0-9]}}, {{[0-9]}}
+; CHECK-NEXT: {{.*}}:
+; CHECK-NEXT: jo {{.*}}-4
+; CHECK: *NO_APP
+entry:
+  %a = alloca i32, align 4
+  %b = alloca i32, align 4
+  %res = alloca i32, align 4
+  store i32 2147483640, ptr %a, align 4
+  store i32 10, ptr %b, align 4
+  %0 = load i32, ptr %a, align 4
+  %1 = load i32, ptr %b, align 4
+  %2 = call i32 asm " alr $0,$1\0A jo *-4\0A", "=r,r,r"(i32 %0, i32 %1)
+  store i32 %2, ptr %res, align 4
+  ret void
+}
+
+define hidden signext i32 @f8() {
+; CHECK-LABEL: f8:
+; CHECK: *APP
+; CHECK-NEXT: lhi 1, 5
+; CHECK: *NO_APP
+; CHECK: *APP
+; CHECK-NEXT: lgr 2, 1
+; CHECK: *NO_APP
+; CHECK: *APP
+; CHECK-NEXT: stg 2, {{.*}}(4)
+; CHECK: *NO_APP
+; CHECK: lgf 3, {{.*}}(4)
+entry:
+  %a = alloca i32, align 4
+  %b = alloca i32, align 4
+  %res = alloca i32, align 4
+  store i32 -1, ptr %res, align 4
+  store i32 -1, ptr %b, align 4
+  store i32 -1, ptr %a, align 4
+  call void asm sideeffect " lhi 1,5\0A", "~{r1}"()
+  call void asm sideeffect " lgr 2,1\0A", "~{r2}"()
+  call void asm " stg 2,$0\0A", "=*m"(ptr elementtype(i32) %res)
+  %0 = load i32, ptr %res, align 4
+  ret i32 %0
+}