[clang] [llvm] [inlineasm] Improve support for multiple inline asm constraints (e.g. "rm") (PR #197798)

Bill Wendling via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 9 01:30:11 PDT 2026


https://github.com/isanbard updated https://github.com/llvm/llvm-project/pull/197798

>From e648d01466a924cf658efc2ecb33855cc1ba7110 Mon Sep 17 00:00:00 2001
From: Bill Wendling <morbo at google.com>
Date: Wed, 22 Apr 2026 20:01:52 -0700
Subject: [PATCH 1/8] [inlineasm] Improve support for multiple inline asm
 constraints (e.g. "rm")

Historically, LLVM preferred the most conservative constraint when
multiple options were provided (e.g., "rm" would default to "m"). While
this ensured successful compilation under high register pressure, it
often led to suboptimal code generation, such as unnecessary spills to
the stack even when registers were available.

The register allocators have grown up and are now able to fold registers
in such instances. There's no longer the need to restrict us to the most
conservative option, except for the fast register allocator. To that
end, we restrict the front-end from preferring the conservative
constraint (unless compiling at '-O0' where we don't care about code
generation quality).

However, simply preferring the least restrictive option doesn't work in
all situations. We could have a situation where the SelectionDAG isn't
able to satisfy the "r" constraint and so we would like it to consider
other constraint options if they exist.

This patch introduces a multi-stage approach to allow LLVM to prefer
registers while maintaining a robust fallback to memory:

1. Clang Front-end: When presented with "rm" constraints, Clang now
   emits a dual-path IR structure using a new 'llvm.asm_constraint_br'
   intrinsic inside a 'callbr'. One path targets the register-preferring
   form, while the other uses the conservative memory form (e.g.,
   "=*rm").

2. InlineAsmPrepare Pass: This new/updated pass resolves the
   'llvm.asm_constraint_br' intrinsic early in the backend pipeline. It
   prunes the IR based on the optimization level: at -O0 it selects the
   memory-preferring path, while at -O1+ it selects the
   register-preferring path.

3. SelectionDAG Iterative Selection: SelectionDAGBuilder now employs an
   iterative approach to constraint resolution. If the chosen constraint
   (e.g., "r") cannot be satisfied during the DAG building phase (e.g.,
   due to reserved registers), it can now backtrack and attempt the next
   available option (e.g., "m").

In the most extreme case, each operand would have all of their
constraint options tried, making the complexity O(M * N), where M is the
number of operands and N is the max number of constraints per operand.
This is worse than the original O(M) complexity. However, we emphasize
that O(M * N) is the *worst* case scenario.

Example:
    void write(unsigned long flags) {
        asm("push %0 ; popf" : : "rm" (flags));
    }

Old codegen:
    movq    %rdi, -8(%rsp)
    pushq   -8(%rsp)
    popfq

New codegen:
    pushq   %rdi
    popfq

Fixes: #20571
---
 clang/lib/CodeGen/CGStmt.cpp                  |  65 +-
 clang/lib/CodeGen/CodeGenFunction.h           |  11 +-
 clang/test/CodeGen/asm-reg-mem-constraints.c  | 124 +++
 clang/test/CodeGen/asm.c                      |  25 -
 llvm/docs/LangRef.rst                         |  51 ++
 llvm/include/llvm/CodeGen/InlineAsmPrepare.h  |  12 +-
 llvm/include/llvm/CodeGen/Passes.h            |   1 +
 llvm/include/llvm/CodeGen/TargetLowering.h    |  16 +
 llvm/include/llvm/IR/InlineAsm.h              |   7 +
 llvm/include/llvm/IR/Intrinsics.td            |  13 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   3 +-
 llvm/lib/CodeGen/InlineAsmPrepare.cpp         | 132 ++-
 .../SelectionDAG/SelectionDAGBuilder.cpp      | 182 ++--
 .../SelectionDAG/SelectionDAGBuilder.h        |   5 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  90 +-
 llvm/lib/IR/Verifier.cpp                      |  26 +-
 llvm/lib/Passes/PassRegistry.def              |   2 +-
 llvm/test/CodeGen/AArch64/O0-pipeline.ll      |   1 +
 llvm/test/CodeGen/AArch64/O3-pipeline.ll      |   1 +
 .../CodeGen/AArch64/inline-asm-prepare.ll     |  60 +-
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll      |   5 +
 llvm/test/CodeGen/ARM/O3-pipeline.ll          |   1 +
 llvm/test/CodeGen/LoongArch/O0-pipeline.ll    |   1 +
 llvm/test/CodeGen/LoongArch/opt-pipeline.ll   |   1 +
 llvm/test/CodeGen/PowerPC/O0-pipeline.ll      |   1 +
 llvm/test/CodeGen/PowerPC/O3-pipeline.ll      |   1 +
 llvm/test/CodeGen/RISCV/O0-pipeline.ll        |   1 +
 llvm/test/CodeGen/RISCV/O3-pipeline.ll        |   1 +
 llvm/test/CodeGen/SPIRV/llc-pipeline.ll       |   2 +
 llvm/test/CodeGen/X86/O0-pipeline.ll          |   1 +
 .../CodeGen/X86/asm-constraints-torture.ll    | 787 ++++++++++++++++++
 llvm/test/CodeGen/X86/asm-modifier.ll         |  36 +-
 llvm/test/CodeGen/X86/inline-asm-callbase.ll  |  75 ++
 llvm/test/CodeGen/X86/inline-asm-rm.ll        | 246 ++++++
 llvm/test/CodeGen/X86/inlineasm-sched-bug.ll  |   5 +-
 llvm/test/CodeGen/X86/opt-pipeline.ll         |   1 +
 36 files changed, 1775 insertions(+), 217 deletions(-)
 create mode 100644 clang/test/CodeGen/asm-reg-mem-constraints.c
 create mode 100644 llvm/test/CodeGen/X86/asm-constraints-torture.ll
 create mode 100644 llvm/test/CodeGen/X86/inline-asm-callbase.ll
 create mode 100644 llvm/test/CodeGen/X86/inline-asm-rm.ll

diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 71f88cdf58954..0c011cc4508db 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -2859,13 +2859,21 @@ void CodeGenFunction::HandleOutputConstraints(const AsmStmt &S,
     if (!AsmInfo.Constraints.empty())
       AsmInfo.Constraints += ',';
 
-    // If this is a register output, then make the inline asm return it
-    // by-value.  If this is a memory result, return the value by-reference.
+    // - If this is a register output, then make the inline asm return it
+    //   by-value.
+    // - If this is a memory output, return the value by reference.
+    // - If this is a register and memory output, treat it like a register
+    //   output at -O[1-3]. This allows the optimizing register allocators to
+    //   choose a register, while the fast register allocator defaults to
+    //   memory.
     QualType QTy = OutExpr->getType();
     const bool IsScalarOrAggregate =
         hasScalarEvaluationKind(QTy) || hasAggregateEvaluationKind(QTy);
+    const bool RegisterMemoryConstraints =
+        AsmInfo.PreferRegs && Info.allowsRegister() && Info.allowsMemory();
 
-    if (!Info.allowsMemory() && IsScalarOrAggregate) {
+    if (IsScalarOrAggregate &&
+        (!Info.allowsMemory() || RegisterMemoryConstraints)) {
       AsmInfo.Constraints += "=" + OutputConstraint;
       AsmInfo.ResultRegQualTys.push_back(QTy);
       AsmInfo.ResultRegDests.push_back(Dest);
@@ -3177,11 +3185,13 @@ bool CodeGenFunction::HandleClobbers(const AsmStmt &S,
 void CodeGenFunction::EmitAsmStmt(
     const AsmStmt &S,
     SmallVectorImpl<TargetInfo::ConstraintInfo> &OutputConstraintInfos,
-    SmallVectorImpl<TargetInfo::ConstraintInfo> &InputConstraintInfos) {
+    SmallVectorImpl<TargetInfo::ConstraintInfo> &InputConstraintInfos,
+    bool PreferRegs) {
   // Assemble the final asm string.
   std::string AsmString = S.generateAsmString(getContext());
 
-  AsmConstraintsInfo AsmInfo(OutputConstraintInfos, InputConstraintInfos);
+  AsmConstraintsInfo AsmInfo(OutputConstraintInfos, InputConstraintInfos,
+                             PreferRegs);
 
   // Handle output constraints.
   HandleOutputConstraints(S, AsmInfo);
@@ -3309,7 +3319,50 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) {
                                     InputConstraintInfos))
     return EmitHipStdParUnsupportedAsm(this, S);
 
-  EmitAsmStmt(S, OutputConstraintInfos, InputConstraintInfos);
+  // If any constraints allow for register and memory options, we
+  // need to delay choosing which constraint option to prefer (register or
+  // memory) until ISel, where the 'llvm.asm.constraint.br' intrinsic is
+  // resolved.
+  bool HasRegMemConstraints =
+      llvm::all_of(llvm::concat<TargetInfo::ConstraintInfo>(
+                       OutputConstraintInfos, InputConstraintInfos),
+                   [](const TargetInfo::ConstraintInfo &Info) {
+                     // FIXME: Should we allow for alternative constraints?
+                     return !StringRef(Info.getConstraintStr()).contains(",");
+                   }) &&
+      llvm::any_of(llvm::concat<TargetInfo::ConstraintInfo>(
+                       OutputConstraintInfos, InputConstraintInfos),
+                   [](const TargetInfo::ConstraintInfo &Info) {
+                     return Info.allowsRegister() && Info.allowsMemory();
+                   });
+
+  llvm::BasicBlock *PrefRegBlock = nullptr;
+  llvm::BasicBlock *PrefMemBlock = nullptr;
+
+  if (HasRegMemConstraints) {
+    PrefRegBlock = createBasicBlock("asm.pref.reg");
+    PrefMemBlock = createBasicBlock("asm.pref.mem");
+    CurFn->insert(CurFn->end(), PrefRegBlock);
+    CurFn->insert(CurFn->end(), PrefMemBlock);
+
+    Builder.CreateCallBr(CGM.getIntrinsic(llvm::Intrinsic::asm_constraint_br),
+                         PrefRegBlock, PrefMemBlock);
+    Builder.SetInsertPoint(PrefMemBlock);
+  }
+
+  EmitAsmStmt(S, OutputConstraintInfos, InputConstraintInfos, false);
+
+  if (HasRegMemConstraints) {
+    llvm::BasicBlock *MergeBlock = createBasicBlock("asm.merge");
+    CurFn->insert(CurFn->end(), MergeBlock);
+    Builder.CreateBr(MergeBlock);
+
+    Builder.SetInsertPoint(PrefRegBlock);
+    EmitAsmStmt(S, OutputConstraintInfos, InputConstraintInfos, true);
+    Builder.CreateBr(MergeBlock);
+
+    Builder.SetInsertPoint(MergeBlock);
+  }
 }
 
 LValue CodeGenFunction::InitCapturedStruct(const CapturedStmt &S) {
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 77ca3e0fee84f..a7c5b7fbbaa4c 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -5548,17 +5548,22 @@ class CodeGenFunction : public CodeGenTypeCache {
     bool ReadOnly = true;
     bool ReadNone = true;
 
+    // Prefer using registers over memory.
+    bool PreferRegs = false;
+
     AsmConstraintsInfo(
         SmallVectorImpl<TargetInfo::ConstraintInfo> &OutputConstraintInfos,
-        SmallVectorImpl<TargetInfo::ConstraintInfo> &InputConstraintInfos)
+        SmallVectorImpl<TargetInfo::ConstraintInfo> &InputConstraintInfos,
+        bool PreferRegs)
         : OutputConstraintInfos(OutputConstraintInfos),
-          InputConstraintInfos(InputConstraintInfos) {}
+          InputConstraintInfos(InputConstraintInfos), PreferRegs(PreferRegs) {}
   };
 
   void EmitAsmStmt(
       const AsmStmt &S,
       SmallVectorImpl<TargetInfo::ConstraintInfo> &OutputConstraintInfos,
-      SmallVectorImpl<TargetInfo::ConstraintInfo> &InputConstraintInfos);
+      SmallVectorImpl<TargetInfo::ConstraintInfo> &InputConstraintInfos,
+      bool PreferRegs);
   void EmitAsmStores(const AsmStmt &S,
                      const llvm::ArrayRef<llvm::Value *> RegResults,
                      const AsmConstraintsInfo &AsmInfo);
diff --git a/clang/test/CodeGen/asm-reg-mem-constraints.c b/clang/test/CodeGen/asm-reg-mem-constraints.c
new file mode 100644
index 0000000000000..a89ef0456f0a6
--- /dev/null
+++ b/clang/test/CodeGen/asm-reg-mem-constraints.c
@@ -0,0 +1,124 @@
+// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -O2 %s -o - | FileCheck %s
+
+void test_reg_mem_inputs(unsigned long flags) {
+  // CHECK-LABEL: @test_reg_mem_inputs
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:     tail call void asm sideeffect "", "rm,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
+  // CHECK-NEXT:     br label %asm.merge
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:     tail call void asm sideeffect "", "rm,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
+  // CHECK-NEXT:     br label %asm.merge
+  asm ("" : : "rm" (flags));
+}
+
+unsigned long test_reg_mem_outputs(void) {
+  // CHECK-LABEL: @test_reg_mem_outputs
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    = tail call i32 asm "", "=rm,~{dirflag},~{fpsr},~{flags}"()
+  // CHECK-NEXT:    br label %asm.merge
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:    call void asm "", "=*rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
+  // CHECK:         = load i32, ptr %out
+  // CHECK-NEXT:    br label %asm.merge
+  unsigned long out;
+  asm ("" : "=rm" (out));
+  return out;
+}
+
+void test_g_inputs(unsigned long flags) {
+  // CHECK-LABEL: @test_g_inputs
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    tail call void asm sideeffect "", "imr,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
+  // CHECK-NEXT:    br label %asm.merge
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:    tail call void asm sideeffect "", "imr,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
+  // CHECK-NEXT:    br label %asm.merge
+  asm ("" : : "g" (flags));
+}
+
+unsigned long test_g_outputs(void) {
+  // CHECK-LABEL: @test_g_outputs
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    %0 = tail call i32 asm "", "=imr,~{dirflag},~{fpsr},~{flags}"()
+  // CHECK-NEXT:    br label %asm.merge
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:    call void asm "", "=*imr,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
+  // CHECK-NEXT:    = load i32, ptr %out
+  // CHECK-NEXT:    br label %asm.merge
+  unsigned long out;
+  asm ("" : "=g" (out));
+  return out;
+}
+
+void test_reg_mem_earlyclobber(int len) {
+  // CHECK-LABEL: @test_reg_mem_earlyclobber
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    = tail call i32 asm sideeffect "", "=&rm,0,~{dirflag},~{fpsr},~{flags}"(i32 %len)
+  // CHECK-NEXT:    br label %asm.merge
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:    call void asm sideeffect "", "=*&rm,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %len.addr, i32 %len)
+  // CHECK-NEXT:    br label %asm.merge
+  __asm__ volatile ("" : "+&&rm" (len));
+}
+
+void test_reg_mem_commutative(int len) {
+  // CHECK-LABEL: @test_reg_mem_commutative
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    = tail call { i32, i32 } asm sideeffect "", "=%rm,=rm,0,1,~{dirflag},~{fpsr},~{flags}"(i32 %len, i32 %len)
+  // CHECK-NEXT:    br label %asm.merge
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:    call void asm sideeffect "", "=*%rm,=*rm,0,1,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %len.addr, ptr nonnull elementtype(i32) %len.addr, i32 %len, i32 %len)
+  // CHECK-NEXT:    br label %asm.merge
+  __asm__ volatile ("" : "+%%rm" (len), "+rm" (len));
+}
+
+unsigned long test_asm_goto(void) {
+  // CHECK-LABEL: @test_asm_goto
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:         to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    = callbr i32 asm "", "=rm,!i,~{dirflag},~{fpsr},~{flags}"()
+  // CHECK-NEXT:         to label %cleanup [label %indirect.split]
+  // CHECK:       asm.pref.mem:
+  // CHECK-NEXT:    callbr void asm "", "=*rm,!i,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
+  // CHECK-NEXT:         to label %asm.pref.mem.asm.merge_crit_edge [label %cleanup]
+  // CHECK:       asm.pref.mem.asm.merge_crit_edge:
+  // CHECK-NEXT:    = load i32, ptr %out, align 4, !tbaa !8
+  // CHECK-NEXT:    br label %cleanup
+  // CHECK:       indirect.split:
+  // CHECK-NEXT:    br label %cleanup
+  unsigned long out;
+  asm goto ("" : "=rm" (out) ::: indirect);
+  return out;
+
+indirect:
+  return 42;
+}
+
+// PR3908
+void test_pr3908(int r) {
+  // CHECK-LABEL: @test_pr3908
+  // CHECK:         callbr void @llvm.asm.constraint.br()
+  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
+  // CHECK:       asm.pref.reg:
+  // CHECK-NEXT:    = tail call i32 asm "# PR3908 $1 $3 $2 $0", "=r,mx,mr,x,0,~{dirflag},~{fpsr},~{flags}"(i32 0, i32 0, double 0.000000e+00, i32 %r)
+  // CHECK-NEXT:    br label %asm.merge
+  // CHECK:       asm.pref.mem:                                     ; preds = %entry
+  // CHECK-NEXT:    = tail call i32 asm "# PR3908 $1 $3 $2 $0", "=r,mx,mr,x,0,~{dirflag},~{fpsr},~{flags}"(i32 0, i32 0, double 0.000000e+00, i32 %r)
+  // CHECK-NEXT:    br label %asm.merge
+  __asm__ ("# PR3908 %[lf] %[xx] %[li] %[r]"
+           : [r] "+r" (r)
+           : [lf] "mx" (0), [li] "mr" (0), [xx] "x" ((double)(0)));
+}
diff --git a/clang/test/CodeGen/asm.c b/clang/test/CodeGen/asm.c
index d7465b22fbbf6..4c1cd2d493136 100644
--- a/clang/test/CodeGen/asm.c
+++ b/clang/test/CodeGen/asm.c
@@ -306,28 +306,3 @@ void t31(void) {
   // CHECK:         call void asm sideeffect "T31 CC NAMED MODIFIER: ${0:c}", "i,~{dirflag},~{fpsr},~{flags}"
   __asm__ volatile ("T31 CC NAMED MODIFIER: %cc[input]" : : [input] "i"  (4));
 }
-
-// TODO: Move the "rm" tests into a new testcase file once work to better
-// support "rm" constraints is done.
-
-void t32(int len) {
-  // CHECK-LABEL: @t32
-  // CHECK:         call void asm sideeffect "", "=*&rm,0,~{dirflag},~{fpsr},~{flags}"
-  __asm__ volatile ("" : "+&&rm" (len));
-}
-
-void t33(int len) {
-  // CHECK-LABEL: @t33
-  // CHECK:         call void asm sideeffect "", "=*%rm,=*rm,0,1,~{dirflag},~{fpsr},~{flags}"
-  __asm__ volatile ("" : "+%%rm" (len), "+rm" (len));
-}
-
-// PR3908
-void t34(int r) {
-  // CHECK-LABEL: @t34
-  // CHECK:         call i32 asm "PR3908 $1 $3 $2 $0", "=r,mx,mr,x,0,~{dirflag},~{fpsr},~{flags}"
-  // CHECK-SAME:      (i32 0, i32 0, double 0.000000e+00, i32 %{{.*}})
-  __asm__ ("PR3908 %[lf] %[xx] %[li] %[r]"
-           : [r] "+r" (r)
-           : [lf] "mx" (0), [li] "mr" (0), [xx] "x" ((double)(0)));
-}
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ed20a25cf06c8..b3f4dbca95920 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -16301,6 +16301,57 @@ Example:
         call void @llvm.call.preallocated.teardown(token %cs)
         ret void
 
+.. _int_asm_constraint_br:
+
+'``llvm.asm.constraint.br``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare void @llvm.asm.constraint.br()
+
+Overview:
+"""""""""
+
+The '``llvm.asm.constraint.br``' intrinsic is used when an inline asm
+constriant allows for either register or memory, e.g., '``"rm"``'.
+
+Semantics:
+""""""""""
+
+The '``llvm.asm.constraint.br``' allows the back-end to choose the best
+constraint rather than restricting the preferred constraint to one that may
+produce substandard code or cannot be handled by the register allocators.
+
+
+It can be called only by the '``callbr``' instruction. The default destination
+of the ``callbr`` contains a call to the preferred inline asm, while the single
+indirect destination contains a call to the pessimal inline asm.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+      %out = alloca i64, align 8
+      callbr void @llvm.asm.constraint.br()
+              to label %asm.pref.reg [label %asm.pref.mem]
+
+    asm.pref.reg:
+      %0 = tail call i64 asm sideeffect "pushf ; pop $0", "=rm,~{dirflag},~{fpsr},~{flags}"()
+      br label %asm.merge
+
+    asm.pref.mem:
+      call void asm sideeffect "pushf ; pop $0", "=*rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i64) %out)
+      %.pre = load i64, ptr %out, align 8, !tbaa !9
+      br label %asm.merge
+
+    asm.merge:
+      %1 = phi i64 [ %0, %asm.pref.reg ], [ %.pre, %asm.pref.mem ]
+
 Standard C/C++ Library Intrinsics
 ---------------------------------
 
diff --git a/llvm/include/llvm/CodeGen/InlineAsmPrepare.h b/llvm/include/llvm/CodeGen/InlineAsmPrepare.h
index fe52cf54259d1..187e7a8130d37 100644
--- a/llvm/include/llvm/CodeGen/InlineAsmPrepare.h
+++ b/llvm/include/llvm/CodeGen/InlineAsmPrepare.h
@@ -13,10 +13,16 @@
 
 namespace llvm {
 
-class InlineAsmPreparePass
-    : public RequiredPassInfoMixin<InlineAsmPreparePass> {
+class TargetMachine;
+
+class InlineAsmPreparePass : public PassInfoMixin<InlineAsmPreparePass> {
+  const TargetMachine *TM;
+
 public:
-  PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM);
+  explicit InlineAsmPreparePass(const TargetMachine &TM) : TM(&TM) {}
+  LLVM_ABI PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM);
+
+  static bool isRequired() { return true; }
 };
 
 } // namespace llvm
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index be344c6aaf40e..38ce7ecc8e465 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -622,6 +622,7 @@ LLVM_ABI ModulePass *createJMCInstrumenterPass();
 /// This pass converts conditional moves to conditional jumps when profitable.
 LLVM_ABI FunctionPass *createSelectOptimizePass();
 
+/// Process inline assembly calls to prepare for code generation.
 LLVM_ABI FunctionPass *createInlineAsmPreparePass();
 
 /// Creates Windows Secure Hot Patch pass. \see WindowsSecureHotPatching.cpp
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 318763113fb42..866722d0c239e 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5301,6 +5301,17 @@ class LLVM_ABI TargetLowering : public TargetLoweringBase {
     /// The ValueType for the operand value.
     MVT ConstraintVT = MVT::Other;
 
+    /// The register may be folded. This is used if the constraint has register
+    /// and memory constraints where we prefer using a register, but can fall
+    /// back to a memory slot under register pressure.
+    bool MayFoldRegister = false;
+
+    /// The index to the last matched constraint code.
+    long ConstraintIndex = -1;
+
+    /// The constraint was successfully assigned to the operand.
+    bool Finalized = false;
+
     /// Copy constructor for copying from a ConstraintInfo.
     AsmOperandInfo(InlineAsm::ConstraintInfo Info)
         : InlineAsm::ConstraintInfo(std::move(Info)) {}
@@ -5312,6 +5323,11 @@ class LLVM_ABI TargetLowering : public TargetLoweringBase {
     /// If this is an input matching constraint, this method returns the output
     /// operand it matches.
     LLVM_ABI unsigned getMatchedOperand() const;
+
+    /// Return true if there are no more constraints to try.
+    bool atFinalConstraint() const {
+      return ConstraintIndex >= static_cast<long>(Codes.size() - 1);
+    }
   };
 
   using AsmOperandInfoVector = std::vector<AsmOperandInfo>;
diff --git a/llvm/include/llvm/IR/InlineAsm.h b/llvm/include/llvm/IR/InlineAsm.h
index 564f2e7df2dd3..bab538e852467 100644
--- a/llvm/include/llvm/IR/InlineAsm.h
+++ b/llvm/include/llvm/IR/InlineAsm.h
@@ -181,6 +181,13 @@ class InlineAsm final : public Value {
     bool hasArg() const {
       return Type == isInput || (Type == isOutput && isIndirect);
     }
+
+    /// hasRegMemConstraints - Returns true if the constraint codes have
+    /// register and memory constraints. This is useful to let the register
+    /// allocator that it can use memory under register pressure.
+    bool hasRegMemConstraints() const {
+      return is_contained(Codes, "r") && is_contained(Codes, "m");
+    }
   };
 
   /// ParseConstraints - Split up the constraint string into the specific
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 993ddd7e33701..d71f1383c2d4a 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2021,6 +2021,19 @@ def int_stepvector : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
 def int_reloc_none : DefaultAttrsIntrinsic<[], [llvm_metadata_ty],
   [IntrNoMem, IntrHasSideEffects]>;
 
+// Intrinsic to select which version of an inline asm with reg-mem constraints
+// to use. The intrinsic must be called by the 'callbr' instruction with only
+// the normal destination and one indirect destination, both of which have only
+// one parent. The destinations are as follows:
+//
+//   - normal dest: the version favoring registers over memory (for '-O[1-3]')
+//   - indirect dest: the version favoring memory over registers (for '-O0')
+//
+// Once code-gen picks which block to take, the other block and the 'callbr'
+// instruction are dead and should be DCE'd. The destination block should then
+// be merged to its parent block.
+def int_asm_constraint_br : DefaultAttrsIntrinsic<[], [], [IntrNoMem]>;
+
 //===---------------- Vector Predication Intrinsics --------------===//
 // Memory Intrinsics
 def int_vp_store : DefaultAttrsIntrinsic<[],
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 9641ac7313c69..6d0b995ca45cf 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -854,7 +854,8 @@ void CodeGenPassBuilder<Derived, TargetMachineT>::addISelPrepare(
   if (Opt.RequiresCodeGenSCCOrder && !AddInCGSCCOrder)
     requireCGSCCOrder(PMW);
 
-  addFunctionPass(InlineAsmPreparePass(), PMW);
+  addFunctionPass(InlineAsmPreparePass(TM), PMW);
+
   // Add both the safe stack and the stack protection passes: each of them will
   // only protect functions that have corresponding attributes.
   addFunctionPass(SafeStackPass(TM), PMW);
diff --git a/llvm/lib/CodeGen/InlineAsmPrepare.cpp b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
index 0bbf55c3d31e4..0f745c5b2e1be 100644
--- a/llvm/lib/CodeGen/InlineAsmPrepare.cpp
+++ b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
@@ -31,6 +31,14 @@
 //   as an IR pass.  (If support for callbr in GlobalISel is implemented, it’s
 //   worth considering whether this is still required.)
 //
+// llvm.asm.constraint.br:
+//
+//   Remove the "llvm.asm.constraint.br" call and opt to prefer either
+//   "registers" (on the callbr's default path) or "memory" (on the callbr's
+//   indirect path). We choose the latter only when compiling at '-O0', because
+//   the fast register allocator isn't equipped to fold registers if register
+//   pressure is too great.
+//
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/InlineAsmPrepare.h"
@@ -39,7 +47,9 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/iterator.h"
 #include "llvm/Analysis/CFG.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/TargetPassConfig.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
@@ -49,6 +59,7 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
+#include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/SSAUpdater.h"
 
@@ -63,7 +74,8 @@ class InlineAsmPrepare : public FunctionPass {
   InlineAsmPrepare() : FunctionPass(ID) {}
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
-    AU.addPreserved<DominatorTreeWrapperPass>();
+    AU.addRequired<TargetPassConfig>();
+    AU.addRequired<DominatorTreeWrapperPass>();
   }
   bool runOnFunction(Function &F) override;
 
@@ -74,16 +86,21 @@ char InlineAsmPrepare::ID = 0;
 
 } // end anonymous namespace
 
-INITIALIZE_PASS_BEGIN(InlineAsmPrepare, "inline-asm-prepare",
-                      "Prepare inline asm insts", false, false)
+INITIALIZE_PASS_BEGIN(InlineAsmPrepare, DEBUG_TYPE, "Prepare inline asm insts",
+                      false, false)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
-INITIALIZE_PASS_END(InlineAsmPrepare, "inline-asm-prepare",
-                    "Prepare inline asm insts", false, false)
+INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
+INITIALIZE_PASS_END(InlineAsmPrepare, DEBUG_TYPE, "Prepare inline asm insts",
+                    false, false)
 
 FunctionPass *llvm::createInlineAsmPreparePass() {
   return new InlineAsmPrepare();
 }
 
+//===----------------------------------------------------------------------===//
+//                      Process CallBr instructions
+//===----------------------------------------------------------------------===//
+
 #ifndef NDEBUG
 static void printDebugDomInfo(const DominatorTree &DT, const Use &U,
                               const BasicBlock *BB, bool IsDefaultDest) {
@@ -170,7 +187,7 @@ static bool splitCriticalEdges(CallBrInst *CBR, DominatorTree *DT) {
 /// have a location to place the intrinsic. Then remap users of the original
 /// callbr output SSA value to instead point to the appropriate
 /// llvm.callbr.landingpad value.
-static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree &DT) {
+static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree *DT) {
   bool Changed = false;
   SmallPtrSet<const BasicBlock *, 4> Visited;
   IRBuilder<> Builder(CBR->getContext());
@@ -191,7 +208,7 @@ static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree &DT) {
     CallInst *Intrinsic = Builder.CreateIntrinsic(
         CBR->getType(), Intrinsic::callbr_landingpad, {CBR});
     SSAUpdate.AddAvailableValue(IndDest, Intrinsic);
-    updateSSA(DT, CBR, Intrinsic, SSAUpdate);
+    updateSSA(*DT, CBR, Intrinsic, SSAUpdate);
     Changed = true;
   }
 
@@ -202,35 +219,81 @@ static bool processCallBrInst(Function &F, CallBrInst *CBR, DominatorTree *DT) {
   bool Changed = false;
 
   Changed |= splitCriticalEdges(CBR, DT);
-  Changed |= insertIntrinsicCalls(CBR, *DT);
+  Changed |= insertIntrinsicCalls(CBR, DT);
+
+  return Changed;
+}
+
+//===----------------------------------------------------------------------===//
+//               Process 'llvm.asm.constraint.br' instructions
+//===----------------------------------------------------------------------===//
+
+static bool processAsmConstraintBrInst(Function &F, CallBrInst &CBR,
+                                       bool IsOptLevelNone, DomTreeUpdater *DTU,
+                                       const TargetMachine *TM) {
+  bool Changed = false;
+
+  BasicBlock *BB = CBR.getParent();
+  BasicBlock *PrefReg = CBR.getDefaultDest();
+  BasicBlock *PrefMem = CBR.getIndirectDest(0);
+  BasicBlock *Merge = isa<CallBrInst>(PrefReg->getTerminator())
+                          ? nullptr
+                          : PrefReg->getSingleSuccessor();
+
+  CBR.eraseFromParent();
+
+  if (IsOptLevelNone) {
+    DeleteDeadBlock(PrefReg, DTU);
+    IRBuilder(BB).CreateBr(PrefMem);
+    MergeBlockIntoPredecessor(PrefMem, DTU);
+    if (Merge)
+      MergeBlockIntoPredecessor(Merge, DTU);
+  } else {
+    DeleteDeadBlock(PrefMem, DTU);
+    IRBuilder(BB).CreateBr(PrefReg);
+    MergeBlockIntoPredecessor(PrefReg, DTU);
+    if (Merge)
+      MergeBlockIntoPredecessor(Merge, DTU);
+  }
+
+  DTU->flush();
 
   return Changed;
 }
 
-static SmallVector<CallBrInst *, 2> findCallBrs(Function &F) {
-  SmallVector<CallBrInst *, 2> CBRs;
-  for (BasicBlock &BB : F)
-    if (auto *CBR = dyn_cast<CallBrInst>(BB.getTerminator()))
-      if (!CBR->getType()->isVoidTy() && !CBR->use_empty())
-        CBRs.push_back(CBR);
-  return CBRs;
+static void getCallBrInsts(Function &F,
+                           SmallVectorImpl<CallBrInst *> &AsmConstraintBrs,
+                           SmallVectorImpl<CallBrInst *> &OtherCallBrs) {
+  for (auto &BB : F)
+    if (auto *CBR = dyn_cast<CallBrInst>(BB.getTerminator())) {
+      if (CBR->getIntrinsicID() == Intrinsic::asm_constraint_br)
+        AsmConstraintBrs.push_back(CBR);
+      else if (!CBR->getType()->isVoidTy() && !CBR->use_empty())
+        OtherCallBrs.push_back(CBR);
+    }
 }
 
-static bool runImpl(Function &F, ArrayRef<CallBrInst *> CBRs,
-                    DominatorTree *DT) {
+static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater *DTU,
+                    const TargetMachine *TM) {
   bool Changed = false;
+  SmallVector<CallBrInst *, 4> AsmConstraintBrs;
+  SmallVector<CallBrInst *, 4> OtherCallBrs;
+
+  getCallBrInsts(F, AsmConstraintBrs, OtherCallBrs);
 
-  for (CallBrInst *CBR : CBRs)
-    Changed |= processCallBrInst(F, CBR, DT);
+  // Process 'llvm.asm.constraint.br' instructions first.
+  for (auto *CBR : AsmConstraintBrs)
+    Changed |= processAsmConstraintBrInst(F, *CBR, IsOptLevelNone, DTU, TM);
+
+  // Process the rest of the 'callbr' instructions.
+  for (auto *CBR : OtherCallBrs)
+    if (!CBR->getType()->isVoidTy() && !CBR->use_empty())
+      Changed |= processCallBrInst(F, CBR, DTU ? &DTU->getDomTree() : nullptr);
 
   return Changed;
 }
 
 bool InlineAsmPrepare::runOnFunction(Function &F) {
-  SmallVector<CallBrInst *, 2> CBRs = findCallBrs(F);
-  if (CBRs.empty())
-    return false;
-
   // It's highly likely that most programs do not contain CallBrInsts. Follow a
   // similar pattern from SafeStackLegacyPass::runOnFunction to reuse previous
   // domtree analysis if available, otherwise compute it lazily. This avoids
@@ -238,27 +301,26 @@ bool InlineAsmPrepare::runOnFunction(Function &F) {
   // contain CallBrInsts. It does pessimize programs with callbr at higher
   // optimization levels, as the DominatorTree created here is not reused by
   // subsequent passes.
-  DominatorTree *DT;
+  const auto *TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>();
+  std::optional<DomTreeUpdater> DTU;
   std::optional<DominatorTree> LazilyComputedDomTree;
   if (auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>())
-    DT = &DTWP->getDomTree();
-  else {
-    LazilyComputedDomTree.emplace(F);
-    DT = &*LazilyComputedDomTree;
-  }
+    DTU.emplace(DTWP->getDomTree(), DomTreeUpdater::UpdateStrategy::Lazy);
+
+  bool IsOptLevelNone =
+      skipFunction(F) ? true : TM->getOptLevel() == CodeGenOptLevel::None;
 
-  return runImpl(F, CBRs, DT);
+  return runImpl(F, IsOptLevelNone, DTU ? &*DTU : nullptr, TM);
 }
 
 PreservedAnalyses InlineAsmPreparePass::run(Function &F,
                                             FunctionAnalysisManager &FAM) {
-  SmallVector<CallBrInst *, 2> CBRs = findCallBrs(F);
-  if (CBRs.empty())
-    return PreservedAnalyses::all();
-
   auto *DT = &FAM.getResult<DominatorTreeAnalysis>(F);
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
+  bool IsOptLevelNone =
+      F.hasOptNone() ? true : TM->getOptLevel() == CodeGenOptLevel::None;
 
-  if (runImpl(F, CBRs, DT)) {
+  if (runImpl(F, IsOptLevelNone, DT ? &DTU : nullptr, TM)) {
     PreservedAnalyses PA;
     PA.preserve<DominatorTreeAnalysis>();
     return PA;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 5753d74168e59..221c97ee83e5d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -1033,7 +1033,8 @@ void RegsForValue::getCopyToRegs(SDValue Val, SelectionDAG &DAG,
 }
 
 void RegsForValue::AddInlineAsmOperands(InlineAsm::Kind Code, bool HasMatching,
-                                        unsigned MatchingIdx, const SDLoc &dl,
+                                        unsigned MatchingIdx,
+                                        bool MayFoldRegister, const SDLoc &dl,
                                         SelectionDAG &DAG,
                                         std::vector<SDValue> &Ops) const {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
@@ -1050,6 +1051,7 @@ void RegsForValue::AddInlineAsmOperands(InlineAsm::Kind Code, bool HasMatching,
     const MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo();
     const TargetRegisterClass *RC = MRI.getRegClass(Regs.front());
     Flag.setRegClass(RC->getID());
+    Flag.setRegMayBeFolded(MayFoldRegister);
   }
 
   SDValue Res = DAG.getTargetConstant(Flag, dl, MVT::i32);
@@ -10195,6 +10197,8 @@ static bool isFunction(SDValue Op) {
 
 namespace {
 
+/// ConstraintDecisionInfo - A struct that holds information while determining
+/// which constraint to use for an inline asm operand.
 struct ConstraintDecisionInfo {
   SmallVector<SDISelAsmOperandInfo, 16> ConstraintOperands;
   std::vector<SDValue> AsmNodeOperands;
@@ -10206,6 +10210,14 @@ struct ConstraintDecisionInfo {
   raw_svector_ostream ErrorMsg;
 
   ConstraintDecisionInfo() : ErrorMsg(Buffer) {}
+
+  void reset() {
+    ConstraintOperands.clear();
+    AsmNodeOperands.clear();
+    Glue = SDValue();
+    Chain = SDValue();
+    BeginLabel = nullptr;
+  }
 };
 
 } // end anonymous namespace
@@ -10236,8 +10248,9 @@ constructOperandInfo(ConstraintDecisionInfo &Info,
         !isa<ConstantSDNode>(OpInfo.CallOperand)) {
       // We've delayed emitting a diagnostic like the "n" constraint because
       // inlining could cause an integer showing up.
-      Info.ErrorMsg << "constraint '" << T.ConstraintCode
-                    << "' expects an integer constant expression";
+      if (OpInfo.atFinalConstraint())
+        Info.ErrorMsg << "constraint '" << T.ConstraintCode
+                      << "' expects an integer constant expression";
       return true;
     }
 
@@ -10310,9 +10323,9 @@ computeConstraintToUse(ConstraintDecisionInfo &Info, const CallBase &Call,
     // need to provide an address for the memory input.
     if (OpInfo.ConstraintType == TargetLowering::C_Memory &&
         !OpInfo.isIndirect) {
-      assert((OpInfo.isMultipleAlternative ||
-              (OpInfo.Type == InlineAsm::isInput)) &&
-             "Can only indirectify direct input operands!");
+      assert(
+          (OpInfo.isMultipleAlternative || OpInfo.Type == InlineAsm::isInput) &&
+          "Can only indirectify direct input operands!");
 
       // Memory operands really want the address of the value.
       Info.Chain = getAddressForMemoryInput(Info.Chain, Builder.getCurSDLoc(),
@@ -10334,6 +10347,15 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
                                     SelectionDAGBuilder &Builder,
                                     const TargetLowering &TLI,
                                     SelectionDAG &DAG) {
+  // Registers before tied operands can't be folded, because the tied operand
+  // will move, which the back-end isn't able to properly account for.
+  bool Clear = false;
+  for (SDISelAsmOperandInfo &OpInfo : llvm::reverse(Info.ConstraintOperands)) {
+    Clear |= OpInfo.isMatchingInputConstraint();
+    if (Clear)
+      OpInfo.MayFoldRegister = false;
+  }
+
   SDLoc DL = Builder.getCurSDLoc();
   for (SDISelAsmOperandInfo &OpInfo : Info.ConstraintOperands) {
     // Assign Registers.
@@ -10343,12 +10365,14 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
             : OpInfo;
     const auto RegError = getRegistersForValue(DAG, DL, OpInfo, RefOpInfo);
     if (RegError) {
-      const MachineFunction &MF = DAG.getMachineFunction();
-      const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
-      const char *RegName = TRI.getName(*RegError);
-      Info.ErrorMsg << "register '" << RegName << "' allocated for constraint '"
-                    << OpInfo.ConstraintCode
-                    << "' does not match required type";
+      if (OpInfo.atFinalConstraint()) {
+        const MachineFunction &MF = DAG.getMachineFunction();
+        const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
+        const char *RegName = TRI.getName(*RegError);
+        Info.ErrorMsg << "register '" << RegName
+                      << "' allocated for constraint '" << OpInfo.ConstraintCode
+                      << "' does not match required type";
+      }
       return true;
     }
 
@@ -10358,8 +10382,10 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
 
       for (Register Reg : OpInfo.AssignedRegs.Regs) {
         if (Reg.isPhysical() && TRI.isInlineAsmReadOnlyReg(MF, Reg)) {
-          Info.ErrorMsg << "write to reserved register '"
-                        << TRI.getRegAsmName(Reg) << "'";
+          if (OpInfo.atFinalConstraint()) {
+            StringRef RegName = TRI.getRegAsmName(Reg);
+            Info.ErrorMsg << "write to reserved register '" << RegName << "'";
+          }
           return true;
         }
       }
@@ -10390,8 +10416,9 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
         // C_RegisterClass, and a target-defined fashion for
         // C_Immediate/C_Other). Find a register that we can use.
         if (OpInfo.AssignedRegs.Regs.empty()) {
-          Info.ErrorMsg << "could not allocate output register for "
-                        << "constraint '" << OpInfo.ConstraintCode << "'";
+          if (OpInfo.atFinalConstraint())
+            Info.ErrorMsg << "could not allocate output register for "
+                          << "constraint '" << OpInfo.ConstraintCode << "'";
           return true;
         }
 
@@ -10403,7 +10430,7 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
         OpInfo.AssignedRegs.AddInlineAsmOperands(
             OpInfo.isEarlyClobber ? InlineAsm::Kind::RegDefEarlyClobber
                                   : InlineAsm::Kind::RegDef,
-            false, 0, DL, DAG, Info.AsmNodeOperands);
+            false, 0, OpInfo.MayFoldRegister, DL, DAG, Info.AsmNodeOperands);
       }
       break;
 
@@ -10420,8 +10447,9 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
         if (Flag.isRegDefKind() || Flag.isRegDefEarlyClobberKind()) {
           if (OpInfo.isIndirect) {
             // This happens on gcc/testsuite/gcc.dg/pr8788-1.c
-            Info.ErrorMsg << "inline asm not supported yet: cannot handle "
-                          << "tied indirect register inputs";
+            if (OpInfo.atFinalConstraint())
+              Info.ErrorMsg << "inline asm not supported yet: cannot handle "
+                            << "tied indirect register inputs";
             return true;
           }
 
@@ -10444,9 +10472,9 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
           // Use the produced MatchedRegs object to
           MatchedRegs.getCopyToRegs(InOperandVal, DAG, DL, Info.Chain,
                                     &Info.Glue, &Call);
-          MatchedRegs.AddInlineAsmOperands(InlineAsm::Kind::RegUse, true,
-                                           OpInfo.getMatchedOperand(), DL, DAG,
-                                           Info.AsmNodeOperands);
+          MatchedRegs.AddInlineAsmOperands(
+              InlineAsm::Kind::RegUse, true, OpInfo.getMatchedOperand(),
+              OpInfo.MayFoldRegister, DL, DAG, Info.AsmNodeOperands);
           break;
         }
 
@@ -10569,8 +10597,9 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
 
       OpInfo.AssignedRegs.getCopyToRegs(InOperandVal, DAG, DL, Info.Chain,
                                         &Info.Glue, &Call);
-      OpInfo.AssignedRegs.AddInlineAsmOperands(
-          InlineAsm::Kind::RegUse, false, 0, DL, DAG, Info.AsmNodeOperands);
+      OpInfo.AssignedRegs.AddInlineAsmOperands(InlineAsm::Kind::RegUse, false,
+                                               0, OpInfo.MayFoldRegister, DL,
+                                               DAG, Info.AsmNodeOperands);
       break;
     }
 
@@ -10578,16 +10607,29 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
       // Add the clobbered value to the operand list, so that the register
       // allocator is aware that the physreg got clobbered.
       if (!OpInfo.AssignedRegs.Regs.empty())
-        OpInfo.AssignedRegs.AddInlineAsmOperands(
-            InlineAsm::Kind::Clobber, false, 0, DL, DAG, Info.AsmNodeOperands);
+        OpInfo.AssignedRegs.AddInlineAsmOperands(InlineAsm::Kind::Clobber,
+                                                 false, 0, false, DL, DAG,
+                                                 Info.AsmNodeOperands);
       break;
     }
+
+    OpInfo.Finalized = true;
   }
 
   return false;
 }
 
-/// DetermineConstraints - Find the constraints to use for inline asm operands.
+/// determineConstraints - ASM operands may have more than one constraint. We
+/// want to choose the "best" constraint for each operand to avoid horrible
+/// code generation---e.g., for "rm" we would like to use "r". This function
+/// tries different constraints in order from best to worst. If a given
+/// constraint isn't possible, e.g., because no registers are available, then
+/// the function returns 'true' and is rerun on the next constraint.
+///
+/// Each operand which has a suitable constraint is marked as "finalized". This
+/// helps reduce the number of times we need to run this function, keeping the
+/// complexity at O(n), where 'n' is the total number of constraints on inputs
+/// and outputs (i.e., for "rm", n == 2).
 static bool
 determineConstraints(ConstraintDecisionInfo &Info,
                      TargetLowering::AsmOperandInfoVector &TargetConstraints,
@@ -10652,9 +10694,12 @@ void SelectionDAGBuilder::visitInlineAsm(const CallBase &Call,
          "InvokeInst must have an EHPadBB");
 
   ConstraintDecisionInfo Info;
-  if (determineConstraints(Info, TargetConstraints, Call, *this, TLI, TM, DAG,
-                           EHPadBB))
-    return emitInlineAsmError(Call, Info.ErrorMsg.str());
+  while (determineConstraints(Info, TargetConstraints, Call, *this, TLI, TM,
+                              DAG, EHPadBB)) {
+    if (!Info.ErrorMsg.buffer().empty())
+      return emitInlineAsmError(Call, Info.ErrorMsg.str());
+    Info.reset();
+  }
 
   SDValue Glue = Info.Glue;
   SDValue Chain = Info.Chain;
@@ -10717,46 +10762,49 @@ void SelectionDAGBuilder::visitInlineAsm(const CallBase &Call,
 
   // Deal with output operands.
   for (SDISelAsmOperandInfo &OpInfo : Info.ConstraintOperands) {
-    if (OpInfo.Type == InlineAsm::isOutput) {
-      SDValue Val;
-      // Skip trivial output operands.
-      if (OpInfo.AssignedRegs.Regs.empty())
-        continue;
+    if (OpInfo.Type != InlineAsm::isOutput)
+      continue;
 
-      switch (OpInfo.ConstraintType) {
-      case TargetLowering::C_Register:
-      case TargetLowering::C_RegisterClass:
-        Val = OpInfo.AssignedRegs.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(),
-                                                  Chain, &Glue, &Call);
-        break;
-      case TargetLowering::C_Immediate:
-      case TargetLowering::C_Other:
-        Val = TLI.LowerAsmOutputForConstraint(Chain, Glue, getCurSDLoc(),
-                                              OpInfo, DAG);
-        break;
-      case TargetLowering::C_Memory:
-        break; // Already handled.
-      case TargetLowering::C_Address:
-        break; // Silence warning.
-      case TargetLowering::C_Unknown:
-        assert(false && "Unexpected unknown constraint");
-      }
+    SDValue Val;
 
-      // Indirect output manifest as stores. Record output chains.
-      if (OpInfo.isIndirect) {
-        const Value *Ptr = OpInfo.CallOperandVal;
-        assert(Ptr && "Expected value CallOperandVal for indirect asm operand");
-        SDValue Store = DAG.getStore(Chain, getCurSDLoc(), Val, getValue(Ptr),
-                                     MachinePointerInfo(Ptr));
-        OutChains.push_back(Store);
+    // Skip trivial output operands.
+    if (OpInfo.AssignedRegs.Regs.empty())
+      continue;
+
+    switch (OpInfo.ConstraintType) {
+    case TargetLowering::C_Register:
+    case TargetLowering::C_RegisterClass:
+      Val = OpInfo.AssignedRegs.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(),
+                                                Chain, &Glue, &Call);
+      break;
+    case TargetLowering::C_Immediate:
+    case TargetLowering::C_Other:
+      Val = TLI.LowerAsmOutputForConstraint(Chain, Glue, getCurSDLoc(), OpInfo,
+                                            DAG);
+      break;
+    case TargetLowering::C_Memory:
+      break; // Already handled.
+    case TargetLowering::C_Address:
+      break; // Silence warning.
+    case TargetLowering::C_Unknown:
+      assert(false && "Unexpected unknown constraint");
+    }
+
+    // Indirect output manifest as stores. Record output chains.
+    if (OpInfo.isIndirect) {
+      const Value *Ptr = OpInfo.CallOperandVal;
+      assert(Ptr && "Expected value CallOperandVal for indirect asm operand");
+      SDValue Store = DAG.getStore(Chain, getCurSDLoc(), Val, getValue(Ptr),
+                                   MachinePointerInfo(Ptr));
+      OutChains.push_back(Store);
+    } else {
+      // generate CopyFromRegs to associated registers.
+      assert(!Call.getType()->isVoidTy() && "Bad inline asm!");
+      if (Val.getOpcode() == ISD::MERGE_VALUES) {
+        for (const SDValue &V : Val->op_values())
+          handleRegAssign(V);
       } else {
-        // generate CopyFromRegs to associated registers.
-        assert(!Call.getType()->isVoidTy() && "Bad inline asm!");
-        if (Val.getOpcode() == ISD::MERGE_VALUES) {
-          for (const SDValue &V : Val->op_values())
-            handleRegAssign(V);
-        } else
-          handleRegAssign(Val);
+        handleRegAssign(Val);
       }
     }
   }
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index 21aac333a73cd..2a8beacc4cc78 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -827,8 +827,9 @@ struct RegsForValue {
   /// code marker, matching input operand index (if applicable), and includes
   /// the number of values added into it.
   void AddInlineAsmOperands(InlineAsm::Kind Code, bool HasMatching,
-                            unsigned MatchingIdx, const SDLoc &dl,
-                            SelectionDAG &DAG, std::vector<SDValue> &Ops) const;
+                            unsigned MatchingIdx, bool MayFoldRegister,
+                            const SDLoc &dl, SelectionDAG &DAG,
+                            std::vector<SDValue> &Ops) const;
 
   /// Check if the total RegCount is greater than one.
   bool occupiesMultipleRegs() const {
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 7e43794ef224b..988d2aad858ad 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -6027,6 +6027,19 @@ TargetLowering::ParseConstraints(const DataLayout &DL,
 
     OpInfo.ConstraintVT = MVT::Other;
 
+    // Special treatment for all platforms that can fold a register into a
+    // spill. This is used for a register-memory constraint, where we would
+    // vastly prefer to use 'r' over 'm'. The non-fast register allocators are
+    // able to handle the 'r' default by folding. The fast register allocator
+    // needs special handling to convert the instruction to use 'm' instead.
+    //
+    // This also applies to read-write "+rm" constraints (which generate a
+    // direct "=rm" output with a matching tied input). The register allocator
+    // can fold both the output and its tied input to the same memory slot when
+    // under pressure.
+    if (OpInfo.hasRegMemConstraints())
+      OpInfo.MayFoldRegister = true;
+
     // Compute the value type for each operand.
     switch (OpInfo.Type) {
     case InlineAsm::isOutput: {
@@ -6188,31 +6201,6 @@ TargetLowering::ParseConstraints(const DataLayout &DL,
   return ConstraintOperands;
 }
 
-/// Return a number indicating our preference for chosing a type of constraint
-/// over another, for the purpose of sorting them. Immediates are almost always
-/// preferrable (when they can be emitted). A higher return value means a
-/// stronger preference for one constraint type relative to another.
-/// FIXME: We should prefer registers over memory but doing so may lead to
-/// unrecoverable register exhaustion later.
-/// https://github.com/llvm/llvm-project/issues/20571
-static unsigned getConstraintPiority(TargetLowering::ConstraintType CT) {
-  switch (CT) {
-  case TargetLowering::C_Immediate:
-  case TargetLowering::C_Other:
-    return 4;
-  case TargetLowering::C_Memory:
-  case TargetLowering::C_Address:
-    return 3;
-  case TargetLowering::C_RegisterClass:
-    return 2;
-  case TargetLowering::C_Register:
-    return 1;
-  case TargetLowering::C_Unknown:
-    return 0;
-  }
-  llvm_unreachable("Invalid constraint type");
-}
-
 /// Examine constraint type and operand type and determine a weight value.
 /// This object must already have been set up with the operand type
 /// and the current alternative constraint selected.
@@ -6290,6 +6278,7 @@ TargetLowering::ConstraintWeight
 /// operand (e.g. "imr") try to pick the 'best' one.
 /// This is somewhat tricky: constraints (TargetLowering::ConstraintType) fall
 /// into seven classes:
+///
 ///    Register      -> one specific register
 ///    RegisterClass -> a group of regs
 ///    Memory        -> memory
@@ -6297,6 +6286,7 @@ TargetLowering::ConstraintWeight
 ///    Immediate     -> immediate values
 ///    Other         -> magic values (such as "Flag Output Operands")
 ///    Unknown       -> something we don't recognize yet and can't handle
+///
 /// Ideally, we would pick the most specific constraint possible: if we have
 /// something that fits into a register, we would pick it.  The problem here
 /// is that if we have something that could either be in a register or in
@@ -6318,12 +6308,17 @@ TargetLowering::ConstraintGroup TargetLowering::getConstraintPreferences(
   for (StringRef Code : OpInfo.Codes) {
     TargetLowering::ConstraintType CType = getConstraintType(Code);
 
-    // Indirect 'other' or 'immediate' constraints are not allowed.
+    // Indirect 'other' or 'immediate' constraints are not allowed for outputs.
     if (OpInfo.isIndirect && !(CType == TargetLowering::C_Memory ||
                                CType == TargetLowering::C_Register ||
                                CType == TargetLowering::C_RegisterClass))
       continue;
 
+    if (OpInfo.Type == InlineAsm::isOutput &&
+        (CType == TargetLowering::C_Other ||
+         CType == TargetLowering::C_Immediate))
+      continue;
+
     // Things with matching constraints can only be registers, per gcc
     // documentation.  This mainly affects "g" constraints.
     if (CType == TargetLowering::C_Memory && OpInfo.hasMatchingInput())
@@ -6332,8 +6327,33 @@ TargetLowering::ConstraintGroup TargetLowering::getConstraintPreferences(
     Ret.emplace_back(Code, CType);
   }
 
-  llvm::stable_sort(Ret, [](ConstraintPair a, ConstraintPair b) {
-    return getConstraintPiority(a.second) > getConstraintPiority(b.second);
+  // Return a number indicating our preference for choosing a type of
+  // constraint over another, for the purpose of sorting them. Immediates are
+  // almost always preferrable (when they can be emitted). A higher return
+  // value means a stronger preference for one constraint type relative to
+  // another.
+  const TargetMachine &TM = getTargetMachine();
+  bool PreferRegs =
+      TM.getOptLevel() != CodeGenOptLevel::None && OpInfo.MayFoldRegister;
+  auto getConstraintPriority = [&](TargetLowering::ConstraintType CT) {
+    switch (CT) {
+    case TargetLowering::C_Immediate:
+    case TargetLowering::C_Other:
+      return 4;
+    case TargetLowering::C_Memory:
+    case TargetLowering::C_Address:
+      return PreferRegs ? 1 : 3;
+    case TargetLowering::C_RegisterClass:
+      return PreferRegs ? 3 : 2;
+    case TargetLowering::C_Register:
+      return PreferRegs ? 2 : 1;
+    case TargetLowering::C_Unknown:
+      return 0;
+    }
+    llvm_unreachable("Invalid constraint type");
+  };
+  llvm::stable_sort(Ret, [&](ConstraintPair a, ConstraintPair b) {
+    return getConstraintPriority(a.second) > getConstraintPriority(b.second);
   });
 
   return Ret;
@@ -6364,22 +6384,29 @@ void TargetLowering::ComputeConstraintToUse(AsmOperandInfo &OpInfo,
                                             SelectionDAG *DAG) const {
   assert(!OpInfo.Codes.empty() && "Must have at least one constraint");
 
+  if (OpInfo.atFinalConstraint())
+    return;
+
   // Single-letter constraints ('r') are very common.
   if (OpInfo.Codes.size() == 1) {
     OpInfo.ConstraintCode = OpInfo.Codes[0];
     OpInfo.ConstraintType = getConstraintType(OpInfo.ConstraintCode);
-  } else {
+    OpInfo.ConstraintIndex = 0;
+  } else if (!OpInfo.Finalized) {
     ConstraintGroup G = getConstraintPreferences(OpInfo);
-    if (G.empty())
+    if (G.empty()) {
+      OpInfo.ConstraintIndex = OpInfo.Codes.size() - 1;
       return;
+    }
 
-    unsigned BestIdx = 0;
+    unsigned BestIdx = OpInfo.ConstraintIndex + 1;
     for (const unsigned E = G.size();
          BestIdx < E && (G[BestIdx].second == TargetLowering::C_Other ||
                          G[BestIdx].second == TargetLowering::C_Immediate);
          ++BestIdx) {
       if (lowerImmediateIfPossible(G[BestIdx], Op, DAG, *this))
         break;
+
       // If we're out of constraints, just pick the first one.
       if (BestIdx + 1 == E) {
         BestIdx = 0;
@@ -6389,6 +6416,7 @@ void TargetLowering::ComputeConstraintToUse(AsmOperandInfo &OpInfo,
 
     OpInfo.ConstraintCode = G[BestIdx].first;
     OpInfo.ConstraintType = G[BestIdx].second;
+    OpInfo.ConstraintIndex = BestIdx;
   }
 
   // 'X' matches anything.
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 438df2b604f3f..cd4f48e0f7ed4 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -3581,6 +3581,25 @@ void Verifier::visitCallBrInst(CallBrInst &CBI) {
             "Callbr amdgcn_kill indirect dest needs to be unreachable");
       break;
     }
+    case Intrinsic::asm_constraint_br: {
+      Check(CBI.getNumIndirectDests() == 1,
+            "Callbr asm_constraint_br only supports only one indirect dest");
+      Check(CBI.getDefaultDest()->hasNPredecessors(1),
+            "Callbr asm_constraint_br default dest must have only one "
+            "predecessor");
+      Check(isa<CallBrInst>(CBI.getDefaultDest()->getTerminator()) ||
+                CBI.getDefaultDest()->getSingleSuccessor(),
+            "Callbr asm_constraint_br default dest must have only "
+            "one successor");
+      Check(CBI.getIndirectDest(0)->hasNPredecessors(1),
+            "Callbr asm_constraint_br indirect dest must have only one "
+            "predecessor");
+      Check(isa<CallBrInst>(CBI.getIndirectDest(0)->getTerminator()) ||
+                CBI.getIndirectDest(0)->getSingleSuccessor(),
+            "Callbr asm_constraint_br indirect dest must have only "
+            "one successor");
+      break;
+    }
     default:
       CheckFailed(
           "Callbr currently only supports asm-goto and selected intrinsics");
@@ -7340,7 +7359,12 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
           "llvm.sponentry must return a pointer to the stack", &Call);
     break;
   }
-  };
+  case Intrinsic::asm_constraint_br: {
+    Check(isa<CallBrInst>(Call),
+          "llvm.asm.constraint.br must be called only by callbr", &Call);
+    break;
+  }
+  }
 
   // Verify that there aren't any unmediated control transfers between funclets.
   if (IntrinsicInst::mayLowerToFunctionCall(ID)) {
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 3328bc0fe836f..f50205984090d 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -460,7 +460,7 @@ FUNCTION_PASS("indirectbr-expand", IndirectBrExpandPass(*TM))
 FUNCTION_PASS("infer-address-spaces", InferAddressSpacesPass())
 FUNCTION_PASS("infer-alignment", InferAlignmentPass())
 FUNCTION_PASS("inject-tli-mappings", InjectTLIMappings())
-FUNCTION_PASS("inline-asm-prepare", InlineAsmPreparePass())
+FUNCTION_PASS("inline-asm-prepare", InlineAsmPreparePass(*TM))
 FUNCTION_PASS("instcount", InstCountPass())
 FUNCTION_PASS("instnamer", InstructionNamerPass())
 FUNCTION_PASS("instsimplify", InstSimplifyPass())
diff --git a/llvm/test/CodeGen/AArch64/O0-pipeline.ll b/llvm/test/CodeGen/AArch64/O0-pipeline.ll
index b363e11f62811..d9e408f9728af 100644
--- a/llvm/test/CodeGen/AArch64/O0-pipeline.ll
+++ b/llvm/test/CodeGen/AArch64/O0-pipeline.ll
@@ -28,6 +28,7 @@
 ; CHECK-NEXT:       Expand reduction intrinsics
 ; CHECK-NEXT:       AArch64 Stack Tagging
 ; CHECK-NEXT:       Exception handling preparation
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index 1a0ffe234a236..0e4f6274f7b72 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
+++ b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
@@ -103,6 +103,7 @@
 ; CHECK-NEXT:         Dominator Tree Construction
 ; CHECK-NEXT:     FunctionPass Manager
 ; CHECK-NEXT:       Merge internal globals
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/AArch64/inline-asm-prepare.ll b/llvm/test/CodeGen/AArch64/inline-asm-prepare.ll
index 13ed24692b35e..539177bf918d4 100644
--- a/llvm/test/CodeGen/AArch64/inline-asm-prepare.ll
+++ b/llvm/test/CodeGen/AArch64/inline-asm-prepare.ll
@@ -1,25 +1,25 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt %s -inline-asm-prepare -S -o - | FileCheck %s
-; RUN: opt %s -passes=inline-asm-prepare -S -o - | FileCheck %s
+; RUN: opt %s -mtriple=aarch64-unknown-unknown -inline-asm-prepare -S -o - | FileCheck %s
+; RUN: opt %s -mtriple=aarch64-unknown-unknown -passes=inline-asm-prepare -S -o - | FileCheck %s
 
 define i32 @test0() {
 ; CHECK-LABEL: @test0(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OUT:%.*]] = callbr i32 asm "# $0", "=r,!i"()
-; CHECK-NEXT:            to label [[DIRECT:%.*]] [label %entry.indirect_crit_edge]
+; CHECK-NEXT:            to label [[DIRECT:%.*]] [label [[ENTRY_INDIRECT_CRIT_EDGE:%.*]]]
 ; CHECK:       entry.indirect_crit_edge:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[OUT]])
 ; CHECK-NEXT:    br label [[INDIRECT:%.*]]
 ; CHECK:       direct:
 ; CHECK-NEXT:    [[OUT2:%.*]] = callbr i32 asm "# $0", "=r,!i"()
-; CHECK-NEXT:            to label [[DIRECT2:%.*]] [label %direct.indirect_crit_edge]
+; CHECK-NEXT:            to label [[DIRECT2:%.*]] [label [[DIRECT_INDIRECT_CRIT_EDGE:%.*]]]
 ; CHECK:       direct.indirect_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[OUT2]])
 ; CHECK-NEXT:    br label [[INDIRECT]]
 ; CHECK:       direct2:
 ; CHECK-NEXT:    ret i32 0
 ; CHECK:       indirect:
-; CHECK-NEXT:    [[OUT3:%.*]] = phi i32 [ [[TMP0]], [[ENTRY_INDIRECT_CRIT_EDGE:%.*]] ], [ [[TMP1]], [[DIRECT_INDIRECT_CRIT_EDGE:%.*]] ]
+; CHECK-NEXT:    [[OUT3:%.*]] = phi i32 [ [[TMP0]], [[ENTRY_INDIRECT_CRIT_EDGE]] ], [ [[TMP1]], [[DIRECT_INDIRECT_CRIT_EDGE]] ]
 ; CHECK-NEXT:    ret i32 [[OUT3]]
 ;
 entry:
@@ -42,7 +42,7 @@ define i32 @dont_split0() {
 ; CHECK-LABEL: @dont_split0(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    callbr void asm "", "!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %y]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[Y:%.*]]]
 ; CHECK:       x:
 ; CHECK-NEXT:    ret i32 42
 ; CHECK:       y:
@@ -68,7 +68,7 @@ define i32 @dont_split1() {
 ; CHECK-LABEL: @dont_split1(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %y]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[Y:%.*]]]
 ; CHECK:       x:
 ; CHECK-NEXT:    ret i32 42
 ; CHECK:       y:
@@ -93,9 +93,9 @@ define i32 @dont_split2() {
 ; CHECK-LABEL: @dont_split2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    callbr void asm "", "!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %y]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[Y:%.*]]]
 ; CHECK:       x:
-; CHECK-NEXT:    br label [[Y:%.*]]
+; CHECK-NEXT:    br label [[Y]]
 ; CHECK:       y:
 ; CHECK-NEXT:    [[TMP0:%.*]] = phi i32 [ 0, [[X]] ], [ 42, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    ret i32 [[TMP0]]
@@ -119,9 +119,9 @@ define i32 @dont_split3() {
 ; CHECK-LABEL: @dont_split3(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %v]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[V:%.*]]]
 ; CHECK:       x:
-; CHECK-NEXT:    br label [[V:%.*]]
+; CHECK-NEXT:    br label [[V]]
 ; CHECK:       v:
 ; CHECK-NEXT:    ret i32 42
 ;
@@ -142,14 +142,14 @@ define i32 @split_me0() {
 ; CHECK-LABEL: @split_me0(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %entry.y_crit_edge]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[ENTRY_Y_CRIT_EDGE:%.*]]]
 ; CHECK:       entry.y_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[Y:%.*]]
 ; CHECK:       x:
 ; CHECK-NEXT:    br label [[Y]]
 ; CHECK:       y:
-; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_Y_CRIT_EDGE:%.*]] ], [ 42, [[X]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_Y_CRIT_EDGE]] ], [ 42, [[X]] ]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
 entry:
@@ -173,7 +173,7 @@ define i32 @split_me1(i1 %z) {
 ; CHECK-NEXT:    br i1 [[Z:%.*]], label [[W:%.*]], label [[V:%.*]]
 ; CHECK:       w:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label [[W_V_CRIT_EDGE:%.*]], label %w.v_crit_edge]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[W_V_CRIT_EDGE:%.*]], label [[W_V_CRIT_EDGE]]]
 ; CHECK:       w.v_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[V]]
@@ -206,7 +206,7 @@ define i32 @split_me2(i1 %z) {
 ; CHECK-NEXT:    br i1 [[Z:%.*]], label [[W:%.*]], label [[V:%.*]]
 ; CHECK:       w:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label [[W_V_CRIT_EDGE:%.*]], label %w.v_crit_edge]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[W_V_CRIT_EDGE:%.*]], label [[W_V_CRIT_EDGE]]]
 ; CHECK:       w.v_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[V]]
@@ -236,14 +236,14 @@ define i32 @dont_split4() {
 ; CHECK-LABEL: @dont_split4(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %y]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[Y:%.*]]]
 ; CHECK:       x:
 ; CHECK-NEXT:    br label [[OUT:%.*]]
 ; CHECK:       y:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[OUT]]
 ; CHECK:       out:
-; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[Y:%.*]] ], [ [[TMP0]], [[X]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[Y]] ], [ [[TMP0]], [[X]] ]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
 entry:
@@ -265,12 +265,12 @@ define i32 @dont_split5() {
 ; CHECK-LABEL: @dont_split5(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[OUT:%.*]] [label %y]
+; CHECK-NEXT:            to label [[OUT:%.*]] [label [[Y:%.*]]]
 ; CHECK:       y:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[OUT]]
 ; CHECK:       out:
-; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[Y:%.*]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[Y]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
 entry:
@@ -289,14 +289,14 @@ define i32 @split_me3() {
 ; CHECK-LABEL: @split_me3(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[Y:%.*]] [label %entry.out_crit_edge]
+; CHECK-NEXT:            to label [[Y:%.*]] [label [[ENTRY_OUT_CRIT_EDGE:%.*]]]
 ; CHECK:       entry.out_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[OUT:%.*]]
 ; CHECK:       y:
 ; CHECK-NEXT:    br label [[OUT]]
 ; CHECK:       out:
-; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_OUT_CRIT_EDGE:%.*]] ], [ [[TMP0]], [[Y]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_OUT_CRIT_EDGE]] ], [ [[TMP0]], [[Y]] ]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
 entry:
@@ -318,7 +318,7 @@ define i32 @dont_split6(i32 %0) {
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ [[TMP0:%.*]], [[ENTRY:%.*]] ], [ [[TMP3:%.*]], [[LOOP_LOOP_CRIT_EDGE:%.*]] ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = callbr i32 asm "", "=r,0,!i"(i32 [[TMP1]])
-; CHECK-NEXT:            to label [[EXIT:%.*]] [label %loop.loop_crit_edge]
+; CHECK-NEXT:            to label [[EXIT:%.*]] [label [[LOOP_LOOP_CRIT_EDGE]]]
 ; CHECK:       loop.loop_crit_edge:
 ; CHECK-NEXT:    [[TMP3]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP2]])
 ; CHECK-NEXT:    br label [[LOOP]]
@@ -339,12 +339,12 @@ define i32 @split_me4() {
 ; CHECK-LABEL: @split_me4(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[SAME:%.*]] [label %entry.same_crit_edge]
+; CHECK-NEXT:            to label [[SAME:%.*]] [label [[ENTRY_SAME_CRIT_EDGE:%.*]]]
 ; CHECK:       entry.same_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[SAME]]
 ; CHECK:       same:
-; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_SAME_CRIT_EDGE:%.*]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_SAME_CRIT_EDGE]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
 entry:
@@ -358,12 +358,12 @@ define i32 @split_me5() {
 ; CHECK-LABEL: @split_me5(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[SAME:%.*]] [label %entry.same_crit_edge]
+; CHECK-NEXT:            to label [[SAME:%.*]] [label [[ENTRY_SAME_CRIT_EDGE:%.*]]]
 ; CHECK:       entry.same_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.callbr.landingpad.i32(i32 [[TMP0]])
 ; CHECK-NEXT:    br label [[SAME]]
 ; CHECK:       same:
-; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_SAME_CRIT_EDGE:%.*]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi i32 [ [[TMP1]], [[ENTRY_SAME_CRIT_EDGE]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
 entry:
@@ -379,18 +379,18 @@ define i64 @split_me6() {
 ; CHECK-LABEL: @split_me6(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i64 asm "# $0 $1", "={dx},!i"()
-; CHECK-NEXT:            to label [[ASM_FALLTHROUGH:%.*]] [label %entry.foo_crit_edge]
+; CHECK-NEXT:            to label [[ASM_FALLTHROUGH:%.*]] [label [[ENTRY_FOO_CRIT_EDGE:%.*]]]
 ; CHECK:       entry.foo_crit_edge:
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.callbr.landingpad.i64(i64 [[TMP0]])
 ; CHECK-NEXT:    br label [[FOO:%.*]]
 ; CHECK:       asm.fallthrough:
 ; CHECK-NEXT:    [[TMP2:%.*]] = callbr i64 asm "# $0 $1", "={bx},!i"()
-; CHECK-NEXT:            to label [[FOO]] [label %asm.fallthrough.foo_crit_edge]
+; CHECK-NEXT:            to label [[FOO]] [label [[ASM_FALLTHROUGH_FOO_CRIT_EDGE:%.*]]]
 ; CHECK:       asm.fallthrough.foo_crit_edge:
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.callbr.landingpad.i64(i64 [[TMP2]])
 ; CHECK-NEXT:    br label [[FOO]]
 ; CHECK:       foo:
-; CHECK-NEXT:    [[X_0:%.*]] = phi i64 [ [[TMP1]], [[ENTRY_FOO_CRIT_EDGE:%.*]] ], [ [[TMP3]], [[ASM_FALLTHROUGH_FOO_CRIT_EDGE:%.*]] ], [ [[TMP2]], [[ASM_FALLTHROUGH]] ]
+; CHECK-NEXT:    [[X_0:%.*]] = phi i64 [ [[TMP1]], [[ENTRY_FOO_CRIT_EDGE]] ], [ [[TMP3]], [[ASM_FALLTHROUGH_FOO_CRIT_EDGE]] ], [ [[TMP2]], [[ASM_FALLTHROUGH]] ]
 ; CHECK-NEXT:    ret i64 [[X_0]]
 ;
 entry:
@@ -412,7 +412,7 @@ define i32 @multiple_split() {
 ; CHECK-LABEL: @multiple_split(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=r,!i"()
-; CHECK-NEXT:            to label [[X:%.*]] [label %y]
+; CHECK-NEXT:            to label [[X:%.*]] [label [[Y:%.*]]]
 ; CHECK:       x:
 ; CHECK-NEXT:    ret i32 42
 ; CHECK:       y:
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index 070c873798647..f793663711f95 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -94,6 +94,7 @@
 ; GCN-O0-NEXT:    Call Graph SCC Pass Manager
 ; GCN-O0-NEXT:      DummyCGSCCPass
 ; GCN-O0-NEXT:      FunctionPass Manager
+; GCN-O0-NEXT:        Dominator Tree Construction
 ; GCN-O0-NEXT:        Prepare inline asm insts
 ; GCN-O0-NEXT:        Safe Stack instrumentation pass
 ; GCN-O0-NEXT:        Insert stack protectors
@@ -298,6 +299,7 @@
 ; GCN-O1-NEXT:    Call Graph SCC Pass Manager
 ; GCN-O1-NEXT:      DummyCGSCCPass
 ; GCN-O1-NEXT:      FunctionPass Manager
+; GCN-O1-NEXT:        Dominator Tree Construction
 ; GCN-O1-NEXT:        Prepare inline asm insts
 ; GCN-O1-NEXT:        Safe Stack instrumentation pass
 ; GCN-O1-NEXT:        Insert stack protectors
@@ -609,6 +611,7 @@
 ; GCN-O1-OPTS-NEXT:    Call Graph SCC Pass Manager
 ; GCN-O1-OPTS-NEXT:      DummyCGSCCPass
 ; GCN-O1-OPTS-NEXT:      FunctionPass Manager
+; GCN-O1-OPTS-NEXT:        Dominator Tree Construction
 ; GCN-O1-OPTS-NEXT:        Prepare inline asm insts
 ; GCN-O1-OPTS-NEXT:        Safe Stack instrumentation pass
 ; GCN-O1-OPTS-NEXT:        Insert stack protectors
@@ -931,6 +934,7 @@
 ; GCN-O2-NEXT:      Analysis if a function is memory bound
 ; GCN-O2-NEXT:      DummyCGSCCPass
 ; GCN-O2-NEXT:      FunctionPass Manager
+; GCN-O2-NEXT:        Dominator Tree Construction
 ; GCN-O2-NEXT:        Prepare inline asm insts
 ; GCN-O2-NEXT:        Safe Stack instrumentation pass
 ; GCN-O2-NEXT:        Insert stack protectors
@@ -1267,6 +1271,7 @@
 ; GCN-O3-NEXT:      Analysis if a function is memory bound
 ; GCN-O3-NEXT:      DummyCGSCCPass
 ; GCN-O3-NEXT:      FunctionPass Manager
+; GCN-O3-NEXT:        Dominator Tree Construction
 ; GCN-O3-NEXT:        Prepare inline asm insts
 ; GCN-O3-NEXT:        Safe Stack instrumentation pass
 ; GCN-O3-NEXT:        Insert stack protectors
diff --git a/llvm/test/CodeGen/ARM/O3-pipeline.ll b/llvm/test/CodeGen/ARM/O3-pipeline.ll
index 9f4d70531a3f7..b3d18cf77afa6 100644
--- a/llvm/test/CodeGen/ARM/O3-pipeline.ll
+++ b/llvm/test/CodeGen/ARM/O3-pipeline.ll
@@ -66,6 +66,7 @@
 ; CHECK-NEXT:        Transform predicated vector loops to use MVE tail predication
 ; CHECK-NEXT:      A No-Op Barrier Pass
 ; CHECK-NEXT:      FunctionPass Manager
+; CHECK-NEXT:      Dominator Tree Construction
 ; CHECK-NEXT:      Prepare inline asm insts
 ; CHECK-NEXT:      Safe Stack instrumentation pass
 ; CHECK-NEXT:      Insert stack protectors
diff --git a/llvm/test/CodeGen/LoongArch/O0-pipeline.ll b/llvm/test/CodeGen/LoongArch/O0-pipeline.ll
index bf519342fa4cc..b4cd101728c35 100644
--- a/llvm/test/CodeGen/LoongArch/O0-pipeline.ll
+++ b/llvm/test/CodeGen/LoongArch/O0-pipeline.ll
@@ -31,6 +31,7 @@
 ; CHECK-NEXT:       Scalarize Masked Memory Intrinsics
 ; CHECK-NEXT:       Expand reduction intrinsics
 ; CHECK-NEXT:       Exception handling preparation
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/LoongArch/opt-pipeline.ll b/llvm/test/CodeGen/LoongArch/opt-pipeline.ll
index 2657a575aa8af..300e095397da8 100644
--- a/llvm/test/CodeGen/LoongArch/opt-pipeline.ll
+++ b/llvm/test/CodeGen/LoongArch/opt-pipeline.ll
@@ -78,6 +78,7 @@
 ; LAXX-NEXT:       Safe Stack instrumentation pass
 ; LAXX-NEXT:       Insert stack protectors
 ; LAXX-NEXT:       Module Verifier
+; LAXX-NEXT:       Dominator Tree Construction
 ; LAXX-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; LAXX-NEXT:       Function Alias Analysis Results
 ; LAXX-NEXT:       Natural Loop Information
diff --git a/llvm/test/CodeGen/PowerPC/O0-pipeline.ll b/llvm/test/CodeGen/PowerPC/O0-pipeline.ll
index b0ba623edfb0a..487db616e57d3 100644
--- a/llvm/test/CodeGen/PowerPC/O0-pipeline.ll
+++ b/llvm/test/CodeGen/PowerPC/O0-pipeline.ll
@@ -30,6 +30,7 @@
 ; CHECK-NEXT:       Scalarize Masked Memory Intrinsics
 ; CHECK-NEXT:       Expand reduction intrinsics
 ; CHECK-NEXT:       Exception handling preparation
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/PowerPC/O3-pipeline.ll b/llvm/test/CodeGen/PowerPC/O3-pipeline.ll
index 3901d122f4494..910723b75678f 100644
--- a/llvm/test/CodeGen/PowerPC/O3-pipeline.ll
+++ b/llvm/test/CodeGen/PowerPC/O3-pipeline.ll
@@ -87,6 +87,7 @@
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
 ; CHECK-NEXT:       Module Verifier
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; CHECK-NEXT:       Function Alias Analysis Results
 ; CHECK-NEXT:       Natural Loop Information
diff --git a/llvm/test/CodeGen/RISCV/O0-pipeline.ll b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
index 847a8bd96c6d6..2a56f94815db8 100644
--- a/llvm/test/CodeGen/RISCV/O0-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
@@ -32,6 +32,7 @@
 ; CHECK-NEXT:       Scalarize Masked Memory Intrinsics
 ; CHECK-NEXT:       Expand reduction intrinsics
 ; CHECK-NEXT:       Exception handling preparation
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 149764ffedf9e..d8ac426223bee 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -90,6 +90,7 @@
 ; CHECK-NEXT:     A No-Op Barrier Pass
 ; CHECK-NEXT:     FunctionPass Manager
 ; CHECK-NEXT:       Merge internal globals
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/SPIRV/llc-pipeline.ll b/llvm/test/CodeGen/SPIRV/llc-pipeline.ll
index c9234576df30d..b7e0a01035603 100644
--- a/llvm/test/CodeGen/SPIRV/llc-pipeline.ll
+++ b/llvm/test/CodeGen/SPIRV/llc-pipeline.ll
@@ -50,6 +50,7 @@
 ; SPIRV-O0-NEXT:    SPIRV emit intrinsics
 ; SPIRV-O0-NEXT:    FunctionPass Manager
 ; SPIRV-O0-NEXT:      SPIRV legalize pointer cast pass
+; SPIRV-O0-NEXT:      Dominator Tree Construction
 ; SPIRV-O0-NEXT:      Prepare inline asm insts
 ; SPIRV-O0-NEXT:      Safe Stack instrumentation pass
 ; SPIRV-O0-NEXT:      Insert stack protectors
@@ -164,6 +165,7 @@
 ; SPIRV-Opt-NEXT:    SPIRV emit intrinsics
 ; SPIRV-Opt-NEXT:    FunctionPass Manager
 ; SPIRV-Opt-NEXT:      SPIRV legalize pointer cast pass
+; SPIRV-Opt-NEXT:      Dominator Tree Construction
 ; SPIRV-Opt-NEXT:      Prepare inline asm insts
 ; SPIRV-Opt-NEXT:      Safe Stack instrumentation pass
 ; SPIRV-Opt-NEXT:      Insert stack protectors
diff --git a/llvm/test/CodeGen/X86/O0-pipeline.ll b/llvm/test/CodeGen/X86/O0-pipeline.ll
index e8a3084563573..f7a213829dde1 100644
--- a/llvm/test/CodeGen/X86/O0-pipeline.ll
+++ b/llvm/test/CodeGen/X86/O0-pipeline.ll
@@ -32,6 +32,7 @@
 ; CHECK-NEXT:       Expand reduction intrinsics
 ; CHECK-NEXT:       Expand indirectbr instructions
 ; CHECK-NEXT:       Exception handling preparation
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Prepare inline asm insts
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
diff --git a/llvm/test/CodeGen/X86/asm-constraints-torture.ll b/llvm/test/CodeGen/X86/asm-constraints-torture.ll
new file mode 100644
index 0000000000000..f34d5ae8b222f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/asm-constraints-torture.ll
@@ -0,0 +1,787 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --filter "^\t#" --version 4
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -O0 < %s | FileCheck --check-prefixes=O0 %s
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu     < %s | FileCheck --check-prefixes=O2 %s
+
+; The non-fast register allocators should use registers when there isn't
+; register pressure.
+
+define dso_local i32 @test_rm_input_no_pressure(ptr noundef readonly captures(none) %foo) local_unnamed_addr {
+; O0-LABEL: test_rm_input_no_pressure:
+; O0:    #APP
+; O0:    # rm input: no pressure
+; O0:    # -{{[0-9]+}}(%rsp), -{{[0-9]+}}(%rsp), -{{[0-9]+}}(%rsp), -{{[0-9]+}}(%rsp), -{{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_rm_input_no_pressure:
+; O2:    #APP
+; O2:    # rm input: no pressure
+; O2:    # %eax, %ecx, %edx, %esi, %r8d
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %0 = load i32, ptr %foo, align 4
+  %b2 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %1 = load i32, ptr %b2, align 4
+  %c3 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %2 = load i32, ptr %c3, align 4
+  %d4 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %3 = load i32, ptr %d4, align 4
+  %e5 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %4 = load i32, ptr %e5, align 4
+  tail call void asm sideeffect "# rm input: no pressure\0A\09# $0, $1, $2, $3, $4", "rm,rm,rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4)
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %5 = load i32, ptr %foo, align 4
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %6 = load i32, ptr %b, align 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %7 = load i32, ptr %c, align 4
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %8 = load i32, ptr %d, align 4
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %9 = load i32, ptr %e, align 4
+  tail call void asm sideeffect "# rm input: no pressure\0A\09# $0, $1, $2, $3, $4", "rm,rm,rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(i32 %5, i32 %6, i32 %7, i32 %8, i32 %9)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %10 = load i32, ptr %foo, align 4
+  ret i32 %10
+}
+
+define dso_local i32 @test_rm_pressure(ptr noundef readonly captures(none) %foo) local_unnamed_addr {
+; O0-LABEL: test_rm_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # rm input: pressure
+; O0:    # {{[0-9]+}}(%rsp), {{[0-9]+}}(%rsp), {{[0-9]+}}(%rsp), {{[0-9]+}}(%rsp), {{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_rm_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # rm input: pressure
+; O2:    # %esi, %edi, %r8d, %r9d, %eax
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %1 = load i32, ptr %foo, align 4
+  %b16 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %2 = load i32, ptr %b16, align 4
+  %c17 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %3 = load i32, ptr %c17, align 4
+  %d18 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %4 = load i32, ptr %d18, align 4
+  %e19 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %5 = load i32, ptr %e19, align 4
+  tail call void asm sideeffect "# rm input: pressure\0A\09# $0, $1, $2, $3, $4", "rm,rm,rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(i32 %1, i32 %2, i32 %3, i32 %4, i32 %5)
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %6 = load i32, ptr %foo, align 4
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %7 = load i32, ptr %b, align 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %8 = load i32, ptr %c, align 4
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %9 = load i32, ptr %d, align 4
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %10 = load i32, ptr %e, align 4
+  tail call void asm sideeffect "# rm input: pressure\0A\09# $0, $1, $2, $3, $4", "rm,rm,rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(i32 %6, i32 %7, i32 %8, i32 %9, i32 %10)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %11 = load i32, ptr %foo, align 4
+  ret i32 %11
+}
+
+define dso_local i32 @test_output_no_pressure(ptr noundef writeonly captures(none) initializes((0, 20)) %foo) local_unnamed_addr {
+; O0-LABEL: test_output_no_pressure:
+; O0:    #APP
+; O0:    # rm output: no pressure
+; O0:    # (%rdi), (%rax), (%rcx), (%rdx), (%rsi)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_output_no_pressure:
+; O2:    #APP
+; O2:    # rm output: no pressure
+; O2:    # %eax, %ecx, %edx, %esi, %r8d
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b2 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c3 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d4 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e5 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %0 = tail call { i32, i32, i32, i32, i32 } asm sideeffect "# rm output: no pressure\0A\09# $0, $1, $2, $3, $4", "=rm,=rm,=rm,=rm,=rm,~{dirflag},~{fpsr},~{flags}"()
+  %asmresult = extractvalue { i32, i32, i32, i32, i32 } %0, 0
+  %asmresult6 = extractvalue { i32, i32, i32, i32, i32 } %0, 1
+  %asmresult7 = extractvalue { i32, i32, i32, i32, i32 } %0, 2
+  %asmresult8 = extractvalue { i32, i32, i32, i32, i32 } %0, 3
+  %asmresult9 = extractvalue { i32, i32, i32, i32, i32 } %0, 4
+  store i32 %asmresult, ptr %foo, align 4
+  store i32 %asmresult6, ptr %b2, align 4
+  store i32 %asmresult7, ptr %c3, align 4
+  store i32 %asmresult8, ptr %d4, align 4
+  store i32 %asmresult9, ptr %e5, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  tail call void asm sideeffect "# rm output: no pressure\0A\09# $0, $1, $2, $3, $4", "=*rm,=*rm,=*rm,=*rm,=*rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, ptr nonnull elementtype(i32) %d, ptr nonnull elementtype(i32) %e)
+  %.pre = load i32, ptr %foo, align 4
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %1 = phi i32 [ %asmresult, %asm.pref.reg ], [ %.pre, %asm.pref.mem ]
+  ret i32 %1
+}
+
+define dso_local i32 @test_output_pressure(ptr noundef writeonly captures(none) initializes((0, 20)) %foo) local_unnamed_addr {
+; O0-LABEL: test_output_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # rm output: pressure
+; O0:    # (%rdi), (%rax), (%rcx), (%rdx), (%rsi)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_output_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # rm output: pressure
+; O2:    # %esi, %edi, %r8d, %r9d, %eax
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b16 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c17 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d18 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e19 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %1 = tail call { i32, i32, i32, i32, i32 } asm sideeffect "# rm output: pressure\0A\09# $0, $1, $2, $3, $4", "=rm,=rm,=rm,=rm,=rm,~{dirflag},~{fpsr},~{flags}"()
+  %asmresult20 = extractvalue { i32, i32, i32, i32, i32 } %1, 0
+  %asmresult21 = extractvalue { i32, i32, i32, i32, i32 } %1, 1
+  %asmresult22 = extractvalue { i32, i32, i32, i32, i32 } %1, 2
+  %asmresult23 = extractvalue { i32, i32, i32, i32, i32 } %1, 3
+  %asmresult24 = extractvalue { i32, i32, i32, i32, i32 } %1, 4
+  store i32 %asmresult20, ptr %foo, align 4
+  store i32 %asmresult21, ptr %b16, align 4
+  store i32 %asmresult22, ptr %c17, align 4
+  store i32 %asmresult23, ptr %d18, align 4
+  store i32 %asmresult24, ptr %e19, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  tail call void asm sideeffect "# rm output: pressure\0A\09# $0, $1, $2, $3, $4", "=*rm,=*rm,=*rm,=*rm,=*rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, ptr nonnull elementtype(i32) %d, ptr nonnull elementtype(i32) %e)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %2 = load i32, ptr %foo, align 4
+  ret i32 %2
+}
+
+define dso_local i32 @test_tied_output_no_pressure(ptr noundef captures(none) %foo) local_unnamed_addr {
+; O0-LABEL: test_tied_output_no_pressure:
+; O0:    #APP
+; O0:    # rm tied output: no pressure
+; O0:    # %eax, %edx, %r8d, %r10d, %ebx
+; O0:    #NO_APP
+;
+; O2-LABEL: test_tied_output_no_pressure:
+; O2:    #APP
+; O2:    # rm tied output: no pressure
+; O2:    # %eax, %ecx, %edx, %esi, %r8d
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %0 = load i32, ptr %foo, align 4
+  %b2 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %1 = load i32, ptr %b2, align 4
+  %c3 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %2 = load i32, ptr %c3, align 4
+  %d4 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %3 = load i32, ptr %d4, align 4
+  %e5 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %4 = load i32, ptr %e5, align 4
+  %5 = tail call { i32, i32, i32, i32, i32 } asm sideeffect "# rm tied output: no pressure\0A\09# $0, $1, $2, $3, $4", "=rm,=rm,=rm,=rm,=rm,0,1,2,3,4,~{dirflag},~{fpsr},~{flags}"(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4)
+  %asmresult = extractvalue { i32, i32, i32, i32, i32 } %5, 0
+  %asmresult6 = extractvalue { i32, i32, i32, i32, i32 } %5, 1
+  %asmresult7 = extractvalue { i32, i32, i32, i32, i32 } %5, 2
+  %asmresult8 = extractvalue { i32, i32, i32, i32, i32 } %5, 3
+  %asmresult9 = extractvalue { i32, i32, i32, i32, i32 } %5, 4
+  store i32 %asmresult, ptr %foo, align 4
+  store i32 %asmresult6, ptr %b2, align 4
+  store i32 %asmresult7, ptr %c3, align 4
+  store i32 %asmresult8, ptr %d4, align 4
+  store i32 %asmresult9, ptr %e5, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %6 = load i32, ptr %foo, align 4
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %7 = load i32, ptr %b, align 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %8 = load i32, ptr %c, align 4
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %9 = load i32, ptr %d, align 4
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %10 = load i32, ptr %e, align 4
+  tail call void asm sideeffect "# rm tied output: no pressure\0A\09# $0, $1, $2, $3, $4", "=*rm,=*rm,=*rm,=*rm,=*rm,0,1,2,3,4,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, ptr nonnull elementtype(i32) %d, ptr nonnull elementtype(i32) %e, i32 %6, i32 %7, i32 %8, i32 %9, i32 %10)
+  %.pre = load i32, ptr %foo, align 4
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %11 = phi i32 [ %asmresult, %asm.pref.reg ], [ %.pre, %asm.pref.mem ]
+  ret i32 %11
+}
+
+define dso_local i32 @test_tied_output_pressure(ptr noundef captures(none) %foo) local_unnamed_addr {
+; O0-LABEL: test_tied_output_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # rm tied output: pressure
+; O0:    # %eax, %ecx, %edx, %esi, %edi
+; O0:    #NO_APP
+;
+; O2-LABEL: test_tied_output_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # rm tied output: pressure
+; O2:    # %esi, %edi, %r8d, %r9d, %eax
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %1 = load i32, ptr %foo, align 4
+  %b16 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %2 = load i32, ptr %b16, align 4
+  %c17 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %3 = load i32, ptr %c17, align 4
+  %d18 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %4 = load i32, ptr %d18, align 4
+  %e19 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %5 = load i32, ptr %e19, align 4
+  %6 = tail call { i32, i32, i32, i32, i32 } asm sideeffect "# rm tied output: pressure\0A\09# $0, $1, $2, $3, $4", "=rm,=rm,=rm,=rm,=rm,0,1,2,3,4,~{dirflag},~{fpsr},~{flags}"(i32 %1, i32 %2, i32 %3, i32 %4, i32 %5)
+  %asmresult20 = extractvalue { i32, i32, i32, i32, i32 } %6, 0
+  %asmresult21 = extractvalue { i32, i32, i32, i32, i32 } %6, 1
+  %asmresult22 = extractvalue { i32, i32, i32, i32, i32 } %6, 2
+  %asmresult23 = extractvalue { i32, i32, i32, i32, i32 } %6, 3
+  %asmresult24 = extractvalue { i32, i32, i32, i32, i32 } %6, 4
+  store i32 %asmresult20, ptr %foo, align 4
+  store i32 %asmresult21, ptr %b16, align 4
+  store i32 %asmresult22, ptr %c17, align 4
+  store i32 %asmresult23, ptr %d18, align 4
+  store i32 %asmresult24, ptr %e19, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %7 = load i32, ptr %foo, align 4
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %8 = load i32, ptr %b, align 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %9 = load i32, ptr %c, align 4
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %10 = load i32, ptr %d, align 4
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %11 = load i32, ptr %e, align 4
+  tail call void asm sideeffect "# rm tied output: pressure\0A\09# $0, $1, $2, $3, $4", "=*rm,=*rm,=*rm,=*rm,=*rm,0,1,2,3,4,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, ptr nonnull elementtype(i32) %d, ptr nonnull elementtype(i32) %e, i32 %7, i32 %8, i32 %9, i32 %10, i32 %11)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %12 = load i32, ptr %foo, align 4
+  ret i32 %12
+}
+
+define dso_local i32 @test_rm_output_r_input_no_pressure(ptr noundef captures(none) initializes((0, 4)) %foo) local_unnamed_addr {
+; O0-LABEL: test_rm_output_r_input_no_pressure:
+; O0:    #APP
+; O0:    # rm output, r input: no pressure
+; O0:    # (%rdi), %eax
+; O0:    #NO_APP
+;
+; O2-LABEL: test_rm_output_r_input_no_pressure:
+; O2:    #APP
+; O2:    # rm output, r input: no pressure
+; O2:    # %eax, %eax
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b2 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %0 = load i32, ptr %b2, align 4
+  %1 = tail call i32 asm sideeffect "# rm output, r input: no pressure\0A\09# $0, $1", "=rm,r,~{dirflag},~{fpsr},~{flags}"(i32 %0)
+  store i32 %1, ptr %foo, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %2 = load i32, ptr %b, align 4
+  tail call void asm sideeffect "# rm output, r input: no pressure\0A\09# $0, $1", "=*rm,r,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, i32 %2)
+  %.pre = load i32, ptr %foo, align 4
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %3 = phi i32 [ %1, %asm.pref.reg ], [ %.pre, %asm.pref.mem ]
+  ret i32 %3
+}
+
+define dso_local i32 @test_rm_output_r_input_pressure(ptr noundef captures(none) initializes((0, 4)) %foo) local_unnamed_addr {
+; O0-LABEL: test_rm_output_r_input_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # rm output, r input: pressure
+; O0:    # (%rdi), %eax
+; O0:    #NO_APP
+;
+; O2-LABEL: test_rm_output_r_input_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # rm output, r input: pressure
+; O2:    # %esi, %esi
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b16 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %1 = load i32, ptr %b16, align 4
+  %2 = tail call i32 asm sideeffect "# rm output, r input: pressure\0A\09# $0, $1", "=rm,r,~{dirflag},~{fpsr},~{flags}"(i32 %1)
+  store i32 %2, ptr %foo, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %3 = load i32, ptr %b, align 4
+  tail call void asm sideeffect "# rm output, r input: pressure\0A\09# $0, $1", "=*rm,r,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, i32 %3)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %4 = load i32, ptr %foo, align 4
+  ret i32 %4
+}
+
+define dso_local i32 @test_m_output_rm_input_no_pressure(ptr noundef %foo) local_unnamed_addr {
+; O0-LABEL: test_m_output_rm_input_no_pressure:
+; O0:    #APP
+; O0:    # m output, rm input: no pressure
+; O0:    # (%rdi), -{{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_m_output_rm_input_no_pressure:
+; O2:    #APP
+; O2:    # m output, rm input: no pressure
+; O2:    # (%rdi), %eax
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b2 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %0 = load i32, ptr %b2, align 4
+  tail call void asm sideeffect "# m output, rm input: no pressure\0A\09# $0, $1", "=*m,rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, i32 %0)
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %1 = load i32, ptr %b, align 4
+  tail call void asm sideeffect "# m output, rm input: no pressure\0A\09# $0, $1", "=*m,rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, i32 %1)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %2 = load i32, ptr %foo, align 4
+  ret i32 %2
+}
+
+define dso_local i32 @test_m_output_rm_input_pressure(ptr noundef %foo) local_unnamed_addr {
+; O0-LABEL: test_m_output_rm_input_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # m output, rm input: pressure
+; O0:    # (%rdi), {{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_m_output_rm_input_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # m output, rm input: pressure
+; O2:    # (%rbp), %esi
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b16 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %1 = load i32, ptr %b16, align 4
+  tail call void asm sideeffect "# m output, rm input: pressure\0A\09# $0, $1", "=*m,rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, i32 %1)
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %2 = load i32, ptr %b, align 4
+  tail call void asm sideeffect "# m output, rm input: pressure\0A\09# $0, $1", "=*m,rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, i32 %2)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %3 = load i32, ptr %foo, align 4
+  ret i32 %3
+}
+
+define dso_local i32 @test_mult_m_output_rm_input_no_pressure(ptr noundef %foo) local_unnamed_addr {
+; O0-LABEL: test_mult_m_output_rm_input_no_pressure:
+; O0:    #APP
+; O0:    # multiple m output, rm input: no pressure
+; O0:    # (%rdi), (%rax), (%rcx), (%rdx), (%rsi), -{{[0-9]+}}(%rsp), -{{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_mult_m_output_rm_input_no_pressure:
+; O2:    #APP
+; O2:    # multiple m output, rm input: no pressure
+; O2:    # (%rdi), 4(%rdi), 8(%rdi), 12(%rdi), 16(%rdi), %eax, %ecx
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b4 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c5 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d6 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e7 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %0 = load i32, ptr %foo, align 4
+  %1 = load i32, ptr %b4, align 4
+  tail call void asm sideeffect "# multiple m output, rm input: no pressure\0A\09# $0, $1, $2, $3, $4, $5, $6", "=*m,=*m,=*m,=*m,=*m,rm,rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %foo, ptr nonnull elementtype(i32) %b4, ptr nonnull elementtype(i32) %c5, ptr nonnull elementtype(i32) %d6, ptr nonnull elementtype(i32) %e7, i32 %0, i32 %1)
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %2 = load i32, ptr %foo, align 4
+  %3 = load i32, ptr %b, align 4
+  tail call void asm sideeffect "# multiple m output, rm input: no pressure\0A\09# $0, $1, $2, $3, $4, $5, $6", "=*m,=*m,=*m,=*m,=*m,rm,rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, ptr nonnull elementtype(i32) %d, ptr nonnull elementtype(i32) %e, i32 %2, i32 %3)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %4 = load i32, ptr %foo, align 4
+  ret i32 %4
+}
+
+define dso_local i32 @test_mult_m_output_rm_input_pressure(ptr noundef %foo) local_unnamed_addr {
+; O0-LABEL: test_mult_m_output_rm_input_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # multiple m output, rm input: pressure
+; O0:    # (%rdi), (%rax), (%rcx), (%rdx), (%rsi), {{[0-9]+}}(%rsp), {{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_mult_m_output_rm_input_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # multiple m output, rm input: pressure
+; O2:    # (%rbp), 4(%rbp), 8(%rbp), 12(%rbp), 16(%rbp), %esi, %edi
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b18 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c19 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d20 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e21 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %1 = load i32, ptr %foo, align 4
+  %2 = load i32, ptr %b18, align 4
+  tail call void asm sideeffect "# multiple m output, rm input: pressure\0A\09# $0, $1, $2, $3, $4, $5, $6", "=*m,=*m,=*m,=*m,=*m,rm,rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %foo, ptr nonnull elementtype(i32) %b18, ptr nonnull elementtype(i32) %c19, ptr nonnull elementtype(i32) %d20, ptr nonnull elementtype(i32) %e21, i32 %1, i32 %2)
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %3 = load i32, ptr %foo, align 4
+  %4 = load i32, ptr %b, align 4
+  tail call void asm sideeffect "# multiple m output, rm input: pressure\0A\09# $0, $1, $2, $3, $4, $5, $6", "=*m,=*m,=*m,=*m,=*m,rm,rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, ptr nonnull elementtype(i32) %d, ptr nonnull elementtype(i32) %e, i32 %3, i32 %4)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %5 = load i32, ptr %foo, align 4
+  ret i32 %5
+}
+
+define dso_local i32 @test_mult_m_early_clobber_output_rm_input_no_pressure(ptr noundef %foo) local_unnamed_addr {
+; O0-LABEL: test_mult_m_early_clobber_output_rm_input_no_pressure:
+; O0:    #APP
+; O0:    # multiple m output, rm input: no pressure
+; O0:    # (%rdi), (%rax), (%rcx), -{{[0-9]+}}(%rsp), -{{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_mult_m_early_clobber_output_rm_input_no_pressure:
+; O2:    #APP
+; O2:    # multiple m output, rm input: no pressure
+; O2:    # %eax, %esi, %r8d, %ecx, %edx
+; O2:    #NO_APP
+entry:
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b2 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c3 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d4 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %0 = load i32, ptr %d4, align 4
+  %e5 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %1 = load i32, ptr %e5, align 4
+  %2 = tail call { i32, i32, i32 } asm sideeffect "# multiple m output, rm input: no pressure\0A\09# $0, $1, $2, $3, $4", "=&rm,=&rm,=&rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(i32 %0, i32 %1)
+  %asmresult = extractvalue { i32, i32, i32 } %2, 0
+  %asmresult6 = extractvalue { i32, i32, i32 } %2, 1
+  %asmresult7 = extractvalue { i32, i32, i32 } %2, 2
+  store i32 %asmresult, ptr %foo, align 4
+  store i32 %asmresult6, ptr %b2, align 4
+  store i32 %asmresult7, ptr %c3, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %3 = load i32, ptr %d, align 4
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %4 = load i32, ptr %e, align 4
+  tail call void asm sideeffect "# multiple m output, rm input: no pressure\0A\09# $0, $1, $2, $3, $4", "=*&rm,=*&rm,=*&rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, i32 %3, i32 %4)
+  %.pre = load i32, ptr %foo, align 4
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %5 = phi i32 [ %asmresult, %asm.pref.reg ], [ %.pre, %asm.pref.mem ]
+  ret i32 %5
+}
+
+define dso_local i32 @test_mult_m_early_clobber_output_rm_input_pressure(ptr noundef %foo) local_unnamed_addr {
+; O0-LABEL: test_mult_m_early_clobber_output_rm_input_pressure:
+; O0:    #APP
+; O0:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O0:    #NO_APP
+; O0:    #APP
+; O0:    # multiple m output, rm input: pressure
+; O0:    # (%rdi), (%rax), (%rcx), {{[0-9]+}}(%rsp), {{[0-9]+}}(%rsp)
+; O0:    #NO_APP
+;
+; O2-LABEL: test_mult_m_early_clobber_output_rm_input_pressure:
+; O2:    #APP
+; O2:    # %rax,%rcx,%rdx,%rsi,%rdi,%rbx,%rbp,%r8,%r9,%r10,%r11, %r12, %r13, %r14, %r15
+; O2:    #NO_APP
+; O2:    #APP
+; O2:    # multiple m output, rm input: pressure
+; O2:    # %r8d, %r9d, %eax, %esi, %edi
+; O2:    #NO_APP
+entry:
+  %0 = tail call { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } asm sideeffect "# $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10, $11, $12, $13, $14", "={rax},={rcx},={rdx},={rsi},={rdi},={rbx},={rbp},={r8},={r9},={r10},={r11},={r12},={r13},={r14},={r15},~{dirflag},~{fpsr},~{flags}"()
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %b16 = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c17 = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d18 = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %1 = load i32, ptr %d18, align 4
+  %e19 = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %2 = load i32, ptr %e19, align 4
+  %3 = tail call { i32, i32, i32 } asm sideeffect "# multiple m output, rm input: pressure\0A\09# $0, $1, $2, $3, $4", "=&rm,=&rm,=&rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(i32 %1, i32 %2)
+  %asmresult20 = extractvalue { i32, i32, i32 } %3, 0
+  %asmresult21 = extractvalue { i32, i32, i32 } %3, 1
+  %asmresult22 = extractvalue { i32, i32, i32 } %3, 2
+  store i32 %asmresult20, ptr %foo, align 4
+  store i32 %asmresult21, ptr %b16, align 4
+  store i32 %asmresult22, ptr %c17, align 4
+  br label %asm.merge
+
+asm.pref.mem:                                     ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %foo, i64 4
+  %c = getelementptr inbounds nuw i8, ptr %foo, i64 8
+  %d = getelementptr inbounds nuw i8, ptr %foo, i64 12
+  %4 = load i32, ptr %d, align 4
+  %e = getelementptr inbounds nuw i8, ptr %foo, i64 16
+  %5 = load i32, ptr %e, align 4
+  tail call void asm sideeffect "# multiple m output, rm input: pressure\0A\09# $0, $1, $2, $3, $4", "=*&rm,=*&rm,=*&rm,rm,rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) %foo, ptr nonnull elementtype(i32) %b, ptr nonnull elementtype(i32) %c, i32 %4, i32 %5)
+  br label %asm.merge
+
+asm.merge:                                        ; preds = %asm.pref.reg, %asm.pref.mem
+  %asmresult14 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 14
+  %asmresult13 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 13
+  %asmresult12 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 12
+  %asmresult11 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 11
+  %asmresult10 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 10
+  %asmresult9 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 9
+  %asmresult8 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 8
+  %asmresult7 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 7
+  %asmresult6 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 6
+  %asmresult5 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 5
+  %asmresult4 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 4
+  %asmresult3 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 3
+  %asmresult2 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 2
+  %asmresult1 = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 1
+  %asmresult = extractvalue { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64 } %0, 0
+  tail call void @g(i64 noundef %asmresult, i64 noundef %asmresult1, i64 noundef %asmresult2, i64 noundef %asmresult3, i64 noundef %asmresult4, i64 noundef %asmresult5, i64 noundef %asmresult6, i64 noundef %asmresult7, i64 noundef %asmresult8, i64 noundef %asmresult9, i64 noundef %asmresult10, i64 noundef %asmresult11, i64 noundef %asmresult12, i64 noundef %asmresult13, i64 noundef %asmresult14)
+  %6 = load i32, ptr %foo, align 4
+  ret i32 %6
+}
+
+declare void @llvm.asm.constraint.br()
+
+declare void @g(i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef, i64 noundef)
diff --git a/llvm/test/CodeGen/X86/asm-modifier.ll b/llvm/test/CodeGen/X86/asm-modifier.ll
index e1aac95a1ff6a..0f744b3dae0c7 100644
--- a/llvm/test/CodeGen/X86/asm-modifier.ll
+++ b/llvm/test/CodeGen/X86/asm-modifier.ll
@@ -66,14 +66,27 @@ define dso_local void @test_c() nounwind {
 
 define dso_local void @test_k() nounwind {
 ; CHECK-LABEL: test_k:
-; CHECK:       # %bb.0:
+; CHECK:       # %bb.0: # %asm.pref.reg
 ; CHECK-NEXT:    #APP
 ; CHECK-NEXT:    movl %fs:0, %eax
 ; CHECK-NEXT:    #NO_APP
+  callbr void @llvm.asm.constraint.br()
+      to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:
   %tmp = tail call i64 asm "movl %fs:${1:a}, ${0:k}", "=q,irm,~{dirflag},~{fpsr},~{flags}"(i64 0)
+  br label %asm.merge
+
+asm.pref.mem:
+  %tmp1 = tail call i64 asm "movl %fs:${1:a}, ${0:k}", "=q,irm,~{dirflag},~{fpsr},~{flags}"(i64 0)
+  br label %asm.merge
+
+asm.merge:
   unreachable
 }
 
+declare void @llvm.asm.constraint.br()
+
 define dso_local void @test_n() nounwind {
 ; CHECK-LABEL: test_n:
 ; CHECK:       # %bb.0:
@@ -86,14 +99,21 @@ define dso_local void @test_n() nounwind {
 }
 
 define void @test_q() {
-; CHECK-LABEL: test_q:
-; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    #APP
-; CHECK-NEXT:    #TEST 0
-; CHECK-NEXT:    #NO_APP
-; CHECK-NEXT:    ret{{[l|q]}}
+; X86-LABEL: test_q:
+; X86:       # %bb.0: # %entry
+; X86-NEXT:    #APP
+; X86-NEXT:    #TEST %eax
+; X86-NEXT:    #NO_APP
+; X86-NEXT:    retl
+;
+; X64-LABEL: test_q:
+; X64:       # %bb.0: # %entry
+; X64-NEXT:    #APP
+; X64-NEXT:    #TEST %rax
+; X64-NEXT:    #NO_APP
+; X64-NEXT:    retq
 entry:
-  call void asm sideeffect "#TEST ${0:q}", "=*imr"( ptr elementtype( i64) null )
+  %0 = call i64 asm sideeffect "#TEST ${0:q}", "=imr"()
   ret void
 }
 
diff --git a/llvm/test/CodeGen/X86/inline-asm-callbase.ll b/llvm/test/CodeGen/X86/inline-asm-callbase.ll
new file mode 100644
index 0000000000000..94a9563d4de3b
--- /dev/null
+++ b/llvm/test/CodeGen/X86/inline-asm-callbase.ll
@@ -0,0 +1,75 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
+
+declare i32 @__gxx_personality_v0(...)
+
+define i32 @test_invoke_rm(i32 %x) personality ptr @__gxx_personality_v0 {
+; CHECK-LABEL: test_invoke_rm:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:  .Ltmp0: # EH_LABEL
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    # %eax, %edi
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .Ltmp1: # EH_LABEL
+; CHECK-NEXT:  # %bb.1: # %normal
+; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB0_2: # %unwind
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:  .Ltmp2: # EH_LABEL
+; CHECK-NEXT:    movq %rax, %rdi
+; CHECK-NEXT:    callq _Unwind_Resume at PLT
+entry:
+  %0 = invoke i32 asm "# $0, $1", "=r,rm"(i32 %x)
+  to label %normal unwind label %unwind
+
+normal:
+  ret i32 %0
+
+unwind:
+  %1 = landingpad { ptr, i32 }
+  cleanup
+  resume { ptr, i32 } %1
+}
+
+define i32 @test_callbr_rm(i32 %x) {
+; CHECK-LABEL: test_callbr_rm:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    # %eax, %edi
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .LBB1_1: # Inline asm indirect target
+; CHECK-NEXT:    # %indirect
+; CHECK-NEXT:    # Label of block must be emitted
+; CHECK-NEXT:    retq
+entry:
+  %0 = callbr i32 asm "# $0, $1", "=r,rm,!i"(i32 %x)
+  to label %normal [label %indirect]
+
+normal:
+  ret i32 %0
+
+indirect:
+  ret i32 %0
+}
+
+define i32 @test_callbr_convert(i32 %x) {
+; CHECK-LABEL: test_callbr_convert:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    # %eax, %edi
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .LBB2_1: # Inline asm indirect target
+; CHECK-NEXT:    # %indirect
+; CHECK-NEXT:    # Label of block must be emitted
+; CHECK-NEXT:    retq
+entry:
+  %0 = callbr i32 asm "# $0, $1", "=rm,rm,!i"(i32 %x)
+  to label %normal [label %indirect]
+
+normal:
+  ret i32 %0
+
+indirect:
+  ret i32 %0
+}
diff --git a/llvm/test/CodeGen/X86/inline-asm-rm.ll b/llvm/test/CodeGen/X86/inline-asm-rm.ll
new file mode 100644
index 0000000000000..d5646dac37c77
--- /dev/null
+++ b/llvm/test/CodeGen/X86/inline-asm-rm.ll
@@ -0,0 +1,246 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -O0 < %s | FileCheck --check-prefixes=O0 %s
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu     < %s | FileCheck --check-prefixes=O2 %s
+
+define dso_local void @test_rm_input(i64 noundef %flags) local_unnamed_addr {
+; O0-LABEL: test_rm_input:
+; O0:       # %bb.0: # %entry
+; O0-NEXT:    movq %rdi, -{{[0-9]+}}(%rsp)
+; O0-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
+; O0-NEXT:    movq %rax, -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #APP
+; O0-NEXT:    pushq -{{[0-9]+}}(%rsp)
+; O0-NEXT:    popfq
+; O0-NEXT:    #NO_APP
+; O0-NEXT:    retq
+;
+; O2-LABEL: test_rm_input:
+; O2:       # %bb.0: # %entry
+; O2-NEXT:    #APP
+; O2-NEXT:    pushq %rdi
+; O2-NEXT:    popfq
+; O2-NEXT:    #NO_APP
+; O2-NEXT:    retq
+entry:
+  %flags.addr = alloca i64, align 8
+  callbr void @llvm.asm.constraint.br()
+      to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:
+  tail call void asm sideeffect "push $0 ; popf", "rm,~{dirflag},~{fpsr},~{flags}"(i64 %flags)
+  br label %asm.merge
+
+asm.pref.mem:
+  store i64 %flags, ptr %flags.addr, align 8
+  %0 = load i64, ptr %flags.addr, align 8
+  call void asm sideeffect "push $0 ; popf", "rm,~{dirflag},~{fpsr},~{flags}"(i64 %0)
+  br label %asm.merge
+
+asm.merge:
+  ret void
+}
+
+define dso_local i64 @test_rm_output() local_unnamed_addr {
+; O0-LABEL: test_rm_output:
+; O0:       # %bb.0: # %entry
+; O0-NEXT:    #APP
+; O0-NEXT:    pushfq
+; O0-NEXT:    popq -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #NO_APP
+; O0-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
+; O0-NEXT:    retq
+;
+; O2-LABEL: test_rm_output:
+; O2:       # %bb.0: # %entry
+; O2-NEXT:    #APP
+; O2-NEXT:    pushfq
+; O2-NEXT:    popq %rax
+; O2-NEXT:    #NO_APP
+; O2-NEXT:    retq
+entry:
+  %out = alloca i64, align 8
+  callbr void @llvm.asm.constraint.br()
+      to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:
+  %0 = tail call i64 asm sideeffect "pushf ; pop $0", "=rm,~{dirflag},~{fpsr},~{flags}"()
+  br label %asm.merge
+
+asm.pref.mem:
+  call void asm sideeffect "pushf ; pop $0", "=*rm,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i64) %out)
+  %1 = load i64, ptr %out, align 8
+  br label %asm.merge
+
+asm.merge:
+  %2 = phi i64 [ %0, %asm.pref.reg ], [ %1, %asm.pref.mem ]
+  ret i64 %2
+}
+
+define dso_local void @test_g_input(i64 noundef %flags) local_unnamed_addr {
+; O0-LABEL: test_g_input:
+; O0:       # %bb.0: # %entry
+; O0-NEXT:    movq %rdi, -{{[0-9]+}}(%rsp)
+; O0-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
+; O0-NEXT:    movq %rax, -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #APP
+; O0-NEXT:    pushq -{{[0-9]+}}(%rsp)
+; O0-NEXT:    popfq
+; O0-NEXT:    #NO_APP
+; O0-NEXT:    retq
+;
+; O2-LABEL: test_g_input:
+; O2:       # %bb.0: # %entry
+; O2-NEXT:    #APP
+; O2-NEXT:    pushq %rdi
+; O2-NEXT:    popfq
+; O2-NEXT:    #NO_APP
+; O2-NEXT:    retq
+entry:
+  %flags.addr = alloca i64, align 8
+  callbr void @llvm.asm.constraint.br()
+      to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:
+  tail call void asm sideeffect "push $0 ; popf", "imr,~{dirflag},~{fpsr},~{flags}"(i64 %flags)
+  br label %asm.merge
+
+asm.pref.mem:
+  store i64 %flags, ptr %flags.addr, align 8
+  %0 = load i64, ptr %flags.addr, align 8
+  call void asm sideeffect "push $0 ; popf", "imr,~{dirflag},~{fpsr},~{flags}"(i64 %0)
+  br label %asm.merge
+
+asm.merge:
+  ret void
+}
+
+define dso_local i64 @test_g_output() local_unnamed_addr {
+; O0-LABEL: test_g_output:
+; O0:       # %bb.0: # %entry
+; O0-NEXT:    #APP
+; O0-NEXT:    pushfq
+; O0-NEXT:    popq -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #NO_APP
+; O0-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
+; O0-NEXT:    retq
+;
+; O2-LABEL: test_g_output:
+; O2:       # %bb.0: # %entry
+; O2-NEXT:    #APP
+; O2-NEXT:    pushfq
+; O2-NEXT:    popq %rax
+; O2-NEXT:    #NO_APP
+; O2-NEXT:    retq
+entry:
+  %out = alloca i64, align 8
+  callbr void @llvm.asm.constraint.br()
+      to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:
+  %0 = tail call i64 asm sideeffect "pushf ; pop $0", "=imr,~{dirflag},~{fpsr},~{flags}"()
+  br label %asm.merge
+
+asm.pref.mem:
+  call void asm sideeffect "pushf ; pop $0", "=*imr,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i64) %out)
+  %1 = load i64, ptr %out, align 8
+  br label %asm.merge
+
+asm.merge:
+  %2 = phi i64 [ %0, %asm.pref.reg ], [ %1, %asm.pref.mem ]
+  ret i64 %2
+}
+
+define i32 @test_bundle(i32 %x) {
+; O0-LABEL: test_bundle:
+; O0:       # %bb.0: # %entry
+; O0-NEXT:    movl %edi, -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #APP
+; O0-NEXT:    # %eax, -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #NO_APP
+; O0-NEXT:    retq
+;
+; O2-LABEL: test_bundle:
+; O2:       # %bb.0: # %entry
+; O2-NEXT:    #APP
+; O2-NEXT:    # %eax, %edi
+; O2-NEXT:    #NO_APP
+; O2-NEXT:    retq
+entry:
+  %out = alloca i64, align 8
+  callbr void @llvm.asm.constraint.br()
+      to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:
+  %0 = call i32 asm sideeffect "# $0, $1", "=r,rm"(i32 %x) [ "bundle"(i32 42) ]
+  br label %asm.merge
+
+asm.pref.mem:
+  %1 = call i32 asm sideeffect "# $0, $1", "=r,rm"(i32 %x) [ "bundle"(i32 42) ]
+  br label %asm.merge
+
+asm.merge:
+  %2 = phi i32 [ %0, %asm.pref.reg ], [ %1, %asm.pref.mem ]
+  ret i32 %2
+}
+
+define dso_local i32 @test_asm_goto() local_unnamed_addr {
+; O0-LABEL: test_asm_goto:
+; O0:       # %bb.0: # %entry
+; O0-NEXT:    movl $42, %eax
+; O0-NEXT:    movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; O0-NEXT:    #APP
+; O0-NEXT:    # -{{[0-9]+}}(%rsp)
+; O0-NEXT:    #NO_APP
+; O0-NEXT:    jmp .LBB5_1
+; O0-NEXT:  .LBB5_1: # %asm.pref.mem.asm.merge_crit_edge
+; O0-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
+; O0-NEXT:    movl %eax, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; O0-NEXT:    jmp .LBB5_3
+; O0-NEXT:  # %bb.2: # %indirect.split
+; O0-NEXT:  .LBB5_3: # Inline asm indirect target
+; O0-NEXT:    # %cleanup
+; O0-NEXT:    # Label of block must be emitted
+; O0-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %eax # 4-byte Reload
+; O0-NEXT:    retq
+;
+; O2-LABEL: test_asm_goto:
+; O2:       # %bb.0: # %entry
+; O2-NEXT:    #APP
+; O2-NEXT:    # %eax
+; O2-NEXT:    #NO_APP
+; O2-NEXT:  # %bb.2: # %cleanup
+; O2-NEXT:    retq
+; O2-NEXT:  .LBB5_1: # Inline asm indirect target
+; O2-NEXT:    # %indirect.split
+; O2-NEXT:    # Label of block must be emitted
+; O2-NEXT:    movl $42, %eax
+; O2-NEXT:    retq
+entry:
+  %out = alloca i32, align 4
+  call void @llvm.lifetime.start.p0(ptr nonnull %out)
+  callbr void @llvm.asm.constraint.br()
+          to label %asm.pref.reg [label %asm.pref.mem]
+
+asm.pref.reg:                                     ; preds = %entry
+  %0 = callbr i32 asm "# $0", "=rm,!i,~{dirflag},~{fpsr},~{flags}"()
+          to label %cleanup [label %indirect.split]
+
+asm.pref.mem:                                     ; preds = %entry
+  callbr void asm "# $0", "=*rm,!i,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
+          to label %asm.pref.mem.asm.merge_crit_edge [label %cleanup]
+
+asm.pref.mem.asm.merge_crit_edge:                 ; preds = %asm.pref.mem
+  %.pre = load i32, ptr %out, align 4
+  br label %cleanup
+
+indirect.split:                                   ; preds = %asm.pref.reg
+  br label %cleanup
+
+cleanup:                                          ; preds = %asm.pref.reg, %asm.pref.mem.asm.merge_crit_edge, %asm.pref.mem, %indirect.split
+  %retval.0 = phi i32 [ 42, %asm.pref.mem ], [ 42, %indirect.split ], [ %.pre, %asm.pref.mem.asm.merge_crit_edge ], [ %0, %asm.pref.reg ]
+  call void @llvm.lifetime.end.p0(ptr nonnull %out)
+  ret i32 %retval.0
+}
+
+
+declare void @llvm.asm.constraint.br()
diff --git a/llvm/test/CodeGen/X86/inlineasm-sched-bug.ll b/llvm/test/CodeGen/X86/inlineasm-sched-bug.ll
index be4d1c29332f7..a322bd3003a58 100644
--- a/llvm/test/CodeGen/X86/inlineasm-sched-bug.ll
+++ b/llvm/test/CodeGen/X86/inlineasm-sched-bug.ll
@@ -6,16 +6,13 @@
 define i32 @foo(i32 %treemap) nounwind {
 ; CHECK-LABEL: foo:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pushl %eax
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; CHECK-NEXT:    movl %eax, %ecx
 ; CHECK-NEXT:    negl %ecx
 ; CHECK-NEXT:    andl %eax, %ecx
-; CHECK-NEXT:    movl %ecx, (%esp)
 ; CHECK-NEXT:    #APP
-; CHECK-NEXT:    bsfl (%esp), %eax
+; CHECK-NEXT:    bsfl %ecx, %eax
 ; CHECK-NEXT:    #NO_APP
-; CHECK-NEXT:    popl %ecx
 ; CHECK-NEXT:    retl
 entry:
   %sub = sub i32 0, %treemap
diff --git a/llvm/test/CodeGen/X86/opt-pipeline.ll b/llvm/test/CodeGen/X86/opt-pipeline.ll
index 24390f2d852d3..6e7707a248873 100644
--- a/llvm/test/CodeGen/X86/opt-pipeline.ll
+++ b/llvm/test/CodeGen/X86/opt-pipeline.ll
@@ -78,6 +78,7 @@
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
 ; CHECK-NEXT:       Module Verifier
+; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; CHECK-NEXT:       Function Alias Analysis Results
 ; CHECK-NEXT:       Natural Loop Information

>From 7bee045d8d8bbc6d2c36897efebb56308f4e67c2 Mon Sep 17 00:00:00 2001
From: Bill Wendling <isanbard at gmail.com>
Date: Sun, 24 May 2026 18:03:42 -0700
Subject: [PATCH 2/8] [inlineasm] Fix many bugs, ams fix all de constraintses

Skwisgaar he finds many bugs in dis codes and fixes thems all, ya?
Dese codes was writeds by total dildoes who not know how to returns
de 'true' when de functions changes de IRs. Skwisgaar fixes it.

Also de Finalizeds write-backs was not writings back to de
TargetConstraintses, so on de retry loops all de operandses
was advancings deir constraints even de ones dat was already
being perfectlies fine. Skwisgaar he fix dat too, ya, because
he is de fastest code-fixer in de worlds.

De DominatorTrees was sometimes being de nullptrs and den de
codes was dereferencings dem anyway like a total douchebags.
Skwisgaar makes dem de references so dis can never happens.

Also de input-onlies "rm" constraintses was generatings two
identical blocks of de IRs, which is stupids and slow, ya?
Only de output constraintses needs de asm_constraint_br, so
Skwisgaar fixes dat. Stop makings duplicate codes, dildoes.

De hasRegMemConstraintses() was only checkings for de literal
"r" and "m" codes like a amateurs. Skwisgaar now uses
getConstraintType() so de target-specifics codes like "x" for
de XMM registers and de "o" and "V" memory variants are also
being recognizeds. Much better, ya.

Also fixes some typos and removes de dead codes because
Skwisgaar does not tolerates de sloppiness.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/lib/CodeGen/CGStmt.cpp                  |  12 +-
 clang/test/CodeGen/asm-reg-mem-constraints.c  | 207 +++++++++++-------
 llvm/docs/LangRef.rst                         |   2 +-
 llvm/include/llvm/IR/InlineAsm.h              |   7 -
 llvm/lib/CodeGen/InlineAsmPrepare.cpp         |  54 ++---
 .../SelectionDAG/SelectionDAGBuilder.cpp      |  19 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  13 +-
 llvm/lib/IR/Verifier.cpp                      |   2 +-
 8 files changed, 185 insertions(+), 131 deletions(-)

diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 0c011cc4508db..202858fc4cb6f 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -3319,10 +3319,11 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) {
                                     InputConstraintInfos))
     return EmitHipStdParUnsupportedAsm(this, S);
 
-  // If any constraints allow for register and memory options, we
-  // need to delay choosing which constraint option to prefer (register or
-  // memory) until ISel, where the 'llvm.asm.constraint.br' intrinsic is
-  // resolved.
+  // If any *output* constraints allow for both register and memory options, we
+  // need to delay choosing which to prefer until ISel, where the
+  // 'llvm.asm.constraint.br' intrinsic is resolved. Input-only "rm"
+  // constraints don't need this: PreferRegs only affects output emission, so
+  // both paths would be identical for pure inputs.
   bool HasRegMemConstraints =
       llvm::all_of(llvm::concat<TargetInfo::ConstraintInfo>(
                        OutputConstraintInfos, InputConstraintInfos),
@@ -3330,8 +3331,7 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) {
                      // FIXME: Should we allow for alternative constraints?
                      return !StringRef(Info.getConstraintStr()).contains(",");
                    }) &&
-      llvm::any_of(llvm::concat<TargetInfo::ConstraintInfo>(
-                       OutputConstraintInfos, InputConstraintInfos),
+      llvm::any_of(OutputConstraintInfos,
                    [](const TargetInfo::ConstraintInfo &Info) {
                      return Info.allowsRegister() && Info.allowsMemory();
                    });
diff --git a/clang/test/CodeGen/asm-reg-mem-constraints.c b/clang/test/CodeGen/asm-reg-mem-constraints.c
index a89ef0456f0a6..77a9bc4283399 100644
--- a/clang/test/CodeGen/asm-reg-mem-constraints.c
+++ b/clang/test/CodeGen/asm-reg-mem-constraints.c
@@ -1,104 +1,140 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -O2 %s -o - | FileCheck %s
 
+// CHECK-LABEL: define dso_local void @test_reg_mem_inputs(
+// CHECK-SAME: i32 noundef [[FLAGS:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    tail call void asm sideeffect "", "rm,~{dirflag},~{fpsr},~{flags}"(i32 [[FLAGS]]) #[[ATTR4:[0-9]+]], !srcloc [[META6:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_reg_mem_inputs(unsigned long flags) {
-  // CHECK-LABEL: @test_reg_mem_inputs
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:     tail call void asm sideeffect "", "rm,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
-  // CHECK-NEXT:     br label %asm.merge
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:     tail call void asm sideeffect "", "rm,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
-  // CHECK-NEXT:     br label %asm.merge
   asm ("" : : "rm" (flags));
 }
 
+// CHECK-LABEL: define dso_local i32 @test_reg_mem_outputs(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[OUT:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(ptr nonnull [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:    callbr void @llvm.asm.constraint.br()
+// CHECK-NEXT:            to label %[[ASM_PREF_REG:.*]] [label %[[ASM_PREF_MEM:.*]]]
+// CHECK:       [[ASM_PREF_REG]]:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i32 asm "", "=rm,~{dirflag},~{fpsr},~{flags}"() #[[ATTR5:[0-9]+]], !srcloc [[META7:![0-9]+]]
+// CHECK-NEXT:    br label %[[ASM_MERGE:.*]]
+// CHECK:       [[ASM_PREF_MEM]]:
+// CHECK-NEXT:    call void asm "", "=*rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) [[OUT]]) #[[ATTR4]], !srcloc [[META7]]
+// CHECK-NEXT:    [[DOTPRE:%.*]] = load i32, ptr [[OUT]], align 4, !tbaa [[LONG_TBAA8:![0-9]+]]
+// CHECK-NEXT:    br label %[[ASM_MERGE]]
+// CHECK:       [[ASM_MERGE]]:
+// CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ [[TMP0]], %[[ASM_PREF_REG]] ], [ [[DOTPRE]], %[[ASM_PREF_MEM]] ]
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(ptr nonnull [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:    ret i32 [[TMP1]]
+//
 unsigned long test_reg_mem_outputs(void) {
-  // CHECK-LABEL: @test_reg_mem_outputs
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    = tail call i32 asm "", "=rm,~{dirflag},~{fpsr},~{flags}"()
-  // CHECK-NEXT:    br label %asm.merge
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:    call void asm "", "=*rm,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
-  // CHECK:         = load i32, ptr %out
-  // CHECK-NEXT:    br label %asm.merge
   unsigned long out;
   asm ("" : "=rm" (out));
   return out;
 }
 
+// CHECK-LABEL: define dso_local void @test_g_inputs(
+// CHECK-SAME: i32 noundef [[FLAGS:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    tail call void asm sideeffect "", "imr,~{dirflag},~{fpsr},~{flags}"(i32 [[FLAGS]]) #[[ATTR4]], !srcloc [[META10:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_g_inputs(unsigned long flags) {
-  // CHECK-LABEL: @test_g_inputs
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    tail call void asm sideeffect "", "imr,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
-  // CHECK-NEXT:    br label %asm.merge
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:    tail call void asm sideeffect "", "imr,~{dirflag},~{fpsr},~{flags}"(i32 %flags)
-  // CHECK-NEXT:    br label %asm.merge
   asm ("" : : "g" (flags));
 }
 
+// CHECK-LABEL: define dso_local i32 @test_g_outputs(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[OUT:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(ptr nonnull [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:    callbr void @llvm.asm.constraint.br()
+// CHECK-NEXT:            to label %[[ASM_PREF_REG:.*]] [label %[[ASM_PREF_MEM:.*]]]
+// CHECK:       [[ASM_PREF_REG]]:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i32 asm "", "=imr,~{dirflag},~{fpsr},~{flags}"() #[[ATTR5]], !srcloc [[META11:![0-9]+]]
+// CHECK-NEXT:    br label %[[ASM_MERGE:.*]]
+// CHECK:       [[ASM_PREF_MEM]]:
+// CHECK-NEXT:    call void asm "", "=*imr,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) [[OUT]]) #[[ATTR4]], !srcloc [[META11]]
+// CHECK-NEXT:    [[DOTPRE:%.*]] = load i32, ptr [[OUT]], align 4, !tbaa [[LONG_TBAA8]]
+// CHECK-NEXT:    br label %[[ASM_MERGE]]
+// CHECK:       [[ASM_MERGE]]:
+// CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ [[TMP0]], %[[ASM_PREF_REG]] ], [ [[DOTPRE]], %[[ASM_PREF_MEM]] ]
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(ptr nonnull [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:    ret i32 [[TMP1]]
+//
 unsigned long test_g_outputs(void) {
-  // CHECK-LABEL: @test_g_outputs
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    %0 = tail call i32 asm "", "=imr,~{dirflag},~{fpsr},~{flags}"()
-  // CHECK-NEXT:    br label %asm.merge
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:    call void asm "", "=*imr,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
-  // CHECK-NEXT:    = load i32, ptr %out
-  // CHECK-NEXT:    br label %asm.merge
   unsigned long out;
   asm ("" : "=g" (out));
   return out;
 }
 
+// CHECK-LABEL: define dso_local void @test_reg_mem_earlyclobber(
+// CHECK-SAME: i32 noundef [[LEN:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[LEN_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[LEN]], ptr [[LEN_ADDR]], align 4, !tbaa [[INT_TBAA2:![0-9]+]]
+// CHECK-NEXT:    callbr void @llvm.asm.constraint.br()
+// CHECK-NEXT:            to label %[[ASM_PREF_REG:.*]] [label %[[ASM_PREF_MEM:.*]]]
+// CHECK:       [[ASM_PREF_REG]]:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i32 asm sideeffect "", "=&rm,0,~{dirflag},~{fpsr},~{flags}"(i32 [[LEN]]) #[[ATTR4]], !srcloc [[META12:![0-9]+]]
+// CHECK-NEXT:    br label %[[ASM_MERGE:.*]]
+// CHECK:       [[ASM_PREF_MEM]]:
+// CHECK-NEXT:    call void asm sideeffect "", "=*&rm,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) [[LEN_ADDR]], i32 [[LEN]]) #[[ATTR4]], !srcloc [[META12]]
+// CHECK-NEXT:    br label %[[ASM_MERGE]]
+// CHECK:       [[ASM_MERGE]]:
+// CHECK-NEXT:    ret void
+//
 void test_reg_mem_earlyclobber(int len) {
-  // CHECK-LABEL: @test_reg_mem_earlyclobber
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    = tail call i32 asm sideeffect "", "=&rm,0,~{dirflag},~{fpsr},~{flags}"(i32 %len)
-  // CHECK-NEXT:    br label %asm.merge
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:    call void asm sideeffect "", "=*&rm,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %len.addr, i32 %len)
-  // CHECK-NEXT:    br label %asm.merge
   __asm__ volatile ("" : "+&&rm" (len));
 }
 
+// CHECK-LABEL: define dso_local void @test_reg_mem_commutative(
+// CHECK-SAME: i32 noundef [[LEN:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[LEN_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[LEN]], ptr [[LEN_ADDR]], align 4, !tbaa [[INT_TBAA2]]
+// CHECK-NEXT:    callbr void @llvm.asm.constraint.br()
+// CHECK-NEXT:            to label %[[ASM_PREF_REG:.*]] [label %[[ASM_PREF_MEM:.*]]]
+// CHECK:       [[ASM_PREF_REG]]:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call { i32, i32 } asm sideeffect "", "=[[RM:%.*]],=rm,0,1,~{dirflag},~{fpsr},~{flags}"(i32 [[LEN]], i32 [[LEN]]) #[[ATTR4]], !srcloc [[META13:![0-9]+]]
+// CHECK-NEXT:    br label %[[ASM_MERGE:.*]]
+// CHECK:       [[ASM_PREF_MEM]]:
+// CHECK-NEXT:    call void asm sideeffect "", "=*[[RM]],=*rm,0,1,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) [[LEN_ADDR]], ptr nonnull elementtype(i32) [[LEN_ADDR]], i32 [[LEN]], i32 [[LEN]]) #[[ATTR4]], !srcloc [[META13]]
+// CHECK-NEXT:    br label %[[ASM_MERGE]]
+// CHECK:       [[ASM_MERGE]]:
+// CHECK-NEXT:    ret void
+//
 void test_reg_mem_commutative(int len) {
-  // CHECK-LABEL: @test_reg_mem_commutative
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    = tail call { i32, i32 } asm sideeffect "", "=%rm,=rm,0,1,~{dirflag},~{fpsr},~{flags}"(i32 %len, i32 %len)
-  // CHECK-NEXT:    br label %asm.merge
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:    call void asm sideeffect "", "=*%rm,=*rm,0,1,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %len.addr, ptr nonnull elementtype(i32) %len.addr, i32 %len, i32 %len)
-  // CHECK-NEXT:    br label %asm.merge
   __asm__ volatile ("" : "+%%rm" (len), "+rm" (len));
 }
 
+// CHECK-LABEL: define dso_local i32 @test_asm_goto(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[OUT:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    call void @llvm.lifetime.start.p0(ptr nonnull [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:    callbr void @llvm.asm.constraint.br()
+// CHECK-NEXT:            to label %[[ASM_PREF_REG:.*]] [label %[[ASM_PREF_MEM:.*]]]
+// CHECK:       [[ASM_PREF_REG]]:
+// CHECK-NEXT:    [[TMP0:%.*]] = callbr i32 asm "", "=rm,!i,~{dirflag},~{fpsr},~{flags}"() #[[ATTR5]]
+// CHECK-NEXT:            to label %[[CLEANUP:.*]] [label %[[INDIRECT_SPLIT:.*]]], !srcloc [[META14:![0-9]+]]
+// CHECK:       [[ASM_PREF_MEM]]:
+// CHECK-NEXT:    callbr void asm "", "=*rm,!i,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:            to label %[[ASM_PREF_MEM_ASM_MERGE_CRIT_EDGE:.*]] [label %[[CLEANUP]]], !srcloc [[META14]]
+// CHECK:       [[ASM_PREF_MEM_ASM_MERGE_CRIT_EDGE]]:
+// CHECK-NEXT:    [[DOTPRE:%.*]] = load i32, ptr [[OUT]], align 4, !tbaa [[LONG_TBAA8]]
+// CHECK-NEXT:    br label %[[CLEANUP]]
+// CHECK:       [[INDIRECT_SPLIT]]:
+// CHECK-NEXT:    br label %[[CLEANUP]]
+// CHECK:       [[CLEANUP]]:
+// CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i32 [ 42, %[[ASM_PREF_MEM]] ], [ 42, %[[INDIRECT_SPLIT]] ], [ [[DOTPRE]], %[[ASM_PREF_MEM_ASM_MERGE_CRIT_EDGE]] ], [ [[TMP0]], %[[ASM_PREF_REG]] ]
+// CHECK-NEXT:    call void @llvm.lifetime.end.p0(ptr nonnull [[OUT]]) #[[ATTR4]]
+// CHECK-NEXT:    ret i32 [[RETVAL_0]]
+//
 unsigned long test_asm_goto(void) {
-  // CHECK-LABEL: @test_asm_goto
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:         to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    = callbr i32 asm "", "=rm,!i,~{dirflag},~{fpsr},~{flags}"()
-  // CHECK-NEXT:         to label %cleanup [label %indirect.split]
-  // CHECK:       asm.pref.mem:
-  // CHECK-NEXT:    callbr void asm "", "=*rm,!i,~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32) %out)
-  // CHECK-NEXT:         to label %asm.pref.mem.asm.merge_crit_edge [label %cleanup]
-  // CHECK:       asm.pref.mem.asm.merge_crit_edge:
-  // CHECK-NEXT:    = load i32, ptr %out, align 4, !tbaa !8
-  // CHECK-NEXT:    br label %cleanup
-  // CHECK:       indirect.split:
-  // CHECK-NEXT:    br label %cleanup
   unsigned long out;
   asm goto ("" : "=rm" (out) ::: indirect);
   return out;
@@ -108,17 +144,30 @@ unsigned long test_asm_goto(void) {
 }
 
 // PR3908
+// CHECK-LABEL: define dso_local void @test_pr3908(
+// CHECK-SAME: i32 noundef [[R:%.*]]) local_unnamed_addr #[[ATTR3:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call i32 asm "# PR3908 $1 $3 $2 $0", "=r,mx,mr,x,0,~{dirflag},~{fpsr},~{flags}"(i32 0, i32 0, double 0.000000e+00, i32 [[R]]) #[[ATTR6:[0-9]+]], !srcloc [[META15:![0-9]+]]
+// CHECK-NEXT:    ret void
+//
 void test_pr3908(int r) {
-  // CHECK-LABEL: @test_pr3908
-  // CHECK:         callbr void @llvm.asm.constraint.br()
-  // CHECK-NEXT:            to label %asm.pref.reg [label %asm.pref.mem]
-  // CHECK:       asm.pref.reg:
-  // CHECK-NEXT:    = tail call i32 asm "# PR3908 $1 $3 $2 $0", "=r,mx,mr,x,0,~{dirflag},~{fpsr},~{flags}"(i32 0, i32 0, double 0.000000e+00, i32 %r)
-  // CHECK-NEXT:    br label %asm.merge
-  // CHECK:       asm.pref.mem:                                     ; preds = %entry
-  // CHECK-NEXT:    = tail call i32 asm "# PR3908 $1 $3 $2 $0", "=r,mx,mr,x,0,~{dirflag},~{fpsr},~{flags}"(i32 0, i32 0, double 0.000000e+00, i32 %r)
-  // CHECK-NEXT:    br label %asm.merge
   __asm__ ("# PR3908 %[lf] %[xx] %[li] %[r]"
            : [r] "+r" (r)
            : [lf] "mx" (0), [li] "mr" (0), [xx] "x" ((double)(0)));
 }
+//.
+// CHECK: [[INT_TBAA2]] = !{[[META3:![0-9]+]], [[META3]], i64 0}
+// CHECK: [[META3]] = !{!"int", [[META4:![0-9]+]], i64 0}
+// CHECK: [[META4]] = !{!"omnipotent char", [[META5:![0-9]+]], i64 0}
+// CHECK: [[META5]] = !{!"Simple C/C++ TBAA"}
+// CHECK: [[META6]] = !{i64 590}
+// CHECK: [[META7]] = !{i64 1887}
+// CHECK: [[LONG_TBAA8]] = !{[[META9:![0-9]+]], [[META9]], i64 0}
+// CHECK: [[META9]] = !{!"long", [[META4]], i64 0}
+// CHECK: [[META10]] = !{i64 2300}
+// CHECK: [[META11]] = !{i64 3573}
+// CHECK: [[META12]] = !{i64 4671}
+// CHECK: [[META13]] = !{i64 5849}
+// CHECK: [[META14]] = !{i64 7485}
+// CHECK: [[META15]] = !{i64 8004}
+//.
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b3f4dbca95920..5f7e9e5e6e615 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -16317,7 +16317,7 @@ Overview:
 """""""""
 
 The '``llvm.asm.constraint.br``' intrinsic is used when an inline asm
-constriant allows for either register or memory, e.g., '``"rm"``'.
+constraint allows for either register or memory, e.g., '``"rm"``'.
 
 Semantics:
 """"""""""
diff --git a/llvm/include/llvm/IR/InlineAsm.h b/llvm/include/llvm/IR/InlineAsm.h
index bab538e852467..564f2e7df2dd3 100644
--- a/llvm/include/llvm/IR/InlineAsm.h
+++ b/llvm/include/llvm/IR/InlineAsm.h
@@ -181,13 +181,6 @@ class InlineAsm final : public Value {
     bool hasArg() const {
       return Type == isInput || (Type == isOutput && isIndirect);
     }
-
-    /// hasRegMemConstraints - Returns true if the constraint codes have
-    /// register and memory constraints. This is useful to let the register
-    /// allocator that it can use memory under register pressure.
-    bool hasRegMemConstraints() const {
-      return is_contained(Codes, "r") && is_contained(Codes, "m");
-    }
   };
 
   /// ParseConstraints - Split up the constraint string into the specific
diff --git a/llvm/lib/CodeGen/InlineAsmPrepare.cpp b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
index 0f745c5b2e1be..a289338c81db1 100644
--- a/llvm/lib/CodeGen/InlineAsmPrepare.cpp
+++ b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
@@ -155,10 +155,10 @@ static void updateSSA(DominatorTree &DT, CallBrInst *CBR, CallInst *Intrinsic,
   }
 }
 
-static bool splitCriticalEdges(CallBrInst *CBR, DominatorTree *DT) {
+static bool splitCriticalEdges(CallBrInst *CBR, DominatorTree &DT) {
   bool Changed = false;
 
-  CriticalEdgeSplittingOptions Options(DT);
+  CriticalEdgeSplittingOptions Options(&DT);
   Options.setMergeIdenticalEdges();
 
   // The indirect destination might be duplicated between another parameter...
@@ -187,7 +187,7 @@ static bool splitCriticalEdges(CallBrInst *CBR, DominatorTree *DT) {
 /// have a location to place the intrinsic. Then remap users of the original
 /// callbr output SSA value to instead point to the appropriate
 /// llvm.callbr.landingpad value.
-static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree *DT) {
+static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree &DT) {
   bool Changed = false;
   SmallPtrSet<const BasicBlock *, 4> Visited;
   IRBuilder<> Builder(CBR->getContext());
@@ -208,14 +208,14 @@ static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree *DT) {
     CallInst *Intrinsic = Builder.CreateIntrinsic(
         CBR->getType(), Intrinsic::callbr_landingpad, {CBR});
     SSAUpdate.AddAvailableValue(IndDest, Intrinsic);
-    updateSSA(*DT, CBR, Intrinsic, SSAUpdate);
+    updateSSA(DT, CBR, Intrinsic, SSAUpdate);
     Changed = true;
   }
 
   return Changed;
 }
 
-static bool processCallBrInst(Function &F, CallBrInst *CBR, DominatorTree *DT) {
+static bool processCallBrInst(Function &F, CallBrInst *CBR, DominatorTree &DT) {
   bool Changed = false;
 
   Changed |= splitCriticalEdges(CBR, DT);
@@ -229,10 +229,8 @@ static bool processCallBrInst(Function &F, CallBrInst *CBR, DominatorTree *DT) {
 //===----------------------------------------------------------------------===//
 
 static bool processAsmConstraintBrInst(Function &F, CallBrInst &CBR,
-                                       bool IsOptLevelNone, DomTreeUpdater *DTU,
+                                       bool IsOptLevelNone, DomTreeUpdater &DTU,
                                        const TargetMachine *TM) {
-  bool Changed = false;
-
   BasicBlock *BB = CBR.getParent();
   BasicBlock *PrefReg = CBR.getDefaultDest();
   BasicBlock *PrefMem = CBR.getIndirectDest(0);
@@ -243,22 +241,22 @@ static bool processAsmConstraintBrInst(Function &F, CallBrInst &CBR,
   CBR.eraseFromParent();
 
   if (IsOptLevelNone) {
-    DeleteDeadBlock(PrefReg, DTU);
+    DeleteDeadBlock(PrefReg, &DTU);
     IRBuilder(BB).CreateBr(PrefMem);
-    MergeBlockIntoPredecessor(PrefMem, DTU);
+    MergeBlockIntoPredecessor(PrefMem, &DTU);
     if (Merge)
-      MergeBlockIntoPredecessor(Merge, DTU);
+      MergeBlockIntoPredecessor(Merge, &DTU);
   } else {
-    DeleteDeadBlock(PrefMem, DTU);
+    DeleteDeadBlock(PrefMem, &DTU);
     IRBuilder(BB).CreateBr(PrefReg);
-    MergeBlockIntoPredecessor(PrefReg, DTU);
+    MergeBlockIntoPredecessor(PrefReg, &DTU);
     if (Merge)
-      MergeBlockIntoPredecessor(Merge, DTU);
+      MergeBlockIntoPredecessor(Merge, &DTU);
   }
 
-  DTU->flush();
+  DTU.flush();
 
-  return Changed;
+  return true;
 }
 
 static void getCallBrInsts(Function &F,
@@ -273,7 +271,7 @@ static void getCallBrInsts(Function &F,
     }
 }
 
-static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater *DTU,
+static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater &DTU,
                     const TargetMachine *TM) {
   bool Changed = false;
   SmallVector<CallBrInst *, 4> AsmConstraintBrs;
@@ -287,40 +285,30 @@ static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater *DTU,
 
   // Process the rest of the 'callbr' instructions.
   for (auto *CBR : OtherCallBrs)
-    if (!CBR->getType()->isVoidTy() && !CBR->use_empty())
-      Changed |= processCallBrInst(F, CBR, DTU ? &DTU->getDomTree() : nullptr);
+    Changed |= processCallBrInst(F, CBR, DTU.getDomTree());
 
   return Changed;
 }
 
 bool InlineAsmPrepare::runOnFunction(Function &F) {
-  // It's highly likely that most programs do not contain CallBrInsts. Follow a
-  // similar pattern from SafeStackLegacyPass::runOnFunction to reuse previous
-  // domtree analysis if available, otherwise compute it lazily. This avoids
-  // forcing Dominator Tree Construction at -O0 for programs that likely do not
-  // contain CallBrInsts. It does pessimize programs with callbr at higher
-  // optimization levels, as the DominatorTree created here is not reused by
-  // subsequent passes.
   const auto *TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>();
-  std::optional<DomTreeUpdater> DTU;
-  std::optional<DominatorTree> LazilyComputedDomTree;
-  if (auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>())
-    DTU.emplace(DTWP->getDomTree(), DomTreeUpdater::UpdateStrategy::Lazy);
+  auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
 
   bool IsOptLevelNone =
       skipFunction(F) ? true : TM->getOptLevel() == CodeGenOptLevel::None;
 
-  return runImpl(F, IsOptLevelNone, DTU ? &*DTU : nullptr, TM);
+  return runImpl(F, IsOptLevelNone, DTU, TM);
 }
 
 PreservedAnalyses InlineAsmPreparePass::run(Function &F,
                                             FunctionAnalysisManager &FAM) {
-  auto *DT = &FAM.getResult<DominatorTreeAnalysis>(F);
+  auto &DT = FAM.getResult<DominatorTreeAnalysis>(F);
   DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
   bool IsOptLevelNone =
       F.hasOptNone() ? true : TM->getOptLevel() == CodeGenOptLevel::None;
 
-  if (runImpl(F, IsOptLevelNone, DT ? &DTU : nullptr, TM)) {
+  if (runImpl(F, IsOptLevelNone, DTU, TM)) {
     PreservedAnalyses PA;
     PA.preserve<DominatorTreeAnalysis>();
     return PA;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 221c97ee83e5d..edd710650f148 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -10214,6 +10214,7 @@ struct ConstraintDecisionInfo {
   void reset() {
     ConstraintOperands.clear();
     AsmNodeOperands.clear();
+    Buffer.clear();
     Glue = SDValue();
     Chain = SDValue();
     BeginLabel = nullptr;
@@ -10241,8 +10242,11 @@ constructOperandInfo(ConstraintDecisionInfo &Info,
     // Determine if this InlineAsm MayLoad or MayStore based on the constraints.
     // FIXME: Could we compute this on OpInfo rather than T?
 
-    // Compute the constraint code and ConstraintType to use.
-    TLI.ComputeConstraintToUse(T, SDValue());
+    // Compute the constraint code and ConstraintType to use. Skip finalized
+    // operands — their constraint was validated in a previous attempt and must
+    // not advance to the next alternative.
+    if (!T.Finalized)
+      TLI.ComputeConstraintToUse(T, SDValue());
 
     if (T.ConstraintType == TargetLowering::C_Immediate && OpInfo.CallOperand &&
         !isa<ConstantSDNode>(OpInfo.CallOperand)) {
@@ -10680,7 +10684,16 @@ determineConstraints(ConstraintDecisionInfo &Info,
                             TLI.getPointerTy(DAG.getDataLayout())));
 
   // Third pass: Prepare DAG-level operands
-  return prepareDAGLevelOperands(Info, Call, Builder, TLI, DAG);
+  bool Result = prepareDAGLevelOperands(Info, Call, Builder, TLI, DAG);
+
+  // Write back the Finalized state to TargetConstraints. On a non-fatal retry,
+  // operands that were successfully assigned a constraint keep their selection
+  // instead of advancing to the next alternative in constructOperandInfo.
+  for (size_t I = 0, N = Info.ConstraintOperands.size(); I != N; ++I)
+    if (Info.ConstraintOperands[I].Finalized)
+      TargetConstraints[I].Finalized = true;
+
+  return Result;
 }
 
 /// visitInlineAsm - Handle a call to an InlineAsm object.
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 988d2aad858ad..6df060efaa36f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -6037,7 +6037,18 @@ TargetLowering::ParseConstraints(const DataLayout &DL,
     // direct "=rm" output with a matching tied input). The register allocator
     // can fold both the output and its tied input to the same memory slot when
     // under pressure.
-    if (OpInfo.hasRegMemConstraints())
+    // Use getConstraintType() rather than checking for literal "r"/"m" codes so
+    // that target-specific register codes (e.g. "x" for x86 XMM) and memory
+    // variants beyond "m" (e.g. "o", "V") are also recognised.
+    bool HasReg = false, HasMem = false;
+    for (StringRef Code : OpInfo.Codes) {
+      ConstraintType CT = getConstraintType(Code);
+      if (CT == C_Register || CT == C_RegisterClass)
+        HasReg = true;
+      else if (CT == C_Memory)
+        HasMem = true;
+    }
+    if (HasReg && HasMem)
       OpInfo.MayFoldRegister = true;
 
     // Compute the value type for each operand.
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index cd4f48e0f7ed4..20e883f5d5ffe 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -3583,7 +3583,7 @@ void Verifier::visitCallBrInst(CallBrInst &CBI) {
     }
     case Intrinsic::asm_constraint_br: {
       Check(CBI.getNumIndirectDests() == 1,
-            "Callbr asm_constraint_br only supports only one indirect dest");
+            "Callbr asm_constraint_br only supports one indirect dest");
       Check(CBI.getDefaultDest()->hasNPredecessors(1),
             "Callbr asm_constraint_br default dest must have only one "
             "predecessor");

>From bfcc259277936f2719fe3605f7303d48a6772538 Mon Sep 17 00:00:00 2001
From: Bill Wendling <isanbard at gmail.com>
Date: Sun, 24 May 2026 18:15:00 -0700
Subject: [PATCH 3/8] Re-add mistakenly deleted pass.

---
 llvm/lib/Passes/PassRegistry.def | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index c589926513dac..dee83be6f8564 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -461,6 +461,7 @@ FUNCTION_PASS("infer-address-spaces", InferAddressSpacesPass())
 FUNCTION_PASS("infer-alignment", InferAlignmentPass())
 FUNCTION_PASS("inject-tli-mappings", InjectTLIMappings())
 FUNCTION_PASS("inline-asm-prepare", InlineAsmPreparePass(*TM))
+FUNCTION_PASS("instcount", InstCountPass())
 FUNCTION_PASS("instnamer", InstructionNamerPass())
 FUNCTION_PASS("instsimplify", InstSimplifyPass())
 FUNCTION_PASS("interleaved-access", InterleavedAccessPass(*TM))

>From bd99c192e814ee125ca83847a6f00e6fcb2a6a8e Mon Sep 17 00:00:00 2001
From: Bill Wendling <isanbard at gmail.com>
Date: Sun, 24 May 2026 18:17:50 -0700
Subject: [PATCH 4/8] Mistakenly re-added the pass that wasn't supposed to be
 there.

---
 llvm/lib/Passes/PassRegistry.def | 1 -
 1 file changed, 1 deletion(-)

diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index dee83be6f8564..c589926513dac 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -461,7 +461,6 @@ FUNCTION_PASS("infer-address-spaces", InferAddressSpacesPass())
 FUNCTION_PASS("infer-alignment", InferAlignmentPass())
 FUNCTION_PASS("inject-tli-mappings", InjectTLIMappings())
 FUNCTION_PASS("inline-asm-prepare", InlineAsmPreparePass(*TM))
-FUNCTION_PASS("instcount", InstCountPass())
 FUNCTION_PASS("instnamer", InstructionNamerPass())
 FUNCTION_PASS("instsimplify", InstSimplifyPass())
 FUNCTION_PASS("interleaved-access", InterleavedAccessPass(*TM))

>From 8a683cb47d5ce0eb2114529da070ac3e1e2bf1e1 Mon Sep 17 00:00:00 2001
From: Bill Wendling <isanbard at gmail.com>
Date: Sun, 24 May 2026 18:36:29 -0700
Subject: [PATCH 5/8] Use llvm::any_of instead of looping, also fixes de unused
 params

Skwisgaar he notice dat de loopings was ugly and slow like a total
amateurs, ya? So now we uses de llvm::any_of which is much more
elegants and befittings of de fastest code-writer in de worlds.

Also de functions was havings de unused parametres like F and TM
dat just sits dere doing nothings, douchebags. Skwisgaar removes
dem because he does not tolerates de dead weights in his codes.

And de addPreserveds was missings from getAnalysisUsage so de
legacy PM was throwings away de DominatorTree like a stupids
even though de pass was keepings it valids. Skwisgaar fixes dat
too because he is thoroughs and de best, ya.

Also fixed de typo "getConstriantType" because someone cannot
spells and Skwisgaar he cannot stands de sloppiness.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 llvm/lib/CodeGen/InlineAsmPrepare.cpp         | 21 +++++++++----------
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 19 ++++++++---------
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/llvm/lib/CodeGen/InlineAsmPrepare.cpp b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
index a289338c81db1..7a14b11309053 100644
--- a/llvm/lib/CodeGen/InlineAsmPrepare.cpp
+++ b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
@@ -6,7 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// This pass lowers inline asm calls in LLVM IR in order to to assist
+// This pass lowers inline asm calls in LLVM IR in order to assist
 // SelectionDAG's codegen.
 //
 // CallBrInst:
@@ -76,6 +76,7 @@ class InlineAsmPrepare : public FunctionPass {
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<TargetPassConfig>();
     AU.addRequired<DominatorTreeWrapperPass>();
+    AU.addPreserved<DominatorTreeWrapperPass>();
   }
   bool runOnFunction(Function &F) override;
 
@@ -215,7 +216,7 @@ static bool insertIntrinsicCalls(CallBrInst *CBR, DominatorTree &DT) {
   return Changed;
 }
 
-static bool processCallBrInst(Function &F, CallBrInst *CBR, DominatorTree &DT) {
+static bool processCallBrInst(CallBrInst *CBR, DominatorTree &DT) {
   bool Changed = false;
 
   Changed |= splitCriticalEdges(CBR, DT);
@@ -228,9 +229,8 @@ static bool processCallBrInst(Function &F, CallBrInst *CBR, DominatorTree &DT) {
 //               Process 'llvm.asm.constraint.br' instructions
 //===----------------------------------------------------------------------===//
 
-static bool processAsmConstraintBrInst(Function &F, CallBrInst &CBR,
-                                       bool IsOptLevelNone, DomTreeUpdater &DTU,
-                                       const TargetMachine *TM) {
+static bool processAsmConstraintBrInst(CallBrInst &CBR, bool IsOptLevelNone,
+                                       DomTreeUpdater &DTU) {
   BasicBlock *BB = CBR.getParent();
   BasicBlock *PrefReg = CBR.getDefaultDest();
   BasicBlock *PrefMem = CBR.getIndirectDest(0);
@@ -271,8 +271,7 @@ static void getCallBrInsts(Function &F,
     }
 }
 
-static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater &DTU,
-                    const TargetMachine *TM) {
+static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater &DTU) {
   bool Changed = false;
   SmallVector<CallBrInst *, 4> AsmConstraintBrs;
   SmallVector<CallBrInst *, 4> OtherCallBrs;
@@ -281,11 +280,11 @@ static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater &DTU,
 
   // Process 'llvm.asm.constraint.br' instructions first.
   for (auto *CBR : AsmConstraintBrs)
-    Changed |= processAsmConstraintBrInst(F, *CBR, IsOptLevelNone, DTU, TM);
+    Changed |= processAsmConstraintBrInst(*CBR, IsOptLevelNone, DTU);
 
   // Process the rest of the 'callbr' instructions.
   for (auto *CBR : OtherCallBrs)
-    Changed |= processCallBrInst(F, CBR, DTU.getDomTree());
+    Changed |= processCallBrInst(CBR, DTU.getDomTree());
 
   return Changed;
 }
@@ -298,7 +297,7 @@ bool InlineAsmPrepare::runOnFunction(Function &F) {
   bool IsOptLevelNone =
       skipFunction(F) ? true : TM->getOptLevel() == CodeGenOptLevel::None;
 
-  return runImpl(F, IsOptLevelNone, DTU, TM);
+  return runImpl(F, IsOptLevelNone, DTU);
 }
 
 PreservedAnalyses InlineAsmPreparePass::run(Function &F,
@@ -308,7 +307,7 @@ PreservedAnalyses InlineAsmPreparePass::run(Function &F,
   bool IsOptLevelNone =
       F.hasOptNone() ? true : TM->getOptLevel() == CodeGenOptLevel::None;
 
-  if (runImpl(F, IsOptLevelNone, DTU, TM)) {
+  if (runImpl(F, IsOptLevelNone, DTU)) {
     PreservedAnalyses PA;
     PA.preserve<DominatorTreeAnalysis>();
     return PA;
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbe16b6ece2d1..6dc167edb3895 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -6042,16 +6042,15 @@ TargetLowering::ParseConstraints(const DataLayout &DL,
     // Use getConstraintType() rather than checking for literal "r"/"m" codes so
     // that target-specific register codes (e.g. "x" for x86 XMM) and memory
     // variants beyond "m" (e.g. "o", "V") are also recognised.
-    bool HasReg = false, HasMem = false;
-    for (StringRef Code : OpInfo.Codes) {
-      ConstraintType CT = getConstraintType(Code);
-      if (CT == C_Register || CT == C_RegisterClass)
-        HasReg = true;
-      else if (CT == C_Memory)
-        HasMem = true;
-    }
-    if (HasReg && HasMem)
-      OpInfo.MayFoldRegister = true;
+    OpInfo.MayFoldRegister =
+        llvm::any_of(OpInfo.Codes,
+                     [&](StringRef Code) {
+                       ConstraintType CT = getConstraintType(Code);
+                       return CT == C_Register || CT == C_RegisterClass;
+                     }) &&
+        llvm::any_of(OpInfo.Codes, [&](StringRef Code) {
+          return getConstraintType(Code) == C_Memory;
+        });
 
     // Compute the value type for each operand.
     switch (OpInfo.Type) {

>From da400580a98275cda4ac0ec4b92c0b63bcdeb3cf Mon Sep 17 00:00:00 2001
From: Bill Wendling <isanbard at gmail.com>
Date: Sun, 24 May 2026 19:29:49 -0700
Subject: [PATCH 6/8] Dey are ze bugs, und dey must be slaughtered like ze
 weakest strings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three bugs, all of dem inferior, all of dem dying now:

- InlineAsmPrepare: OtherCallBrs was collected before AsmConstraintBrs
  were processed, leaving dangling pointers into blocks deleted by
  DeleteDeadBlock at -O0. Now we collect OtherCallBrs after, when only
  living blocks remain. Simple. Like alternate picking, but for memory.

- TargetLowering: MayFoldRegister was using getConstraintType() on both
  sides of ze check, which made it fire for "ro" and "nZr" inputs — not
  just "rm". Dis caused constants to be materialized to registers instead
  of going to ze constant pool. Now ze register side uses getConstraintType()
  (to catch "x" and friends), but ze memory side requires literal "m".
  Precise. Controlled. Swedish.

- Pipeline tests: addPreserved<DominatorTreeWrapperPass>() correctly
  eliminated redundant Dominator Tree Construction passes after
  InlineAsmPrepare. Updated nine pipeline tests across AArch64, AMDGPU,
  ARM, LoongArch, PowerPC, RISCV, SPIRV, WebAssembly, and X86 to reflect
  ze new, superior scheduling.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 llvm/lib/CodeGen/InlineAsmPrepare.cpp         | 32 +++++++++----------
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 13 ++++----
 llvm/test/CodeGen/AArch64/O3-pipeline.ll      |  1 -
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll      |  5 ---
 llvm/test/CodeGen/ARM/O3-pipeline.ll          |  1 -
 llvm/test/CodeGen/LoongArch/opt-pipeline.ll   |  1 -
 llvm/test/CodeGen/PowerPC/O3-pipeline.ll      |  1 -
 llvm/test/CodeGen/RISCV/O3-pipeline.ll        |  1 -
 llvm/test/CodeGen/SPIRV/llc-pipeline.ll       |  2 --
 .../GlobalISel/gisel-commandline-option.ll    |  1 -
 llvm/test/CodeGen/X86/opt-pipeline.ll         |  1 -
 11 files changed, 23 insertions(+), 36 deletions(-)

diff --git a/llvm/lib/CodeGen/InlineAsmPrepare.cpp b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
index 7a14b11309053..c81627c3feee0 100644
--- a/llvm/lib/CodeGen/InlineAsmPrepare.cpp
+++ b/llvm/lib/CodeGen/InlineAsmPrepare.cpp
@@ -259,30 +259,30 @@ static bool processAsmConstraintBrInst(CallBrInst &CBR, bool IsOptLevelNone,
   return true;
 }
 
-static void getCallBrInsts(Function &F,
-                           SmallVectorImpl<CallBrInst *> &AsmConstraintBrs,
-                           SmallVectorImpl<CallBrInst *> &OtherCallBrs) {
-  for (auto &BB : F)
-    if (auto *CBR = dyn_cast<CallBrInst>(BB.getTerminator())) {
-      if (CBR->getIntrinsicID() == Intrinsic::asm_constraint_br)
-        AsmConstraintBrs.push_back(CBR);
-      else if (!CBR->getType()->isVoidTy() && !CBR->use_empty())
-        OtherCallBrs.push_back(CBR);
-    }
-}
-
 static bool runImpl(Function &F, bool IsOptLevelNone, DomTreeUpdater &DTU) {
   bool Changed = false;
   SmallVector<CallBrInst *, 4> AsmConstraintBrs;
-  SmallVector<CallBrInst *, 4> OtherCallBrs;
 
-  getCallBrInsts(F, AsmConstraintBrs, OtherCallBrs);
+  // Collect asm_constraint_br instructions first.
+  for (auto &BB : F)
+    if (auto *CBR = dyn_cast<CallBrInst>(BB.getTerminator()))
+      if (CBR->getIntrinsicID() == Intrinsic::asm_constraint_br)
+        AsmConstraintBrs.push_back(CBR);
 
-  // Process 'llvm.asm.constraint.br' instructions first.
+  // Process 'llvm.asm.constraint.br' instructions first. At -O0 this deletes
+  // the PrefReg block (and its callbr) via DeleteDeadBlock, which immediately
+  // removes it from the function's block list. Collect OtherCallBrs only
+  // after this loop to avoid holding dangling pointers into deleted blocks.
   for (auto *CBR : AsmConstraintBrs)
     Changed |= processAsmConstraintBrInst(*CBR, IsOptLevelNone, DTU);
 
-  // Process the rest of the 'callbr' instructions.
+  // Collect and process the remaining 'callbr' instructions.
+  SmallVector<CallBrInst *, 4> OtherCallBrs;
+  for (auto &BB : F)
+    if (auto *CBR = dyn_cast<CallBrInst>(BB.getTerminator()))
+      if (!CBR->getType()->isVoidTy() && !CBR->use_empty())
+        OtherCallBrs.push_back(CBR);
+
   for (auto *CBR : OtherCallBrs)
     Changed |= processCallBrInst(CBR, DTU.getDomTree());
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 6dc167edb3895..2dcc356512734 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -6039,18 +6039,19 @@ TargetLowering::ParseConstraints(const DataLayout &DL,
     // direct "=rm" output with a matching tied input). The register allocator
     // can fold both the output and its tied input to the same memory slot when
     // under pressure.
-    // Use getConstraintType() rather than checking for literal "r"/"m" codes so
-    // that target-specific register codes (e.g. "x" for x86 XMM) and memory
-    // variants beyond "m" (e.g. "o", "V") are also recognised.
+    //
+    // Use getConstraintType() for the register side so that target-specific
+    // register codes (e.g. "x" for x86 XMM) are also recognised. Require the
+    // literal code "m" for the memory side — broader memory alternatives like
+    // "o" (offsetable) or "Z" (PowerPC) intentionally select memory and should
+    // not activate this register-preference optimisation.
     OpInfo.MayFoldRegister =
         llvm::any_of(OpInfo.Codes,
                      [&](StringRef Code) {
                        ConstraintType CT = getConstraintType(Code);
                        return CT == C_Register || CT == C_RegisterClass;
                      }) &&
-        llvm::any_of(OpInfo.Codes, [&](StringRef Code) {
-          return getConstraintType(Code) == C_Memory;
-        });
+        llvm::is_contained(OpInfo.Codes, "m");
 
     // Compute the value type for each operand.
     switch (OpInfo.Type) {
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index 0e4f6274f7b72..e32881e306f0f 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
+++ b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
@@ -109,7 +109,6 @@
 ; CHECK-NEXT:       Insert stack protectors
 ; CHECK-NEXT:       Module Verifier
 ; CHECK-NEXT:       Analysis containing CSE Info
-; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Natural Loop Information
 ; CHECK-NEXT:       Post-Dominator Tree Construction
 ; CHECK-NEXT:       Branch Probability Analysis
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index f793663711f95..fea612f1bdc2a 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -98,7 +98,6 @@
 ; GCN-O0-NEXT:        Prepare inline asm insts
 ; GCN-O0-NEXT:        Safe Stack instrumentation pass
 ; GCN-O0-NEXT:        Insert stack protectors
-; GCN-O0-NEXT:        Dominator Tree Construction
 ; GCN-O0-NEXT:        Cycle Info Analysis
 ; GCN-O0-NEXT:        Uniformity Analysis
 ; GCN-O0-NEXT:        Assignment Tracking Analysis
@@ -303,7 +302,6 @@
 ; GCN-O1-NEXT:        Prepare inline asm insts
 ; GCN-O1-NEXT:        Safe Stack instrumentation pass
 ; GCN-O1-NEXT:        Insert stack protectors
-; GCN-O1-NEXT:        Dominator Tree Construction
 ; GCN-O1-NEXT:        Cycle Info Analysis
 ; GCN-O1-NEXT:        Uniformity Analysis
 ; GCN-O1-NEXT:        Basic Alias Analysis (stateless AA impl)
@@ -615,7 +613,6 @@
 ; GCN-O1-OPTS-NEXT:        Prepare inline asm insts
 ; GCN-O1-OPTS-NEXT:        Safe Stack instrumentation pass
 ; GCN-O1-OPTS-NEXT:        Insert stack protectors
-; GCN-O1-OPTS-NEXT:        Dominator Tree Construction
 ; GCN-O1-OPTS-NEXT:        Cycle Info Analysis
 ; GCN-O1-OPTS-NEXT:        Uniformity Analysis
 ; GCN-O1-OPTS-NEXT:        Basic Alias Analysis (stateless AA impl)
@@ -938,7 +935,6 @@
 ; GCN-O2-NEXT:        Prepare inline asm insts
 ; GCN-O2-NEXT:        Safe Stack instrumentation pass
 ; GCN-O2-NEXT:        Insert stack protectors
-; GCN-O2-NEXT:        Dominator Tree Construction
 ; GCN-O2-NEXT:        Cycle Info Analysis
 ; GCN-O2-NEXT:        Uniformity Analysis
 ; GCN-O2-NEXT:        Basic Alias Analysis (stateless AA impl)
@@ -1275,7 +1271,6 @@
 ; GCN-O3-NEXT:        Prepare inline asm insts
 ; GCN-O3-NEXT:        Safe Stack instrumentation pass
 ; GCN-O3-NEXT:        Insert stack protectors
-; GCN-O3-NEXT:        Dominator Tree Construction
 ; GCN-O3-NEXT:        Cycle Info Analysis
 ; GCN-O3-NEXT:        Uniformity Analysis
 ; GCN-O3-NEXT:        Basic Alias Analysis (stateless AA impl)
diff --git a/llvm/test/CodeGen/ARM/O3-pipeline.ll b/llvm/test/CodeGen/ARM/O3-pipeline.ll
index b3d18cf77afa6..8bd6e3cf9c271 100644
--- a/llvm/test/CodeGen/ARM/O3-pipeline.ll
+++ b/llvm/test/CodeGen/ARM/O3-pipeline.ll
@@ -71,7 +71,6 @@
 ; CHECK-NEXT:      Safe Stack instrumentation pass
 ; CHECK-NEXT:      Insert stack protectors
 ; CHECK-NEXT:      Module Verifier
-; CHECK-NEXT:      Dominator Tree Construction
 ; CHECK-NEXT:      Basic Alias Analysis (stateless AA impl)
 ; CHECK-NEXT:      Function Alias Analysis Results
 ; CHECK-NEXT:      Natural Loop Information
diff --git a/llvm/test/CodeGen/LoongArch/opt-pipeline.ll b/llvm/test/CodeGen/LoongArch/opt-pipeline.ll
index 300e095397da8..2657a575aa8af 100644
--- a/llvm/test/CodeGen/LoongArch/opt-pipeline.ll
+++ b/llvm/test/CodeGen/LoongArch/opt-pipeline.ll
@@ -78,7 +78,6 @@
 ; LAXX-NEXT:       Safe Stack instrumentation pass
 ; LAXX-NEXT:       Insert stack protectors
 ; LAXX-NEXT:       Module Verifier
-; LAXX-NEXT:       Dominator Tree Construction
 ; LAXX-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; LAXX-NEXT:       Function Alias Analysis Results
 ; LAXX-NEXT:       Natural Loop Information
diff --git a/llvm/test/CodeGen/PowerPC/O3-pipeline.ll b/llvm/test/CodeGen/PowerPC/O3-pipeline.ll
index 910723b75678f..3901d122f4494 100644
--- a/llvm/test/CodeGen/PowerPC/O3-pipeline.ll
+++ b/llvm/test/CodeGen/PowerPC/O3-pipeline.ll
@@ -87,7 +87,6 @@
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
 ; CHECK-NEXT:       Module Verifier
-; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; CHECK-NEXT:       Function Alias Analysis Results
 ; CHECK-NEXT:       Natural Loop Information
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index d8ac426223bee..302ab99d8b451 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -95,7 +95,6 @@
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
 ; CHECK-NEXT:       Module Verifier
-; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; CHECK-NEXT:       Function Alias Analysis Results
 ; CHECK-NEXT:       Natural Loop Information
diff --git a/llvm/test/CodeGen/SPIRV/llc-pipeline.ll b/llvm/test/CodeGen/SPIRV/llc-pipeline.ll
index b7e0a01035603..0b3455f154349 100644
--- a/llvm/test/CodeGen/SPIRV/llc-pipeline.ll
+++ b/llvm/test/CodeGen/SPIRV/llc-pipeline.ll
@@ -64,7 +64,6 @@
 ; SPIRV-O0-NEXT:      Legalizer
 ; SPIRV-O0-NEXT:      SPIRV post legalizer
 ; SPIRV-O0-NEXT:      Analysis for ComputingKnownBits
-; SPIRV-O0-NEXT:      Dominator Tree Construction
 ; SPIRV-O0-NEXT:      Natural Loop Information
 ; SPIRV-O0-NEXT:      Lazy Branch Probability Analysis
 ; SPIRV-O0-NEXT:      Lazy Block Frequency Analysis
@@ -170,7 +169,6 @@
 ; SPIRV-Opt-NEXT:      Safe Stack instrumentation pass
 ; SPIRV-Opt-NEXT:      Insert stack protectors
 ; SPIRV-Opt-NEXT:      Analysis containing CSE Info
-; SPIRV-Opt-NEXT:      Dominator Tree Construction
 ; SPIRV-Opt-NEXT:      Natural Loop Information
 ; SPIRV-Opt-NEXT:      Post-Dominator Tree Construction
 ; SPIRV-Opt-NEXT:      Branch Probability Analysis
diff --git a/llvm/test/CodeGen/WebAssembly/GlobalISel/gisel-commandline-option.ll b/llvm/test/CodeGen/WebAssembly/GlobalISel/gisel-commandline-option.ll
index bff1f6912d48b..47b05c8160fd1 100644
--- a/llvm/test/CodeGen/WebAssembly/GlobalISel/gisel-commandline-option.ll
+++ b/llvm/test/CodeGen/WebAssembly/GlobalISel/gisel-commandline-option.ll
@@ -25,7 +25,6 @@
 ; ENABLED-O1-NEXT:  WebAssemblyPostLegalizerCombiner
 ; ENABLED-NEXT:  RegBankSelect
 ; ENABLED-NEXT:  Analysis for ComputingKnownBits
-; ENABLED-O1-NEXT:  Dominator Tree Construction
 ; ENABLED-O1-NEXT:  Natural Loop Information
 ; ENABLED-O1-NEXT:  Lazy Branch Probability Analysis
 ; ENABLED-O1-NEXT:  Lazy Block Frequency Analysis
diff --git a/llvm/test/CodeGen/X86/opt-pipeline.ll b/llvm/test/CodeGen/X86/opt-pipeline.ll
index 6e7707a248873..24390f2d852d3 100644
--- a/llvm/test/CodeGen/X86/opt-pipeline.ll
+++ b/llvm/test/CodeGen/X86/opt-pipeline.ll
@@ -78,7 +78,6 @@
 ; CHECK-NEXT:       Safe Stack instrumentation pass
 ; CHECK-NEXT:       Insert stack protectors
 ; CHECK-NEXT:       Module Verifier
-; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; CHECK-NEXT:       Function Alias Analysis Results
 ; CHECK-NEXT:       Natural Loop Information

>From ba5517d75f9c79759a065a05abbdb2378d082ebd Mon Sep 17 00:00:00 2001
From: Bill Wendling <morbo at google.com>
Date: Thu, 4 Jun 2026 14:32:33 -0700
Subject: [PATCH 7/8] Remove extraneous newline.

---
 llvm/docs/LangRef.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8cb51936a6184..79e0855871025 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -16328,7 +16328,6 @@ The '``llvm.asm.constraint.br``' allows the back-end to choose the best
 constraint rather than restricting the preferred constraint to one that may
 produce substandard code or cannot be handled by the register allocators.
 
-
 It can be called only by the '``callbr``' instruction. The default destination
 of the ``callbr`` contains a call to the preferred inline asm, while the single
 indirect destination contains a call to the pessimal inline asm.

>From 5373473434d9e8797b4bedde8c29404dc8aad035 Mon Sep 17 00:00:00 2001
From: Bill Wendling <isanbard at gmail.com>
Date: Tue, 9 Jun 2026 01:28:49 -0700
Subject: [PATCH 8/8] [SelectionDAG] Clear MayFoldRegister only for outputs
 that threaten each tie
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Dude, we were totally wiping out MayFoldRegister on like every output
before the first matched input, even ones that had nothing to do with
the tie. That's so not metal.

The real deal: a tied input points to output T by MIR position. If any
output at P < T gets folded (register → memory), its operand count
changes and shifts T's MIR position, corrupting the tie encoding. Output
T itself can't be folded either — it IS the tie target. Outputs at
positions > T? Totally fine, dude, leave 'em alone.

Replace the reverse cascade with a targeted forward loop that clears
only positions 0..T for each matched input at target T.
---
 .../SelectionDAG/SelectionDAGBuilder.cpp       | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 2bb1d5725e695..d9158485bc134 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -10354,14 +10354,16 @@ static bool prepareDAGLevelOperands(ConstraintDecisionInfo &Info,
                                     SelectionDAGBuilder &Builder,
                                     const TargetLowering &TLI,
                                     SelectionDAG &DAG) {
-  // Registers before tied operands can't be folded, because the tied operand
-  // will move, which the back-end isn't able to properly account for.
-  bool Clear = false;
-  for (SDISelAsmOperandInfo &OpInfo : llvm::reverse(Info.ConstraintOperands)) {
-    Clear |= OpInfo.isMatchingInputConstraint();
-    if (Clear)
-      OpInfo.MayFoldRegister = false;
-  }
+  // A tied input points to output T by MIR operand position. If any output at
+  // position P < T is folded (register → memory), its operand count changes and
+  // shifts T's MIR position, corrupting the tie encoding. Output T itself
+  // cannot be folded either — it IS the tie target. Outputs after T are
+  // unaffected. Clear only the operands that actually threaten each tie, rather
+  // than conservatively clearing everything before the first matched input.
+  for (SDISelAsmOperandInfo &OpInfo : Info.ConstraintOperands)
+    if (OpInfo.isMatchingInputConstraint())
+      for (unsigned i = 0, T = OpInfo.getMatchedOperand(); i <= T; ++i)
+        Info.ConstraintOperands[i].MayFoldRegister = false;
 
   SDLoc DL = Builder.getCurSDLoc();
   for (SDISelAsmOperandInfo &OpInfo : Info.ConstraintOperands) {



More information about the cfe-commits mailing list