[clang] df75b5d - [win][x64] Add support for Windows x64 unwind v3 (#200249)

via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 9 12:02:59 PDT 2026


Author: Daniel Paoliello
Date: 2026-06-09T12:02:51-07:00
New Revision: df75b5d458b9faef2007485e8348d83b32798a5c

URL: https://github.com/llvm/llvm-project/commit/df75b5d458b9faef2007485e8348d83b32798a5c
DIFF: https://github.com/llvm/llvm-project/commit/df75b5d458b9faef2007485e8348d83b32798a5c.diff

LOG: [win][x64] Add support for Windows x64 unwind v3 (#200249)

The extended registers in Intel APX cannot be described in the current
Windows x64 unwind information, this has made it neccesary to introduce
a new version. While designing this new version, improvements have been
made from the lessons learnt from the existing unwind information
formats (both x64 and AArch64).

Documentation for unwind v3 is available at:
<https://learn.microsoft.com/en-us/cpp/build/x64-unwind-information-v3>

This change:
* Implements encoding unwind v3 information in MC. This includes support
for non-mirror epilogs, WOD pool sharing and using large infos if
required.
* Add support for encoding push2/pop2 SEH info: a new `SEH_Push2Regs`
pseudo-instruction and `.seh_push2regs` assembly directive.
* Changes the prolog/epilog codegen to support unwind v3. Specifically,
adding pseudos into the epilog (as non-mirror epilogs are permitted) and
placing the pseudos in the prolog before the instruction instead of
after (as v3 measures its offsets from the start rather than the end of
an instruction).
* Add an unwind v3 pass to chain unwind info if there are too many
epilogs in a function and raise a fatal error if there are too many
instructions in a prolog or epilog.
* Changes the module flag and code gen enums to allow selecting
"Default" vs "v1" vs "v2 best effort" vs "v2 required" vs "v3" unwind
info version.
* Adds new Clang and cc1 flags for selecting the unwind info version and
deprecates the old v2 flag. The default unwind info version stays as v1
UNLESS the EGPR target feature is enabled (then it will use v3). Trying
to enable EGPR when v1/v2 is explicitly enabled will result in an error.

Windows 11 [build
29576](https://learn.microsoft.com/en-us/windows-insider/release-notes/experimental-future-platforms/preview-build-29576-1000)
and above support unwind v3 in user-mode, but do not yet support large
info.

Added: 
    clang/test/CodeGen/winx64-eh-unwind-egpr.c
    clang/test/Driver/winx64-eh-unwind.c
    llvm/lib/Target/X86/X86WinEHUnwindV3.cpp
    llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh-v3.ll
    llvm/test/CodeGen/X86/win64-eh-unwindv3-egpr-required.ll
    llvm/test/CodeGen/X86/win64-eh-unwindv3-funclet-prolog.ll
    llvm/test/CodeGen/X86/win64-eh-unwindv3-push2pop2.ll
    llvm/test/CodeGen/X86/win64-eh-unwindv3-split.ll
    llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-epilog-ops.mir
    llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-prolog-ops.mir
    llvm/test/CodeGen/X86/win64-eh-unwindv3.ll
    llvm/test/MC/COFF/seh-unwindv3-error.s
    llvm/test/MC/COFF/seh-unwindv3-inheritance.s
    llvm/test/MC/COFF/seh-unwindv3-large.s
    llvm/test/MC/COFF/seh-unwindv3-nonmirror.s
    llvm/test/MC/COFF/seh-unwindv3-pool-sharing.s
    llvm/test/MC/COFF/seh-unwindv3.s
    llvm/unittests/MC/WODRoundTripTest.cpp

Modified: 
    clang/docs/ReleaseNotes.rst
    clang/include/clang/Basic/CodeGenOptions.def
    clang/include/clang/Basic/CodeGenOptions.h
    clang/include/clang/Options/Options.td
    clang/lib/CodeGen/CodeGenModule.cpp
    clang/lib/Driver/ToolChains/Clang.cpp
    clang/test/CodeGen/epilog-unwind.c
    clang/test/Driver/cl-options.c
    llvm/docs/ReleaseNotes.md
    llvm/include/llvm/IR/Module.h
    llvm/include/llvm/MC/MCStreamer.h
    llvm/include/llvm/MC/MCWin64EH.h
    llvm/include/llvm/MC/MCWinEH.h
    llvm/include/llvm/Support/CodeGen.h
    llvm/lib/IR/Module.cpp
    llvm/lib/MC/MCAsmStreamer.cpp
    llvm/lib/MC/MCStreamer.cpp
    llvm/lib/MC/MCWin64EH.cpp
    llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
    llvm/lib/Target/X86/CMakeLists.txt
    llvm/lib/Target/X86/X86.h
    llvm/lib/Target/X86/X86AsmPrinter.cpp
    llvm/lib/Target/X86/X86FrameLowering.cpp
    llvm/lib/Target/X86/X86InstrCompiler.td
    llvm/lib/Target/X86/X86MCInstLower.cpp
    llvm/lib/Target/X86/X86TargetMachine.cpp
    llvm/lib/Target/X86/X86WinEHUnwindV2.cpp
    llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
    llvm/test/DebugInfo/COFF/apx-egpr.ll
    llvm/test/MC/AsmParser/seh-directive-errors.s
    llvm/unittests/MC/CMakeLists.txt

Removed: 
    


################################################################################
diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index a00f7212512d4..ad36eb7c7a5b0 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -839,6 +839,30 @@ Windows Support
 - ``-fmacro-prefix-map=`` (``-ffile-prefix-map=``) now affects an anonymous namespace hash generation
   for the MSVC targets and allows deterministic symbol mangling for reproducible builds.
 
+- Added the ``-fwinx64-eh-unwind=`` flag to select the x64 Windows unwind info
+  version (``v1``, ``v2-best-effort``, ``v2-required``, or ``v3``). The legacy
+  ``-fwinx64-eh-unwindv2=`` flag is deprecated; it is still accepted and mapped
+  onto the new flag as follows:
+
+  .. list-table::
+     :header-rows: 1
+
+     * - Legacy ``-fwinx64-eh-unwindv2=``
+       - New ``-fwinx64-eh-unwind=``
+     * - ``disabled``
+       - ``v1`` (default; no flag forwarded)
+     * - ``best-effort``
+       - ``v2-best-effort``
+     * - ``required``
+       - ``v2-required``
+
+  The MSVC-compatible ``/d2epilogunwind`` and ``/d2epilogunwindrequirev2``
+  options map to ``v2-best-effort`` and ``v2-required`` respectively.
+
+- When targeting Windows x64 with EGPR (`-mapx-features=egpr`), Clang now
+  automatically enables V3 unwind info (`-fwinx64-eh-unwind=v3`) if no
+  explicit unwind version was specified.
+
 LoongArch Support
 ^^^^^^^^^^^^^^^^^
 

diff  --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 6cce4ada1dfd1..073daeeb4b87a 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -521,10 +521,9 @@ CODEGENOPT(ResMayAlias, 1, 0, Benign)
 /// Assume that all resources are bound if enabled
 CODEGENOPT(AllResourcesBound, 1, 0, Benign)
 
-/// Controls how unwind v2 (epilog) information should be generated for x64
-/// Windows.
-ENUM_CODEGENOPT(WinX64EHUnwindV2, WinX64EHUnwindV2Mode,
-                2, WinX64EHUnwindV2Mode::Disabled, Benign)
+/// Controls the x64 Windows unwind info version.
+ENUM_CODEGENOPT(WinX64EHUnwind, WinX64EHUnwindMode, 3,
+                WinX64EHUnwindMode::Default, Benign)
 
 /// Controls the mechanism used for Control Flow Guard (CFG) on Windows.
 ENUM_CODEGENOPT(WinControlFlowGuardMechanism, ControlFlowGuardMechanism,

diff  --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h
index e43112b4bb98b..835ea728e9c56 100644
--- a/clang/include/clang/Basic/CodeGenOptions.h
+++ b/clang/include/clang/Basic/CodeGenOptions.h
@@ -66,7 +66,7 @@ class CodeGenOptionsBase {
   using AsanDtorKind = llvm::AsanDtorKind;
   using VectorLibrary = llvm::driver::VectorLibrary;
   using ZeroCallUsedRegsKind = llvm::ZeroCallUsedRegs::ZeroCallUsedRegsKind;
-  using WinX64EHUnwindV2Mode = llvm::WinX64EHUnwindV2Mode;
+  using WinX64EHUnwindMode = llvm::WinX64EHUnwindMode;
   using ControlFlowGuardMechanism = llvm::ControlFlowGuardMechanism;
 
   using DebugCompressionType = llvm::DebugCompressionType;

diff  --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td
index 4fd892e58df86..a39ef7793d876 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -2341,14 +2341,20 @@ defm assume_nothrow_exception_dtor: BoolFOption<"assume-nothrow-exception-dtor",
   LangOpts<"AssumeNothrowExceptionDtor">, DefaultFalse,
   PosFlag<SetTrue, [], [ClangOption, CC1Option], "Assume that exception objects' destructors are non-throwing">,
   NegFlag<SetFalse>>;
-def winx64_eh_unwindv2
-    : Joined<["-"], "fwinx64-eh-unwindv2=">, Group<f_Group>,
+def winx64_eh_unwind_EQ
+    : Joined<["-"], "fwinx64-eh-unwind=">, Group<f_Group>,
     Visibility<[ClangOption, CC1Option]>,
-      HelpText<"Generate unwind v2 (epilog) information for x64 Windows">,
-      Values<"disabled,best-effort,required">,
-      NormalizedValues<["Disabled", "BestEffort", "Required"]>,
-      NormalizedValuesScope<"llvm::WinX64EHUnwindV2Mode">,
-      MarshallingInfoEnum<CodeGenOpts<"WinX64EHUnwindV2">, "Disabled">;
+      HelpText<"Set x64 Windows unwind info version">,
+      Values<"v1,v2-best-effort,v2-required,v3">,
+      NormalizedValues<["V1", "V2BestEffort", "V2Required", "V3"]>,
+      NormalizedValuesScope<"llvm::WinX64EHUnwindMode">,
+      MarshallingInfoEnum<CodeGenOpts<"WinX64EHUnwind">, "Default">;
+// Legacy flag — driver only, not passed to cc1. The driver translates it.
+def winx64_eh_unwindv2_EQ
+    : Joined<["-"], "fwinx64-eh-unwindv2=">, Group<f_Group>,
+    Visibility<[ClangOption]>,
+      HelpText<"(Legacy) Generate unwind v2 (epilog) information for x64 Windows">,
+      Values<"disabled,best-effort,required">;
 def win_cfg_mechanism
     : Joined<["-"], "fwin-cfg-mechanism=">, Group<f_Group>,
     Visibility<[ClangOption, CC1Option]>,

diff  --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 432e60e84fa92..41049d85121be 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1592,11 +1592,24 @@ void CodeGenModule::Release() {
     getModule().addModuleFlag(llvm::Module::Warning, "import-call-optimization",
                               1);
 
-  // Enable unwind v2 (epilog).
-  if (CodeGenOpts.getWinX64EHUnwindV2() != llvm::WinX64EHUnwindV2Mode::Disabled)
-    getModule().addModuleFlag(
-        llvm::Module::Warning, "winx64-eh-unwindv2",
-        static_cast<unsigned>(CodeGenOpts.getWinX64EHUnwindV2()));
+  // Enable unwind v2/v3.
+  // Set the module flag here based on the user's requested mode (or auto-
+  // promote to V3 when EGPR is enabled module-wide, since V1/V2 cannot encode
+  // R16-R31). The per-function EGPR compatibility check is performed in
+  // EmitGlobalFunctionDefinition so that `__attribute__((target("egpr")))`
+  // and `nounwind` are respected.
+
+  auto UnwindMode = CodeGenOpts.getWinX64EHUnwind();
+  if (UnwindMode == llvm::WinX64EHUnwindMode::Default) {
+    if (T.isOSWindows() && T.isX86_64() &&
+        Context.getTargetInfo().hasFeature("egpr"))
+      UnwindMode = llvm::WinX64EHUnwindMode::V3;
+    else
+      UnwindMode = llvm::WinX64EHUnwindMode::V1;
+  }
+  if (UnwindMode != llvm::WinX64EHUnwindMode::V1)
+    getModule().addModuleFlag(llvm::Module::Warning, "winx64-eh-unwind",
+                              static_cast<unsigned>(UnwindMode));
 
   // Indicate whether this Module was compiled with -fopenmp
   if (getLangOpts().OpenMP && !getLangOpts().OpenMPSimd)
@@ -6928,6 +6941,38 @@ void CodeGenModule::EmitGlobalFunctionDefinition(GlobalDecl GD,
 
   SetLLVMFunctionAttributesForDefinition(D, Fn);
 
+  // EGPR (R16-R31) requires V3 unwind info on Windows x64 because V1/V2 cannot
+  // encode extended register numbers. Check per-function so that `target`
+  // attribute and `nounwind`/no-unwind-table functions are respected.
+  if (getTriple().isOSWindows() && getTriple().isX86_64()) {
+    auto UnwindMode = CodeGenOpts.getWinX64EHUnwind();
+    if (UnwindMode != llvm::WinX64EHUnwindMode::Default &&
+        UnwindMode != llvm::WinX64EHUnwindMode::V3 &&
+        Fn->needsUnwindTableEntry()) {
+      bool HasEGPR = false;
+      if (Fn->hasFnAttribute("target-features")) {
+        StringRef Feats =
+            Fn->getFnAttribute("target-features").getValueAsString();
+        SmallVector<StringRef, 16> Tokens;
+        Feats.split(Tokens, ',', /*MaxSplit=*/-1, /*KeepEmpty=*/false);
+        for (StringRef Tok : Tokens) {
+          if (Tok == "+egpr")
+            HasEGPR = true;
+          else if (Tok == "-egpr")
+            HasEGPR = false;
+        }
+      } else {
+        HasEGPR = Context.getTargetInfo().hasFeature("egpr");
+      }
+      if (HasEGPR) {
+        unsigned DiagID = Diags.getCustomDiagID(
+            DiagnosticsEngine::Error,
+            "EGPR target feature requires unwind version 3");
+        Diags.Report(D->getLocation(), DiagID);
+      }
+    }
+  }
+
   auto GetPriority = [this](const auto *Attr) -> int {
     Expr *E = Attr->getPriority();
     if (E) {

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 8a0efd70e6c0d..4e29179f57d95 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7707,8 +7707,24 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
     }
   }
 
-  // Unwind v2 (epilog) information for x64 Windows.
-  Args.AddLastArg(CmdArgs, options::OPT_winx64_eh_unwindv2);
+  // Unwind information version for x64 Windows.
+  // Forward the new unified flag if present, otherwise translate legacy flags.
+  if (const Arg *A = Args.getLastArg(options::OPT_winx64_eh_unwind_EQ)) {
+    A->claim();
+    CmdArgs.push_back(
+        Args.MakeArgString(Twine("-fwinx64-eh-unwind=") + A->getValue()));
+  } else if (const Arg *A =
+                 Args.getLastArg(options::OPT_winx64_eh_unwindv2_EQ)) {
+    A->claim();
+    StringRef Val = A->getValue();
+    if (Val == "best-effort")
+      CmdArgs.push_back("-fwinx64-eh-unwind=v2-best-effort");
+    else if (Val == "required")
+      CmdArgs.push_back("-fwinx64-eh-unwind=v2-required");
+    // "disabled" maps to v1 default, nothing to forward.
+    else if (Val != "disabled")
+      D.Diag(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
+  }
 
   // Control Flow Guard mechanism for Windows.
   Args.AddLastArg(CmdArgs, options::OPT_win_cfg_mechanism);
@@ -8815,11 +8831,12 @@ void Clang::AddClangCLArgs(const ArgList &Args, types::ID InputType,
   if (Args.hasArg(options::OPT__SLASH_kernel))
     CmdArgs.push_back("-fms-kernel");
 
-  // Unwind v2 (epilog) information for x64 Windows.
+  // Unwind v2 (epilog) information for x64 Windows. MSVC's behavior is not
+  // order-dependent: /d2epilogunwindrequirev2 always wins over /d2epilogunwind.
   if (Args.hasArg(options::OPT__SLASH_d2epilogunwindrequirev2))
-    CmdArgs.push_back("-fwinx64-eh-unwindv2=required");
+    CmdArgs.push_back("-fwinx64-eh-unwind=v2-required");
   else if (Args.hasArg(options::OPT__SLASH_d2epilogunwind))
-    CmdArgs.push_back("-fwinx64-eh-unwindv2=best-effort");
+    CmdArgs.push_back("-fwinx64-eh-unwind=v2-best-effort");
 
   // Handle the various /guard options. We don't immediately push back clang
   // args since there are /d2 args that can modify the behavior of /guard:cf.

diff  --git a/clang/test/CodeGen/epilog-unwind.c b/clang/test/CodeGen/epilog-unwind.c
index b2f7497b455b6..55c22a55ad374 100644
--- a/clang/test/CodeGen/epilog-unwind.c
+++ b/clang/test/CodeGen/epilog-unwind.c
@@ -1,11 +1,13 @@
 // RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s -check-prefix=DISABLED
-// RUN: %clang_cc1 -fwinx64-eh-unwindv2=disabled -emit-llvm %s -o - | FileCheck %s -check-prefix=DISABLED
-// RUN: %clang_cc1 -fwinx64-eh-unwindv2=best-effort -emit-llvm %s -o - | FileCheck %s -check-prefix=BESTEFFORT
-// RUN: %clang_cc1 -fwinx64-eh-unwindv2=required -emit-llvm %s -o - | FileCheck %s -check-prefix=REQUIRED
-// RUN: %clang -fwinx64-eh-unwindv2=best-effort -S -emit-llvm %s -o - | FileCheck %s -check-prefix=BESTEFFORT
+// RUN: %clang_cc1 -fwinx64-eh-unwind=v1 -emit-llvm %s -o - | FileCheck %s -check-prefix=DISABLED
+// RUN: %clang_cc1 -fwinx64-eh-unwind=v2-best-effort -emit-llvm %s -o - | FileCheck %s -check-prefix=BESTEFFORT
+// RUN: %clang_cc1 -fwinx64-eh-unwind=v2-required -emit-llvm %s -o - | FileCheck %s -check-prefix=REQUIRED
+// RUN: %clang_cc1 -fwinx64-eh-unwind=v3 -emit-llvm %s -o - | FileCheck %s -check-prefix=V3
+// RUN: %clang -fwinx64-eh-unwind=v2-best-effort -S -emit-llvm %s -o - | FileCheck %s -check-prefix=BESTEFFORT
 
 void f(void) {}
 
-// BESTEFFORT: !"winx64-eh-unwindv2", i32 1}
-// REQUIRED: !"winx64-eh-unwindv2", i32 2}
-// DISABLED-NOT: "winx64-eh-unwindv2"
+// BESTEFFORT: !"winx64-eh-unwind", i32 1}
+// REQUIRED: !"winx64-eh-unwind", i32 2}
+// V3: !"winx64-eh-unwind", i32 3}
+// DISABLED-NOT: "winx64-eh-unwind"

diff  --git a/clang/test/CodeGen/winx64-eh-unwind-egpr.c b/clang/test/CodeGen/winx64-eh-unwind-egpr.c
new file mode 100644
index 0000000000000..f87888db6610b
--- /dev/null
+++ b/clang/test/CodeGen/winx64-eh-unwind-egpr.c
@@ -0,0 +1,38 @@
+// REQUIRES: x86-registered-target
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +egpr -emit-llvm %s -o - | FileCheck %s -check-prefix=EGPR-DEFAULT
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +egpr -fwinx64-eh-unwind=v3 -emit-llvm %s -o - | FileCheck %s -check-prefix=EGPR-V3
+// RUN: not %clang_cc1 -triple x86_64-windows-msvc -target-feature +egpr -fwinx64-eh-unwind=v1 -fexceptions -emit-llvm %s -o - 2>&1 | FileCheck %s -check-prefix=EGPR-V1-ERROR
+// RUN: not %clang_cc1 -triple x86_64-windows-msvc -target-feature +egpr -fwinx64-eh-unwind=v2-best-effort -fexceptions -emit-llvm %s -o - 2>&1 | FileCheck %s -check-prefix=EGPR-V2-ERROR
+// RUN: not %clang_cc1 -triple x86_64-windows-msvc -target-feature +egpr -fwinx64-eh-unwind=v2-required -fexceptions -emit-llvm %s -o - 2>&1 | FileCheck %s -check-prefix=EGPR-V2-ERROR
+// Per-function check should NOT fire when the function is nounwind (no unwind
+// info is emitted) — this run compiles successfully without -fexceptions.
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +egpr -fwinx64-eh-unwind=v1 -emit-llvm %s -o - | FileCheck %s -check-prefix=EGPR-V1-NOUNWIND
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -emit-llvm %s -o - | FileCheck %s -check-prefix=NO-EGPR
+// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -target-feature +egpr -emit-llvm %s -o - | FileCheck %s -check-prefix=EGPR-LINUX
+
+void g(void);
+void f(void) { g(); }
+
+// EGPR on Windows x64 with no explicit unwind mode should auto-promote to V3.
+// EGPR-DEFAULT: !"winx64-eh-unwind", i32 3}
+
+// EGPR on Windows x64 with explicit V3 should still emit V3.
+// EGPR-V3: !"winx64-eh-unwind", i32 3}
+
+// EGPR on Windows x64 with explicit V1 + exceptions should produce an error
+// (the function needs unwind info).
+// EGPR-V1-ERROR: error: EGPR target feature requires unwind version 3
+
+// EGPR on Windows x64 with explicit V2 + exceptions should produce an error.
+// EGPR-V2-ERROR: error: EGPR target feature requires unwind version 3
+
+// EGPR on Windows x64 with explicit V1 and no exceptions: the function is
+// nounwind, so no error and no module flag (V1 is the default).
+// EGPR-V1-NOUNWIND-NOT: error
+// EGPR-V1-NOUNWIND-NOT: "winx64-eh-unwind"
+
+// Without EGPR on Windows x64, default should be V1 (no flag emitted).
+// NO-EGPR-NOT: "winx64-eh-unwind"
+
+// EGPR on non-Windows should not emit the flag.
+// EGPR-LINUX-NOT: "winx64-eh-unwind"

diff  --git a/clang/test/Driver/cl-options.c b/clang/test/Driver/cl-options.c
index c0f57ae768252..63a423f5b3114 100644
--- a/clang/test/Driver/cl-options.c
+++ b/clang/test/Driver/cl-options.c
@@ -856,11 +856,20 @@
 // ARM64EC_OVERRIDE: warning: /arm64EC has been overridden by specified target: x86_64-pc-windows-msvc; option ignored
 
 // RUN: %clang_cl /d2epilogunwind /c -### -- %s 2>&1 | FileCheck %s --check-prefix=EPILOGUNWIND
-// EPILOGUNWIND: -fwinx64-eh-unwindv2=best-effort
+// EPILOGUNWIND: -fwinx64-eh-unwind=v2-best-effort
 
 // RUN: %clang_cl /d2epilogunwindrequirev2 /c -### -- %s 2>&1 | FileCheck %s --check-prefix=EPILOGUNWINDREQUIREV2
+// /d2epilogunwindrequirev2 always wins over /d2epilogunwind, regardless of
+// the order they appear on the command line (matches MSVC).
+// RUN: %clang_cl /d2epilogunwind /d2epilogunwindrequirev2 /c -### -- %s 2>&1 | FileCheck %s --check-prefix=EPILOGUNWINDREQUIREV2
 // RUN: %clang_cl /d2epilogunwindrequirev2 /d2epilogunwind /c -### -- %s 2>&1 | FileCheck %s --check-prefix=EPILOGUNWINDREQUIREV2
-// EPILOGUNWINDREQUIREV2: -fwinx64-eh-unwindv2=require
+// EPILOGUNWINDREQUIREV2: -fwinx64-eh-unwind=v2-required
+
+// RUN: not %clang --target=x86_64-windows-msvc -fsyntax-only -fwinx64-eh-unwind=invalid %s 2>&1 | FileCheck %s --check-prefix=UNWIND_INVALID
+// UNWIND_INVALID: error: invalid value 'invalid' in '-fwinx64-eh-unwind=invalid'
+
+// RUN: not %clang --target=x86_64-windows-msvc -fwinx64-eh-unwindv2=invalid -### %s 2>&1 | FileCheck %s --check-prefix=UNWINDV2_INVALID
+// UNWINDV2_INVALID: error: invalid value 'invalid' in '-fwinx64-eh-unwindv2=invalid'
 
 // RUN: %clang_cl /funcoverride:override_me1 /funcoverride:override_me2 /c -### -- %s 2>&1 | FileCheck %s --check-prefix=FUNCOVERRIDE
 // FUNCOVERRIDE: -loader-replaceable-function=override_me1

diff  --git a/clang/test/Driver/winx64-eh-unwind.c b/clang/test/Driver/winx64-eh-unwind.c
new file mode 100644
index 0000000000000..036c44aeed87c
--- /dev/null
+++ b/clang/test/Driver/winx64-eh-unwind.c
@@ -0,0 +1,22 @@
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwind=v1 %s 2>&1 | FileCheck --check-prefix=V1 %s
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwind=v2-best-effort %s 2>&1 | FileCheck --check-prefix=V2BE %s
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwind=v2-required %s 2>&1 | FileCheck --check-prefix=V2REQ %s
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwind=v3 %s 2>&1 | FileCheck --check-prefix=V3 %s
+
+// Legacy -fwinx64-eh-unwindv2= translation.
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwindv2=best-effort %s 2>&1 | FileCheck --check-prefix=V2BE %s
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwindv2=required %s 2>&1 | FileCheck --check-prefix=V2REQ %s
+// Legacy disabled maps to v1 default — no flag should be forwarded.
+// RUN: %clang -### --target=x86_64-windows-msvc -fwinx64-eh-unwindv2=disabled %s 2>&1 | FileCheck --check-prefix=V1DISABLED %s
+
+// MSVC compatibility flags.
+// RUN: %clang_cl -### --target=x86_64-windows-msvc /d2epilogunwind -- %s 2>&1 | FileCheck --check-prefix=V2BE %s
+// RUN: %clang_cl -### --target=x86_64-windows-msvc /d2epilogunwindrequirev2 -- %s 2>&1 | FileCheck --check-prefix=V2REQ %s
+
+// V1:          "-fwinx64-eh-unwind=v1"
+// V1DISABLED-NOT: "-fwinx64-eh-unwind=
+// V2BE:        "-fwinx64-eh-unwind=v2-best-effort"
+// V2REQ:       "-fwinx64-eh-unwind=v2-required"
+// V3:          "-fwinx64-eh-unwind=v3"
+
+void f(void) {}

diff  --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 61b0dd3220285..398e46214dc98 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -262,6 +262,8 @@ Makes programs 10x faster by doing Special New Thing.
   in use. This matches the behaviour of Intel syntax and aids with
   compatibility when changing the default Clang syntax to the Intel syntax.
 
+* EGPR (R16-R31) now requires V3 unwind info on Windows x64. Using EGPR
+  without V3 unwind produces a fatal error.
 * Implemented Win64 APX ABI callee-saved registers: R30 and R31 are now
   treated as non-volatile in the Win64 calling convention when APX is
   available, per the Microsoft x64 calling convention specification.
@@ -271,6 +273,10 @@ Makes programs 10x faster by doing Special New Thing.
   registers across longjmp. A warning is emitted for large functions
   where this reservation may impact performance.
 
+* Added ``.seh_push2regs`` assembly directive for explicitly encoding a
+  two-register push in Windows x64 V3 unwind info. The directive takes two
+  register operands: ``.seh_push2regs %r12, %r13``.
+
 ### Changes to the OCaml bindings
 
 ### Changes to the Python bindings

diff  --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index 2032c0ceb2088..4f7e33969f16f 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -1063,9 +1063,8 @@ class LLVM_ABI Module {
   /// Returns target-abi from MDString, null if target-abi is absent.
   StringRef getTargetABIFromMD();
 
-  /// Get how unwind v2 (epilog) information should be generated for x64
-  /// Windows.
-  WinX64EHUnwindV2Mode getWinX64EHUnwindV2Mode() const;
+  /// Get how unwind information should be generated for x64 Windows.
+  WinX64EHUnwindMode getWinX64EHUnwindMode() const;
 
   /// Gets the Control Flow Guard mode.
   ControlFlowGuardMode getControlFlowGuardMode() const;

diff  --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 3d249408d07a8..750e3269cc0e3 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -236,6 +236,9 @@ class LLVM_ABI MCStreamer {
   WinEH::FrameInfo *CurrentWinFrameInfo;
   size_t CurrentProcWinFrameInfoStartIndex;
 
+  /// Default unwind version for new WinCFI frames.
+  uint8_t DefaultWinCFIUnwindVersion = 1;
+
   /// This is stack of current and previous section values saved by
   /// pushSection.
   SmallVector<std::pair<MCSectionSubPair, MCSectionSubPair>, 4> SectionStack;
@@ -1065,6 +1068,8 @@ class LLVM_ABI MCStreamer {
   virtual void emitWinCFIFuncletOrFuncEnd(SMLoc Loc = SMLoc());
   virtual void emitWinCFISplitChained(SMLoc Loc = SMLoc());
   virtual void emitWinCFIPushReg(MCRegister Register, SMLoc Loc = SMLoc());
+  virtual void emitWinCFIPush2Regs(MCRegister Reg1, MCRegister Reg2,
+                                   SMLoc Loc = SMLoc());
   virtual void emitWinCFISetFrame(MCRegister Register, unsigned Offset,
                                   SMLoc Loc = SMLoc());
   virtual void emitWinCFIAllocStack(unsigned Size, SMLoc Loc = SMLoc());
@@ -1078,6 +1083,11 @@ class LLVM_ABI MCStreamer {
   virtual void emitWinCFIEndEpilogue(SMLoc Loc = SMLoc());
   virtual void emitWinCFIUnwindV2Start(SMLoc Loc = SMLoc());
   virtual void emitWinCFIUnwindVersion(uint8_t Version, SMLoc Loc = SMLoc());
+
+  /// Set the default unwind version for new WinCFI frames.
+  void setDefaultWinCFIUnwindVersion(uint8_t V) {
+    DefaultWinCFIUnwindVersion = V;
+  }
   virtual void emitWinEHHandler(const MCSymbol *Sym, bool Unwind, bool Except,
                                 SMLoc Loc = SMLoc());
   virtual void emitWinEHHandlerData(SMLoc Loc = SMLoc());

diff  --git a/llvm/include/llvm/MC/MCWin64EH.h b/llvm/include/llvm/MC/MCWin64EH.h
index f9af9857d6867..dcd6c6497d111 100644
--- a/llvm/include/llvm/MC/MCWin64EH.h
+++ b/llvm/include/llvm/MC/MCWin64EH.h
@@ -26,6 +26,9 @@ struct Instruction {
   static WinEH::Instruction PushNonVol(MCSymbol *L, unsigned Reg) {
     return WinEH::Instruction(Win64EH::UOP_PushNonVol, L, Reg, -1);
   }
+  static WinEH::Instruction Push2(MCSymbol *L, unsigned Reg1, unsigned Reg2) {
+    return WinEH::Instruction(Win64EH::UOP_Push2, L, Reg1, Reg2, -1);
+  }
   static WinEH::Instruction Alloc(MCSymbol *L, unsigned Size) {
     return WinEH::Instruction(Size > 128 ? UOP_AllocLarge : UOP_AllocSmall, L,
                               -1, Size);
@@ -70,6 +73,9 @@ class ARM64UnwindEmitter : public WinEH::UnwindEmitter {
   void EmitUnwindInfo(MCStreamer &Streamer, WinEH::FrameInfo *FI,
                       bool HandlerData) const override;
 };
+/// Encode a single WinEH::Instruction as V3 WOD bytes.
+/// Appends encoded bytes to Out.
+void EncodeWOD(const WinEH::Instruction &Inst, SmallVectorImpl<uint8_t> &Out);
 } // namespace Win64EH
 } // namespace llvm
 

diff  --git a/llvm/include/llvm/MC/MCWinEH.h b/llvm/include/llvm/MC/MCWinEH.h
index 9fbfd34da8e64..3d96f1d842fae 100644
--- a/llvm/include/llvm/MC/MCWinEH.h
+++ b/llvm/include/llvm/MC/MCWinEH.h
@@ -23,17 +23,22 @@ namespace WinEH {
 struct Instruction {
   const MCSymbol *Label;
   unsigned Offset;
-  unsigned Register;
-  unsigned Operation;
+  uint16_t Register;
+  uint16_t Register2; // For 2-register ops (e.g. PUSH2)
+  uint8_t Operation;
 
   Instruction(unsigned Op, MCSymbol *L, unsigned Reg, unsigned Off)
-    : Label(L), Offset(Off), Register(Reg), Operation(Op) {}
+      : Label(L), Offset(Off), Register(Reg), Register2(0), Operation(Op) {}
+
+  Instruction(unsigned Op, MCSymbol *L, unsigned Reg1, unsigned Reg2,
+              unsigned Off)
+      : Label(L), Offset(Off), Register(Reg1), Register2(Reg2), Operation(Op) {}
 
   bool operator==(const Instruction &I) const {
     // Check whether two instructions refer to the same operation
     // applied at a 
diff erent spot (i.e. pointing at a 
diff erent label).
     return Offset == I.Offset && Register == I.Register &&
-           Operation == I.Operation;
+           Register2 == I.Register2 && Operation == I.Operation;
   }
   bool operator!=(const Instruction &I) const { return !(*this == I); }
 };

diff  --git a/llvm/include/llvm/Support/CodeGen.h b/llvm/include/llvm/Support/CodeGen.h
index 52f00c3258c0f..62be46ffb8bd3 100644
--- a/llvm/include/llvm/Support/CodeGen.h
+++ b/llvm/include/llvm/Support/CodeGen.h
@@ -164,13 +164,14 @@ namespace llvm {
     Invalid = 2, ///< Not used.
   };
 
-  enum class WinX64EHUnwindV2Mode {
-    // Don't use unwind v2 (i.e., use v1).
-    Disabled = 0,
-    // Use unwind v2 here possible, otherwise fallback to v1.
-    BestEffort = 1,
-    // Use unwind v2 everywhere, otherwise raise an error.
-    Required = 2,
+  enum class WinX64EHUnwindMode {
+    Default = 4, // Toolchain default/auto.
+                 // Using '4' to avoid renumbering the existing values.
+
+    V1 = 0,           // V1 unwind info.
+    V2BestEffort = 1, // V2 where possible, fall back to V1.
+    V2Required = 2,   // V2 required — error if a function cannot use V2.
+    V3 = 3,           // V3 unwind info.
   };
 
   enum class ControlFlowGuardMode {

diff  --git a/llvm/lib/IR/Module.cpp b/llvm/lib/IR/Module.cpp
index faa0fbd3ab81c..40ef7e354da72 100644
--- a/llvm/lib/IR/Module.cpp
+++ b/llvm/lib/IR/Module.cpp
@@ -957,11 +957,18 @@ StringRef Module::getTargetABIFromMD() {
   return TargetABI;
 }
 
-WinX64EHUnwindV2Mode Module::getWinX64EHUnwindV2Mode() const {
-  Metadata *MD = getModuleFlag("winx64-eh-unwindv2");
-  if (auto *CI = mdconst::dyn_extract_or_null<ConstantInt>(MD))
-    return static_cast<WinX64EHUnwindV2Mode>(CI->getZExtValue());
-  return WinX64EHUnwindV2Mode::Disabled;
+WinX64EHUnwindMode Module::getWinX64EHUnwindMode() const {
+  // Check the new unified flag first.
+  if (Metadata *MD = getModuleFlag("winx64-eh-unwind")) {
+    if (auto *CI = mdconst::dyn_extract_or_null<ConstantInt>(MD))
+      return static_cast<WinX64EHUnwindMode>(CI->getZExtValue());
+  }
+  // Fall back to the legacy V2 flag.
+  if (Metadata *MD = getModuleFlag("winx64-eh-unwindv2")) {
+    if (auto *CI = mdconst::dyn_extract_or_null<ConstantInt>(MD))
+      return static_cast<WinX64EHUnwindMode>(CI->getZExtValue());
+  }
+  return WinX64EHUnwindMode::V1;
 }
 
 ControlFlowGuardMode Module::getControlFlowGuardMode() const {

diff  --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 05e3dc28a502b..68929ddc135f3 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -419,6 +419,8 @@ class MCAsmStreamer final : public MCAsmBaseStreamer {
   void emitWinCFIFuncletOrFuncEnd(SMLoc Loc) override;
   void emitWinCFISplitChained(SMLoc Loc) override;
   void emitWinCFIPushReg(MCRegister Register, SMLoc Loc) override;
+  void emitWinCFIPush2Regs(MCRegister Reg1, MCRegister Reg2,
+                           SMLoc Loc) override;
   void emitWinCFISetFrame(MCRegister Register, unsigned Offset,
                           SMLoc Loc) override;
   void emitWinCFIAllocStack(unsigned Size, SMLoc Loc) override;
@@ -2391,6 +2393,17 @@ void MCAsmStreamer::emitWinCFIPushReg(MCRegister Register, SMLoc Loc) {
   EmitEOL();
 }
 
+void MCAsmStreamer::emitWinCFIPush2Regs(MCRegister Reg1, MCRegister Reg2,
+                                        SMLoc Loc) {
+  MCStreamer::emitWinCFIPush2Regs(Reg1, Reg2, Loc);
+
+  OS << "\t.seh_push2regs ";
+  InstPrinter->printRegName(OS, Reg1);
+  OS << ", ";
+  InstPrinter->printRegName(OS, Reg2);
+  EmitEOL();
+}
+
 void MCAsmStreamer::emitWinCFISetFrame(MCRegister Register, unsigned Offset,
                                        SMLoc Loc) {
   MCStreamer::emitWinCFISetFrame(Register, Offset, Loc);

diff  --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index c4f607a8c11f7..a02ba1fe31f8c 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -805,6 +805,8 @@ void MCStreamer::emitWinCFIStartProc(const MCSymbol *Symbol, SMLoc Loc) {
   CurrentWinFrameInfo = WinFrameInfos.back().get();
   CurrentWinFrameInfo->TextSection = getCurrentSectionOnly();
   CurrentWinFrameInfo->FunctionLoc = Loc;
+  // Inherit the module-wide default unwind version.
+  CurrentWinFrameInfo->Version = DefaultWinCFIUnwindVersion;
 }
 
 void MCStreamer::emitWinCFIEndProc(SMLoc Loc) {
@@ -962,7 +964,35 @@ void MCStreamer::emitWinCFIPushReg(MCRegister Register, SMLoc Loc) {
 
   WinEH::Instruction Inst = Win64EH::Instruction::PushNonVol(
       Label, encodeSEHRegNum(Context, Register));
-  CurFrame->Instructions.push_back(Inst);
+  if (CurrentWinEpilog) {
+    if (CurFrame->Version < 3)
+      return getContext().reportError(
+          Loc, ".seh_pushreg inside epilog requires unwind v3");
+    CurrentWinEpilog->Instructions.push_back(Inst);
+  } else {
+    CurFrame->Instructions.push_back(Inst);
+  }
+}
+
+void MCStreamer::emitWinCFIPush2Regs(MCRegister Reg1, MCRegister Reg2,
+                                     SMLoc Loc) {
+  WinEH::FrameInfo *CurFrame = EnsureValidWinFrameInfo(Loc);
+  if (!CurFrame)
+    return;
+
+  // UOP_Push2 is V3-only - reject for V1/V2.
+  if (CurFrame->Version < 3)
+    return getContext().reportError(
+        Loc, ".seh_push2regs is only supported for unwind v3");
+
+  MCSymbol *Label = emitCFILabel();
+
+  WinEH::Instruction Inst = Win64EH::Instruction::Push2(
+      Label, encodeSEHRegNum(Context, Reg1), encodeSEHRegNum(Context, Reg2));
+  if (CurrentWinEpilog)
+    CurrentWinEpilog->Instructions.push_back(Inst);
+  else
+    CurFrame->Instructions.push_back(Inst);
 }
 
 void MCStreamer::emitWinCFISetFrame(MCRegister Register, unsigned Offset,
@@ -970,7 +1000,7 @@ void MCStreamer::emitWinCFISetFrame(MCRegister Register, unsigned Offset,
   WinEH::FrameInfo *CurFrame = EnsureValidWinFrameInfo(Loc);
   if (!CurFrame)
     return;
-  if (CurFrame->LastFrameInst >= 0)
+  if (!CurrentWinEpilog && CurFrame->LastFrameInst >= 0)
     return getContext().reportError(
         Loc, "frame register and offset can be set at most once");
   if (Offset & 0x0F)
@@ -983,8 +1013,15 @@ void MCStreamer::emitWinCFISetFrame(MCRegister Register, unsigned Offset,
 
   WinEH::Instruction Inst = Win64EH::Instruction::SetFPReg(
       Label, encodeSEHRegNum(getContext(), Register), Offset);
-  CurFrame->LastFrameInst = CurFrame->Instructions.size();
-  CurFrame->Instructions.push_back(Inst);
+  if (CurrentWinEpilog) {
+    if (CurFrame->Version < 3)
+      return getContext().reportError(
+          Loc, ".seh_setframe inside epilog requires unwind v3");
+    CurrentWinEpilog->Instructions.push_back(Inst);
+  } else {
+    CurFrame->LastFrameInst = CurFrame->Instructions.size();
+    CurFrame->Instructions.push_back(Inst);
+  }
 }
 
 void MCStreamer::emitWinCFIAllocStack(unsigned Size, SMLoc Loc) {
@@ -1001,7 +1038,14 @@ void MCStreamer::emitWinCFIAllocStack(unsigned Size, SMLoc Loc) {
   MCSymbol *Label = emitCFILabel();
 
   WinEH::Instruction Inst = Win64EH::Instruction::Alloc(Label, Size);
-  CurFrame->Instructions.push_back(Inst);
+  if (CurrentWinEpilog) {
+    if (CurFrame->Version < 3)
+      return getContext().reportError(
+          Loc, ".seh_stackalloc inside epilog requires unwind v3");
+    CurrentWinEpilog->Instructions.push_back(Inst);
+  } else {
+    CurFrame->Instructions.push_back(Inst);
+  }
 }
 
 void MCStreamer::emitWinCFISaveReg(MCRegister Register, unsigned Offset,
@@ -1018,7 +1062,14 @@ void MCStreamer::emitWinCFISaveReg(MCRegister Register, unsigned Offset,
 
   WinEH::Instruction Inst = Win64EH::Instruction::SaveNonVol(
       Label, encodeSEHRegNum(Context, Register), Offset);
-  CurFrame->Instructions.push_back(Inst);
+  if (CurrentWinEpilog) {
+    if (CurFrame->Version < 3)
+      return getContext().reportError(
+          Loc, ".seh_savereg inside epilog requires unwind v3");
+    CurrentWinEpilog->Instructions.push_back(Inst);
+  } else {
+    CurFrame->Instructions.push_back(Inst);
+  }
 }
 
 void MCStreamer::emitWinCFISaveXMM(MCRegister Register, unsigned Offset,
@@ -1033,13 +1084,29 @@ void MCStreamer::emitWinCFISaveXMM(MCRegister Register, unsigned Offset,
 
   WinEH::Instruction Inst = Win64EH::Instruction::SaveXMM(
       Label, encodeSEHRegNum(Context, Register), Offset);
-  CurFrame->Instructions.push_back(Inst);
+  if (CurrentWinEpilog) {
+    if (CurFrame->Version < 3)
+      return getContext().reportError(
+          Loc, ".seh_savexmm inside epilog requires unwind v3");
+    CurrentWinEpilog->Instructions.push_back(Inst);
+  } else {
+    CurFrame->Instructions.push_back(Inst);
+  }
 }
 
 void MCStreamer::emitWinCFIPushFrame(bool Code, SMLoc Loc) {
   WinEH::FrameInfo *CurFrame = EnsureValidWinFrameInfo(Loc);
   if (!CurFrame)
     return;
+  if (CurrentWinEpilog) {
+    if (CurFrame->Version < 3)
+      return getContext().reportError(
+          Loc, ".seh_pushframe inside epilog requires unwind v3");
+    MCSymbol *Label = emitCFILabel();
+    WinEH::Instruction Inst = Win64EH::Instruction::PushMachFrame(Label, Code);
+    CurrentWinEpilog->Instructions.push_back(Inst);
+    return;
+  }
   if (!CurFrame->Instructions.empty())
     return getContext().reportError(
         Loc, "If present, PushMachFrame must be the first UOP");
@@ -1090,7 +1157,7 @@ void MCStreamer::emitWinCFIEndEpilogue(SMLoc Loc) {
     return getContext().reportError(Loc, "Stray .seh_endepilogue in " +
                                              CurFrame->Function->getName());
 
-  if ((CurFrame->Version >= 2) && !CurrentWinEpilog->UnwindV2Start) {
+  if ((CurFrame->Version == 2) && !CurrentWinEpilog->UnwindV2Start) {
     // Set UnwindV2Start to... something... to prevent crashes later.
     CurrentWinEpilog->UnwindV2Start = CurrentWinEpilog->Start;
     getContext().reportError(Loc, "Missing .seh_unwindv2start in " +
@@ -1119,15 +1186,26 @@ void MCStreamer::emitWinCFIUnwindV2Start(SMLoc Loc) {
 }
 
 void MCStreamer::emitWinCFIUnwindVersion(uint8_t Version, SMLoc Loc) {
-  WinEH::FrameInfo *CurFrame = EnsureValidWinFrameInfo(Loc);
-  if (!CurFrame)
+  bool SupportedVersion = (Version >= 1 && Version <= 3);
+
+  // If called outside a proc, set the module-level default.
+  if (!CurrentWinFrameInfo || CurrentWinFrameInfo->End) {
+    if (!SupportedVersion)
+      return getContext().reportError(
+          Loc, "Unsupported version for .seh_unwindversion");
+    setDefaultWinCFIUnwindVersion(Version);
     return;
+  }
+
+  // Per-function override (existing behaviour).
+  WinEH::FrameInfo *CurFrame = CurrentWinFrameInfo;
 
-  if (CurFrame->Version != WinEH::FrameInfo::DefaultVersion)
+  if (CurFrame->Version != DefaultWinCFIUnwindVersion &&
+      CurFrame->Version != WinEH::FrameInfo::DefaultVersion)
     return getContext().reportError(Loc, "Duplicate .seh_unwindversion in " +
                                              CurFrame->Function->getName());
 
-  if (Version != 2)
+  if (!SupportedVersion)
     return getContext().reportError(
         Loc, "Unsupported version specified in .seh_unwindversion in " +
                  CurFrame->Function->getName());

diff  --git a/llvm/lib/MC/MCWin64EH.cpp b/llvm/lib/MC/MCWin64EH.cpp
index 32f701019e528..edc54235d8e9e 100644
--- a/llvm/lib/MC/MCWin64EH.cpp
+++ b/llvm/lib/MC/MCWin64EH.cpp
@@ -15,6 +15,7 @@
 #include "llvm/MC/MCStreamer.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/MC/MCValue.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/Win64EH.h"
 
 namespace llvm {
@@ -106,6 +107,28 @@ static void EmitAbsDifference(MCStreamer &Streamer, const MCSymbol *LHS,
   Streamer.emitValue(Diff, 1);
 }
 
+/// Emit a 16-bit (2-byte LE) label 
diff erence. If the 
diff erence is
+/// evaluatable at this point, validate that it fits in [0, UINT16_MAX]
+/// and emit it as a constant; otherwise emit a 16-bit fixup.
+static void EmitAbsDifference16(MCStreamer &Streamer, const MCSymbol *LHS,
+                                const MCSymbol *RHS) {
+  MCContext &Context = Streamer.getContext();
+  const MCExpr *Diff =
+      MCBinaryExpr::createSub(MCSymbolRefExpr::create(LHS, Context),
+                              MCSymbolRefExpr::create(RHS, Context), Context);
+  int64_t Value;
+  if (Diff->evaluateAsAbsolute(
+          Value, static_cast<MCObjectStreamer &>(Streamer).getAssembler())) {
+    if (Value < 0 || Value > UINT16_MAX)
+      Context.reportError(
+          SMLoc(),
+          "Label 
diff erence out of 16-bit unsigned range for V3 unwind info");
+  }
+  // Always emit a 2-byte value so subsequent emission stays in sync; if a
+  // diagnostic was reported, the object file will be discarded by the caller.
+  Streamer.emitValue(Diff, 2);
+}
+
 static void EmitUnwindCode(MCStreamer &streamer, const MCSymbol *begin,
                            WinEH::Instruction &inst) {
   uint8_t b2;
@@ -114,6 +137,15 @@ static void EmitUnwindCode(MCStreamer &streamer, const MCSymbol *begin,
   switch (static_cast<Win64EH::UnwindOpcodes>(inst.Operation)) {
   default:
     llvm_unreachable("Unsupported unwind code");
+  case Win64EH::UOP_Push2:
+    // Reachable from hand-written .s if a UOP_Push2 ends up in a V1/V2
+    // frame (e.g. via a per-function `.seh_unwindversion` downgrade after
+    // `.seh_push2regs`). Emit a recoverable diagnostic and skip the op so
+    // the assembler doesn't keep writing malformed bytes.
+    streamer.getContext().reportError(
+        SMLoc(), "UOP_Push2 (PUSH2 with two registers) requires V3 unwind "
+                 "info. Use `.seh_unwindversion 3`.");
+    return;
   case Win64EH::UOP_PushNonVol:
     EmitAbsDifference(streamer, inst.Label, begin);
     b2 |= (inst.Register & 0x0F) << 4;
@@ -226,11 +258,573 @@ GetOptionalAbsDifference(const MCAssembler &Assembler, const MCSymbol *LHS,
   return value;
 }
 
+//===----------------------------------------------------------------------===//
+// V3 UNWIND_INFO Emission
+// See https://learn.microsoft.com/en-us/cpp/build/x64-unwind-information-v3
+//===----------------------------------------------------------------------===//
+
+/// Encode a single WinEH::Instruction as V3 WOD bytes.
+/// Appends encoded bytes to Out.
+void Win64EH::EncodeWOD(const WinEH::Instruction &Inst,
+                        SmallVectorImpl<uint8_t> &Out) {
+  switch (static_cast<Win64EH::UnwindOpcodes>(Inst.Operation)) {
+  case Win64EH::UOP_PushNonVol: {
+    // WOD_PUSH: 1 byte, bits [2:0] = 4, bits [7:3] = register (5-bit)
+    uint8_t Reg = Inst.Register & 0x1F;
+    Out.push_back((Reg << 3) | Win64EH::WOD_PUSH);
+    break;
+  }
+  case Win64EH::UOP_AllocSmall: {
+    // WOD_ALLOC_SMALL: 1 byte, bits [3:0] = 8, bits [7:4] = (size/8 - 1)
+    // V1/V2 stores (size-8)/8 in OpInfo; actual size = Offset.
+    // Inst.Offset is the raw allocation size.
+    if (Inst.Offset < 8 || Inst.Offset > 128 || Inst.Offset % 8 != 0)
+      reportFatalInternalError(
+          "UOP_AllocSmall outside expected range or alignment");
+    uint8_t Encoded = ((Inst.Offset / 8 - 1) & 0x0F);
+    Out.push_back((Encoded << 4) | Win64EH::WOD_ALLOC_SMALL);
+    break;
+  }
+  case Win64EH::UOP_AllocLarge: {
+    if (Inst.Offset > 512 * 1024 - 8) {
+      // WOD_ALLOC_HUGE: 5 bytes, byte[0] = 1, bytes[1:4] = LE32(size)
+      Out.push_back(Win64EH::WOD_ALLOC_HUGE);
+      uint32_t Size = Inst.Offset;
+      Out.push_back(Size & 0xFF);
+      Out.push_back((Size >> 8) & 0xFF);
+      Out.push_back((Size >> 16) & 0xFF);
+      Out.push_back((Size >> 24) & 0xFF);
+    } else {
+      // WOD_ALLOC_LARGE: 3 bytes, byte[0] = 2, bytes[1:2] = LE16(size/8)
+      Out.push_back(Win64EH::WOD_ALLOC_LARGE);
+      uint16_t Scaled = Inst.Offset / 8;
+      Out.push_back(Scaled & 0xFF);
+      Out.push_back((Scaled >> 8) & 0xFF);
+    }
+    break;
+  }
+  case Win64EH::UOP_SetFPReg: {
+    // WOD_SET_FPREG: 2 bytes, byte[0] = 0, byte[1] = reg(4) | (offset/16)(4)
+    // The frame register field is only 4 bits, so EGPR (R16-R31) cannot be
+    // used as the frame pointer in V3 unwind info.
+    if (Inst.Register > 0x0F)
+      reportFatalInternalError(
+          "SET_FPREG frame register does not fit in 4 bits");
+    Out.push_back(Win64EH::WOD_SET_FPREG);
+    uint8_t Reg = Inst.Register & 0x0F;
+    uint8_t Off = (Inst.Offset / 16) & 0x0F;
+    Out.push_back(Reg | (Off << 4));
+    break;
+  }
+  case Win64EH::UOP_SaveNonVol: {
+    // WOD_SAVE_NONVOL: 3 bytes, bits [2:0] = 6, bits [7:3] = reg,
+    // bytes[1:2] = LE16(displacement/8)
+    uint8_t Reg = Inst.Register & 0x1F;
+    Out.push_back((Reg << 3) | Win64EH::WOD_SAVE_NONVOL);
+    uint16_t Disp = Inst.Offset / 8;
+    Out.push_back(Disp & 0xFF);
+    Out.push_back((Disp >> 8) & 0xFF);
+    break;
+  }
+  case Win64EH::UOP_SaveNonVolBig: {
+    // WOD_SAVE_NONVOL_FAR: 5 bytes, bits [2:0] = 5, bits [7:3] = reg,
+    // bytes[1:4] = LE32(displacement)
+    uint8_t Reg = Inst.Register & 0x1F;
+    Out.push_back((Reg << 3) | Win64EH::WOD_SAVE_NONVOL_FAR);
+    uint32_t Disp = Inst.Offset;
+    Out.push_back(Disp & 0xFF);
+    Out.push_back((Disp >> 8) & 0xFF);
+    Out.push_back((Disp >> 16) & 0xFF);
+    Out.push_back((Disp >> 24) & 0xFF);
+    break;
+  }
+  case Win64EH::UOP_SaveXMM128: {
+    // WOD_SAVE_XMM128: 3 bytes, bits [3:0] = 10, bits [7:4] = reg,
+    // bytes[1:2] = LE16(displacement/16)
+    // The XMM register field is only 4 bits, so XMM16-XMM31 (AVX-512 EVEX)
+    // cannot be encoded. Such registers are caller-saved on Win64 and should
+    // never reach here from codegen.
+    if (Inst.Register > 0x0F)
+      reportFatalInternalError(
+          "SAVE_XMM128 register does not fit in 4 bits (XMM16-31 unsupported)");
+    uint8_t Reg = Inst.Register & 0x0F;
+    Out.push_back((Reg << 4) | Win64EH::WOD_SAVE_XMM128);
+    uint16_t Disp = Inst.Offset / 16;
+    Out.push_back(Disp & 0xFF);
+    Out.push_back((Disp >> 8) & 0xFF);
+    break;
+  }
+  case Win64EH::UOP_SaveXMM128Big: {
+    // WOD_SAVE_XMM128_FAR: 5 bytes, bits [3:0] = 9, bits [7:4] = reg,
+    // bytes[1:4] = LE32(displacement)
+    if (Inst.Register > 0x0F)
+      reportFatalInternalError("SAVE_XMM128_FAR register does not fit in 4 "
+                               "bits (XMM16-31 unsupported)");
+    uint8_t Reg = Inst.Register & 0x0F;
+    Out.push_back((Reg << 4) | Win64EH::WOD_SAVE_XMM128_FAR);
+    uint32_t Disp = Inst.Offset;
+    Out.push_back(Disp & 0xFF);
+    Out.push_back((Disp >> 8) & 0xFF);
+    Out.push_back((Disp >> 16) & 0xFF);
+    Out.push_back((Disp >> 24) & 0xFF);
+    break;
+  }
+  case Win64EH::UOP_PushMachFrame: {
+    // WOD_PUSH_CANONICAL_FRAME: 2 bytes, byte[0] = 3, byte[1] = type
+    Out.push_back(Win64EH::WOD_PUSH_CANONICAL_FRAME);
+    Out.push_back(Inst.Offset == 1 ? 1 : 0);
+    break;
+  }
+  case Win64EH::UOP_Push2: {
+    uint8_t Reg1 = Inst.Register & 0x1F;
+    uint8_t Reg2 = Inst.Register2 & 0x1F;
+    // Optimization: if registers are consecutive, use WOD_PUSH_CONSECUTIVE_2
+    // (opcode 7, 1 byte) instead of WOD_PUSH2 (opcode 32, 2 bytes).
+    if (Reg2 == Reg1 + 1) {
+      // WOD_PUSH_CONSECUTIVE_2: bits [2:0] = 111b, bits [7:3] = Register
+      Out.push_back((Reg1 << 3) | Win64EH::WOD_PUSH_CONSECUTIVE_2);
+    } else {
+      // WOD_PUSH2: 2 bytes
+      // Byte 0: [5:0] = 100000b (opcode 32), [7:6] = Register1[1:0]
+      // Byte 1: [2:0] = Register1[4:2], [7:3] = Register2
+      Out.push_back(((Reg1 & 0x03) << 6) | Win64EH::WOD_PUSH2);
+      Out.push_back(((Reg2 & 0x1F) << 3) | ((Reg1 >> 2) & 0x07));
+    }
+    break;
+  }
+  default:
+    llvm_unreachable("Unsupported unwind operation for V3 encoding");
+  }
+}
+
+/// Try to find Needle as a contiguous subsequence within Haystack.
+/// Returns the byte offset if found, or std::nullopt.
+static std::optional<uint16_t> FindInPool(ArrayRef<uint8_t> Haystack,
+                                          ArrayRef<uint8_t> Needle) {
+  assert(!Needle.empty() && "FindInPool called with empty Needle");
+  auto It = std::search(Haystack.begin(), Haystack.end(), Needle.begin(),
+                        Needle.end());
+  if (It == Haystack.end())
+    return std::nullopt;
+  return static_cast<uint16_t>(std::distance(Haystack.begin(), It));
+}
+
+/// Compare the relative IP offset arrays of two epilogs.
+static bool EpilogIpOffsetsMatch(const WinEH::FrameInfo::Epilog &A,
+                                 const WinEH::FrameInfo::Epilog &B,
+                                 const MCAssembler &Asm) {
+  if (A.Instructions.size() != B.Instructions.size())
+    return false;
+  for (unsigned I = 0; I < A.Instructions.size(); ++I) {
+    auto OffA = GetOptionalAbsDifference(Asm, A.Instructions[I].Label, A.Start);
+    auto OffB = GetOptionalAbsDifference(Asm, B.Instructions[I].Label, B.Start);
+    if (!OffA || !OffB || *OffA != *OffB)
+      return false;
+  }
+  return true;
+}
+
+/// Emit V3 UNWIND_INFO for a single frame.
+static void EmitUnwindInfoV3(MCStreamer &Streamer, WinEH::FrameInfo *Info) {
+  // Should have been checked by our caller.
+  assert(!Info->Symbol && "UNWIND_INFO already has a symbol");
+
+  MCContext &Context = Streamer.getContext();
+  MCObjectStreamer *OS = static_cast<MCObjectStreamer *>(&Streamer);
+  const MCAssembler &Asm = OS->getAssembler();
+
+  MCSymbol *Label = Context.createTempSymbol();
+  Streamer.emitValueToAlignment(Align(4));
+  Streamer.emitLabel(Label);
+  Info->Symbol = Label;
+
+  // ===================================================================
+  // Phase 1: Data preparation — compute all metadata before emitting.
+  // ===================================================================
+
+  // --- Build prolog WOD pool (body-to-entry order) ---
+  SmallVector<uint8_t, 64> WODPool;
+  for (auto It = Info->Instructions.rbegin(); It != Info->Instructions.rend();
+       ++It)
+    Win64EH::EncodeWOD(*It, WODPool);
+
+  // --- Build prolog IP offset label pairs (body-to-entry order) ---
+  SmallVector<std::pair<const MCSymbol *, const MCSymbol *>, 16> PrologIpLabels;
+  unsigned PrologOpCount = Info->Instructions.size();
+  if (PrologOpCount > 31) {
+    reportFatalUsageError(
+        "Too many prolog unwind codes for V3 encoding. Maximum "
+        "is 31. This function has " +
+        Twine(PrologOpCount));
+  }
+  for (auto It = Info->Instructions.rbegin(); It != Info->Instructions.rend();
+       ++It)
+    PrologIpLabels.push_back({It->Label, Info->Begin});
+
+  // --- Determine if UNW_FLAG_LARGE is needed for the prolog ---
+  // Conservative: if we KNOW a value exceeds 255 or can't measure, use LARGE.
+  // Reject evaluatable values that are negative or exceed the 16-bit unsigned
+  // range supported by LARGE, since those would be silently truncated.
+  bool NeedsLargeProlog = false;
+  if (Info->PrologEnd) {
+    auto MaybePrologSize =
+        GetOptionalAbsDifference(Asm, Info->PrologEnd, Info->Begin);
+    if (MaybePrologSize) {
+      if (*MaybePrologSize < 0)
+        reportFatalUsageError("Negative SizeOfProlog in V3 unwind info");
+      if (*MaybePrologSize > UINT16_MAX)
+        reportFatalUsageError(
+            "SizeOfProlog exceeds 16-bit range for V3 unwind info");
+      NeedsLargeProlog = (*MaybePrologSize > 255);
+    } else {
+      NeedsLargeProlog = true; // Can't measure -> be conservative
+    }
+  }
+  for (auto &[InstLabel, BeginLabel] : PrologIpLabels) {
+    if (NeedsLargeProlog)
+      break;
+    auto MaybeOffset = GetOptionalAbsDifference(Asm, InstLabel, BeginLabel);
+    if (MaybeOffset) {
+      if (*MaybeOffset < 0)
+        reportFatalUsageError("Negative prolog IP offset in V3 unwind info");
+      if (*MaybeOffset > UINT16_MAX)
+        reportFatalUsageError(
+            "Prolog IP offset exceeds 16-bit range for V3 unwind info");
+      NeedsLargeProlog = (*MaybeOffset > 255);
+    } else {
+      NeedsLargeProlog = true; // Can't measure -> be conservative
+    }
+  }
+
+  // --- Per-epilog data preparation ---
+  struct EpilogEmitInfo {
+    const WinEH::FrameInfo::Epilog *Epilog;
+    SmallVector<uint8_t, 32> WODBytes;
+    uint16_t FirstOp;
+    uint8_t NumberOfOps;
+    bool Inherited;
+    bool NeedsLarge; // EPILOG_INFO_LARGE needed for this epilog
+  };
+
+  SmallVector<EpilogEmitInfo, 8> EpilogInfos;
+  for (const auto &[EpilogSym, Epilog] : Info->EpilogMap) {
+    if (Epilog.Instructions.empty())
+      continue;
+
+    EpilogEmitInfo EI;
+    EI.Epilog = &Epilog;
+    EI.NumberOfOps = Epilog.Instructions.size();
+    if (EI.NumberOfOps > 31)
+      reportFatalUsageError(
+          "Too many epilog unwind codes for V3 encoding. Maximum "
+          "is 31. This epilog has " +
+          Twine(EI.NumberOfOps));
+    EI.Inherited = false;
+    EI.NeedsLarge = false;
+
+    // Determine if EPILOG_INFO_LARGE is needed.
+    // Check IpOffsetOfLastInstruction (EpilogEnd - EpilogStart).
+    // Reject negative or out-of-range evaluatable values.
+    auto MaybeLastInstOfs =
+        GetOptionalAbsDifference(Asm, Epilog.End, Epilog.Start);
+    if (MaybeLastInstOfs) {
+      if (*MaybeLastInstOfs < 0)
+        reportFatalUsageError(
+            "Negative IpOffsetOfLastInstruction in V3 unwind info");
+      if (*MaybeLastInstOfs > UINT16_MAX)
+        reportFatalUsageError(
+            "IpOffsetOfLastInstruction exceeds 16-bit range for "
+            "V3 unwind info");
+      EI.NeedsLarge = (*MaybeLastInstOfs > 255);
+    } else {
+      EI.NeedsLarge = true; // Can't measure -> be conservative
+    }
+
+    // Check each epilog IP offset.
+    for (const auto &EpiInst : Epilog.Instructions) {
+      if (EI.NeedsLarge)
+        break;
+      auto MaybeOffset =
+          GetOptionalAbsDifference(Asm, EpiInst.Label, Epilog.Start);
+      if (MaybeOffset) {
+        if (*MaybeOffset < 0)
+          reportFatalUsageError("Negative epilog IP offset in V3 unwind info");
+        if (*MaybeOffset > UINT16_MAX)
+          reportFatalUsageError(
+              "Epilog IP offset exceeds 16-bit range for V3 unwind info");
+        EI.NeedsLarge = (*MaybeOffset > 255);
+      } else {
+        EI.NeedsLarge = true; // Can't measure -> be conservative
+      }
+    }
+
+    // Encode this epilog's WODs (forward order: body-to-terminator).
+    for (const auto &Inst : Epilog.Instructions)
+      Win64EH::EncodeWOD(Inst, EI.WODBytes);
+
+    // Pool sharing: try to re-use existing bytes in the pool rather than
+    // appending.
+    if (auto Offset = FindInPool(WODPool, EI.WODBytes)) {
+      EI.FirstOp = *Offset;
+    } else {
+      EI.FirstOp = WODPool.size();
+      WODPool.append(EI.WODBytes.begin(), EI.WODBytes.end());
+    }
+
+    EpilogInfos.push_back(std::move(EI));
+  }
+  if (EpilogInfos.size() > 7)
+    reportFatalUsageError("Too many epilogs for V3 encoding. Maximum is 7."
+                          " This function has " +
+                          Twine(EpilogInfos.size()));
+
+  // --- Inheritance decisions ---
+  // An epilog can use the inherited (3-byte) descriptor when FirstOp,
+  // NumberOfOps, NeedsLarge, and relative IP offsets all match the previous
+  // epilog.
+  for (unsigned I = 1; I < EpilogInfos.size(); ++I) {
+    auto &Prev = EpilogInfos[I - 1];
+    auto &Curr = EpilogInfos[I];
+    if (Curr.FirstOp == Prev.FirstOp && Curr.NumberOfOps == Prev.NumberOfOps &&
+        Curr.NeedsLarge == Prev.NeedsLarge &&
+        EpilogIpOffsetsMatch(*Curr.Epilog, *Prev.Epilog, OS->getAssembler()))
+      Curr.Inherited = true;
+  }
+
+  // --- Compute payload sizes ---
+  unsigned PrologIpEntrySize = NeedsLargeProlog ? 2 : 1;
+  unsigned EpilogDescBytes = 0;
+  for (const auto &EI : EpilogInfos) {
+    if (EI.Inherited) {
+      EpilogDescBytes += 3;
+    } else if (EI.NeedsLarge) {
+      // EPILOG_INFO_V3 (3) + EPILOG_INFO_LARGE_EX_V3 (4) + IP offsets (N*2)
+      EpilogDescBytes += 7 + EI.NumberOfOps * 2;
+    } else {
+      // EPILOG_INFO_V3 (3) + EPILOG_INFO_EX_V3 (3) + IP offsets (N*1)
+      EpilogDescBytes += 6 + EI.NumberOfOps;
+    }
+  }
+
+  unsigned PrologIpBytes = PrologOpCount * PrologIpEntrySize;
+  unsigned WODPoolBytes = WODPool.size();
+  // When UNW_FLAG_LARGE is set, the SizeOfPrologHighByte sits at the start
+  // of the payload (immediately after the 4-byte fixed header) and is
+  // counted in PayloadWords. See decodeUnwindInfoV3 / the V3 spec:
+  //   handler_offset = ALIGN_UP(sizeof(UNWIND_INFO_V3) + PayloadWords * 2, 4)
+  unsigned LargeHeaderBytes = NeedsLargeProlog ? 1 : 0;
+  unsigned TotalPayloadBytes =
+      LargeHeaderBytes + PrologIpBytes + EpilogDescBytes + WODPoolBytes;
+  if (TotalPayloadBytes > 255 * 2) {
+    reportFatalUsageError("Too much unwind info for V3 encoding. Maximum is "
+                          "510 bytes. This function has " +
+                          Twine(TotalPayloadBytes));
+  }
+  uint8_t PayloadWords = (TotalPayloadBytes + 1) / 2;
+
+  // ===================================================================
+  // Phase 2: Emission — emit header, payload, and trailer.
+  // ===================================================================
+
+  // --- Emit header (4 bytes, or 5 when UNW_FLAG_LARGE) ---
+  uint8_t Flags = 0;
+  if (Info->ChainedParent)
+    Flags |= Win64EH::UNW_ChainInfo;
+  else {
+    if (Info->HandlesUnwind)
+      Flags |= Win64EH::UNW_TerminateHandler;
+    if (Info->HandlesExceptions)
+      Flags |= Win64EH::UNW_ExceptionHandler;
+  }
+  if (NeedsLargeProlog)
+    Flags |= Win64EH::UNW_FlagLarge;
+
+  // Byte 0: (Flags << 3) | Version(3)
+  Streamer.emitInt8((Flags << 3) | 3);
+
+  // Byte 1: SizeOfProlog (low byte, or full 8-bit value when not LARGE)
+  if (Info->PrologEnd) {
+    if (NeedsLargeProlog) {
+      // Emit low byte as a fixup; we'll emit the high byte after Byte 3.
+      // Use a 2-byte value at a temp symbol and extract bytes, OR just emit
+      // the known value if evaluable.
+      auto MaybePrologSize =
+          GetOptionalAbsDifference(Asm, Info->PrologEnd, Info->Begin);
+      if (MaybePrologSize) {
+        Streamer.emitInt8(*MaybePrologSize & 0xFF);
+      } else {
+        // Emit as 1-byte fixup for the low byte.
+        EmitAbsDifference(Streamer, Info->PrologEnd, Info->Begin);
+      }
+    } else {
+      EmitAbsDifference(Streamer, Info->PrologEnd, Info->Begin);
+    }
+  } else {
+    Streamer.emitInt8(0);
+  }
+
+  // Byte 2: PayloadWords
+  Streamer.emitInt8(PayloadWords);
+
+  // Byte 3: (NumberOfEpilogs << 5) | NumberOfPrologOps
+  uint8_t NumberOfEpilogs = EpilogInfos.size();
+  Streamer.emitInt8((NumberOfEpilogs << 5) | (PrologOpCount & 0x1F));
+
+  // Byte 4 (LARGE only): SizeOfPrologHighByte
+  if (NeedsLargeProlog) {
+    if (Info->PrologEnd) {
+      auto MaybePrologSize =
+          GetOptionalAbsDifference(Asm, Info->PrologEnd, Info->Begin);
+      if (MaybePrologSize) {
+        Streamer.emitInt8((*MaybePrologSize >> 8) & 0xFF);
+      } else {
+        // Can't evaluate at this point — emit a fixup that shifts the
+        // 
diff erence right by 8 to extract the high byte.
+        const MCExpr *Diff = MCBinaryExpr::createSub(
+            MCSymbolRefExpr::create(Info->PrologEnd, Context),
+            MCSymbolRefExpr::create(Info->Begin, Context), Context);
+        const MCExpr *HighByte = MCBinaryExpr::createLShr(
+            Diff, MCConstantExpr::create(8, Context), Context);
+        Streamer.emitValue(HighByte, 1);
+      }
+    } else {
+      Streamer.emitInt8(0);
+    }
+  }
+
+  // --- Emit prolog IP offsets (8-bit or 16-bit) ---
+  for (auto &[InstLabel, BeginLabel] : PrologIpLabels) {
+    if (NeedsLargeProlog)
+      EmitAbsDifference16(Streamer, InstLabel, BeginLabel);
+    else
+      EmitAbsDifference(Streamer, InstLabel, BeginLabel);
+  }
+
+  // --- Emit epilog descriptors ---
+  const MCSymbol *PrevEpilogStart = nullptr;
+  for (const auto &EI : EpilogInfos) {
+    const auto &Epilog = *EI.Epilog;
+
+    // FlagsAndNumOps: bits [2:0] = flags, bits [7:3] = NumberOfOps.
+    // For inherited descriptors, NumberOfOps = 0.
+    uint8_t EpiFlags = 0;
+    if (EI.NeedsLarge && !EI.Inherited)
+      EpiFlags |= Win64EH::EPILOG_INFO_LARGE;
+    uint8_t EpiNumOps = EI.Inherited ? 0 : EI.NumberOfOps;
+    Streamer.emitInt8((EpiNumOps << 3) | EpiFlags);
+
+    // EpilogOffset: signed 16-bit.
+    // For the first epilog: byte offset from fragment start to epilog start.
+    // For subsequent epilogs: delta from the previous epilog's start position.
+    // Emit as a fixup since we may not know the exact distance yet.
+    {
+      const MCSymbol *Base = PrevEpilogStart ? PrevEpilogStart : Info->Begin;
+      const MCExpr *EpilogOffsetExpr = MCBinaryExpr::createSub(
+          MCSymbolRefExpr::create(Epilog.Start, Context),
+          MCSymbolRefExpr::create(Base, Context), Context);
+      // Validate the epilog offset fits in a signed 16-bit field if we can
+      // evaluate it now.
+      int64_t OffsetValue;
+      if (EpilogOffsetExpr->evaluateAsAbsolute(OffsetValue,
+                                               OS->getAssembler())) {
+        if (OffsetValue < INT16_MIN || OffsetValue > INT16_MAX)
+          reportFatalUsageError(
+              "Epilog offset out of signed 16-bit range for V3 encoding");
+      }
+      OS->ensureHeadroom(2);
+      OS->addFixup(EpilogOffsetExpr, FK_Data_2);
+      OS->appendContents(2, 0);
+    }
+    PrevEpilogStart = Epilog.Start;
+
+    // Full descriptor fields (only for non-inherited epilogs).
+    if (!EI.Inherited) {
+      // FirstOp: byte offset into WOD pool (2 bytes LE).
+      Streamer.emitInt8(EI.FirstOp & 0xFF);
+      Streamer.emitInt8((EI.FirstOp >> 8) & 0xFF);
+
+      // IpOffsetOfLastInstruction: 8-bit or 16-bit depending on
+      // EPILOG_INFO_LARGE.
+      {
+        const MCExpr *LastInstOffsetExpr = MCBinaryExpr::createSub(
+            MCSymbolRefExpr::create(Epilog.End, Context),
+            MCSymbolRefExpr::create(Epilog.Start, Context), Context);
+        unsigned FixupSize = EI.NeedsLarge ? 2 : 1;
+        OS->ensureHeadroom(FixupSize);
+        OS->addFixup(LastInstOffsetExpr, EI.NeedsLarge ? FK_Data_2 : FK_Data_1);
+        OS->appendContents(FixupSize, 0);
+      }
+
+      // Epilog IP offsets (forward order: body-to-terminator).
+      for (const auto &EpiInst : Epilog.Instructions) {
+        if (EI.NeedsLarge)
+          EmitAbsDifference16(Streamer, EpiInst.Label, Epilog.Start);
+        else
+          EmitAbsDifference(Streamer, EpiInst.Label, Epilog.Start);
+      }
+    }
+  }
+
+  // --- Emit WOD pool ---
+  for (uint8_t B : WODPool)
+    Streamer.emitInt8(B);
+
+  // --- Pad to PayloadWords * 2 bytes ---
+  // PayloadWords = (TotalPayloadBytes + 1) / 2, so at most 1 byte of padding.
+  if (TotalPayloadBytes % 2 != 0)
+    Streamer.emitInt8(0);
+
+  // --- Pad to 4-byte boundary before handler/chain info ---
+  // Per the V3 spec, the handler RVA / chained RUNTIME_FUNCTION begins at
+  //   handler_offset = ALIGN_UP(sizeof(UNWIND_INFO_V3) + PayloadWords * 2, 4)
+  // The unwind info structure itself is 4-byte aligned, so when PayloadWords
+  // is odd, the natural end of the payload sits at +2 mod 4 and requires 2
+  // additional zero bytes of padding before the handler/chain.
+  if (PayloadWords % 2 != 0)
+    Streamer.emitInt16(0);
+
+  // --- Emit handler/chained info (same position as V1/V2) ---
+  if (Flags & Win64EH::UNW_ChainInfo)
+    EmitRuntimeFunction(Streamer, Info->ChainedParent);
+  else if (Flags &
+           (Win64EH::UNW_TerminateHandler | Win64EH::UNW_ExceptionHandler))
+    Streamer.emitValue(
+        MCSymbolRefExpr::create(Info->ExceptionHandler,
+                                MCSymbolRefExpr::VK_COFF_IMGREL32, Context),
+        4);
+  else if (PayloadWords == 0) {
+    // Minimum size: pad to 8 bytes total.
+    Streamer.emitInt32(0);
+  }
+}
+
 static void EmitUnwindInfo(MCStreamer &streamer, WinEH::FrameInfo *info) {
   // If this UNWIND_INFO already has a symbol, it's already been emitted.
   if (info->Symbol)
     return;
 
+  // V3 has a completely 
diff erent binary layout; dispatch to separate emitter.
+  if (info->Version == 3) {
+    EmitUnwindInfoV3(streamer, info);
+    return;
+  }
+
+  // UOP_Push2 is V3-only and cannot be encoded in V1/V2. Detect this early
+  // (before counting codes) so the error is reported cleanly. This is
+  // reachable from hand-written .s if `.seh_push2regs` is followed by a
+  // per-function `.seh_unwindversion 1` or `2` downgrade.
+  for (const auto &Inst : info->Instructions) {
+    if (Inst.Operation == Win64EH::UOP_Push2) {
+      streamer.getContext().reportError(
+          SMLoc(), "UOP_Push2 (PUSH2 with two registers) requires V3 unwind "
+                   "info. Use `.seh_unwindversion 3`.");
+      // Mark the frame as emitted (with no UNWIND_INFO) and bail so we don't
+      // emit malformed bytes or hit a downstream assertion.
+      info->Symbol = streamer.getContext().createTempSymbol();
+      return;
+    }
+  }
+
   MCContext &context = streamer.getContext();
   MCObjectStreamer *OS = (MCObjectStreamer *)(&streamer);
   MCSymbol *Label = context.createTempSymbol();

diff  --git a/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp b/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
index 1741efac3a8c7..58c6dd2b709a1 100644
--- a/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
+++ b/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
@@ -1232,6 +1232,7 @@ class X86AsmParser : public MCTargetAsmParser {
   /// SEH directives.
   bool parseSEHRegisterNumber(unsigned RegClassID, MCRegister &RegNo);
   bool parseDirectiveSEHPushReg(SMLoc);
+  bool parseDirectiveSEHPush2Regs(SMLoc);
   bool parseDirectiveSEHSetFrame(SMLoc);
   bool parseDirectiveSEHSaveReg(SMLoc);
   bool parseDirectiveSEHSaveXMM(SMLoc);
@@ -4839,6 +4840,8 @@ bool X86AsmParser::ParseDirective(AsmToken DirectiveID) {
   else if (IDVal == ".seh_pushreg" ||
            (Parser.isParsingMasm() && IDVal.equals_insensitive(".pushreg")))
     return parseDirectiveSEHPushReg(DirectiveID.getLoc());
+  else if (IDVal == ".seh_push2regs")
+    return parseDirectiveSEHPush2Regs(DirectiveID.getLoc());
   else if (IDVal == ".seh_setframe" ||
            (Parser.isParsingMasm() && IDVal.equals_insensitive(".setframe")))
     return parseDirectiveSEHSetFrame(DirectiveID.getLoc());
@@ -5075,6 +5078,27 @@ bool X86AsmParser::parseDirectiveSEHPushReg(SMLoc Loc) {
   return false;
 }
 
+bool X86AsmParser::parseDirectiveSEHPush2Regs(SMLoc Loc) {
+  MCRegister Reg1;
+  if (parseSEHRegisterNumber(X86::GR64RegClassID, Reg1))
+    return true;
+
+  if (getLexer().isNot(AsmToken::Comma))
+    return TokError("expected comma between registers");
+  getParser().Lex();
+
+  MCRegister Reg2;
+  if (parseSEHRegisterNumber(X86::GR64RegClassID, Reg2))
+    return true;
+
+  if (getLexer().isNot(AsmToken::EndOfStatement))
+    return TokError("expected end of directive");
+
+  getParser().Lex();
+  getStreamer().emitWinCFIPush2Regs(Reg1, Reg2, Loc);
+  return false;
+}
+
 bool X86AsmParser::parseDirectiveSEHSetFrame(SMLoc Loc) {
   MCRegister Reg;
   int64_t Off;

diff  --git a/llvm/lib/Target/X86/CMakeLists.txt b/llvm/lib/Target/X86/CMakeLists.txt
index e81d0b29f019e..62987bdbd1c2b 100644
--- a/llvm/lib/Target/X86/CMakeLists.txt
+++ b/llvm/lib/Target/X86/CMakeLists.txt
@@ -90,6 +90,7 @@ set(sources
   X86TargetTransformInfo.cpp
   X86WinEHState.cpp
   X86WinEHUnwindV2.cpp
+  X86WinEHUnwindV3.cpp
   GISel/X86CallLowering.cpp
   GISel/X86InstructionSelector.cpp
   GISel/X86LegalizerInfo.cpp

diff  --git a/llvm/lib/Target/X86/X86.h b/llvm/lib/Target/X86/X86.h
index f8b6a952d7232..3ecc7a6ec498b 100644
--- a/llvm/lib/Target/X86/X86.h
+++ b/llvm/lib/Target/X86/X86.h
@@ -399,6 +399,9 @@ class X86LowerAMXIntrinsicsPass
 
 FunctionPass *createX86LowerAMXIntrinsicsLegacyPass();
 
+/// Capacity check and sub-fragment splitting for Win x64 Unwind V3.
+FunctionPass *createX86WinEHUnwindV3Pass();
+
 InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
                                                   const X86Subtarget &,
                                                   const X86RegisterBankInfo &);
@@ -507,6 +510,7 @@ void initializeX86TileConfigLegacyPass(PassRegistry &);
 void initializeX86WinEHUnwindV2LegacyPass(PassRegistry &);
 void initializeX86PreLegalizerCombinerLegacyPass(PassRegistry &);
 void initializeX86PostLegalizerCombinerLegacyPass(PassRegistry &);
+void initializeX86WinEHUnwindV3Pass(PassRegistry &);
 
 namespace X86AS {
 enum : unsigned {

diff  --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index e9fdb3d3a148d..6d213b8c213aa 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -957,6 +957,10 @@ void X86AsmPrinter::emitStartOfAsmFile(Module &M) {
 
     if (M.getModuleFlag("import-call-optimization"))
       EnableImportCallOptimization = true;
+
+    // Unwind v3 is set for the entire module, not just individual functions.
+    if (M.getWinX64EHUnwindMode() == WinX64EHUnwindMode::V3)
+      OutStreamer->emitWinCFIUnwindVersion(3);
   }
 
   // TODO: Support prefixed registers for the Intel syntax.

diff  --git a/llvm/lib/Target/X86/X86FrameLowering.cpp b/llvm/lib/Target/X86/X86FrameLowering.cpp
index 3dec37bb87ea9..8fe09bea456c8 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.cpp
+++ b/llvm/lib/Target/X86/X86FrameLowering.cpp
@@ -1617,6 +1617,9 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
                      MF.getFunction().getParent()->getCodeViewFlag();
   bool NeedsWinCFI = NeedsWin64CFI || NeedsWinFPO;
   bool NeedsDwarfCFI = needsDwarfCFI(MF);
+  bool IsWin64UnwindV3 =
+      NeedsWin64CFI &&
+      Fn.getParent()->getWinX64EHUnwindMode() == WinX64EHUnwindMode::V3;
   Register FramePtr = TRI->getFrameRegister(MF);
   const Register MachineFramePtr =
       STI.isTarget64BitILP32() ? Register(getX86SubSuperRegister(FramePtr, 64))
@@ -1624,6 +1627,26 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
   Register BasePtr = TRI->getBaseRegister();
   bool HasWinCFI = false;
 
+  // Helpers to emit Windows x64 unwind SEH pseudos with the correct placement.
+  // V1/V2: pseudo goes after the real instruction.
+  // V3:    pseudo goes before the real instruction.
+  // Usage:
+  //   EmitSEHBefore([&]{ BuildMI(...SEH_PushReg...); });
+  //   BuildMI(... real instruction ...);
+  //   EmitSEHAfter([&]{ BuildMI(...SEH_PushReg...); });
+  auto EmitSEHBefore = [&](auto EmitFn) {
+    if (NeedsWinCFI && IsWin64UnwindV3) {
+      HasWinCFI = true;
+      EmitFn();
+    }
+  };
+  auto EmitSEHAfter = [&](auto EmitFn) {
+    if (NeedsWinCFI && !IsWin64UnwindV3) {
+      HasWinCFI = true;
+      EmitFn();
+    }
+  };
+
   // Debug location must be unknown since the first debug location is used
   // to determine the end of the prologue.
   DebugLoc DL;
@@ -1806,10 +1829,17 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
       NumBytes = alignTo(NumBytes, MaxAlign);
 
     // Save EBP/RBP into the appropriate stack slot.
+    auto EmitSEHPushFramePtr = [&]() {
+      BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
+          .addImm(FramePtr)
+          .setMIFlag(MachineInstr::FrameSetup);
+    };
+    EmitSEHBefore(EmitSEHPushFramePtr);
     BuildMI(MBB, MBBI, DL,
             TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
         .addReg(MachineFramePtr, RegState::Kill)
         .setMIFlag(MachineInstr::FrameSetup);
+    EmitSEHAfter(EmitSEHPushFramePtr);
 
     if (NeedsDwarfCFI && !ArgBaseReg.isValid()) {
       // Mark the place where EBP/RBP was saved.
@@ -1829,13 +1859,6 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
                MachineInstr::FrameSetup);
     }
 
-    if (NeedsWinCFI) {
-      HasWinCFI = true;
-      BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
-          .addImm(FramePtr)
-          .setMIFlag(MachineInstr::FrameSetup);
-    }
-
     if (!IsFunclet) {
       if (X86FI->hasSwiftAsyncContext()) {
         assert(!IsWin64Prologue &&
@@ -1845,6 +1868,12 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
         // Before we update the live frame pointer we have to ensure there's a
         // valid (or null) asynchronous context in its slot just before FP in
         // the frame record, so store it now.
+        auto EmitSEHPushR14 = [&]() {
+          BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
+              .addImm(X86::R14)
+              .setMIFlag(MachineInstr::FrameSetup);
+        };
+        EmitSEHBefore(EmitSEHPushR14);
         if (Attrs.hasAttrSomewhere(Attribute::SwiftAsync)) {
           // We have an initial context in r14, store it just before the frame
           // pointer.
@@ -1859,13 +1888,7 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
               .addImm(0)
               .setMIFlag(MachineInstr::FrameSetup);
         }
-
-        if (NeedsWinCFI) {
-          HasWinCFI = true;
-          BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
-              .addImm(X86::R14)
-              .setMIFlag(MachineInstr::FrameSetup);
-        }
+        EmitSEHAfter(EmitSEHPushR14);
 
         BuildMI(MBB, MBBI, DL, TII.get(X86::LEA64r), FramePtr)
             .addUse(X86::RSP)
@@ -1916,6 +1939,9 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
 
         if (NeedsWinFPO) {
           // .cv_fpo_setframe $FramePtr
+          // NeedsWinFPO is Win32 only, so we're never using Unwind v3, hence it
+          // is always inserted afterwards.
+          assert(!IsWin64UnwindV3);
           HasWinCFI = true;
           BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SetFrame))
               .addImm(FramePtr)
@@ -1961,8 +1987,23 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
     PushedRegs = true;
     Register Reg = MBBI->getOperand(0).getReg();
     LastCSPush = MBBI;
-    ++MBBI;
     unsigned Opc = LastCSPush->getOpcode();
+    bool IsPush2 = Opc == X86::PUSH2 || Opc == X86::PUSH2P;
+
+    // V3: emit SEH pseudo before the real instruction.
+    EmitSEHBefore([&]() {
+      if (IsPush2) {
+        BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_Push2Regs))
+            .addImm(Reg)
+            .addImm(LastCSPush->getOperand(1).getReg())
+            .setMIFlag(MachineInstr::FrameSetup);
+      } else {
+        BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
+            .addImm(Reg)
+            .setMIFlag(MachineInstr::FrameSetup);
+      }
+    });
+    ++MBBI;
 
     if (!HasFP && NeedsDwarfCFI) {
       // Mark callee-saved push instruction.
@@ -1970,7 +2011,7 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
       assert(StackSize);
       // Compared to push, push2 introduces more stack offset (one more
       // register).
-      if (Opc == X86::PUSH2 || Opc == X86::PUSH2P)
+      if (IsPush2)
         StackOffset += stackGrowth;
       BuildCFI(MBB, MBBI, DL,
                MCCFIInstruction::cfiDefCfaOffset(nullptr, -StackOffset),
@@ -1978,16 +2019,16 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
       StackOffset += stackGrowth;
     }
 
-    if (NeedsWinCFI) {
-      HasWinCFI = true;
+    // V1/V2: emit SEH pseudo after the real instruction.
+    EmitSEHAfter([&]() {
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
           .addImm(Reg)
           .setMIFlag(MachineInstr::FrameSetup);
-      if (Opc == X86::PUSH2 || Opc == X86::PUSH2P)
+      if (IsPush2)
         BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
             .addImm(LastCSPush->getOperand(1).getReg())
             .setMIFlag(MachineInstr::FrameSetup);
-    }
+    });
   }
 
   // Realign stack after we pushed callee-saved registers (so that we'll be
@@ -1996,14 +2037,14 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
   if (!IsWin64Prologue && !IsFunclet && TRI->hasStackRealignment(MF) &&
       !ArgBaseReg.isValid()) {
     assert(HasFP && "There should be a frame pointer if stack is realigned.");
-    BuildStackAlignAND(MBB, MBBI, DL, StackPtr, MaxAlign);
-
-    if (NeedsWinCFI) {
-      HasWinCFI = true;
+    auto EmitSEHStackAlign = [&]() {
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlign))
           .addImm(MaxAlign)
           .setMIFlag(MachineInstr::FrameSetup);
-    }
+    };
+    EmitSEHBefore(EmitSEHStackAlign);
+    BuildStackAlignAND(MBB, MBBI, DL, StackPtr, MaxAlign);
+    EmitSEHAfter(EmitSEHStackAlign);
   }
 
   // If there is an SUB32ri of ESP immediately before this instruction, merge
@@ -2026,6 +2067,15 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
   uint64_t AlignedNumBytes = NumBytes;
   if (IsWin64Prologue && !IsFunclet && TRI->hasStackRealignment(MF))
     AlignedNumBytes = alignTo(AlignedNumBytes, MaxAlign);
+
+  auto EmitSEHStackAlloc = [&]() {
+    BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
+        .addImm(NumBytes)
+        .setMIFlag(MachineInstr::FrameSetup);
+  };
+  if (NumBytes)
+    EmitSEHBefore(EmitSEHStackAlloc);
+
   if (AlignedNumBytes >= StackProbeSize && EmitStackProbeCall) {
     assert(!X86FI->getUsesRedZone() &&
            "The Red Zone is not accounted for in stack probes");
@@ -2081,12 +2131,8 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
     emitSPUpdate(MBB, MBBI, DL, -(int64_t)NumBytes, /*InEpilogue=*/false);
   }
 
-  if (NeedsWinCFI && NumBytes) {
-    HasWinCFI = true;
-    BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
-        .addImm(NumBytes)
-        .setMIFlag(MachineInstr::FrameSetup);
-  }
+  if (NumBytes)
+    EmitSEHAfter(EmitSEHStackAlloc);
 
   int SEHFrameOffset = 0;
   Register SPOrEstablisher;
@@ -2124,24 +2170,39 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
     // this calculation on the incoming establisher, which holds the value of
     // RSP from the parent frame at the end of the prologue.
     SEHFrameOffset = calculateSetFPREG(ParentFrameNumBytes);
-    if (SEHFrameOffset)
-      addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::LEA64r), FramePtr),
-                   SPOrEstablisher, false, SEHFrameOffset);
-    else
-      BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64rr), FramePtr)
-          .addReg(SPOrEstablisher);
 
     // If this is not a funclet, emit the CFI describing our frame pointer.
     if (NeedsWinCFI && !IsFunclet) {
       assert(!NeedsWinFPO && "this setframe incompatible with FPO data");
       HasWinCFI = true;
+      if (isAsynchronousEHPersonality(Personality) || MF.hasEHFunclets()) {
+        if (TRI->hasBasePointer(MF))
+          MF.getWinEHFuncInfo()->SEHSetFrameOffset =
+              getWinEHParentFrameOffset(MF);
+        else
+          MF.getWinEHFuncInfo()->SEHSetFrameOffset = SEHFrameOffset;
+      }
+    }
+
+    auto EmitSEHSetFrame = [&]() {
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SetFrame))
           .addImm(FramePtr)
           .addImm(SEHFrameOffset)
           .setMIFlag(MachineInstr::FrameSetup);
-      if (isAsynchronousEHPersonality(Personality))
-        MF.getWinEHFuncInfo()->SEHSetFrameOffset = SEHFrameOffset;
-    }
+    };
+
+    if (!IsFunclet)
+      EmitSEHBefore(EmitSEHSetFrame);
+
+    if (SEHFrameOffset)
+      addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::LEA64r), FramePtr),
+                   SPOrEstablisher, false, SEHFrameOffset);
+    else
+      BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64rr), FramePtr)
+          .addReg(SPOrEstablisher);
+
+    if (!IsFunclet)
+      EmitSEHAfter(EmitSEHSetFrame);
   } else if (IsFunclet && STI.is32Bit()) {
     // Reset EBP / ESI to something good for funclets.
     MBBI = restoreWin32EHStackPointers(MBB, MBBI, DL);
@@ -2161,7 +2222,6 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
 
   while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup)) {
     const MachineInstr &FrameInstr = *MBBI;
-    ++MBBI;
 
     if (NeedsWinCFI) {
       int FI;
@@ -2176,20 +2236,27 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
                 getFrameIndexReference(MF, FI, IgnoredFrameReg).getFixed() +
                 SEHFrameOffset;
 
-          HasWinCFI = true;
           assert(!NeedsWinFPO && "SEH_SaveXMM incompatible with FPO data");
-          BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SaveXMM))
-              .addImm(Reg)
-              .addImm(Offset)
-              .setMIFlag(MachineInstr::FrameSetup);
+          auto EmitSEHSaveXMM = [&]() {
+            BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SaveXMM))
+                .addImm(Reg)
+                .addImm(Offset)
+                .setMIFlag(MachineInstr::FrameSetup);
+          };
+          EmitSEHBefore(EmitSEHSaveXMM);
+          ++MBBI;
+          EmitSEHAfter(EmitSEHSaveXMM);
+          continue;
         }
       }
     }
+    ++MBBI;
   }
 
-  if (NeedsWinCFI && HasWinCFI)
+  if (NeedsWinCFI && HasWinCFI) {
     BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_EndPrologue))
         .setMIFlag(MachineInstr::FrameSetup);
+  }
 
   if (FnHasClrFunclet && !IsFunclet) {
     // Save the so-called Initial-SP (i.e. the value of the stack pointer
@@ -2430,6 +2497,12 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
   bool IsWin64Prologue = MF.getTarget().getMCAsmInfo().usesWindowsCFI();
   bool NeedsWin64CFI =
       IsWin64Prologue && MF.getFunction().needsUnwindTableEntry();
+  // For V3 unwind, epilog SEH pseudos are emitted inline before each
+  // unwind-effecting instruction.
+  bool IsWin64UnwindV3 =
+      NeedsWin64CFI && MF.hasWinCFI() &&
+      MF.getFunction().getParent()->getWinX64EHUnwindMode() ==
+          WinX64EHUnwindMode::V3;
   bool IsFunclet = MBBI == MBB.end() ? false : isFuncletReturnInstr(*MBBI);
 
   // Get the number of bytes to allocate from the FrameInfo.
@@ -2490,6 +2563,10 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
   }
   uint64_t SEHStackAllocAmt = NumBytes;
 
+  unsigned SEHFrameOffset = 0;
+  if (IsWin64Prologue && HasFP)
+    SEHFrameOffset = calculateSetFPREG(SEHStackAllocAmt);
+
   // AfterPop is the position to insert .cfi_restore.
   MachineBasicBlock::iterator AfterPop = MBBI;
   if (HasFP) {
@@ -2499,6 +2576,10 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       emitSPUpdate(MBB, MBBI, DL, Offset, /*InEpilogue*/ true);
     }
     // Pop EBP.
+    if (IsWin64UnwindV3)
+      BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
+          .addImm(FramePtr)
+          .setMIFlag(MachineInstr::FrameDestroy);
     BuildMI(MBB, MBBI, DL,
             TII.get(getPOPOpcode(MF.getSubtarget<X86Subtarget>())),
             MachineFramePtr)
@@ -2543,7 +2624,8 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       if (!PI->getFlag(MachineInstr::FrameDestroy) ||
           (Opc != X86::POP32r && Opc != X86::POP64r && Opc != X86::BTR64ri8 &&
            Opc != X86::ADD64ri32 && Opc != X86::POPP64r && Opc != X86::POP2 &&
-           Opc != X86::POP2P && Opc != X86::LEA64r))
+           Opc != X86::POP2P && Opc != X86::LEA64r && Opc != X86::SEH_PushReg &&
+           Opc != X86::SEH_Push2Regs && Opc != X86::SEH_StackAlloc))
         break;
       FirstCSPop = PI;
     }
@@ -2571,6 +2653,44 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
   if (NumBytes || MFI.hasVarSizedObjects())
     NumBytes = mergeSPAdd(MBB, MBBI, NumBytes, true);
 
+  if (IsWin64UnwindV3 && NeedsWin64CFI && MF.hasWinCFI()) {
+    // Find the XMM restores that were tagged with FrameDestroy, now that we
+    // know the offset we can emit the SEH pseudos for them.
+    auto EpilogStart = MBBI;
+    {
+      auto ScanIt = MBBI;
+      while (ScanIt != MBB.begin()) {
+        auto PI = std::prev(ScanIt);
+        int FI;
+        if (PI->getFlag(MachineInstr::FrameDestroy) &&
+            TII.isLoadFromStackSlot(*PI, FI)) {
+          Register Reg = PI->getOperand(0).getReg();
+          if (X86::FR64RegClass.contains(Reg)) {
+            Register IgnoredFrameReg;
+            int Offset =
+                getFrameIndexReference(MF, FI, IgnoredFrameReg).getFixed() +
+                SEHFrameOffset;
+            BuildMI(MBB, PI, DL, TII.get(X86::SEH_SaveXMM))
+                .addImm(Reg)
+                .addImm(Offset)
+                .setMIFlag(MachineInstr::FrameDestroy);
+            // std::prev(PI) is the SEH_SaveXMM we just inserted (before PI).
+            // We start ScanIt from that point so that the next
+            // std::prev(ScanIt) will examine the instruction before the pseudo,
+            // i.e. the next potential XMM restore further up the block.
+            EpilogStart = std::prev(PI);
+            ScanIt = EpilogStart;
+            continue;
+          }
+        }
+        break;
+      }
+    }
+
+    // For V3, SEH_BeginEpilogue must be emitted before any epilog SEH pseudos.
+    BuildMI(MBB, EpilogStart, DL, TII.get(X86::SEH_BeginEpilogue));
+  }
+
   // If dynamic alloca is used, then reset esp to point to the last callee-saved
   // slot before popping them off! Same applies for the case, when stack was
   // realigned. Don't do this if this was a funclet epilogue, since the funclets
@@ -2579,7 +2699,6 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       !IsFunclet) {
     if (TRI->hasStackRealignment(MF))
       MBBI = FirstCSPop;
-    unsigned SEHFrameOffset = calculateSetFPREG(SEHStackAllocAmt);
     uint64_t LEAAmount =
         IsWin64Prologue ? SEHStackAllocAmt - SEHFrameOffset : -CSSize;
 
@@ -2593,6 +2712,16 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
     // 'mov %FramePtr, %rsp' will not be recognized as an epilogue sequence.
     // However, we may use this sequence if we have a frame pointer because the
     // effects of the prologue can safely be undone.
+    if (IsWin64UnwindV3) {
+      BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SetFrame))
+          .addImm(FramePtr)
+          .addImm(SEHFrameOffset)
+          .setMIFlag(MachineInstr::FrameDestroy);
+      if (SEHStackAllocAmt)
+        BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
+            .addImm(SEHStackAllocAmt)
+            .setMIFlag(MachineInstr::FrameDestroy);
+    }
     if (LEAAmount != 0) {
       unsigned Opc = getLEArOpcode(Uses64BitFramePtr);
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr), FramePtr,
@@ -2605,6 +2734,10 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
     }
   } else if (NumBytes) {
     // Adjust stack pointer back: ESP += numbytes.
+    if (IsWin64UnwindV3)
+      BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
+          .addImm(NumBytes)
+          .setMIFlag(MachineInstr::FrameDestroy);
     emitSPUpdate(MBB, MBBI, DL, NumBytes, /*InEpilogue=*/true);
     if (!HasFP && NeedsDwarfCFI) {
       // Define the current CFA rule to use the provided offset.
@@ -2616,7 +2749,8 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
     --MBBI;
   }
 
-  if (NeedsWin64CFI && MF.hasWinCFI())
+  // For V1/V2, emit SEH_BeginEpilogue after stack restore code.
+  if (!IsWin64UnwindV3 && NeedsWin64CFI && MF.hasWinCFI())
     BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_BeginEpilogue));
 
   if (!HasFP && NeedsDwarfCFI) {
@@ -3151,6 +3285,14 @@ bool X86FrameLowering::restoreCalleeSavedRegisters(
   }
 
   DebugLoc DL = MBB.findDebugLoc(MI);
+  MachineFunction &MF = *MBB.getParent();
+  const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
+
+  bool NeedsWin64CFI =
+      isWin64Prologue(MF) && MF.getFunction().needsUnwindTableEntry();
+  bool IsWin64UnwindV3 =
+      NeedsWin64CFI && MF.getFunction().getParent()->getWinX64EHUnwindMode() ==
+                           WinX64EHUnwindMode::V3;
 
   // Reload XMMs from stack frame.
   for (const CalleeSavedInfo &I : CSI) {
@@ -3164,9 +3306,11 @@ bool X86FrameLowering::restoreCalleeSavedRegisters(
   }
 
   // Clear the stack slot for spill base pointer register.
-  MachineFunction &MF = *MBB.getParent();
-  const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   if (X86FI->getRestoreBasePointer()) {
+    if (IsWin64UnwindV3)
+      BuildMI(MBB, MI, DL, TII.get(X86::SEH_PushReg))
+          .addImm(this->TRI->getBaseRegister())
+          .setMIFlag(MachineInstr::FrameDestroy);
     unsigned Opc = STI.is64Bit() ? X86::POP64r : X86::POP32r;
     Register BaseReg = this->TRI->getBaseRegister();
     BuildMI(MBB, MI, DL, TII.get(Opc), BaseReg)
@@ -3179,16 +3323,33 @@ bool X86FrameLowering::restoreCalleeSavedRegisters(
     if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
       continue;
 
-    if (X86FI->isCandidateForPush2Pop2(Reg))
+    if (X86FI->isCandidateForPush2Pop2(Reg)) {
+      MCRegister Reg2 = (++I)->getReg();
+      if (IsWin64UnwindV3) {
+        BuildMI(MBB, MI, DL, TII.get(X86::SEH_Push2Regs))
+            .addImm(Reg)
+            .addImm(Reg2)
+            .setMIFlag(MachineInstr::FrameDestroy);
+      }
       BuildMI(MBB, MI, DL, TII.get(getPOP2Opcode(STI)), Reg)
-          .addReg((++I)->getReg(), RegState::Define)
+          .addReg(Reg2, RegState::Define)
           .setMIFlag(MachineInstr::FrameDestroy);
-    else
+    } else {
+      if (IsWin64UnwindV3)
+        BuildMI(MBB, MI, DL, TII.get(X86::SEH_PushReg))
+            .addImm(Reg)
+            .setMIFlag(MachineInstr::FrameDestroy);
       BuildMI(MBB, MI, DL, TII.get(getPOPOpcode(STI)), Reg)
           .setMIFlag(MachineInstr::FrameDestroy);
+    }
   }
-  if (X86FI->padForPush2Pop2())
+  if (X86FI->padForPush2Pop2()) {
+    if (IsWin64UnwindV3)
+      BuildMI(MBB, MI, DL, TII.get(X86::SEH_StackAlloc))
+          .addImm(SlotSize)
+          .setMIFlag(MachineInstr::FrameDestroy);
     emitSPUpdate(MBB, MI, DL, SlotSize, /*InEpilogue=*/true);
+  }
 
   return true;
 }

diff  --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td
index b228f4deb704e..c7ea2985d1353 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -244,6 +244,8 @@ let isBranch = 1, isTerminator = 1, isCodeGenOnly = 1 in {
 let isPseudo = 1, isMeta = 1, isNotDuplicable = 1, SchedRW = [WriteSystem] in {
   def SEH_PushReg : I<0, Pseudo, (outs), (ins i32imm:$reg),
                             "#SEH_PushReg $reg", []>;
+  def SEH_Push2Regs : I<0, Pseudo, (outs), (ins i32imm:$reg1, i32imm:$reg2),
+                            "#SEH_Push2Regs $reg1, $reg2", []>;
   def SEH_SaveReg : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$dst),
                             "#SEH_SaveReg $reg, $dst", []>;
   def SEH_SaveXMM : I<0, Pseudo, (outs), (ins i32imm:$reg, i32imm:$dst),
@@ -276,6 +278,8 @@ let isPseudo = 1, isMeta = 1, SchedRW = [WriteSystem] in {
 let isPseudo = 1, isMeta = 1, SchedRW = [WriteSystem] in {
   def SEH_SplitChainedAtEndOfBlock : I<0, Pseudo, (outs), (ins),
                             "#SEH_SplitChainedAtEndOfBlock", []>;
+  def SEH_SplitChained : I<0, Pseudo, (outs), (ins),
+                            "#SEH_SplitChained", []>;
 }
 
 //===----------------------------------------------------------------------===//

diff  --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index 467d8009732ad..9d285b5e2dcf0 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -1760,6 +1760,7 @@ void X86AsmPrinter::EmitSEHInstruction(const MachineInstr *MI) {
     case X86::SEH_SaveReg:
     case X86::SEH_SaveXMM:
     case X86::SEH_PushFrame:
+    case X86::SEH_Push2Regs:
       llvm_unreachable("SEH_ directive incompatible with FPO");
       break;
     default:
@@ -1774,6 +1775,11 @@ void X86AsmPrinter::EmitSEHInstruction(const MachineInstr *MI) {
     OutStreamer->emitWinCFIPushReg(MI->getOperand(0).getImm());
     break;
 
+  case X86::SEH_Push2Regs:
+    OutStreamer->emitWinCFIPush2Regs(MI->getOperand(0).getImm(),
+                                     MI->getOperand(1).getImm());
+    break;
+
   case X86::SEH_SaveReg:
     OutStreamer->emitWinCFISaveReg(MI->getOperand(0).getImm(),
                                    MI->getOperand(1).getImm());
@@ -2554,6 +2560,7 @@ void X86AsmPrinter::emitInstruction(const MachineInstr *MI) {
     return;
 
   case X86::SEH_PushReg:
+  case X86::SEH_Push2Regs:
   case X86::SEH_SaveReg:
   case X86::SEH_SaveXMM:
   case X86::SEH_StackAlloc:
@@ -2573,6 +2580,11 @@ void X86AsmPrinter::emitInstruction(const MachineInstr *MI) {
     SplitChainedAtEndOfBlock = true;
     return;
 
+  case X86::SEH_SplitChained:
+    assert(MF->hasWinCFI() && "SEH_ instruction in function without WinCFI?");
+    OutStreamer->emitWinCFISplitChained();
+    return;
+
   case X86::SEH_BeginEpilogue: {
     assert(MF->hasWinCFI() && "SEH_ instruction in function without WinCFI?");
     EmitSEHInstruction(MI);

diff  --git a/llvm/lib/Target/X86/X86TargetMachine.cpp b/llvm/lib/Target/X86/X86TargetMachine.cpp
index 5e729b89ad630..932669b5cbac6 100644
--- a/llvm/lib/Target/X86/X86TargetMachine.cpp
+++ b/llvm/lib/Target/X86/X86TargetMachine.cpp
@@ -109,6 +109,7 @@ extern "C" LLVM_C_ABI void LLVMInitializeX86Target() {
   initializeX86WinEHUnwindV2LegacyPass(PR);
   initializeX86PreLegalizerCombinerLegacyPass(PR);
   initializeX86PostLegalizerCombinerLegacyPass(PR);
+  initializeX86WinEHUnwindV3Pass(PR);
 }
 
 static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
@@ -630,8 +631,10 @@ void X86PassConfig::addPreEmitPass2() {
 
   // Analyzes and emits pseudos to support Win x64 Unwind V2. This pass must run
   // after all real instructions have been added to the epilog.
-  if (TT.isOSWindows() && TT.isX86_64())
+  if (TT.isOSWindows() && TT.isX86_64()) {
     addPass(createX86WinEHUnwindV2LegacyPass());
+    addPass(createX86WinEHUnwindV3Pass());
+  }
 }
 
 bool X86PassConfig::addPostFastRegAllocRewrite() {

diff  --git a/llvm/lib/Target/X86/X86WinEHUnwindV2.cpp b/llvm/lib/Target/X86/X86WinEHUnwindV2.cpp
index f4a5bdd842c65..254a91174de7c 100644
--- a/llvm/lib/Target/X86/X86WinEHUnwindV2.cpp
+++ b/llvm/lib/Target/X86/X86WinEHUnwindV2.cpp
@@ -79,9 +79,9 @@ class X86WinEHUnwindV2Legacy : public MachineFunctionPass {
 
 /// Rejects the current function due to an internal error within LLVM.
 std::nullopt_t rejectCurrentFunctionInternalError(const MachineFunction &MF,
-                                                  WinX64EHUnwindV2Mode Mode,
+                                                  WinX64EHUnwindMode Mode,
                                                   StringRef Reason) {
-  if (Mode == WinX64EHUnwindV2Mode::Required)
+  if (Mode == WinX64EHUnwindMode::V2Required)
     reportFatalInternalError("Windows x64 Unwind v2 is required, but LLVM has "
                              "generated incompatible code in function '" +
                              MF.getName() + "': " + Reason);
@@ -120,7 +120,7 @@ DebugLoc findDebugLoc(const MachineBasicBlock &MBB) {
 // Continues running the analysis on the given function or funclet.
 std::optional<FrameInfo>
 runAnalysisOnFuncOrFunclet(MachineFunction &MF, MachineFunction::iterator &Iter,
-                           WinX64EHUnwindV2Mode Mode) {
+                           WinX64EHUnwindMode Mode) {
   const TargetFrameLowering &TFL = *MF.getSubtarget().getFrameLowering();
 
   // Current state of processing the function. We'll assume that all functions
@@ -373,12 +373,14 @@ runAnalysisOnFuncOrFunclet(MachineFunction &MF, MachineFunction::iterator &Iter,
 }
 
 bool runX86WinEHUnwindV2(MachineFunction &MF) {
-  WinX64EHUnwindV2Mode Mode =
+  WinX64EHUnwindMode Mode =
       ForceMode.getNumOccurrences()
-          ? static_cast<WinX64EHUnwindV2Mode>(ForceMode.getValue())
-          : MF.getFunction().getParent()->getWinX64EHUnwindV2Mode();
+          ? static_cast<WinX64EHUnwindMode>(ForceMode.getValue())
+          : MF.getFunction().getParent()->getWinX64EHUnwindMode();
 
-  if (Mode == WinX64EHUnwindV2Mode::Disabled)
+  // Only act on V2 modes; V1 = disabled, V3 handled by the V3 pass.
+  if (Mode != WinX64EHUnwindMode::V2BestEffort &&
+      Mode != WinX64EHUnwindMode::V2Required)
     return false;
 
   // Requested changes.

diff  --git a/llvm/lib/Target/X86/X86WinEHUnwindV3.cpp b/llvm/lib/Target/X86/X86WinEHUnwindV3.cpp
new file mode 100644
index 0000000000000..66d43d2bad217
--- /dev/null
+++ b/llvm/lib/Target/X86/X86WinEHUnwindV3.cpp
@@ -0,0 +1,258 @@
+//===-- X86WinEHUnwindV3.cpp - Win x64 Unwind v3 ----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// Implements the capacity-checking and sub-fragment splitting pass for
+/// Unwind v3 information. Unlike the V2 pass, V3 does not need to validate
+/// epilog structure (V3 can encode any prolog/epilog pattern). This pass
+/// only needs to:
+///   1. Count prolog/epilog operations and epilogs.
+///   2. Check V3 capacity limits (<=31 prolog/epilog ops, <=7 epilogs).
+///   3. Insert sub-fragment split points if limits are exceeded.
+///
+/// The unwind version is set module-wide, not per-function.
+///
+/// See https://learn.microsoft.com/en-us/cpp/build/x64-unwind-information-v3
+///
+//===----------------------------------------------------------------------===//
+
+#include "MCTargetDesc/X86BaseInfo.h"
+#include "X86.h"
+#include "X86Subtarget.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/CodeGen/TargetSubtargetInfo.h"
+#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "x86-wineh-unwindv3"
+
+STATISTIC(FunctionsProcessed,
+          "Number of functions processed by Unwind v3 pass");
+STATISTIC(SubFragmentSplits,
+          "Number of sub-fragment splits inserted for Unwind v3");
+
+/// V3 limits from the format specification.
+static constexpr unsigned MaxV3PrologOps = 31;
+static constexpr unsigned MaxV3Epilogs = 7;
+static constexpr unsigned MaxV3EpilogOps = 31;
+
+/// After reporting a recoverable error for `MF`, erase all SEH pseudo-
+/// instructions and clear the WinCFI flag so the AsmPrinter doesn't try to
+/// emit (potentially malformed) unwind information. The LLVMContext
+/// diagnostic recorded by the caller will prevent the object file from
+/// actually being written.
+static void suppressWinCFI(MachineFunction &MF) {
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
+      switch (MI.getOpcode()) {
+      case X86::SEH_PushReg:
+      case X86::SEH_Push2Regs:
+      case X86::SEH_SaveReg:
+      case X86::SEH_SaveXMM:
+      case X86::SEH_StackAlloc:
+      case X86::SEH_StackAlign:
+      case X86::SEH_SetFrame:
+      case X86::SEH_PushFrame:
+      case X86::SEH_EndPrologue:
+      case X86::SEH_BeginEpilogue:
+      case X86::SEH_EndEpilogue:
+      case X86::SEH_SplitChained:
+      case X86::SEH_SplitChainedAtEndOfBlock:
+        MI.eraseFromParent();
+        break;
+      default:
+        break;
+      }
+    }
+  }
+  MF.setHasWinCFI(false);
+}
+
+namespace {
+
+/// Per-funclet analysis results.
+struct FuncletInfo {
+  unsigned PrologOpCount = 0;
+  unsigned EpilogCount = 0;
+  unsigned MaxEpilogOpCount = 0;
+  /// SEH_BeginEpilogue instructions, used as insertion points for splitting.
+  SmallVector<MachineInstr *, 8> EpilogBegins;
+};
+
+class X86WinEHUnwindV3 : public MachineFunctionPass {
+public:
+  static char ID;
+
+  X86WinEHUnwindV3() : MachineFunctionPass(ID) {
+    initializeX86WinEHUnwindV3Pass(*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override { return "WinEH Unwind V3"; }
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+private:
+  /// Analyze one funclet (or the main function body) starting at Iter.
+  /// Advances Iter past the analyzed region, stopping at the next funclet
+  /// entry or the end of the function.
+  static FuncletInfo analyzeFunclet(MachineFunction &MF,
+                                    MachineFunction::iterator &Iter);
+};
+
+} // end anonymous namespace
+
+char X86WinEHUnwindV3::ID = 0;
+
+INITIALIZE_PASS(X86WinEHUnwindV3, "x86-wineh-unwindv3",
+                "Capacity check and sub-fragment splitting for Win64 Unwind v3",
+                false, false)
+
+FunctionPass *llvm::createX86WinEHUnwindV3Pass() {
+  return new X86WinEHUnwindV3();
+}
+
+FuncletInfo X86WinEHUnwindV3::analyzeFunclet(MachineFunction &MF,
+                                             MachineFunction::iterator &Iter) {
+  FuncletInfo Info;
+  bool InEpilog = false;
+  bool SeenProlog = false;
+  unsigned CurrentEpilogOpCount = 0;
+
+  for (; Iter != MF.end(); ++Iter) {
+    MachineBasicBlock &MBB = *Iter;
+
+    // If we've already been processing a funclet's prolog/body and encounter
+    // another funclet entry, stop - that funclet gets its own analysis.
+    if (MBB.isEHFuncletEntry() && SeenProlog)
+      break;
+
+    for (MachineInstr &MI : MBB) {
+      switch (MI.getOpcode()) {
+      case X86::SEH_PushReg:
+      case X86::SEH_Push2Regs:
+      case X86::SEH_StackAlloc:
+      case X86::SEH_SetFrame:
+      case X86::SEH_SaveReg:
+      case X86::SEH_SaveXMM:
+      case X86::SEH_PushFrame:
+        if (InEpilog)
+          CurrentEpilogOpCount++;
+        else
+          Info.PrologOpCount++;
+        break;
+      case X86::SEH_EndPrologue:
+        SeenProlog = true;
+        break;
+      case X86::SEH_BeginEpilogue:
+        InEpilog = true;
+        CurrentEpilogOpCount = 0;
+        Info.EpilogCount++;
+        Info.EpilogBegins.push_back(&MI);
+        break;
+      case X86::SEH_EndEpilogue:
+        InEpilog = false;
+        Info.MaxEpilogOpCount =
+            std::max(Info.MaxEpilogOpCount, CurrentEpilogOpCount);
+        break;
+      default:
+        break;
+      }
+    }
+  }
+
+  return Info;
+}
+
+bool X86WinEHUnwindV3::runOnMachineFunction(MachineFunction &MF) {
+  WinX64EHUnwindMode Mode =
+      MF.getFunction().getParent()->getWinX64EHUnwindMode();
+
+  Function &F = MF.getFunction();
+  LLVMContext &Ctx = F.getContext();
+
+  // EGPR (R16-R31) requires V3 unwind info because V1/V2 cannot encode
+  // registers beyond R15. Only enforce this for functions that actually
+  // emit SEH unwind info — `nounwind` functions and targets that don't
+  // require unwind tables (e.g. cross-compilation host defaults) can use
+  // EGPR with any unwind mode since no SEH metadata is generated.
+  if (Mode != WinX64EHUnwindMode::V3) {
+    if (!F.needsUnwindTableEntry())
+      return false;
+    const auto &STI = MF.getSubtarget<X86Subtarget>();
+    if (STI.hasEGPR()) {
+      Ctx.diagnose(DiagnosticInfoUnsupported(
+          F, "EGPR (R16-R31) requires V3 unwind info on Windows x64"));
+      // Stripping the SEH pseudos modifies the function, so report a change.
+      suppressWinCFI(MF);
+      return true;
+    }
+    return false;
+  }
+
+  bool Changed = false;
+  MachineFunction::iterator Iter = MF.begin();
+
+  // Process each funclet (and the main function body) independently.
+  // Each funclet gets its own UNWIND_INFO, so V3 limits apply per funclet.
+  while (Iter != MF.end()) {
+    FuncletInfo Info = analyzeFunclet(MF, Iter);
+
+    if (Info.PrologOpCount > MaxV3PrologOps) {
+      Ctx.diagnose(DiagnosticInfoResourceLimit(
+          F, "number of unwind v3 prolog operations required",
+          Info.PrologOpCount, MaxV3PrologOps, DS_Error, DK_ResourceLimit));
+      Ctx.diagnose(DiagnosticInfoGenericWithLoc(
+          "sub-fragment splitting for prolog overflow is not yet implemented",
+          F, F.getSubprogram(), DS_Note));
+      // Stripping the SEH pseudos modifies the function, so report a change.
+      suppressWinCFI(MF);
+      return true;
+    }
+
+    if (Info.MaxEpilogOpCount > MaxV3EpilogOps) {
+      Ctx.diagnose(DiagnosticInfoResourceLimit(
+          F, "number of unwind v3 epilog operations required",
+          Info.MaxEpilogOpCount, MaxV3EpilogOps, DS_Error, DK_ResourceLimit));
+      Ctx.diagnose(DiagnosticInfoGenericWithLoc(
+          "sub-fragment splitting for epilog overflow is not yet implemented",
+          F, F.getSubprogram(), DS_Note));
+      // Stripping the SEH pseudos modifies the function, so report a change.
+      suppressWinCFI(MF);
+      return true;
+    }
+
+    if (Info.EpilogCount > MaxV3Epilogs) {
+      const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
+      unsigned Count = 0;
+      for (MachineInstr *BeginEpilog : Info.EpilogBegins) {
+        Count++;
+        if (Count > MaxV3Epilogs) {
+          MachineBasicBlock *MBB = BeginEpilog->getParent();
+          BuildMI(*MBB, BeginEpilog, BeginEpilog->getDebugLoc(),
+                  TII->get(X86::SEH_SplitChained));
+          BuildMI(*MBB, BeginEpilog, BeginEpilog->getDebugLoc(),
+                  TII->get(X86::SEH_EndPrologue));
+          SubFragmentSplits++;
+          Count = 1;
+        }
+      }
+      Changed = true;
+    }
+  }
+
+  if (Changed)
+    FunctionsProcessed++;
+
+  return Changed;
+}

diff  --git a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh-v3.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh-v3.ll
new file mode 100644
index 0000000000000..7f3df90123de2
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh-v3.ll
@@ -0,0 +1,167 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-windows-msvc | FileCheck %s --check-prefix=WIN-V3-REF
+; RUN: llc < %s -mtriple=x86_64-windows-msvc -mattr=+push2pop2 | FileCheck %s --check-prefix=WIN-V3
+; RUN: llc < %s -mtriple=x86_64-windows-msvc -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=WIN-V3-PPX
+; RUN: llc < %s -mtriple=x86_64-windows-msvc -mcpu=diamondrapids | FileCheck %s --check-prefix=WIN-V3-DR
+
+; V3 unwind info is enabled module-wide here. diamondrapids (which enables
+; EGPR) requires V3, but with V3 enabled the SEH prolog/epilogue ordering
+; 
diff ers from V1, so this is split from push2-pop2-cfi-seh.ll.
+
+define i32 @csr6_alloc16(ptr %argv) {
+; WIN-V3-REF-LABEL: csr6_alloc16:
+; WIN-V3-REF:       # %bb.0: # %entry
+; WIN-V3-REF-NEXT:    .seh_pushreg %r15
+; WIN-V3-REF-NEXT:    pushq %r15
+; WIN-V3-REF-NEXT:    .seh_pushreg %r14
+; WIN-V3-REF-NEXT:    pushq %r14
+; WIN-V3-REF-NEXT:    .seh_pushreg %r13
+; WIN-V3-REF-NEXT:    pushq %r13
+; WIN-V3-REF-NEXT:    .seh_pushreg %r12
+; WIN-V3-REF-NEXT:    pushq %r12
+; WIN-V3-REF-NEXT:    .seh_pushreg %rbp
+; WIN-V3-REF-NEXT:    pushq %rbp
+; WIN-V3-REF-NEXT:    .seh_pushreg %rbx
+; WIN-V3-REF-NEXT:    pushq %rbx
+; WIN-V3-REF-NEXT:    .seh_stackalloc 56
+; WIN-V3-REF-NEXT:    subq $56, %rsp
+; WIN-V3-REF-NEXT:    .seh_endprologue
+; WIN-V3-REF-NEXT:    #APP
+; WIN-V3-REF-NEXT:    #NO_APP
+; WIN-V3-REF-NEXT:    xorl %eax, %eax
+; WIN-V3-REF-NEXT:    callq *%rax
+; WIN-V3-REF-NEXT:    nop
+; WIN-V3-REF-NEXT:    .seh_startepilogue
+; WIN-V3-REF-NEXT:    .seh_stackalloc 56
+; WIN-V3-REF-NEXT:    addq $56, %rsp
+; WIN-V3-REF-NEXT:    .seh_pushreg %rbx
+; WIN-V3-REF-NEXT:    popq %rbx
+; WIN-V3-REF-NEXT:    .seh_pushreg %rbp
+; WIN-V3-REF-NEXT:    popq %rbp
+; WIN-V3-REF-NEXT:    .seh_pushreg %r12
+; WIN-V3-REF-NEXT:    popq %r12
+; WIN-V3-REF-NEXT:    .seh_pushreg %r13
+; WIN-V3-REF-NEXT:    popq %r13
+; WIN-V3-REF-NEXT:    .seh_pushreg %r14
+; WIN-V3-REF-NEXT:    popq %r14
+; WIN-V3-REF-NEXT:    .seh_pushreg %r15
+; WIN-V3-REF-NEXT:    popq %r15
+; WIN-V3-REF-NEXT:    .seh_endepilogue
+; WIN-V3-REF-NEXT:    retq
+; WIN-V3-REF-NEXT:    .seh_endproc
+;
+; WIN-V3-LABEL: csr6_alloc16:
+; WIN-V3:       # %bb.0: # %entry
+; WIN-V3-NEXT:    .seh_pushreg %rax
+; WIN-V3-NEXT:    pushq %rax
+; WIN-V3-NEXT:    .seh_push2regs %r15, %r14
+; WIN-V3-NEXT:    push2 %r14, %r15
+; WIN-V3-NEXT:    .seh_push2regs %r13, %r12
+; WIN-V3-NEXT:    push2 %r12, %r13
+; WIN-V3-NEXT:    .seh_push2regs %rbp, %rbx
+; WIN-V3-NEXT:    push2 %rbx, %rbp
+; WIN-V3-NEXT:    .seh_stackalloc 64
+; WIN-V3-NEXT:    subq $64, %rsp
+; WIN-V3-NEXT:    .seh_endprologue
+; WIN-V3-NEXT:    #APP
+; WIN-V3-NEXT:    #NO_APP
+; WIN-V3-NEXT:    xorl %eax, %eax
+; WIN-V3-NEXT:    callq *%rax
+; WIN-V3-NEXT:    nop
+; WIN-V3-NEXT:    .seh_startepilogue
+; WIN-V3-NEXT:    .seh_stackalloc 64
+; WIN-V3-NEXT:    addq $64, %rsp
+; WIN-V3-NEXT:    .seh_push2regs %rbx, %rbp
+; WIN-V3-NEXT:    pop2 %rbp, %rbx
+; WIN-V3-NEXT:    .seh_push2regs %r12, %r13
+; WIN-V3-NEXT:    pop2 %r13, %r12
+; WIN-V3-NEXT:    .seh_push2regs %r14, %r15
+; WIN-V3-NEXT:    pop2 %r15, %r14
+; WIN-V3-NEXT:    .seh_stackalloc 8
+; WIN-V3-NEXT:    popq %rax
+; WIN-V3-NEXT:    .seh_endepilogue
+; WIN-V3-NEXT:    retq
+; WIN-V3-NEXT:    .seh_endproc
+;
+; WIN-V3-PPX-LABEL: csr6_alloc16:
+; WIN-V3-PPX:       # %bb.0: # %entry
+; WIN-V3-PPX-NEXT:    .seh_pushreg %rax
+; WIN-V3-PPX-NEXT:    pushq %rax
+; WIN-V3-PPX-NEXT:    .seh_push2regs %r15, %r14
+; WIN-V3-PPX-NEXT:    push2p %r14, %r15
+; WIN-V3-PPX-NEXT:    .seh_push2regs %r13, %r12
+; WIN-V3-PPX-NEXT:    push2p %r12, %r13
+; WIN-V3-PPX-NEXT:    .seh_push2regs %rbp, %rbx
+; WIN-V3-PPX-NEXT:    push2p %rbx, %rbp
+; WIN-V3-PPX-NEXT:    .seh_stackalloc 64
+; WIN-V3-PPX-NEXT:    subq $64, %rsp
+; WIN-V3-PPX-NEXT:    .seh_endprologue
+; WIN-V3-PPX-NEXT:    #APP
+; WIN-V3-PPX-NEXT:    #NO_APP
+; WIN-V3-PPX-NEXT:    xorl %eax, %eax
+; WIN-V3-PPX-NEXT:    callq *%rax
+; WIN-V3-PPX-NEXT:    nop
+; WIN-V3-PPX-NEXT:    .seh_startepilogue
+; WIN-V3-PPX-NEXT:    .seh_stackalloc 64
+; WIN-V3-PPX-NEXT:    addq $64, %rsp
+; WIN-V3-PPX-NEXT:    .seh_push2regs %rbx, %rbp
+; WIN-V3-PPX-NEXT:    pop2p %rbp, %rbx
+; WIN-V3-PPX-NEXT:    .seh_push2regs %r12, %r13
+; WIN-V3-PPX-NEXT:    pop2p %r13, %r12
+; WIN-V3-PPX-NEXT:    .seh_push2regs %r14, %r15
+; WIN-V3-PPX-NEXT:    pop2p %r15, %r14
+; WIN-V3-PPX-NEXT:    .seh_stackalloc 8
+; WIN-V3-PPX-NEXT:    popq %rax
+; WIN-V3-PPX-NEXT:    .seh_endepilogue
+; WIN-V3-PPX-NEXT:    retq
+; WIN-V3-PPX-NEXT:    .seh_endproc
+;
+; WIN-V3-DR-LABEL: csr6_alloc16:
+; WIN-V3-DR:       # %bb.0: # %entry
+; WIN-V3-DR-NEXT:    .seh_pushreg %r15
+; WIN-V3-DR-NEXT:    pushq %r15
+; WIN-V3-DR-NEXT:    .seh_pushreg %r14
+; WIN-V3-DR-NEXT:    pushq %r14
+; WIN-V3-DR-NEXT:    .seh_pushreg %r13
+; WIN-V3-DR-NEXT:    pushq %r13
+; WIN-V3-DR-NEXT:    .seh_pushreg %r12
+; WIN-V3-DR-NEXT:    pushq %r12
+; WIN-V3-DR-NEXT:    .seh_pushreg %rbp
+; WIN-V3-DR-NEXT:    pushq %rbp
+; WIN-V3-DR-NEXT:    .seh_pushreg %rbx
+; WIN-V3-DR-NEXT:    pushq %rbx
+; WIN-V3-DR-NEXT:    .seh_stackalloc 56
+; WIN-V3-DR-NEXT:    subq $56, %rsp
+; WIN-V3-DR-NEXT:    .seh_endprologue
+; WIN-V3-DR-NEXT:    #APP
+; WIN-V3-DR-NEXT:    #NO_APP
+; WIN-V3-DR-NEXT:    xorl %eax, %eax
+; WIN-V3-DR-NEXT:    callq *%rax
+; WIN-V3-DR-NEXT:    nop
+; WIN-V3-DR-NEXT:    .seh_startepilogue
+; WIN-V3-DR-NEXT:    .seh_stackalloc 56
+; WIN-V3-DR-NEXT:    addq $56, %rsp
+; WIN-V3-DR-NEXT:    .seh_pushreg %rbx
+; WIN-V3-DR-NEXT:    popq %rbx
+; WIN-V3-DR-NEXT:    .seh_pushreg %rbp
+; WIN-V3-DR-NEXT:    popq %rbp
+; WIN-V3-DR-NEXT:    .seh_pushreg %r12
+; WIN-V3-DR-NEXT:    popq %r12
+; WIN-V3-DR-NEXT:    .seh_pushreg %r13
+; WIN-V3-DR-NEXT:    popq %r13
+; WIN-V3-DR-NEXT:    .seh_pushreg %r14
+; WIN-V3-DR-NEXT:    popq %r14
+; WIN-V3-DR-NEXT:    .seh_pushreg %r15
+; WIN-V3-DR-NEXT:    popq %r15
+; WIN-V3-DR-NEXT:    .seh_endepilogue
+; WIN-V3-DR-NEXT:    retq
+; WIN-V3-DR-NEXT:    .seh_endproc
+entry:
+  tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{r12},~{rbx},~{dirflag},~{fpsr},~{flags}"()
+  %a = alloca [3 x ptr], align 8
+  %b = call ptr (...) null()
+  ret i32 poison
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"winx64-eh-unwind", i32 3}

diff  --git a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
index 071b297b49dc1..bd81d36f2db0f 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
@@ -4,10 +4,14 @@
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=LIN-PPX
 ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=diamondrapids | FileCheck %s --check-prefix=LIN-PPX
 ; RUN: llc < %s -mtriple=x86_64-windows-msvc | FileCheck %s --check-prefix=WIN-REF
-; RUN: llc < %s -mtriple=x86_64-windows-msvc -mcpu=diamondrapids | FileCheck %s --check-prefix=WIN-REF
 ; RUN: llc < %s -mtriple=x86_64-windows-msvc -mattr=+push2pop2 | FileCheck %s --check-prefix=WIN
 ; RUN: llc < %s -mtriple=x86_64-windows-msvc -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=WIN-PPX
 
+; EPGR normally required unwind v3 info, but that changes the SEH directives
+; that get emitted, so disable epgr so that we can validate diamondrapids
+; enables push2pop2
+; RUN: llc < %s -mtriple=x86_64-windows-msvc -mcpu=diamondrapids -mattr=-egpr | FileCheck %s --check-prefix=WIN-REF
+
 define i32 @csr6_alloc16(ptr %argv) {
 ; LIN-REF-LABEL: csr6_alloc16:
 ; LIN-REF:       # %bb.0: # %entry

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3-egpr-required.ll b/llvm/test/CodeGen/X86/win64-eh-unwindv3-egpr-required.ll
new file mode 100644
index 0000000000000..f33893a9e1d88
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3-egpr-required.ll
@@ -0,0 +1,16 @@
+; RUN: not llc -mtriple=x86_64-unknown-windows-msvc -mattr=+egpr -o /dev/null %s 2>&1 | FileCheck %s
+; CHECK: error: {{.*}}: in function func {{.*}}: EGPR (R16-R31) requires V3 unwind info on Windows x64
+
+; EGPR enabled without V3 unwind (default V1) should produce a recoverable
+; backend diagnostic (no crash, no stack trace).
+; The uwtable attribute and the call site force a stack frame and SEH unwind
+; info emission so the V3 pass runs regardless of the host platform's default
+; (matters for cross-compilation on Linux/macOS hosts targeting Windows).
+
+declare void @other()
+
+define dso_local void @func() uwtable {
+entry:
+  call void @other()
+  ret void
+}

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3-funclet-prolog.ll b/llvm/test/CodeGen/X86/win64-eh-unwindv3-funclet-prolog.ll
new file mode 100644
index 0000000000000..b23780126da04
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3-funclet-prolog.ll
@@ -0,0 +1,165 @@
+; RUN: llc -mtriple=x86_64-unknown-windows-msvc -o - %s | FileCheck %s
+
+; Test that the V3 pass correctly handles funclets. Each funclet gets its own
+; UNWIND_INFO, so V3 capacity limits (<=31 prolog ops, <=7 epilogs) apply per
+; funclet, not summed across the entire function.
+;
+; This function has 15 cleanup funclets, each with its own prolog/epilog.
+; Each funclet generates ~2 prolog ops (push rbp + stack alloc), plus the
+; main function has ~3 (push rbp + stack alloc + set frame). The total across
+; all funclets (~33) exceeds the 31-op V3 limit, but each individual funclet
+; is well within limits. The V3 pass should handle this without errors.
+
+declare i32 @c(i32) local_unnamed_addr
+declare void @cleanup_helper(i32) local_unnamed_addr
+
+; CHECK-LABEL: many_funclets:
+; CHECK:       .seh_proc many_funclets
+; CHECK:       .seh_endprologue
+; CHECK-NOT:   .seh_splitchained
+; CHECK:       .seh_endproc
+
+define dso_local i32 @many_funclets(i32 %x) local_unnamed_addr personality ptr @__C_specific_handler {
+entry:
+  %v0 = invoke i32 @c(i32 %x)
+    to label %try1 unwind label %cleanup0
+
+try1:
+  %v1 = invoke i32 @c(i32 %v0)
+    to label %try2 unwind label %cleanup1
+
+try2:
+  %v2 = invoke i32 @c(i32 %v1)
+    to label %try3 unwind label %cleanup2
+
+try3:
+  %v3 = invoke i32 @c(i32 %v2)
+    to label %try4 unwind label %cleanup3
+
+try4:
+  %v4 = invoke i32 @c(i32 %v3)
+    to label %try5 unwind label %cleanup4
+
+try5:
+  %v5 = invoke i32 @c(i32 %v4)
+    to label %try6 unwind label %cleanup5
+
+try6:
+  %v6 = invoke i32 @c(i32 %v5)
+    to label %try7 unwind label %cleanup6
+
+try7:
+  %v7 = invoke i32 @c(i32 %v6)
+    to label %try8 unwind label %cleanup7
+
+try8:
+  %v8 = invoke i32 @c(i32 %v7)
+    to label %try9 unwind label %cleanup8
+
+try9:
+  %v9 = invoke i32 @c(i32 %v8)
+    to label %try10 unwind label %cleanup9
+
+try10:
+  %v10 = invoke i32 @c(i32 %v9)
+    to label %try11 unwind label %cleanup10
+
+try11:
+  %v11 = invoke i32 @c(i32 %v10)
+    to label %try12 unwind label %cleanup11
+
+try12:
+  %v12 = invoke i32 @c(i32 %v11)
+    to label %try13 unwind label %cleanup12
+
+try13:
+  %v13 = invoke i32 @c(i32 %v12)
+    to label %try14 unwind label %cleanup13
+
+try14:
+  %v14 = invoke i32 @c(i32 %v13)
+    to label %done unwind label %cleanup14
+
+done:
+  ret i32 %v14
+
+cleanup0:
+  %tok0 = cleanuppad within none []
+  call void @cleanup_helper(i32 0) [ "funclet"(token %tok0) ]
+  cleanupret from %tok0 unwind to caller
+
+cleanup1:
+  %tok1 = cleanuppad within none []
+  call void @cleanup_helper(i32 1) [ "funclet"(token %tok1) ]
+  cleanupret from %tok1 unwind to caller
+
+cleanup2:
+  %tok2 = cleanuppad within none []
+  call void @cleanup_helper(i32 2) [ "funclet"(token %tok2) ]
+  cleanupret from %tok2 unwind to caller
+
+cleanup3:
+  %tok3 = cleanuppad within none []
+  call void @cleanup_helper(i32 3) [ "funclet"(token %tok3) ]
+  cleanupret from %tok3 unwind to caller
+
+cleanup4:
+  %tok4 = cleanuppad within none []
+  call void @cleanup_helper(i32 4) [ "funclet"(token %tok4) ]
+  cleanupret from %tok4 unwind to caller
+
+cleanup5:
+  %tok5 = cleanuppad within none []
+  call void @cleanup_helper(i32 5) [ "funclet"(token %tok5) ]
+  cleanupret from %tok5 unwind to caller
+
+cleanup6:
+  %tok6 = cleanuppad within none []
+  call void @cleanup_helper(i32 6) [ "funclet"(token %tok6) ]
+  cleanupret from %tok6 unwind to caller
+
+cleanup7:
+  %tok7 = cleanuppad within none []
+  call void @cleanup_helper(i32 7) [ "funclet"(token %tok7) ]
+  cleanupret from %tok7 unwind to caller
+
+cleanup8:
+  %tok8 = cleanuppad within none []
+  call void @cleanup_helper(i32 8) [ "funclet"(token %tok8) ]
+  cleanupret from %tok8 unwind to caller
+
+cleanup9:
+  %tok9 = cleanuppad within none []
+  call void @cleanup_helper(i32 9) [ "funclet"(token %tok9) ]
+  cleanupret from %tok9 unwind to caller
+
+cleanup10:
+  %tok10 = cleanuppad within none []
+  call void @cleanup_helper(i32 10) [ "funclet"(token %tok10) ]
+  cleanupret from %tok10 unwind to caller
+
+cleanup11:
+  %tok11 = cleanuppad within none []
+  call void @cleanup_helper(i32 11) [ "funclet"(token %tok11) ]
+  cleanupret from %tok11 unwind to caller
+
+cleanup12:
+  %tok12 = cleanuppad within none []
+  call void @cleanup_helper(i32 12) [ "funclet"(token %tok12) ]
+  cleanupret from %tok12 unwind to caller
+
+cleanup13:
+  %tok13 = cleanuppad within none []
+  call void @cleanup_helper(i32 13) [ "funclet"(token %tok13) ]
+  cleanupret from %tok13 unwind to caller
+
+cleanup14:
+  %tok14 = cleanuppad within none []
+  call void @cleanup_helper(i32 14) [ "funclet"(token %tok14) ]
+  cleanupret from %tok14 unwind to caller
+}
+
+declare i32 @__C_specific_handler(...)
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"winx64-eh-unwind", i32 3}

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3-push2pop2.ll b/llvm/test/CodeGen/X86/win64-eh-unwindv3-push2pop2.ll
new file mode 100644
index 0000000000000..f24050f2db77c
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3-push2pop2.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=x86_64-unknown-windows-msvc -mattr=+push2pop2 -o - %s | FileCheck %s
+
+; Test that push2/pop2 padding cleanup in the epilog emits the correct
+; SEH pseudo for unwind v3. The padding PUSH in the prolog gets an
+; SEH_PushReg, so the corresponding cleanup in the epilog (an ADD RSP,8
+; or POP into a dead register) must also have an SEH pseudo.
+
+; CHECK:        .seh_unwindversion 3
+
+declare void @a() local_unnamed_addr
+declare i32 @c(i32) local_unnamed_addr
+
+; Function with 6 callee-saved GPRs (even count) which triggers push2/pop2
+; padding (extra push to align stack for push2).
+define dso_local i32 @push2pop2_padding(i32 %x) local_unnamed_addr {
+; CHECK-LABEL: push2pop2_padding:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rax
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    .seh_push2regs %r15, %r14
+; CHECK-NEXT:    push2 %r14, %r15
+; CHECK-NEXT:    .seh_push2regs %r13, %r12
+; CHECK-NEXT:    push2 %r12, %r13
+; CHECK-NEXT:    .seh_push2regs %rbp, %rbx
+; CHECK-NEXT:    push2 %rbx, %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_push2regs %rbx, %rbp
+; CHECK-NEXT:    pop2 %rbp, %rbx
+; CHECK-NEXT:    .seh_push2regs %r12, %r13
+; CHECK-NEXT:    pop2 %r13, %r12
+; CHECK-NEXT:    .seh_push2regs %r14, %r15
+; CHECK-NEXT:    pop2 %r15, %r14
+; CHECK-NEXT:    .seh_stackalloc 8
+; CHECK-NEXT:    popq %rax
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    jmp c # TAILCALL
+; CHECK-NEXT:    .seh_endproc
+entry:
+  call void asm sideeffect "", "~{rbx},~{rbp},~{r12},~{r13},~{r14},~{r15}"()
+  %call = tail call i32 @c(i32 %x)
+  ret i32 %call
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"winx64-eh-unwind", i32 3}

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3-split.ll b/llvm/test/CodeGen/X86/win64-eh-unwindv3-split.ll
new file mode 100644
index 0000000000000..a85eee851ea78
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3-split.ll
@@ -0,0 +1,67 @@
+; RUN: llc -mtriple=x86_64-unknown-windows-msvc -o - %s | FileCheck %s
+
+; Test that the V3 pass splits a function with >7 epilogs into chained
+; sub-fragments. The switch produces 8 cases, each with its own epilog.
+; The pass should insert a .seh_splitchained before the 8th epilog.
+
+declare i32 @c(i32) local_unnamed_addr
+
+; CHECK-LABEL: eight_epilogs:
+; CHECK:       .seh_proc eight_epilogs
+; CHECK:       .seh_stackalloc 40
+; CHECK:       .seh_endprologue
+
+; Epilogs 1-7 in the main fragment.
+; CHECK-COUNT-7: .seh_startepilogue
+
+; Split before the 8th epilog.
+; CHECK:       .seh_splitchained
+; CHECK-NEXT:  .seh_endprologue
+
+; The 8th epilog is in the chained fragment.
+; CHECK:       .seh_startepilogue
+; CHECK:       .seh_endepilogue
+; CHECK:       .seh_endproc
+
+define dso_local i32 @eight_epilogs(i32 %x) #0 {
+entry:
+  switch i32 %x, label %sw.default [
+    i32 0, label %sw.0
+    i32 1, label %sw.1
+    i32 2, label %sw.2
+    i32 3, label %sw.3
+    i32 4, label %sw.4
+    i32 5, label %sw.5
+    i32 6, label %sw.6
+  ]
+
+sw.0:
+  %r0 = call i32 @c(i32 0)
+  ret i32 %r0
+sw.1:
+  %r1 = call i32 @c(i32 1)
+  ret i32 %r1
+sw.2:
+  %r2 = call i32 @c(i32 2)
+  ret i32 %r2
+sw.3:
+  %r3 = call i32 @c(i32 3)
+  ret i32 %r3
+sw.4:
+  %r4 = call i32 @c(i32 4)
+  ret i32 %r4
+sw.5:
+  %r5 = call i32 @c(i32 5)
+  ret i32 %r5
+sw.6:
+  %r6 = call i32 @c(i32 6)
+  ret i32 %r6
+sw.default:
+  %rd = call i32 @c(i32 7)
+  ret i32 %rd
+}
+
+attributes #0 = { optnone noinline }
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"winx64-eh-unwind", i32 3}

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-epilog-ops.mir b/llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-epilog-ops.mir
new file mode 100644
index 0000000000000..99d72faa834a1
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-epilog-ops.mir
@@ -0,0 +1,61 @@
+# RUN: not llc -mtriple=x86_64-pc-windows-msvc -o /dev/null %s \
+# RUN:    -run-pass=x86-wineh-unwindv3 2>&1 | FileCheck %s
+
+# Exceeding the V3 epilog op limit (31 ops in a single epilog) is a
+# recoverable diagnostic - llc exits non-zero but does not crash or produce
+# a stack trace.
+
+# CHECK: error: {{.*}}: number of unwind v3 epilog operations required ({{[0-9]+}}) exceeds limit (31) in function 'too_many_epilog_ops'
+# CHECK: note: {{.*}}: sub-fragment splitting for epilog overflow is not yet implemented
+
+--- |
+  define dso_local void @too_many_epilog_ops() local_unnamed_addr uwtable {
+  entry:
+    ret void
+  }
+
+  !llvm.module.flags = !{!0}
+  !0 = !{i32 1, !"winx64-eh-unwind", i32 3}
+...
+---
+name:            too_many_epilog_ops
+body:             |
+  bb.0.entry:
+    frame-setup SEH_PushReg 51
+    frame-setup SEH_EndPrologue
+    SEH_BeginEpilogue
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_StackAlloc 8
+    SEH_EndEpilogue
+    RET64
+...

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-prolog-ops.mir b/llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-prolog-ops.mir
new file mode 100644
index 0000000000000..fcfd32ab00625
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3-too-many-prolog-ops.mir
@@ -0,0 +1,59 @@
+# RUN: not llc -mtriple=x86_64-pc-windows-msvc -o /dev/null %s \
+# RUN:    -run-pass=x86-wineh-unwindv3 2>&1 | FileCheck %s
+
+# Exceeding the V3 prolog op limit (31) is a recoverable diagnostic - llc
+# exits non-zero but does not crash or produce a stack trace.
+
+# CHECK: error: {{.*}}: number of unwind v3 prolog operations required ({{[0-9]+}}) exceeds limit (31) in function 'too_many_prolog_ops'
+# CHECK: note: {{.*}}: sub-fragment splitting for prolog overflow is not yet implemented
+
+--- |
+  define dso_local void @too_many_prolog_ops() local_unnamed_addr uwtable {
+  entry:
+    ret void
+  }
+
+  !llvm.module.flags = !{!0}
+  !0 = !{i32 1, !"winx64-eh-unwind", i32 3}
+...
+---
+name:            too_many_prolog_ops
+body:             |
+  bb.0.entry:
+    frame-setup SEH_PushReg 51
+    frame-setup SEH_PushReg 52
+    frame-setup SEH_PushReg 53
+    frame-setup SEH_PushReg 54
+    frame-setup SEH_PushReg 55
+    frame-setup SEH_PushReg 56
+    frame-setup SEH_PushReg 57
+    frame-setup SEH_PushReg 58
+    frame-setup SEH_PushReg 109
+    frame-setup SEH_PushReg 110
+    frame-setup SEH_PushReg 111
+    frame-setup SEH_PushReg 112
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_StackAlloc 8
+    frame-setup SEH_EndPrologue
+    SEH_BeginEpilogue
+    SEH_EndEpilogue
+    RET64
+...

diff  --git a/llvm/test/CodeGen/X86/win64-eh-unwindv3.ll b/llvm/test/CodeGen/X86/win64-eh-unwindv3.ll
new file mode 100644
index 0000000000000..799d4704e1601
--- /dev/null
+++ b/llvm/test/CodeGen/X86/win64-eh-unwindv3.ll
@@ -0,0 +1,536 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=x86_64-unknown-windows-msvc -mattr=+egpr -o - %s | FileCheck %s
+
+; EGPR is enabled to verify V3 + EGPR compiles without errors.
+; R16-R31 are caller-saved on Win64, so they won't appear in SEH push/pop.
+
+; V3 uses a module-wide default via a file-level .seh_unwindversion 3 directive.
+; Functions should NOT have per-function .seh_unwindversion or .seh_unwindv2start.
+
+; Unlike V1/V2, there is a .seh_* directive *before* each real instruction in
+; the prolog AND before each instruction in the epilog.
+
+; CHECK:        .seh_unwindversion 3
+
+define dso_local void @no_epilog() local_unnamed_addr {
+; CHECK-LABEL: no_epilog:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    retq
+entry:
+  ret void
+}
+
+define dso_local void @stack_alloc_no_pushes() local_unnamed_addr {
+; CHECK-LABEL: stack_alloc_no_pushes:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    callq a
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  call void @a()
+  ret void
+}
+
+define dso_local i32 @stack_alloc_and_pushes(i32 %x) local_unnamed_addr {
+; CHECK-LABEL: stack_alloc_and_pushes:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rsi
+; CHECK-NEXT:    pushq %rsi
+; CHECK-NEXT:    .seh_pushreg %rdi
+; CHECK-NEXT:    pushq %rdi
+; CHECK-NEXT:    .seh_pushreg %rbx
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    .seh_stackalloc 32
+; CHECK-NEXT:    subq $32, %rsp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    movl %ecx, %esi
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    movl %eax, %edi
+; CHECK-NEXT:    movl %esi, %ecx
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    movl %eax, %ebx
+; CHECK-NEXT:    addl %edi, %ebx
+; CHECK-NEXT:    movl %esi, %ecx
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    movl %eax, %ecx
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    addl %ebx, %eax
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 32
+; CHECK-NEXT:    addq $32, %rsp
+; CHECK-NEXT:    .seh_pushreg %rbx
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    .seh_pushreg %rdi
+; CHECK-NEXT:    popq %rdi
+; CHECK-NEXT:    .seh_pushreg %rsi
+; CHECK-NEXT:    popq %rsi
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %call = tail call i32 @c(i32 %x)
+  %call1 = tail call i32 @c(i32 %x)
+  %add = add nsw i32 %call1, %call
+  %call2 = tail call i32 @c(i32 %x)
+  %call3 = tail call i32 @c(i32 %call2)
+  %add4 = add nsw i32 %add, %call3
+  ret i32 %add4
+}
+
+define dso_local i32 @multiple_epilogs(i32 %x) local_unnamed_addr {
+; CHECK-LABEL: multiple_epilogs:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    testl %eax, %eax
+; CHECK-NEXT:    jle .LBB3_2
+; CHECK-NEXT:  # %bb.1: # %if.then
+; CHECK-NEXT:    movl %eax, %ecx
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    jmp c # TAILCALL
+; CHECK-NEXT:  .LBB3_2: # %if.else
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    jmp b # TAILCALL
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %call = tail call i32 @c(i32 noundef %x)
+  %cmp = icmp sgt i32 %call, 0
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  %call1 = tail call i32 @c(i32 noundef %call)
+  ret i32 %call1
+
+if.else:
+  %call2 = tail call i32 @b()
+  ret i32 %call2
+}
+
+define dso_local i32 @tail_call(i32 %x) local_unnamed_addr {
+; CHECK-LABEL: tail_call:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    movl %eax, %ecx
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    jmp c # TAILCALL
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %call = tail call i32 @c(i32 %x)
+  %call1 = tail call i32 @c(i32 %call)
+  ret i32 %call1
+}
+
+define dso_local void @dynamic_stack_alloc(i32 %x) local_unnamed_addr {
+; CHECK-LABEL: dynamic_stack_alloc:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_setframe %rbp, 0
+; CHECK-NEXT:    movq %rsp, %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    movl %ecx, %eax
+; CHECK-NEXT:    leaq 15(,%rax,4), %rax
+; CHECK-NEXT:    andq $-16, %rax
+; CHECK-NEXT:    callq __chkstk
+; CHECK-NEXT:    subq %rax, %rsp
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_setframe %rbp, 0
+; CHECK-NEXT:    movq %rbp, %rsp
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %y = alloca i32, i32 %x
+  ret void
+}
+
+define dso_local void @large_aligned_alloc() align 16 {
+; CHECK-LABEL: large_aligned_alloc:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_stackalloc 176
+; CHECK-NEXT:    subq $176, %rsp
+; CHECK-NEXT:    .seh_setframe %rbp, 128
+; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    andq $-64, %rsp
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_setframe %rbp, 128
+; CHECK-NEXT:    .seh_stackalloc 176
+; CHECK-NEXT:    leaq 48(%rbp), %rsp
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+  %1 = alloca [128 x i8], align 64
+  ret void
+}
+
+attributes #1 = { noreturn }
+define dso_local void @no_return_func() local_unnamed_addr #1 {
+; CHECK-LABEL: no_return_func:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_stackalloc 40
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    callq d
+; CHECK-NEXT:    int3
+; CHECK-NEXT:    .seh_endproc
+entry:
+  call void @d()
+  unreachable
+}
+
+define dso_local i32 @has_funclet(i32 %x) local_unnamed_addr personality ptr @__C_specific_handler {
+; CHECK-LABEL: has_funclet:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_pushreg %rsi
+; CHECK-NEXT:    pushq %rsi
+; CHECK-NEXT:    .seh_pushreg %rdi
+; CHECK-NEXT:    pushq %rdi
+; CHECK-NEXT:    .seh_stackalloc 48
+; CHECK-NEXT:    subq $48, %rsp
+; CHECK-NEXT:    .seh_setframe %rbp, 48
+; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:  .Ltmp0: # EH_LABEL
+; CHECK-NEXT:    movl %ecx, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    nop
+; CHECK-NEXT:  .Ltmp1: # EH_LABEL
+; CHECK-NEXT:  # %bb.1: # %call.block.1
+; CHECK-NEXT:  .Ltmp2: # EH_LABEL
+; CHECK-NEXT:    movl %eax, %esi
+; CHECK-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %ecx # 4-byte Reload
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    nop
+; CHECK-NEXT:  .Ltmp3: # EH_LABEL
+; CHECK-NEXT:  # %bb.2: # %call.block.2
+; CHECK-NEXT:  .Ltmp4: # EH_LABEL
+; CHECK-NEXT:    movl %eax, %edi
+; CHECK-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %ecx # 4-byte Reload
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    nop
+; CHECK-NEXT:  .Ltmp5: # EH_LABEL
+; CHECK-NEXT:  # %bb.3: # %call.block.3
+; CHECK-NEXT:  .Ltmp6: # EH_LABEL
+; CHECK-NEXT:    movl %eax, %ecx
+; CHECK-NEXT:    callq c
+; CHECK-NEXT:    nop
+; CHECK-NEXT:  .Ltmp7: # EH_LABEL
+; CHECK-NEXT:  # %bb.4: # %call.block.4
+; CHECK-NEXT:    addl %esi, %edi
+; CHECK-NEXT:    addl %eax, %edi
+; CHECK-NEXT:    movl %edi, %eax
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 48
+; CHECK-NEXT:    addq $48, %rsp
+; CHECK-NEXT:    .seh_pushreg %rdi
+; CHECK-NEXT:    popq %rdi
+; CHECK-NEXT:    .seh_pushreg %rsi
+; CHECK-NEXT:    popq %rsi
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_handlerdata
+; CHECK-NEXT:  .Lhas_funclet$parent_frame_offset = 48
+; CHECK-NEXT:    .long (.Llsda_end0-.Llsda_begin0)/16 # Number of call sites
+; CHECK-NEXT:  .Llsda_begin0:
+; CHECK-NEXT:    .long .Ltmp0 at IMGREL # LabelStart
+; CHECK-NEXT:    .long .Ltmp7 at IMGREL # LabelEnd
+; CHECK-NEXT:    .long "?dtor$5@?0?has_funclet at 4HA"@IMGREL # FinallyFunclet
+; CHECK-NEXT:    .long 0 # Null
+; CHECK-NEXT:  .Llsda_end0:
+; CHECK-NEXT:    .text
+; CHECK-NEXT:    .seh_endproc
+; CHECK-NEXT:    .def "?dtor$5@?0?has_funclet at 4HA";
+; CHECK-NEXT:    .scl 3;
+; CHECK-NEXT:    .type 32;
+; CHECK-NEXT:    .endef
+; CHECK-NEXT:    .p2align 4
+; CHECK-NEXT:  "?dtor$5@?0?has_funclet at 4HA":
+; CHECK-NEXT:  .seh_proc "?dtor$5@?0?has_funclet at 4HA"
+; CHECK-NEXT:  .LBB8_5: # %cleanup
+; CHECK-NEXT:    movq %rdx, {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_pushreg %rsi
+; CHECK-NEXT:    pushq %rsi
+; CHECK-NEXT:    .seh_pushreg %rdi
+; CHECK-NEXT:    pushq %rdi
+; CHECK-NEXT:    .seh_stackalloc 32
+; CHECK-NEXT:    subq $32, %rsp
+; CHECK-NEXT:    leaq 48(%rdx), %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %ecx # 4-byte Reload
+; CHECK-NEXT:    callq cleanup_helper
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 32
+; CHECK-NEXT:    addq $32, %rsp
+; CHECK-NEXT:    .seh_pushreg %rdi
+; CHECK-NEXT:    popq %rdi
+; CHECK-NEXT:    .seh_pushreg %rsi
+; CHECK-NEXT:    popq %rsi
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq # CLEANUPRET
+; CHECK-NEXT:  .Lfunc_end0:
+; CHECK-NEXT:    .seh_handlerdata
+; CHECK-NEXT:    .text
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %call = invoke i32 @c(i32 %x)
+    to label %call.block.1 unwind label %cleanup
+
+call.block.1:
+  %call1 = invoke i32 @c(i32 %x)
+    to label %call.block.2 unwind label %cleanup
+
+call.block.2:
+  %add = add nsw i32 %call1, %call
+  %call2 = invoke i32 @c(i32 %x)
+    to label %call.block.3 unwind label %cleanup
+
+call.block.3:
+  %call3 = invoke i32 @c(i32 %call2)
+    to label %call.block.4 unwind label %cleanup
+
+call.block.4:
+  %add4 = add nsw i32 %add, %call3
+  ret i32 %add4
+
+cleanup:
+  %cleanup_token = cleanuppad within none []
+  call fastcc void @cleanup_helper(i32 %x) #18 [ "funclet"(token %cleanup_token) ]
+  cleanupret from %cleanup_token unwind to caller
+}
+
+define internal fastcc void @cleanup_helper(i32 %x) local_unnamed_addr {
+; CHECK-LABEL: cleanup_helper:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_setframe %rbp, 0
+; CHECK-NEXT:    movq %rsp, %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    movl %ecx, %eax
+; CHECK-NEXT:    leaq 15(,%rax,4), %rax
+; CHECK-NEXT:    andq $-16, %rax
+; CHECK-NEXT:    callq __chkstk
+; CHECK-NEXT:    subq %rax, %rsp
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_setframe %rbp, 0
+; CHECK-NEXT:    movq %rbp, %rsp
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %y = alloca i32, i32 %x
+  ret void
+}
+
+declare void @a() local_unnamed_addr
+declare i32 @b() local_unnamed_addr
+declare i32 @c(i32) local_unnamed_addr
+declare void @d() local_unnamed_addr #1
+declare dso_local i32 @__C_specific_handler(...)
+declare i64 @llvm.x86.flags.read.u64()
+
+; ---- XMM callee-saved saves: SEH_SaveXMM before movaps in V3 ----
+define dso_local void @xmm_saves() local_unnamed_addr {
+; CHECK-LABEL: xmm_saves:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_stackalloc 72
+; CHECK-NEXT:    subq $72, %rsp
+; CHECK-NEXT:    .seh_savexmm %xmm7, 48
+; CHECK-NEXT:    movaps %xmm7, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    .seh_savexmm %xmm6, 32
+; CHECK-NEXT:    movaps %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    callq a
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_savexmm %xmm6, 32
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
+; CHECK-NEXT:    .seh_savexmm %xmm7, 48
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm7 # 16-byte Reload
+; CHECK-NEXT:    .seh_stackalloc 72
+; CHECK-NEXT:    addq $72, %rsp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  call void asm sideeffect "", "~{xmm6},~{xmm7}"()
+  call void @a()
+  ret void
+}
+
+define dso_local void @xmm_and_gpr_saves(ptr %p) local_unnamed_addr {
+; CHECK-LABEL: xmm_and_gpr_saves:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rbx
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    .seh_stackalloc 48
+; CHECK-NEXT:    subq $48, %rsp
+; CHECK-NEXT:    .seh_savexmm %xmm6, 32
+; CHECK-NEXT:    movaps %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    callq a
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_savexmm %xmm6, 32
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
+; CHECK-NEXT:    .seh_stackalloc 48
+; CHECK-NEXT:    addq $48, %rsp
+; CHECK-NEXT:    .seh_pushreg %rbx
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  call void asm sideeffect "", "~{xmm6},~{rbx}"()
+  call void @a()
+  ret void
+}
+
+define dso_local void @large_stack_alloc() local_unnamed_addr {
+; CHECK-LABEL: large_stack_alloc:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_stackalloc 8232
+; CHECK-NEXT:    movl $8232, %eax # imm = 0x2028
+; CHECK-NEXT:    callq __chkstk
+; CHECK-NEXT:    subq %rax, %rsp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    callq a
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_stackalloc 8232
+; CHECK-NEXT:    addq $8232, %rsp # imm = 0x2028
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %buf = alloca [8192 x i8], align 1
+  call void @a()
+  ret void
+}
+
+define dso_local void @large_dynalloc_frame(i32 %n) local_unnamed_addr {
+; CHECK-LABEL: large_dynalloc_frame:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_stackalloc 4096
+; CHECK-NEXT:    movl $4096, %eax # imm = 0x1000
+; CHECK-NEXT:    callq __chkstk
+; CHECK-NEXT:    subq %rax, %rsp
+; CHECK-NEXT:    .seh_setframe %rbp, 128
+; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rbp
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    movl %ecx, %eax
+; CHECK-NEXT:    leaq 15(,%rax,4), %rax
+; CHECK-NEXT:    andq $-16, %rax
+; CHECK-NEXT:    callq __chkstk
+; CHECK-NEXT:    subq %rax, %rsp
+; CHECK-NEXT:    subq $32, %rsp
+; CHECK-NEXT:    callq a
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_setframe %rbp, 128
+; CHECK-NEXT:    .seh_stackalloc 4096
+; CHECK-NEXT:    leaq 3968(%rbp), %rsp
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  %buf = alloca [4096 x i8], align 16
+  %dyn = alloca i32, i32 %n
+  call void @a()
+  ret void
+}
+
+define dso_local void @xmm_with_dynalloc(i32 %n) local_unnamed_addr {
+; CHECK-LABEL: xmm_with_dynalloc:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .seh_stackalloc 32
+; CHECK-NEXT:    subq $32, %rsp
+; CHECK-NEXT:    .seh_setframe %rbp, 32
+; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rbp
+; CHECK-NEXT:    .seh_savexmm %xmm7, 16
+; CHECK-NEXT:    movaps %xmm7, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    .seh_savexmm %xmm6, 0
+; CHECK-NEXT:    movaps %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    .seh_endprologue
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    movl %ecx, %eax
+; CHECK-NEXT:    leaq 15(,%rax,4), %rax
+; CHECK-NEXT:    andq $-16, %rax
+; CHECK-NEXT:    callq __chkstk
+; CHECK-NEXT:    subq %rax, %rsp
+; CHECK-NEXT:    subq $32, %rsp
+; CHECK-NEXT:    callq a
+; CHECK-NEXT:    addq $32, %rsp
+; CHECK-NEXT:    .seh_startepilogue
+; CHECK-NEXT:    .seh_savexmm %xmm6, 0
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
+; CHECK-NEXT:    .seh_savexmm %xmm7, 16
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm7 # 16-byte Reload
+; CHECK-NEXT:    .seh_setframe %rbp, 32
+; CHECK-NEXT:    .seh_stackalloc 32
+; CHECK-NEXT:    movq %rbp, %rsp
+; CHECK-NEXT:    .seh_pushreg %rbp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .seh_endepilogue
+; CHECK-NEXT:    retq
+; CHECK-NEXT:    .seh_endproc
+entry:
+  call void asm sideeffect "", "~{xmm6},~{xmm7}"()
+  %dyn = alloca i32, i32 %n
+  call void @a()
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"winx64-eh-unwind", i32 3}

diff  --git a/llvm/test/DebugInfo/COFF/apx-egpr.ll b/llvm/test/DebugInfo/COFF/apx-egpr.ll
index 119ad8d38edba..4f8bc288b576c 100644
--- a/llvm/test/DebugInfo/COFF/apx-egpr.ll
+++ b/llvm/test/DebugInfo/COFF/apx-egpr.ll
@@ -14,7 +14,7 @@
 ; OBJ-NEXT: }
 
 ; This test is to check CodeView register IDs for APX EGPR.
-define i32 @test() !dbg !4 {
+define i32 @test() nounwind !dbg !4 {
 entry:
   %0 = call i32 asm sideeffect "nop", "={r16},~{dirflag},~{fpsr},~{flags}"(), !dbg !7
   #dbg_value(i32 %0, !8, !DIExpression(), !10)

diff  --git a/llvm/test/MC/AsmParser/seh-directive-errors.s b/llvm/test/MC/AsmParser/seh-directive-errors.s
index a5fc2be018801..70037fb498276 100644
--- a/llvm/test/MC/AsmParser/seh-directive-errors.s
+++ b/llvm/test/MC/AsmParser/seh-directive-errors.s
@@ -142,7 +142,7 @@ i:
 
 j:
     .seh_proc j
-    .seh_unwindversion 1
+    .seh_unwindversion 4
 # CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: Unsupported version specified in .seh_unwindversion in j
     .seh_unwindversion 2
     .seh_unwindversion 2
@@ -177,3 +177,25 @@ k:
     .seh_endepilogue
     ret
     .seh_endproc
+
+# --- Test: .seh_pushreg inside epilog errors for V2 ---
+l:
+    .seh_proc l
+    .seh_unwindversion 2
+    pushq   %rbx
+    .seh_pushreg %rbx
+    subq    $32, %rsp
+    .seh_stackalloc 32
+    .seh_endprologue
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+# CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: .seh_stackalloc inside epilog requires unwind v3
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+# CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: .seh_pushreg inside epilog requires unwind v3
+    popq    %rbx
+    .seh_unwindv2start
+    .seh_endepilogue
+    retq
+    .seh_endproc

diff  --git a/llvm/test/MC/COFF/seh-unwindv3-error.s b/llvm/test/MC/COFF/seh-unwindv3-error.s
new file mode 100644
index 0000000000000..0cc0e13b172bc
--- /dev/null
+++ b/llvm/test/MC/COFF/seh-unwindv3-error.s
@@ -0,0 +1,72 @@
+// RUN: not llvm-mc -triple x86_64-pc-win32 -filetype=obj %s -o /dev/null 2>&1 | FileCheck %s
+
+// Test: invalid unwind version (not 2 or 3) inside a function.
+.text
+bad_version:
+    .seh_proc bad_version
+    .seh_unwindversion 4
+// CHECK: error: Unsupported version specified in .seh_unwindversion
+    .seh_endprologue
+    retq
+    .seh_endproc
+
+// Test: .seh_push2regs in a V1 function (no .seh_unwindversion) produces error.
+push2_in_v1:
+    .seh_proc push2_in_v1
+    .seh_push2regs %r12, %r13
+// CHECK: error: .seh_push2regs is only supported for unwind v3
+    .seh_endprologue
+    retq
+    .seh_endproc
+
+// Test: .seh_push2regs missing comma between registers.
+push2_missing_comma:
+    .seh_proc push2_missing_comma
+    .seh_unwindversion 3
+    .seh_push2regs %r12 %r13
+// CHECK: error: expected comma between registers
+    .seh_endprologue
+    retq
+    .seh_endproc
+
+// Test: .seh_push2regs missing second register.
+push2_missing_reg2:
+    .seh_proc push2_missing_reg2
+    .seh_unwindversion 3
+    .seh_push2regs %r12,
+// CHECK: error: invalid register name
+    .seh_endprologue
+    retq
+    .seh_endproc
+
+// Test: .seh_push2regs with trailing junk.
+push2_trailing_junk:
+    .seh_proc push2_trailing_junk
+    .seh_unwindversion 3
+    .seh_push2regs %r12, %r13 extra
+// CHECK: error: expected end of directive
+    .seh_endprologue
+    retq
+    .seh_endproc
+
+// Test: UOP_Push2 recorded under V3 then frame downgraded to V2 — the
+// directive-level check passes (since the frame is already V3), so the
+// error must come from the unwind-info emitter as a recoverable diagnostic
+// rather than a fatal crash.
+.seh_unwindversion 3
+push2_downgrade_v2:
+    .seh_proc push2_downgrade_v2
+    .seh_push2regs %r12, %r13
+    push2   %r13, %r12
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    // Downgrade this frame to V2; UOP_Push2 cannot be encoded for V1/V2.
+    .seh_unwindversion 2
+    nop
+    .seh_startepilogue
+    .seh_unwindv2start
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK: error: UOP_Push2 (PUSH2 with two registers) requires V3 unwind info

diff  --git a/llvm/test/MC/COFF/seh-unwindv3-inheritance.s b/llvm/test/MC/COFF/seh-unwindv3-inheritance.s
new file mode 100644
index 0000000000000..4c768e275f3ce
--- /dev/null
+++ b/llvm/test/MC/COFF/seh-unwindv3-inheritance.s
@@ -0,0 +1,137 @@
+// RUN: llvm-mc -triple x86_64-pc-win32 -filetype=obj %s | llvm-readobj -u - | FileCheck %s
+
+// Tests for epilog descriptor inheritance in V3 unwind info.
+
+// CHECK:       UnwindInformation [
+
+.text
+
+// --- Test 1: two identical mirror epilogs -> second inherits ---
+// Both epilogs have the same WODs, same FirstOp, same IP offsets.
+same_ops_inherit:
+    .seh_proc same_ops_inherit
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_INHERIT_ELSE
+    // Epilog 0: mirror
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+.L_INHERIT_ELSE:
+    // Epilog 1: identical mirror
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: same_ops_inherit
+// CHECK:        NumberOfOps: 2
+// CHECK:        NumberOfEpilogs: 2
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 2
+// CHECK:          FirstOp: 0x0
+// CHECK:          ALLOC_SMALL Size=0x20
+// CHECK:          PUSH Reg=RBX
+// CHECK:        }
+// Epilog 1 inherits (same FirstOp, same NumberOfOps, same IP offsets)
+// CHECK:        Epilog [1] {
+// CHECK:          NumberOfOps: 0
+// CHECK:          (inherits
+
+// --- Test 2: two epilogs, 
diff erent NumberOfOps -> no inheritance ---
+// Epilog 0: mirror (2 ops), Epilog 1: partial (1 op)
+
diff erent_numops:
+    .seh_proc 
diff erent_numops
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_DIFFOPS_ELSE
+    // Epilog 0: full mirror
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+.L_DIFFOPS_ELSE:
+    // Epilog 1: partial ΓÇö only dealloc, no pop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: 
diff erent_numops
+// CHECK:        NumberOfOps: 2
+// CHECK:        NumberOfEpilogs: 2
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 2
+// CHECK:          FirstOp: 0x0
+// CHECK:        }
+// Epilog 1: 
diff erent NumberOfOps ΓÇö gets its own full descriptor, not inherited
+// CHECK:        Epilog [1] {
+// CHECK:          NumberOfOps: 1
+// CHECK:          FirstOp: 0x0
+// CHECK:          ALLOC_SMALL Size=0x20
+
+// --- Test 3: two epilogs, same NumberOfOps but 
diff erent WODs -> no inheritance ---
+// Epilog 0: pop rdi, Epilog 1: pop rbx (
diff erent register in single-op epilog)
+
diff erent_wods:
+    .seh_proc 
diff erent_wods
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_DIFFWOD_ELSE
+    // Epilog 0: pop rdi only
+    .seh_startepilogue
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_endepilogue
+    jmp     c
+.L_DIFFWOD_ELSE:
+    // Epilog 1: pop rbx only (
diff erent register)
+    .seh_startepilogue
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: 
diff erent_wods
+// CHECK:        NumberOfOps: 3
+// CHECK:        NumberOfEpilogs: 2
+// Epilog 0: PUSH RDI, found in prolog pool
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 1
+// CHECK:          PUSH Reg=RDI
+// CHECK:        }
+// Epilog 1: PUSH RBX ΓÇö 
diff erent FirstOp, so NOT inherited
+// CHECK:        Epilog [1] {
+// CHECK:          NumberOfOps: 1
+// CHECK:          PUSH Reg=RBX
+// CHECK:       ]

diff  --git a/llvm/test/MC/COFF/seh-unwindv3-large.s b/llvm/test/MC/COFF/seh-unwindv3-large.s
new file mode 100644
index 0000000000000..c0f35b0ba0e5f
--- /dev/null
+++ b/llvm/test/MC/COFF/seh-unwindv3-large.s
@@ -0,0 +1,197 @@
+// RUN: llvm-mc -triple x86_64-pc-win32 -filetype=obj %s | llvm-readobj -u - | FileCheck %s
+
+// Tests for V3 UNW_FLAG_LARGE emission when prolog exceeds 255 bytes.
+// This exercises the LARGE prolog header (5-byte) and 16-bit IP offsets.
+
+// CHECK:       UnwindInformation [
+
+.text
+
+// --- Test 1: prolog with IP offset > 255 (evaluatable case) ---
+// Uses a large .space to push the second prolog instruction past offset 255.
+// This should trigger UNW_FLAG_LARGE with known values.
+large_prolog_known:
+    .seh_proc large_prolog_known
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    // Pad with 260 bytes of NOPs to push the next IP offset past 255.
+    .rept 260
+    nop
+    .endr
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: large_prolog_known
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x8)
+// CHECK-NEXT:       Large (0x8)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog: 0x109
+// CHECK-NEXT:     PayloadWords: 8
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK-NEXT:       [0] IP +0x0105: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0000: PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            EpilogOffset: +0x10A
+// CHECK-NEXT:       NumberOfOps: 2
+// CHECK-NEXT:       FirstOp: 0x0
+// CHECK-NEXT:       IpOffsetOfLastInstruction: 0x5
+// CHECK-NEXT:       [0] IP +0x0000: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0004: PUSH Reg=RBX
+// Uses .p2align inside the prolog to create a relaxable fragment that makes
+// GetOptionalAbsDifference return nullopt, triggering the conservative
+// NeedsLargeProlog=true path with fixup-based emission.
+large_prolog_unevaluatable:
+    .seh_proc large_prolog_unevaluatable
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    // Alignment directive creates a relaxable fragment ΓÇö the distance from
+    // the function start to subsequent instructions becomes unevaluatable.
+    .p2align 8, 0x90
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// When the expression is unevaluatable, we conservatively set LARGE.
+// The output should still be valid V3 unwind info with the Large flag.
+// The .p2align 8 aligns to 256; with the function starting at offset 0x110
+// in the section, the pushq is at func+0, and the .p2align pads to the
+// next 256-byte boundary at func+0xF0, so the subq is at func+0xF0.
+// CHECK-LABEL:  StartAddress: large_prolog_unevaluatable
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x8)
+// CHECK-NEXT:       Large (0x8)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog: 0xF4
+// CHECK-NEXT:     PayloadWords: 8
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK-NEXT:       [0] IP +0x00F0: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0000: PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            EpilogOffset: +0xF5
+// CHECK-NEXT:       NumberOfOps: 2
+// CHECK-NEXT:       FirstOp: 0x0
+// CHECK-NEXT:       IpOffsetOfLastInstruction: 0x5
+// CHECK-NEXT:       [0] IP +0x0000: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0004: PUSH Reg=RBX
+
+// --- Test 3: epilog with IP offset > 255 (evaluatable case) ---
+// The epilog has 260 NOPs between the two unwind operations, pushing the
+// second epilog IP offset and IpOffsetOfLastInstruction past 255.
+// This should trigger EPILOG_INFO_LARGE (bit 1 in epilog flags) with 16-bit
+// epilog IP offsets and IpOffsetOfLastInstruction.
+// The prolog is small, so UNW_FLAG_LARGE should NOT be set.
+large_epilog_known:
+    .seh_proc large_epilog_known
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    // Pad with 260 NOPs inside the epilog to push the popq past offset 255.
+    .rept 260
+    nop
+    .endr
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: large_epilog_known
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog: 0x5
+// CHECK-NEXT:     PayloadWords: 8
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK-NEXT:       [0] IP +0x0001: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0000: PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK-NEXT:       Flags [ (0x2)
+// CHECK-NEXT:         Large (0x2)
+// CHECK-NEXT:       ]
+// CHECK-NEXT:       EpilogOffset: +0x6
+// CHECK-NEXT:       NumberOfOps: 2
+// CHECK-NEXT:       FirstOp: 0x0
+// CHECK-NEXT:       IpOffsetOfLastInstruction: 0x109
+// CHECK-NEXT:       [0] IP +0x0000: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0108: PUSH Reg=RBX
+
+// --- Test 4: epilog with alignment directive (unevaluatable case) ---
+// Uses .p2align inside the epilog to create a relaxable fragment, making
+// the epilog IP offsets unevaluatable and triggering EPILOG_INFO_LARGE
+// conservatively.
+large_epilog_unevaluatable:
+    .seh_proc large_epilog_unevaluatable
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    // Alignment directive inside epilog makes offsets unevaluatable.
+    .p2align 8, 0x90
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: large_epilog_unevaluatable
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog: 0x5
+// CHECK-NEXT:     PayloadWords: 8
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK-NEXT:       [0] IP +0x0001: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x0000: PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK-NEXT:       Flags [ (0x2)
+// CHECK-NEXT:         Large (0x2)
+// CHECK-NEXT:       ]
+// CHECK-NEXT:       EpilogOffset: +0x6
+// CHECK-NEXT:       NumberOfOps: 2
+// CHECK-NEXT:       FirstOp: 0x0
+// CHECK-NEXT:       IpOffsetOfLastInstruction: 0xE0
+// CHECK-NEXT:       [0] IP +0x0000: ALLOC_SMALL Size=0x20
+// CHECK-NEXT:       [1] IP +0x00DF: PUSH Reg=RBX

diff  --git a/llvm/test/MC/COFF/seh-unwindv3-nonmirror.s b/llvm/test/MC/COFF/seh-unwindv3-nonmirror.s
new file mode 100644
index 0000000000000..9c9d59e7e4e6d
--- /dev/null
+++ b/llvm/test/MC/COFF/seh-unwindv3-nonmirror.s
@@ -0,0 +1,327 @@
+// RUN: llvm-mc -triple x86_64-pc-win32 -filetype=obj %s | llvm-readobj -u - | FileCheck %s
+
+// Tests for non-mirror epilog support in V3 unwind info emission.
+// These test cases exercise epilogs that 
diff er from the prolog in
+// operation count, operation type, or both.
+
+// CHECK:       UnwindInformation [
+
+.text
+
+// --- Test 1: partial restore (epilog has fewer ops than prolog) ---
+// Prolog: push rbx, push rdi, sub rsp, 32
+// Epilog: add rsp, 32, pop rdi (no pop rbx — tail call path)
+partial_restore:
+    .seh_proc partial_restore
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: partial_restore
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 3
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [3 ops]:
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RDI
+// CHECK:            PUSH Reg=RBX
+// Epilog has 2 ops — partial restore (suffix of prolog).
+// FirstOp should point into the prolog's WODs (offset 0 = ALLOC_SMALL).
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 2
+// CHECK:            FirstOp: 0x0
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RDI
+
+// --- Test 2: 
diff erent operation type in epilog ---
+// Prolog: push rax (padding), sub rsp, 32
+// Epilog: add rsp, 32, add rsp, 8 (stackalloc instead of pop for padding)
+
diff erent_op_type:
+    .seh_proc 
diff erent_op_type
+    .seh_unwindversion 3
+    .seh_pushreg %rax
+    pushq   %rax
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_stackalloc 8
+    addq    $8, %rsp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: 
diff erent_op_type
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RAX
+// Epilog has alloc+alloc instead of alloc+push — FirstOp should be past
+// the prolog WODs (appended to pool).
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 2
+// CHECK-NOT:       FirstOp: 0x0
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            ALLOC_SMALL Size=0x8
+
+// --- Test 3: mixed mirror + non-mirror epilogs ---
+// Prolog: push rbx, push rdi, sub rsp, 32
+// Epilog 0: mirror (add rsp, 32; pop rdi; pop rbx) -> FirstOp=0
+// Epilog 1: partial (add rsp, 32; pop rdi) -> FirstOp=0 (suffix)
+mixed_epilogs:
+    .seh_proc mixed_epilogs
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_MIXED_ELSE
+    // Mirror epilog (full restore)
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+.L_MIXED_ELSE:
+    // Partial epilog (no pop rbx — tail call)
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: mixed_epilogs
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 3
+// CHECK-NEXT:     NumberOfEpilogs: 2
+// CHECK:          Prolog [3 ops]:
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RDI
+// CHECK:            PUSH Reg=RBX
+// Mirror epilog (FirstOp=0, NumberOfOps=3)
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 3
+// CHECK:            FirstOp: 0x0
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RDI
+// CHECK:            PUSH Reg=RBX
+// Partial epilog — 
diff erent NumberOfOps, so NOT inherited.
+// CHECK:          Epilog [1] {
+// CHECK:            NumberOfOps: 2
+// CHECK:            FirstOp: 0x0
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RDI
+
+// --- Test 4: reordered epilog ---
+// Prolog: push rbx, push rdi, sub rsp, 32
+// Epilog: add rsp, 32, pop rbx, pop rdi (swapped pop order)
+// The epilog WODs are 
diff erent from the prolog's because the registers
+// are in a 
diff erent order.
+reordered_epilog:
+    .seh_proc reordered_epilog
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: reordered_epilog
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 3
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [3 ops]:
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RDI
+// CHECK:            PUSH Reg=RBX
+// Epilog has same ops but 
diff erent register order — distinct WODs.
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 3
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RBX
+// CHECK:            PUSH Reg=RDI
+
+// --- Test 5: setframe omitted in epilog ---
+// Prolog: push rbx, push rdi, push rbp, sub rsp, 48, setframe rbp, 48
+// Epilog: add rsp, 48, pop rbp, pop rdi, pop rbx (no setframe)
+// This is the pattern that funclets produce.
+setframe_omitted:
+    .seh_proc setframe_omitted
+    .seh_unwindversion 3
+    .seh_pushreg %rbp
+    pushq   %rbp
+    .seh_pushreg %rsi
+    pushq   %rsi
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 48
+    subq    $48, %rsp
+    .seh_setframe %rbp, 48
+    leaq    48(%rsp), %rbp
+    .seh_endprologue
+    callq   a
+    nop
+    // Epilog omits setframe — ADD RSP subsumes it
+    .seh_startepilogue
+    .seh_stackalloc 48
+    addq    $48, %rsp
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_pushreg %rsi
+    popq    %rsi
+    .seh_pushreg %rbp
+    popq    %rbp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: setframe_omitted
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 5
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [5 ops]:
+// CHECK:            SET_FPREG Reg=RBP, Offset=0x30
+// CHECK:            ALLOC_SMALL Size=0x30
+// CHECK:            PUSH Reg=RDI
+// CHECK:            PUSH Reg=RSI
+// CHECK:            PUSH Reg=RBP
+// Epilog has 4 ops (no setframe), FirstOp should be 2 (suffix past setframe WOD).
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 4
+// CHECK:            FirstOp: 0x2
+// CHECK:            ALLOC_SMALL Size=0x30
+// CHECK:            PUSH Reg=RDI
+// CHECK:            PUSH Reg=RSI
+// CHECK:            PUSH Reg=RBP
+
+// --- Test 6: chained child with its own prolog instructions ---
+// Main fragment: push rbx, sub rsp, 32
+// Chained child has its own prolog (sub rsp, 16) and a mirror epilog.
+// The child's NumberOfOps should be >0 and its pool should contain
+// the child's own prolog WODs (not the parent's).
+chained_with_prolog:
+    .seh_proc chained_with_prolog
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    // Chained child with its own prolog
+    .seh_splitchained
+    .seh_stackalloc 16
+    subq    $16, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 16
+    addq    $16, %rsp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// Main fragment
+// CHECK-LABEL:  StartAddress: chained_with_prolog
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK:          NumberOfOps: 2
+// CHECK:          NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 2
+// CHECK:            FirstOp: 0x0
+// Chained child — has its own prolog
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x4)
+// CHECK-NEXT:       ChainInfo (0x4)
+// CHECK-NEXT:     ]
+// CHECK:          NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [1 ops]:
+// CHECK:            ALLOC_SMALL Size=0x10
+// CHECK:          Epilog [0] {
+// CHECK:            NumberOfOps: 1
+// CHECK:            FirstOp: 0x0
+// CHECK:            ALLOC_SMALL Size=0x10
+// CHECK:       ]

diff  --git a/llvm/test/MC/COFF/seh-unwindv3-pool-sharing.s b/llvm/test/MC/COFF/seh-unwindv3-pool-sharing.s
new file mode 100644
index 0000000000000..8273843b15607
--- /dev/null
+++ b/llvm/test/MC/COFF/seh-unwindv3-pool-sharing.s
@@ -0,0 +1,153 @@
+// RUN: llvm-mc -triple x86_64-pc-win32 -filetype=obj %s | llvm-readobj -u - | FileCheck %s
+
+// Tests for WOD pool sharing / deduplication in V3 unwind info.
+
+// CHECK:       UnwindInformation [
+
+.text
+
+// --- Test 1: suffix sharing ---
+// Prolog: push rbx, push rdi, sub rsp, 32 (3 WODs)
+// Epilog: add rsp, 32, pop rdi (2 WODs — suffix of prolog)
+// Pool should contain only the prolog's 3 WODs; epilog reuses bytes 0..1.
+suffix_sharing:
+    .seh_proc suffix_sharing
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: suffix_sharing
+// Payload = 3 (prolog IPs) + 8 (epilog desc: 6+2) + 3 (WOD pool) = 14 bytes, 7 codes
+// CHECK:        PayloadWords: 7
+// CHECK:        NumberOfOps: 3
+// CHECK:        NumberOfEpilogs: 1
+// Prolog WODs: ALLOC_SMALL(32), PUSH(RDI), PUSH(RBX) — 3 bytes
+// Epilog uses FirstOp=0 (ALLOC_SMALL + PUSH RDI is a prefix of the pool)
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 2
+// CHECK:          FirstOp: 0x0
+
+// --- Test 2: two epilogs sharing pool ---
+// Prolog: push rbx, sub rsp, 32
+// Epilog 0: add rsp, 32 (1 WOD, distinct from prolog — no push)
+// Epilog 1: add rsp, 32 (1 WOD, same as epilog 0)
+// Epilog 1 should share epilog 0's FirstOp (not duplicate in pool).
+cross_epilog_sharing:
+    .seh_proc cross_epilog_sharing
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_CROSS_ELSE
+    // Epilog 0: only dealloc (no pop rbx — tail call)
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_endepilogue
+    jmp     c
+.L_CROSS_ELSE:
+    // Epilog 1: same as epilog 0
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: cross_epilog_sharing
+// Payload = 2 (prolog IPs) + 10 (epilog descs: 7+3) + 2 (WOD pool) = 14 bytes, 7 codes
+// CHECK:        PayloadWords: 7
+// CHECK:        NumberOfOps: 2
+// CHECK:        NumberOfEpilogs: 2
+// Epilog 0: ALLOC_SMALL(32) — found in prolog pool at offset 0
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 1
+// CHECK:          FirstOp: 0x0
+// CHECK:          ALLOC_SMALL Size=0x20
+// Epilog 1: same WODs and same IP offsets — inherited from epilog 0
+// CHECK:        Epilog [1] {
+// CHECK:          NumberOfOps: 0
+// CHECK:          (inherits
+
+// --- Test 3: no-match append ---
+// Prolog: push rbx, sub rsp, 32
+// Epilog: add rsp, 64 (
diff erent alloc size — no pool match)
+// Epilog WOD must be appended to the pool.
+no_match_append:
+    .seh_proc no_match_append
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 64
+    addq    $64, %rsp
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: no_match_append
+// Payload = 2 (prolog IPs) + 7 (epilog desc: 6+1) + 3 (WOD pool: 2+1) = 12 bytes, 6 codes
+// CHECK:        PayloadWords: 6
+// CHECK:        NumberOfOps: 2
+// CHECK:        NumberOfEpilogs: 1
+// Prolog pool: ALLOC_SMALL(32)=1 byte, PUSH(RBX)=1 byte => 2 bytes
+// Epilog WOD: ALLOC_SMALL(64)=1 byte at pool offset 2 (appended)
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 1
+// CHECK:          FirstOp: 0x2
+// CHECK:          ALLOC_SMALL Size=0x40
+
+// --- Test 4: epilog mirrors prolog exactly ---
+// Prolog: push rdi, sub rsp, 32
+// Epilog: add rsp, 32, pop rdi (mirror)
+// FirstOp should be 0 (exact match in pool).
+exact_mirror:
+    .seh_proc exact_mirror
+    .seh_unwindversion 3
+    .seh_pushreg %rdi
+    pushq   %rdi
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rdi
+    popq    %rdi
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: exact_mirror
+// Payload = 2 (prolog IPs) + 8 (epilog desc: 6+2) + 2 (WOD pool) = 12 bytes, 6 codes
+// CHECK:        PayloadWords: 6
+// CHECK:        NumberOfOps: 2
+// CHECK:        NumberOfEpilogs: 1
+// CHECK:        Epilog [0] {
+// CHECK:          NumberOfOps: 2
+// CHECK:          FirstOp: 0x0
+// CHECK:          ALLOC_SMALL Size=0x20
+// CHECK:          PUSH Reg=RDI
+// CHECK:       ]

diff  --git a/llvm/test/MC/COFF/seh-unwindv3.s b/llvm/test/MC/COFF/seh-unwindv3.s
new file mode 100644
index 0000000000000..a2d7249dec021
--- /dev/null
+++ b/llvm/test/MC/COFF/seh-unwindv3.s
@@ -0,0 +1,758 @@
+// RUN: llvm-mc -triple x86_64-pc-win32 -mattr=+push2pop2,+egpr -filetype=obj %s | llvm-readobj -u - | FileCheck %s
+
+// CHECK:       UnwindInformation [
+
+.text
+
+// --- Test 0: file-scope .seh_unwindversion 3 applies to subsequent functions ---
+// This sets the default so tests 14+ don't need per-function .seh_unwindversion.
+// Tests 1-13 use per-function .seh_unwindversion 3 for explicit testing.
+
+// --- Test 1: simple stack alloc, single epilog at end ---
+simple_alloc:
+    .seh_proc simple_alloc
+    .seh_unwindversion 3
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: simple_alloc
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [1 ops]:
+// CHECK:          [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x28
+// CHECK:          Epilog [0] {
+// CHECK:            IpOffsetOfLastInstruction:
+// CHECK:            [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x28
+
+// --- Test 2: push + alloc, single epilog ---
+push_and_alloc:
+    .seh_proc push_and_alloc
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: push_and_alloc
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:          ALLOC_SMALL Size=0x20
+// CHECK:          PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RBX
+
+// --- Test 3: multiple pushes + alloc + frame register ---
+frame_register:
+    .seh_proc frame_register
+    .seh_unwindversion 3
+    .seh_pushreg %rbp
+    pushq   %rbp
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_setframe %rbp, 32
+    leaq    32(%rsp), %rbp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_setframe %rbp, 32
+    leaq    -32(%rbp), %rsp
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_pushreg %rbp
+    popq    %rbp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: frame_register
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 4
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [4 ops]:
+// CHECK:          SET_FPREG Reg=RBP, Offset=0x20
+// CHECK:          ALLOC_SMALL Size=0x20
+// CHECK:          PUSH Reg=RBX
+// CHECK:          PUSH Reg=RBP
+// CHECK:          Epilog [0] {
+// CHECK:            SET_FPREG Reg=RBP, Offset=0x20
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RBX
+// CHECK:            PUSH Reg=RBP
+
+// --- Test 4: multiple epilogs ---
+multiple_epilogs:
+    .seh_proc multiple_epilogs
+    .seh_unwindversion 3
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_ELSE_1
+    movl    %eax, %ecx
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    jmp     c
+.L_ELSE_1:
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: multiple_epilogs
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 2
+// CHECK:          Prolog [1 ops]:
+// CHECK:          ALLOC_SMALL Size=0x28
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_SMALL Size=0x28
+// CHECK:          Epilog [1] {
+// CHECK:            (inherits
+
+// --- Test 5: large alloc ---
+large_alloc:
+    .seh_proc large_alloc
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 4096
+    subq    $4096, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 4096
+    addq    $4096, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: large_alloc
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:          ALLOC_LARGE Size=0x1000
+// CHECK:          PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_LARGE Size=0x1000
+// CHECK:            PUSH Reg=RBX
+
+// --- Test 6: handler ---
+with_handler:
+    .seh_proc with_handler
+    .seh_handler __C_specific_handler, @unwind, @except
+    .seh_unwindversion 3
+    .seh_pushreg %rbp
+    pushq   %rbp
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+.with_handler_callsite:
+    callq   a
+    nop
+.with_handler_finish:
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbp
+    popq    %rbp
+    .seh_endepilogue
+    retq
+.with_handler_handler:
+    jmp     .with_handler_finish
+    .seh_handlerdata
+    .long   1
+    .long   .with_handler_callsite at IMGREL
+    .long   .with_handler_finish at IMGREL
+    .long   1
+    .long   .with_handler_handler at IMGREL
+    .text
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: with_handler
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [
+// CHECK:            ExceptionHandler
+// CHECK:          ]
+// CHECK:          NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Handler: __C_specific_handler
+
+// --- Test 7: XMM register save ---
+save_xmm:
+    .seh_proc save_xmm
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 48
+    subq    $48, %rsp
+    .seh_savexmm %xmm6, 32
+    movaps  %xmm6, 32(%rsp)
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_savexmm %xmm6, 32
+    movaps  32(%rsp), %xmm6
+    .seh_stackalloc 48
+    addq    $48, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: save_xmm
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 3
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [3 ops]:
+// CHECK:          SAVE_XMM128 Reg=XMM6, Disp=0x20
+// CHECK:          ALLOC_SMALL Size=0x30
+// CHECK:          PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            SAVE_XMM128 Reg=XMM6, Disp=0x20
+// CHECK:            ALLOC_SMALL Size=0x30
+// CHECK:            PUSH Reg=RBX
+
+// --- Test 8: non-volatile register save (mov to stack) ---
+save_nonvol:
+    .seh_proc save_nonvol
+    .seh_unwindversion 3
+    .seh_stackalloc 48
+    subq    $48, %rsp
+    .seh_savereg %rbx, 40
+    movq    %rbx, 40(%rsp)
+    .seh_savereg %rsi, 32
+    movq    %rsi, 32(%rsp)
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_savereg %rsi, 32
+    movq    32(%rsp), %rsi
+    .seh_savereg %rbx, 40
+    movq    40(%rsp), %rbx
+    .seh_stackalloc 48
+    addq    $48, %rsp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: save_nonvol
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 3
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [3 ops]:
+// CHECK:          SAVE_NONVOL Reg=RSI, Disp=0x20
+// CHECK:          SAVE_NONVOL Reg=RBX, Disp=0x28
+// CHECK:          ALLOC_SMALL Size=0x30
+// CHECK:          Epilog [0] {
+// CHECK:            SAVE_NONVOL Reg=RSI, Disp=0x20
+// CHECK:            SAVE_NONVOL Reg=RBX, Disp=0x28
+// CHECK:            ALLOC_SMALL Size=0x30
+
+// --- Test 9: pushframe (machine frame) ---
+pushframe:
+    .seh_proc pushframe
+    .seh_unwindversion 3
+    .seh_pushframe @code
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    nop
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: pushframe
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 0
+// CHECK:          Prolog [2 ops]:
+// CHECK:          ALLOC_SMALL Size=0x28
+// CHECK:          PUSH_CANONICAL_FRAME Type=1
+
+// --- Test 10: chained unwind info (sub-fragment split) ---
+chained:
+    .seh_proc chained
+    .seh_unwindversion 3
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_CHAIN_ELSE
+    movl    %eax, %ecx
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    jmp     c
+    .seh_splitchained
+    .seh_endprologue
+.L_CHAIN_ELSE:
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    jmp     b
+    .seh_endproc
+// First fragment: the main function with one epilog.
+// CHECK-LABEL:  StartAddress: chained
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog: 0x4
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          ALLOC_SMALL Size=0x28
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_SMALL Size=0x28
+// Second fragment: chained, with one epilog.
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x4)
+// CHECK-NEXT:       ChainInfo (0x4)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog: 0
+// CHECK:          NumberOfOps: 0
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_SMALL Size=0x28
+// CHECK:          Chained {
+
+// --- Test 11: huge alloc (>= 4GB, uses ALLOC_HUGE) ---
+huge_alloc:
+    .seh_proc huge_alloc
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 524288
+    subq    $524288, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 524288
+    addq    $524288, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: huge_alloc
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:          ALLOC_HUGE Size=0x80000
+// CHECK:          PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_HUGE Size=0x80000
+// CHECK:            PUSH Reg=RBX
+
+// --- Test 12: handler + chaining combined ---
+handler_and_chain:
+    .seh_proc handler_and_chain
+    .seh_handler __C_specific_handler, @unwind, @except
+    .seh_unwindversion 3
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_HC_ELSE
+.handler_chain_callsite:
+    callq   a
+    nop
+.handler_chain_finish:
+    movl    %eax, %ecx
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    jmp     c
+    .seh_splitchained
+    .seh_endprologue
+.L_HC_ELSE:
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    jmp     b
+.handler_chain_handler:
+    jmp     .handler_chain_finish
+    .seh_handlerdata
+    .long   1
+    .long   .handler_chain_callsite at IMGREL
+    .long   .handler_chain_finish at IMGREL
+    .long   1
+    .long   .handler_chain_handler at IMGREL
+    .text
+    .seh_endproc
+// Main fragment has handler.
+// CHECK-LABEL:  StartAddress: handler_and_chain
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [
+// CHECK:            ExceptionHandler
+// CHECK:          ]
+// CHECK:          NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Handler: __C_specific_handler
+// Chained fragment.
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x4)
+// CHECK-NEXT:       ChainInfo (0x4)
+// CHECK-NEXT:     ]
+
+// --- Test 13: no epilog (no-return function, empty payload) ---
+no_epilog:
+    .seh_proc no_epilog
+    .seh_unwindversion 3
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    callq   a
+    int3
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: no_epilog
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 0
+// CHECK:          Prolog [1 ops]:
+// CHECK:          ALLOC_SMALL Size=0x28
+
+// --- Test 14: file-scope default applies to function without per-function directive ---
+.seh_unwindversion 3
+file_scope_default:
+    .seh_proc file_scope_default
+    .seh_stackalloc 40
+    subq    $40, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 40
+    addq    $40, %rsp
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: file_scope_default
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 1
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          ALLOC_SMALL Size=0x28
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_SMALL Size=0x28
+
+// --- Test 15: three epilogs ? first is full, 2nd and 3rd inherit ---
+three_epilogs:
+    .seh_proc three_epilogs
+    .seh_unwindversion 3
+    .seh_pushreg %rbx
+    pushq   %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   c
+    testl   %eax, %eax
+    jle     .L_3E_ELSE
+    cmpl    $10, %eax
+    jge     .L_3E_LARGE
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+.L_3E_LARGE:
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+.L_3E_ELSE:
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %rbx
+    popq    %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: three_epilogs
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 3
+// CHECK:          Prolog [2 ops]:
+// CHECK:          ALLOC_SMALL Size=0x20
+// CHECK:          PUSH Reg=RBX
+// CHECK:          Epilog [0] {
+// CHECK:            ALLOC_SMALL Size=0x20
+// CHECK:            PUSH Reg=RBX
+// CHECK:          Epilog [1] {
+// CHECK:            (inherits
+// CHECK:          Epilog [2] {
+// CHECK:            (inherits
+
+// --- Test 16: push2regs with non-consecutive registers -> WOD_PUSH2 ---
+push2_nonconsecutive:
+    .seh_proc push2_nonconsecutive
+    .seh_push2regs %rbx, %rdi
+    push2   %rdi, %rbx
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_push2regs %rbx, %rdi
+    pop2    %rdi, %rbx
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: push2_nonconsecutive
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:          [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:          [1] IP +0x{{[0-9A-F]+}}: PUSH2 Reg1=RBX, Reg2=RDI
+// CHECK:          Epilog [0] {
+// CHECK:            [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:            [1] IP +0x{{[0-9A-F]+}}: PUSH2 Reg1=RBX, Reg2=RDI
+
+// --- Test 17: push2regs with consecutive registers -> WOD_PUSH_CONSECUTIVE_2 ---
+push2_consecutive:
+    .seh_proc push2_consecutive
+    .seh_push2regs %r12, %r13
+    push2   %r13, %r12
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_push2regs %r12, %r13
+    pop2    %r13, %r12
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: push2_consecutive
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:          [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:          [1] IP +0x{{[0-9A-F]+}}: PUSH_CONSECUTIVE_2 Reg=R12 (+R13)
+// CHECK:          Epilog [0] {
+// CHECK:            [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:            [1] IP +0x{{[0-9A-F]+}}: PUSH_CONSECUTIVE_2 Reg=R12 (+R13)
+
+// --- Test 18: EGPR push (register > 15) uses 5-bit encoding ---
+egpr_push:
+    .seh_proc egpr_push
+    .seh_pushreg %r16
+    pushq   %r16
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_pushreg %r16
+    popq    %r16
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: egpr_push
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK-NEXT:     Flags [ (0x0)
+// CHECK-NEXT:     ]
+// CHECK-NEXT:     SizeOfProlog:
+// CHECK-NEXT:     PayloadWords:
+// CHECK-NEXT:     NumberOfOps: 2
+// CHECK-NEXT:     NumberOfEpilogs: 1
+// CHECK:          Prolog [2 ops]:
+// CHECK:          [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:          [1] IP +0x{{[0-9A-F]+}}: PUSH Reg=R16
+// CHECK:          Epilog [0] {
+// CHECK:            [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:            [1] IP +0x{{[0-9A-F]+}}: PUSH Reg=R16
+
+// --- Test 19: EGPR push2regs with non-consecutive EGPRs ---
+egpr_push2_nonconsecutive:
+    .seh_proc egpr_push2_nonconsecutive
+    .seh_push2regs %r16, %r20
+    push2   %r20, %r16
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_push2regs %r16, %r20
+    pop2    %r20, %r16
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: egpr_push2_nonconsecutive
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK:          NumberOfOps: 2
+// CHECK:          Prolog [2 ops]:
+// CHECK:          [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:          [1] IP +0x{{[0-9A-F]+}}: PUSH2 Reg1=R16, Reg2=R20
+// CHECK:          Epilog [0] {
+// CHECK:            [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:            [1] IP +0x{{[0-9A-F]+}}: PUSH2 Reg1=R16, Reg2=R20
+
+// --- Test 20: EGPR push2regs with consecutive EGPRs -> PUSH_CONSECUTIVE_2 ---
+egpr_push2_consecutive:
+    .seh_proc egpr_push2_consecutive
+    .seh_push2regs %r16, %r17
+    push2   %r17, %r16
+    .seh_stackalloc 32
+    subq    $32, %rsp
+    .seh_endprologue
+    callq   a
+    nop
+    .seh_startepilogue
+    .seh_stackalloc 32
+    addq    $32, %rsp
+    .seh_push2regs %r16, %r17
+    pop2    %r17, %r16
+    .seh_endepilogue
+    retq
+    .seh_endproc
+// CHECK-LABEL:  StartAddress: egpr_push2_consecutive
+// CHECK:        UnwindInfo {
+// CHECK-NEXT:     Version: 3
+// CHECK:          NumberOfOps: 2
+// CHECK:          Prolog [2 ops]:
+// CHECK:          [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:          [1] IP +0x{{[0-9A-F]+}}: PUSH_CONSECUTIVE_2 Reg=R16 (+R17)
+// CHECK:          Epilog [0] {
+// CHECK:            [0] IP +0x{{[0-9A-F]+}}: ALLOC_SMALL Size=0x20
+// CHECK:            [1] IP +0x{{[0-9A-F]+}}: PUSH_CONSECUTIVE_2 Reg=R16 (+R17)

diff  --git a/llvm/unittests/MC/CMakeLists.txt b/llvm/unittests/MC/CMakeLists.txt
index 4881888f03742..4958396c731b8 100644
--- a/llvm/unittests/MC/CMakeLists.txt
+++ b/llvm/unittests/MC/CMakeLists.txt
@@ -23,5 +23,6 @@ add_llvm_unittest(MCTests
   StringTableBuilderTest.cpp
   TargetRegistry.cpp
   MCDisassemblerTest.cpp
+  WODRoundTripTest.cpp
   )
 

diff  --git a/llvm/unittests/MC/WODRoundTripTest.cpp b/llvm/unittests/MC/WODRoundTripTest.cpp
new file mode 100644
index 0000000000000..3492aa37c8281
--- /dev/null
+++ b/llvm/unittests/MC/WODRoundTripTest.cpp
@@ -0,0 +1,206 @@
+//===- WODRoundTripTest.cpp - V3 WOD encode/decode round-trip tests -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/MC/MCWin64EH.h"
+#include "llvm/MC/MCWinEH.h"
+#include "llvm/Support/Win64EH.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace llvm::Win64EH;
+
+/// Helper: encode a WinEH::Instruction via EncodeWOD, then decode the resulting
+/// bytes via decodeWOD, returning the DecodedWOD. Fails the test on error.
+static DecodedWOD roundTrip(const WinEH::Instruction &Inst) {
+  SmallVector<uint8_t, 8> Pool;
+  Win64EH::EncodeWOD(Inst, Pool);
+
+  auto Result = Win64EH::decodeWOD(Pool, 0);
+  EXPECT_TRUE(!!Result) << "decodeWOD failed: " << toString(Result.takeError());
+  EXPECT_EQ(Result->ByteSize, Pool.size())
+      << "Decoded byte size doesn't match encoded pool size";
+  return *Result;
+}
+
+TEST(WODRoundTrip, PushNonVol) {
+  // push rbx (reg 3)
+  WinEH::Instruction Inst(UOP_PushNonVol, nullptr, 3, 0);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH);
+  EXPECT_EQ(W.Register, 3u);
+  EXPECT_EQ(W.ByteSize, 1u);
+}
+
+TEST(WODRoundTrip, PushNonVolHighReg) {
+  // push r15 (reg 15)
+  WinEH::Instruction Inst(UOP_PushNonVol, nullptr, 15, 0);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH);
+  EXPECT_EQ(W.Register, 15u);
+}
+
+TEST(WODRoundTrip, AllocSmall) {
+  // sub rsp, 40 (smallest alloc: 8..128 in steps of 8)
+  WinEH::Instruction Inst(UOP_AllocSmall, nullptr, 0, 40);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_ALLOC_SMALL);
+  EXPECT_EQ(W.Size, 40u);
+  EXPECT_EQ(W.ByteSize, 1u);
+}
+
+TEST(WODRoundTrip, AllocSmallMin) {
+  // sub rsp, 8
+  WinEH::Instruction Inst(UOP_AllocSmall, nullptr, 0, 8);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_ALLOC_SMALL);
+  EXPECT_EQ(W.Size, 8u);
+}
+
+TEST(WODRoundTrip, AllocSmallMax) {
+  // sub rsp, 128 (max for ALLOC_SMALL: (15+1)*8 = 128)
+  WinEH::Instruction Inst(UOP_AllocSmall, nullptr, 0, 128);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_ALLOC_SMALL);
+  EXPECT_EQ(W.Size, 128u);
+}
+
+TEST(WODRoundTrip, AllocLarge) {
+  // sub rsp, 4096 (fits in ALLOC_LARGE: 3 bytes, size/8)
+  WinEH::Instruction Inst(UOP_AllocLarge, nullptr, 0, 4096);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_ALLOC_LARGE);
+  EXPECT_EQ(W.Size, 4096u);
+  EXPECT_EQ(W.ByteSize, 3u);
+}
+
+TEST(WODRoundTrip, AllocHuge) {
+  // sub rsp, 524288 (>= 512*1024-8, uses ALLOC_HUGE: 5 bytes)
+  WinEH::Instruction Inst(UOP_AllocLarge, nullptr, 0, 524288);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_ALLOC_HUGE);
+  EXPECT_EQ(W.Size, 524288u);
+  EXPECT_EQ(W.ByteSize, 5u);
+}
+
+TEST(WODRoundTrip, SetFPReg) {
+  // lea rbp, [rsp+32]  =>  SetFPReg reg=5(RBP), offset=32
+  WinEH::Instruction Inst(UOP_SetFPReg, nullptr, 5, 32);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_SET_FPREG);
+  EXPECT_EQ(W.Register, 5u);
+  EXPECT_EQ(W.Displacement, 32u);
+  EXPECT_EQ(W.ByteSize, 2u);
+}
+
+TEST(WODRoundTrip, SaveNonVol) {
+  // mov [rsp+40], rbx  =>  SaveNonVol reg=3(RBX), disp=40
+  WinEH::Instruction Inst(UOP_SaveNonVol, nullptr, 3, 40);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_SAVE_NONVOL);
+  EXPECT_EQ(W.Register, 3u);
+  EXPECT_EQ(W.Displacement, 40u);
+  EXPECT_EQ(W.ByteSize, 3u);
+}
+
+TEST(WODRoundTrip, SaveNonVolBig) {
+  // mov [rsp+0x90000], rbx  =>  SaveNonVolBig reg=3, disp=0x90000
+  WinEH::Instruction Inst(UOP_SaveNonVolBig, nullptr, 3, 0x90000);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_SAVE_NONVOL_FAR);
+  EXPECT_EQ(W.Register, 3u);
+  EXPECT_EQ(W.Displacement, 0x90000u);
+  EXPECT_EQ(W.ByteSize, 5u);
+}
+
+TEST(WODRoundTrip, SaveXMM128) {
+  // movaps [rsp+32], xmm6  =>  SaveXMM128 reg=6, disp=32
+  WinEH::Instruction Inst(UOP_SaveXMM128, nullptr, 6, 32);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_SAVE_XMM128);
+  EXPECT_EQ(W.Register, 6u);
+  EXPECT_EQ(W.Displacement, 32u);
+  EXPECT_EQ(W.ByteSize, 3u);
+}
+
+TEST(WODRoundTrip, SaveXMM128Big) {
+  // movaps [rsp+0x90000], xmm6  =>  SaveXMM128Big reg=6, disp=0x90000
+  WinEH::Instruction Inst(UOP_SaveXMM128Big, nullptr, 6, 0x90000);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_SAVE_XMM128_FAR);
+  EXPECT_EQ(W.Register, 6u);
+  EXPECT_EQ(W.Displacement, 0x90000u);
+  EXPECT_EQ(W.ByteSize, 5u);
+}
+
+TEST(WODRoundTrip, PushMachFrameCode) {
+  // .seh_pushframe @code  =>  PushMachFrame offset=1
+  WinEH::Instruction Inst(UOP_PushMachFrame, nullptr, 0, 1);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH_CANONICAL_FRAME);
+  EXPECT_EQ(W.Type, 1u);
+  EXPECT_EQ(W.ByteSize, 2u);
+}
+
+TEST(WODRoundTrip, PushMachFrameNoCode) {
+  // .seh_pushframe  =>  PushMachFrame offset=0
+  WinEH::Instruction Inst(UOP_PushMachFrame, nullptr, 0, 0);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH_CANONICAL_FRAME);
+  EXPECT_EQ(W.Type, 0u);
+  EXPECT_EQ(W.ByteSize, 2u);
+}
+
+TEST(WODRoundTrip, MultipleOpsInPool) {
+  // Encode push rbx + sub rsp, 32 into one pool, then decode both.
+  SmallVector<uint8_t, 8> Pool;
+  WinEH::Instruction Push(UOP_PushNonVol, nullptr, 3, 0);
+  WinEH::Instruction Alloc(UOP_AllocSmall, nullptr, 0, 32);
+  Win64EH::EncodeWOD(Push, Pool);
+  Win64EH::EncodeWOD(Alloc, Pool);
+  EXPECT_EQ(Pool.size(), 2u); // 1 byte each
+
+  auto W0 = Win64EH::decodeWOD(Pool, 0);
+  ASSERT_TRUE(!!W0);
+  EXPECT_EQ(W0->Opcode, WOD_PUSH);
+  EXPECT_EQ(W0->Register, 3u);
+
+  auto W1 = Win64EH::decodeWOD(Pool, W0->ByteSize);
+  ASSERT_TRUE(!!W1);
+  EXPECT_EQ(W1->Opcode, WOD_ALLOC_SMALL);
+  EXPECT_EQ(W1->Size, 32u);
+}
+
+TEST(WODRoundTrip, Push2NonConsecutive) {
+  // push2 rbx, rdi (regs 3, 7 - non-consecutive) => WOD_PUSH2 (2 bytes)
+  WinEH::Instruction Inst(UOP_Push2, nullptr, 3, 7, 0);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH2);
+  EXPECT_EQ(W.Register, 3u);
+  EXPECT_EQ(W.Register2, 7u);
+  EXPECT_EQ(W.ByteSize, 2u);
+}
+
+TEST(WODRoundTrip, Push2Consecutive) {
+  // push2 r12, r13 (regs 12, 13 - consecutive) => WOD_PUSH_CONSECUTIVE_2
+  // (1 byte)
+  WinEH::Instruction Inst(UOP_Push2, nullptr, 12, 13, 0);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH_CONSECUTIVE_2);
+  EXPECT_EQ(W.Register, 12u);
+  EXPECT_EQ(W.ByteSize, 1u);
+}
+
+TEST(WODRoundTrip, Push2HighRegs) {
+  // push2 r14, r8 (regs 14, 8 - non-consecutive, high regs)
+  WinEH::Instruction Inst(UOP_Push2, nullptr, 14, 8, 0);
+  auto W = roundTrip(Inst);
+  EXPECT_EQ(W.Opcode, WOD_PUSH2);
+  EXPECT_EQ(W.Register, 14u);
+  EXPECT_EQ(W.Register2, 8u);
+  EXPECT_EQ(W.ByteSize, 2u);
+}


        


More information about the cfe-commits mailing list