[Lldb-commits] [clang] [flang] [llvm] [libc] [openmp] [lldb] [mlir] [libcxx] Add security group 2023 transparency report. (PR #80272)

Thu Feb 1 10:14:38 PST 2024

https://github.com/smithp35 updated https://github.com/llvm/llvm-project/pull/80272

>From d755f870e53c08c009dcdc9c05f3896e40f432f5 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.smith at arm.com>
Date: Wed, 17 Jan 2024 18:13:04 +0000
Subject: [PATCH 01/42] Add security group 2023 transparency report.

---
 llvm/docs/SecurityTransparencyReports.rst | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/llvm/docs/SecurityTransparencyReports.rst b/llvm/docs/SecurityTransparencyReports.rst
index a857e676880f8..c8e3cd45c98ef 100644
--- a/llvm/docs/SecurityTransparencyReports.rst
+++ b/llvm/docs/SecurityTransparencyReports.rst
@@ -76,3 +76,42 @@ the time of writing this transparency report.
 
 No dedicated LLVM releases were made for any of the above issues.
 
+2023
+----
+
+In this section we report on the issues the group received in 2023, or on issues
+that were received earlier, but were disclosed in 2023.
+
+9 of these were judged to be security issues:
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=36 reports the presence of
+.git folder in https://llvm.org/.git.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=66 reports the presence of
+a GitHub Personal Access token in a DockerHub imaage.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=42 reports a potential gap
+in the Armv8.1-m BTI protection, involving a combination of large switch statements
+and __builtin_unreachable() in the default case.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=43 reports a dependency
+on an old version of xml2js with a CVE filed against it.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=45 reports a number of
+dependencies that have had vulnerabilities reported against them.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=46 is related to issue 43
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=48 reports a buffer overflow
+in std::format from -fexperimental-library.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=54 reports a memory leak in
+basic_string move assignment when built with libc++ versions <=6.0 and run against
+newer libc++ shared/dylibs.
+
+https://bugs.chromium.org/p/llvm/issues/detail?id=56 reports a out of bounds buffer
+store introduced by LLVM backends, that regressed due to a procedural oversight.
+
+No dedicated LLVM releases were made for any of the above issues.
+
+Over the course of 2023 we had one person join the LLVM Security Group.

>From 64fbd7b858a6cec9df5bdb91577008a459cc125f Mon Sep 17 00:00:00 2001
From: David Spickett <david.spickett at linaro.org>
Date: Thu, 1 Feb 2024 10:43:34 +0000
Subject: [PATCH 02/42] [GitHub][workflows] Reflow some text in buildbot info
 PR comment

When the markdown link renders the line gets a lot shorter.
---
 llvm/utils/git/github-automation.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/utils/git/github-automation.py b/llvm/utils/git/github-automation.py
index f9d48ae5fb477..04698cacbff92 100755
--- a/llvm/utils/git/github-automation.py
+++ b/llvm/utils/git/github-automation.py
@@ -280,8 +280,7 @@ def run(self) -> bool:
 @{self.author} Congratulations on having your first Pull Request (PR) merged into the LLVM Project!
 
 Your changes will be combined with recent changes from other authors, then tested
-by our [build bots](https://lab.llvm.org/buildbot/). If there is a problem with a build,
-you may recieve a report in an email or a comment on this PR.
+by our [build bots](https://lab.llvm.org/buildbot/). If there is a problem with a build, you may recieve a report in an email or a comment on this PR.
 
 Please check whether problems have been caused by your change specifically, as
 the builds can include changes from many authors. It is not uncommon for your

>From 8afe4766da7ecfce212964182a68142149a82829 Mon Sep 17 00:00:00 2001
From: Jay Foad <jay.foad at amd.com>
Date: Thu, 1 Feb 2024 10:49:42 +0000
Subject: [PATCH 03/42] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins
 (#79980)

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  | 34 +++++++++----------
 .../builtins-amdgcn-wmma-w32-gfx10-err.cl     | 16 ++++-----
 .../builtins-amdgcn-wmma-w64-gfx10-err.cl     | 18 +++++-----
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl |  2 +-
 4 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 74dfd1d214e84..e9dd8dcd0b60e 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -292,23 +292,23 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wait_event_export_ready, "v", "n", "gfx11-inst
 // Postfix w32 indicates the builtin requires wavefront size of 32.
 // Postfix w64 indicates the builtin requires wavefront size of 64.
 //===----------------------------------------------------------------------===//
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, "V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, "V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts")
-
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, "V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts")
-TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, "V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, "V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, "V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, "V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, "V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts,wavefrontsize64")
 
 TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtn, "UiUIi", "n", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtnl, "UWiUIi", "n", "gfx11-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl
index 49cb797df4233..a1a56f0d8417d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl
@@ -21,14 +21,14 @@ void test_amdgcn_wmma_f32_16x16x16_bf16_w32(global v8f* out8f, v16s a16s, v16s b
                                             global v16s* out16s, v2i a2i, v2i b2i, v16s c16s,
                                             global v8i* out8i, v4i a4i, v4i b4i, v8i c8i)
 {
- *out8f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w32(a16h, b16h, c8f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w32' needs target feature gfx11-insts}}
- *out8f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w32(a16s, b16s, c8f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32' needs target feature gfx11-insts}}
- *out16h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w32(a16h, b16h, c16h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w32' needs target feature gfx11-insts}}
- *out16s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32(a16s, b16s, c16s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32' needs target feature gfx11-insts}}
- *out16h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32(a16h, b16h, c16h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32' needs target feature gfx11-insts}}
- *out16s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32(a16s, b16s, c16s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32' needs target feature gfx11-insts}}
- *out8i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w32(true, a4i, true, b4i, c8i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32' needs target feature gfx11-insts}}
- *out8i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w32(true, a2i, true, b2i, c8i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32' needs target feature gfx11-insts}}
+ *out8f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w32(a16h, b16h, c8f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out8f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w32(a16s, b16s, c8f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out16h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w32(a16h, b16h, c16h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out16s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32(a16s, b16s, c16s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out16h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32(a16h, b16h, c16h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out16s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32(a16s, b16s, c16s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out8i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w32(true, a4i, true, b4i, c8i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32' needs target feature gfx11-insts,wavefrontsize32}}
+ *out8i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w32(true, a2i, true, b2i, c8i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32' needs target feature gfx11-insts,wavefrontsize32}}
 }
 
 #endif
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl
index d5d9d973eb300..d995b1dc46be7 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl
@@ -21,14 +21,14 @@ void test_amdgcn_wmma_f32_16x16x16_bf16_w64(global v4f* out4f, v16h a16h, v16h b
                                             global v8s* out8s, v4i a4i, v4i b4i, v8s c8s,
                                             global v4i* out4i, v2i a2i, v2i b2i, v4i c4i)
 {
- *out4f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w64(a16h, b16h, c4f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w64' needs target feature gfx11-insts}}
- *out4f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64(a16s, b16s, c4f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64' needs target feature gfx11-insts}}
- *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w64' needs target feature gfx11-insts}}
- *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64' needs target feature gfx11-insts}}
- *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64' needs target feature gfx11-insts}}
- *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64' needs target feature gfx11-insts}}
- *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64(true, a4i, true, b4i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64' needs target feature gfx11-insts}}
- *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64(true, a2i, true, b2i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64' needs target feature gfx11-insts}}
+ *out4f = __builtin_amdgcn_wmma_f32_16x16x16_f16_w64(a16h, b16h, c4f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_f16_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out4f = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w64(a16s, b16s, c4f);  // expected-error{{'__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out8h = __builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64(a16h, b16h, c8h, true); // expected-error{{'__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out8s = __builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64(a16s, b16s, c8s, true); // expected-error{{'__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu8_w64(true, a4i, true, b4i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64' needs target feature gfx11-insts,wavefrontsize64}}
+ *out4i = __builtin_amdgcn_wmma_i32_16x16x16_iu4_w64(true, a2i, true, b2i, c4i, false); // expected-error{{'__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64' needs target feature gfx11-insts,wavefrontsize64}}
 }
 
-#endif
\ No newline at end of file
+#endif
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl
index 1490f14fd17b6..af0d4ce371080 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl
@@ -1,6 +1,6 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // REQUIRES: amdgpu-registered-target
-// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -DWMMA_GFX1100_TESTS -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-GFX1100
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -target-feature +wavefrontsize64 -DWMMA_GFX1100_TESTS -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-GFX1100
 
 typedef float  v4f   __attribute__((ext_vector_type(4)));
 typedef half   v8h   __attribute__((ext_vector_type(8)));

>From 00cfbf59dec46711ff176f3faa49dab5021c75e0 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Thu, 1 Feb 2024 11:01:29 +0000
Subject: [PATCH 04/42] [SCEVExp] Keep NUW/NSW if both original inc and
 isomporphic inc agree. (#79512)

We are replacing with a wider increment. If both OrigInc and
IsomorphicInc are NUW/NSW, then we can preserve them on the wider
increment; the narrower IsomorphicInc would wrap before the wider
OrigInc, so the replacement won't make IsomorphicInc's uses more
poisonous.

PR: https://github.com/llvm/llvm-project/pull/79512
---
 .../Utils/ScalarEvolutionExpander.cpp         | 29 +++++++++++++++++--
 .../Transforms/IndVarSimplify/iv-poison.ll    | 12 ++++----
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
index 2cff57facbf78..ed55a13072aaa 100644
--- a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
+++ b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
@@ -1604,11 +1604,36 @@ void SCEVExpander::replaceCongruentIVInc(
   const SCEV *TruncExpr =
       SE.getTruncateOrNoop(SE.getSCEV(OrigInc), IsomorphicInc->getType());
   if (OrigInc == IsomorphicInc || TruncExpr != SE.getSCEV(IsomorphicInc) ||
-      !SE.LI.replacementPreservesLCSSAForm(IsomorphicInc, OrigInc) ||
-      !hoistIVInc(OrigInc, IsomorphicInc,
+      !SE.LI.replacementPreservesLCSSAForm(IsomorphicInc, OrigInc))
+    return;
+
+  bool BothHaveNUW = false;
+  bool BothHaveNSW = false;
+  auto *OBOIncV = dyn_cast<OverflowingBinaryOperator>(OrigInc);
+  auto *OBOIsomorphic = dyn_cast<OverflowingBinaryOperator>(IsomorphicInc);
+  if (OBOIncV && OBOIsomorphic) {
+    BothHaveNUW =
+        OBOIncV->hasNoUnsignedWrap() && OBOIsomorphic->hasNoUnsignedWrap();
+    BothHaveNSW =
+        OBOIncV->hasNoSignedWrap() && OBOIsomorphic->hasNoSignedWrap();
+  }
+
+  if (!hoistIVInc(OrigInc, IsomorphicInc,
                   /*RecomputePoisonFlags*/ true))
     return;
 
+  // We are replacing with a wider increment. If both OrigInc and IsomorphicInc
+  // are NUW/NSW, then we can preserve them on the wider increment; the narrower
+  // IsomorphicInc would wrap before the wider OrigInc, so the replacement won't
+  // make IsomorphicInc's uses more poisonous.
+  assert(OrigInc->getType()->getScalarSizeInBits() >=
+             IsomorphicInc->getType()->getScalarSizeInBits() &&
+         "Should only replace an increment with a wider one.");
+  if (BothHaveNUW || BothHaveNSW) {
+    OrigInc->setHasNoUnsignedWrap(OBOIncV->hasNoUnsignedWrap() || BothHaveNUW);
+    OrigInc->setHasNoSignedWrap(OBOIncV->hasNoSignedWrap() || BothHaveNSW);
+  }
+
   SCEV_DEBUG_WITH_TYPE(DebugType,
                        dbgs() << "INDVARS: Eliminated congruent iv.inc: "
                               << *IsomorphicInc << '\n');
diff --git a/llvm/test/Transforms/IndVarSimplify/iv-poison.ll b/llvm/test/Transforms/IndVarSimplify/iv-poison.ll
index 38299e0a6b353..383599f614357 100644
--- a/llvm/test/Transforms/IndVarSimplify/iv-poison.ll
+++ b/llvm/test/Transforms/IndVarSimplify/iv-poison.ll
@@ -64,7 +64,7 @@ define i2 @iv_hoist_both_adds_nsw(i2 %arg) {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV_0:%.*]] = phi i2 [ 1, [[BB:%.*]] ], [ [[IV_0_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw i2 [[IV_0]], 1
+; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw nsw i2 [[IV_0]], 1
 ; CHECK-NEXT:    [[DOTNOT_NOT:%.*]] = icmp ult i2 1, [[ARG:%.*]]
 ; CHECK-NEXT:    br i1 [[DOTNOT_NOT]], label [[EXIT:%.*]], label [[LOOP]]
 ; CHECK:       exit:
@@ -92,7 +92,7 @@ define i4 @iv_hoist_both_adds_nsw_extra_use(i4 %arg) {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV_0:%.*]] = phi i4 [ 1, [[BB:%.*]] ], [ [[IV_0_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw i4 [[IV_0]], 1
+; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw nsw i4 [[IV_0]], 1
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    [[DOTNOT_NOT:%.*]] = icmp ult i4 1, [[ARG:%.*]]
@@ -124,7 +124,7 @@ define i4 @iv_hoist_both_adds_nsw_extra_use_incs_reordered(i4 %arg) {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV_0:%.*]] = phi i4 [ 1, [[BB:%.*]] ], [ [[IV_0_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw i4 [[IV_0]], 1
+; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw nsw i4 [[IV_0]], 1
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    [[DOTNOT_NOT:%.*]] = icmp ult i4 1, [[ARG:%.*]]
@@ -244,7 +244,7 @@ define i2 @iv_hoist_both_adds_nuw(i2 %arg, i2 %start) {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV_0:%.*]] = phi i2 [ [[START:%.*]], [[BB:%.*]] ], [ [[IV_0_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[IV_0_NEXT]] = add i2 [[IV_0]], 1
+; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw i2 [[IV_0]], 1
 ; CHECK-NEXT:    [[DOTNOT_NOT:%.*]] = icmp ult i2 [[START]], [[ARG:%.*]]
 ; CHECK-NEXT:    br i1 [[DOTNOT_NOT]], label [[EXIT:%.*]], label [[LOOP]]
 ; CHECK:       exit:
@@ -272,7 +272,7 @@ define i4 @iv_hoist_both_adds_nuw_extra_use(i4 %arg, i4 %start) {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV_0:%.*]] = phi i4 [ [[START:%.*]], [[BB:%.*]] ], [ [[IV_0_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[IV_0_NEXT]] = add i4 [[IV_0]], 1
+; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw i4 [[IV_0]], 1
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    [[DOTNOT_NOT:%.*]] = icmp ult i4 [[START]], [[ARG:%.*]]
@@ -304,7 +304,7 @@ define i4 @iv_hoist_both_adds_nuw_extra_use_incs_reordered(i4 %arg, i4 %start) {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV_0:%.*]] = phi i4 [ [[START:%.*]], [[BB:%.*]] ], [ [[IV_0_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[IV_0_NEXT]] = add i4 [[IV_0]], 1
+; CHECK-NEXT:    [[IV_0_NEXT]] = add nuw i4 [[IV_0]], 1
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    call void @use(i4 [[IV_0_NEXT]])
 ; CHECK-NEXT:    [[DOTNOT_NOT:%.*]] = icmp ult i4 [[START]], [[ARG:%.*]]

>From 2e8dd80ccb9b1550041bed0ae7de19d15f2c09bf Mon Sep 17 00:00:00 2001
From: Hristo Hristov <hghristov.rmm at gmail.com>
Date: Thu, 1 Feb 2024 13:31:25 +0200
Subject: [PATCH 05/42] [libc++][memory] P2652R2: Disallow Specialization of
 `allocator_traits` (#79978)

Implements P2652R2 <https://wg21.link/P2652R2>:
- https://eel.is/c++draft/allocator.requirements.general
- https://eel.is/c++draft/memory.syn
- https://eel.is/c++draft/allocator.traits.general
- https://eel.is/c++draft/allocator.traits.members
- https://eel.is/c++draft/diff.cpp20.concepts
- https://eel.is/c++draft/diff.cpp20.utilities

---------

Co-authored-by: Zingam <zingam at outlook.com>
---
 libcxx/docs/FeatureTestMacroTable.rst         |  2 +-
 libcxx/docs/ReleaseNotes/19.rst               |  2 ++
 libcxx/docs/Status/Cxx23Papers.csv            |  2 +-
 libcxx/docs/Status/Cxx2cIssues.csv            |  2 +-
 libcxx/include/__memory/allocate_at_least.h   | 20 +++-------------
 libcxx/include/__memory/allocator_traits.h    | 24 +++++++++++++++++++
 libcxx/include/memory                         | 13 +++++-----
 libcxx/include/version                        |  4 ++--
 libcxx/modules/std/memory.inc                 |  2 --
 .../memory.version.compile.pass.cpp           | 10 ++++----
 .../version.version.compile.pass.cpp          | 10 ++++----
 .../allocate_at_least.pass.cpp                | 10 ++++----
 .../generate_feature_test_macro_components.py |  3 +--
 13 files changed, 56 insertions(+), 48 deletions(-)
 rename libcxx/test/std/utilities/memory/allocator.traits/{ => allocator.traits.members}/allocate_at_least.pass.cpp (91%)

diff --git a/libcxx/docs/FeatureTestMacroTable.rst b/libcxx/docs/FeatureTestMacroTable.rst
index d0d057e6bbaf0..a5c6fa22cec06 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -304,7 +304,7 @@ Status
     ---------------------------------------------------------------------
     ``__cpp_lib_adaptor_iterator_pair_constructor``     ``202106L``
     --------------------------------------------------- -----------------
-    ``__cpp_lib_allocate_at_least``                     ``202106L``
+    ``__cpp_lib_allocate_at_least``                     ``202302L``
     --------------------------------------------------- -----------------
     ``__cpp_lib_associative_heterogeneous_erasure``     *unimplemented*
     --------------------------------------------------- -----------------
diff --git a/libcxx/docs/ReleaseNotes/19.rst b/libcxx/docs/ReleaseNotes/19.rst
index e96abc72c1648..db731de2e4399 100644
--- a/libcxx/docs/ReleaseNotes/19.rst
+++ b/libcxx/docs/ReleaseNotes/19.rst
@@ -37,7 +37,9 @@ What's New in Libc++ 19.0.0?
 
 Implemented Papers
 ------------------
+
 - P2637R3 - Member ``visit``
+- P2652R2 - Disallow User Specialization of ``allocator_traits``
 
 
 Improvements and New Features
diff --git a/libcxx/docs/Status/Cxx23Papers.csv b/libcxx/docs/Status/Cxx23Papers.csv
index ebab3ef735b61..eb415ed8c031f 100644
--- a/libcxx/docs/Status/Cxx23Papers.csv
+++ b/libcxx/docs/Status/Cxx23Papers.csv
@@ -115,7 +115,7 @@
 "`P2679R2 <https://wg21.link/P2679R2>`__","LWG", "Fixing ``std::start_lifetime_as`` for arrays","February 2023","","",""
 "`P2674R1 <https://wg21.link/P2674R1>`__","LWG", "A trait for implicit lifetime types","February 2023","","",""
 "`P2655R3 <https://wg21.link/P2655R3>`__","LWG", "``common_reference_t`` of ``reference_wrapper`` Should Be a Reference Type","February 2023","","",""
-"`P2652R2 <https://wg21.link/P2652R2>`__","LWG", "Disallow User Specialization of ``allocator_traits``","February 2023","","",""
+"`P2652R2 <https://wg21.link/P2652R2>`__","LWG", "Disallow User Specialization of ``allocator_traits``","February 2023","|Complete|","19.0",""
 "`P2787R1 <https://wg21.link/P2787R1>`__","LWG", "``pmr::generator`` - Promise Types are not Values","February 2023","","",""
 "`P2614R2 <https://wg21.link/P2614R2>`__","LWG", "Deprecate ``numeric_limits::has_denorm``","February 2023","|Complete|","18.0",""
 "`P2588R3 <https://wg21.link/P2588R3>`__","LWG", "``barrier``’s phase completion guarantees","February 2023","","",""
diff --git a/libcxx/docs/Status/Cxx2cIssues.csv b/libcxx/docs/Status/Cxx2cIssues.csv
index b69b094832541..58e995809777c 100644
--- a/libcxx/docs/Status/Cxx2cIssues.csv
+++ b/libcxx/docs/Status/Cxx2cIssues.csv
@@ -2,7 +2,7 @@
 "`2994 <https://wg21.link/LWG2994>`__","Needless UB for ``basic_string`` and ``basic_string_view``","Varna June 2023","|Complete|","5.0",""
 "`3884 <https://wg21.link/LWG3884>`__","``flat_foo`` is missing allocator-extended copy/move constructors","Varna June 2023","","","|flat_containers|"
 "`3885 <https://wg21.link/LWG3885>`__","``op`` should be in [zombie.names]","Varna June 2023","|Nothing To Do|","",""
-"`3887 <https://wg21.link/LWG3887>`__","Version macro for ``allocate_at_least``","Varna June 2023","","",""
+"`3887 <https://wg21.link/LWG3887>`__","Version macro for ``allocate_at_least``","Varna June 2023","|Complete|","19.0",""
 "`3893 <https://wg21.link/LWG3893>`__","LWG 3661 broke ``atomic<shared_ptr<T>> a; a = nullptr;``","Varna June 2023","","",""
 "`3894 <https://wg21.link/LWG3894>`__","``generator::promise_type::yield_value(ranges::elements_of<Rng, Alloc>)`` should not be ``noexcept``","Varna June 2023","","",""
 "`3903 <https://wg21.link/LWG3903>`__","span destructor is redundantly noexcept","Varna June 2023","|Complete|","7.0",""
diff --git a/libcxx/include/__memory/allocate_at_least.h b/libcxx/include/__memory/allocate_at_least.h
index 05cbdee828839..b2e5dd3ff98a0 100644
--- a/libcxx/include/__memory/allocate_at_least.h
+++ b/libcxx/include/__memory/allocate_at_least.h
@@ -20,28 +20,14 @@
 _LIBCPP_BEGIN_NAMESPACE_STD
 
 #if _LIBCPP_STD_VER >= 23
-template <class _Pointer>
-struct allocation_result {
-  _Pointer ptr;
-  size_t count;
-};
-_LIBCPP_CTAD_SUPPORTED_FOR_TYPE(allocation_result);
-
-template <class _Alloc>
-[[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr allocation_result<typename allocator_traits<_Alloc>::pointer>
-allocate_at_least(_Alloc& __alloc, size_t __n) {
-  if constexpr (requires { __alloc.allocate_at_least(__n); }) {
-    return __alloc.allocate_at_least(__n);
-  } else {
-    return {__alloc.allocate(__n), __n};
-  }
-}
 
 template <class _Alloc>
 [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr auto __allocate_at_least(_Alloc& __alloc, size_t __n) {
-  return std::allocate_at_least(__alloc, __n);
+  return std::allocator_traits<_Alloc>::allocate_at_least(__alloc, __n);
 }
+
 #else
+
 template <class _Pointer>
 struct __allocation_result {
   _Pointer ptr;
diff --git a/libcxx/include/__memory/allocator_traits.h b/libcxx/include/__memory/allocator_traits.h
index c4482872ea810..3c7fc863b77bc 100644
--- a/libcxx/include/__memory/allocator_traits.h
+++ b/libcxx/include/__memory/allocator_traits.h
@@ -22,6 +22,7 @@
 #include <__type_traits/void_t.h>
 #include <__utility/declval.h>
 #include <__utility/forward.h>
+#include <cstddef>
 #include <limits>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -231,6 +232,17 @@ struct __has_select_on_container_copy_construction<
 
 _LIBCPP_SUPPRESS_DEPRECATED_POP
 
+#if _LIBCPP_STD_VER >= 23
+
+template <class _Pointer, class _SizeType = size_t>
+struct allocation_result {
+  _Pointer ptr;
+  _SizeType count;
+};
+_LIBCPP_CTAD_SUPPORTED_FOR_TYPE(allocation_result);
+
+#endif // _LIBCPP_STD_VER
+
 template <class _Alloc>
 struct _LIBCPP_TEMPLATE_VIS allocator_traits {
   using allocator_type     = _Alloc;
@@ -284,6 +296,18 @@ struct _LIBCPP_TEMPLATE_VIS allocator_traits {
     return __a.allocate(__n);
   }
 
+#if _LIBCPP_STD_VER >= 23
+  template <class _Ap = _Alloc>
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI static constexpr allocation_result<pointer, size_type>
+  allocate_at_least(_Ap& __alloc, size_type __n) {
+    if constexpr (requires { __alloc.allocate_at_least(__n); }) {
+      return __alloc.allocate_at_least(__n);
+    } else {
+      return {__alloc.allocate(__n), __n};
+    }
+  }
+#endif
+
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 static void
   deallocate(allocator_type& __a, pointer __p, size_type __n) _NOEXCEPT {
     __a.deallocate(__p, __n);
diff --git a/libcxx/include/memory b/libcxx/include/memory
index 19c11ee949872..0ada7cdfa2069 100644
--- a/libcxx/include/memory
+++ b/libcxx/include/memory
@@ -88,6 +88,9 @@ struct allocator_traits
     static pointer allocate(allocator_type& a, size_type n);                          // constexpr and [[nodiscard]] in C++20
     static pointer allocate(allocator_type& a, size_type n, const_void_pointer hint); // constexpr and [[nodiscard]] in C++20
 
+    [[nodiscard]] static constexpr allocation_result<pointer, size_type>
+      allocate_at_least(Alloc& a, size_type n);                                 // Since C++23
+
     static void deallocate(allocator_type& a, pointer p, size_type n) noexcept; // constexpr in C++20
 
     template <class T, class... Args>
@@ -100,15 +103,11 @@ struct allocator_traits
     static allocator_type select_on_container_copy_construction(const allocator_type& a); // constexpr in C++20
 };
 
-template<class Pointer>
+template<class Pointer, class SizeType = size_t>
 struct allocation_result {
     Pointer ptr;
-    size_t count;
-}; // since C++23
-
-template<class Allocator>
-[[nodiscard]] constexpr allocation_result<typename allocator_traits<Allocator>::pointer>
-    allocate_at_least(Allocator& a, size_t n); // since C++23
+    SizeType count;
+}; // Since C++23
 
 template <>
 class allocator<void> // removed in C++20
diff --git a/libcxx/include/version b/libcxx/include/version
index 9e26da8c1b242..e4dbb7bdd5fc2 100644
--- a/libcxx/include/version
+++ b/libcxx/include/version
@@ -16,7 +16,7 @@
 Macro name                                              Value   Headers
 __cpp_lib_adaptor_iterator_pair_constructor             202106L <queue> <stack>
 __cpp_lib_addressof_constexpr                           201603L <memory>
-__cpp_lib_allocate_at_least                             202106L <memory>
+__cpp_lib_allocate_at_least                             202302L <memory>
 __cpp_lib_allocator_traits_is_always_equal              201411L <deque> <forward_list> <list>
                                                                 <map> <memory> <scoped_allocator>
                                                                 <set> <string> <unordered_map>
@@ -433,7 +433,7 @@ __cpp_lib_within_lifetime                               202306L <type_traits>
 
 #if _LIBCPP_STD_VER >= 23
 # define __cpp_lib_adaptor_iterator_pair_constructor    202106L
-# define __cpp_lib_allocate_at_least                    202106L
+# define __cpp_lib_allocate_at_least                    202302L
 // # define __cpp_lib_associative_heterogeneous_erasure    202110L
 // # define __cpp_lib_bind_back                            202202L
 # define __cpp_lib_byteswap                             202110L
diff --git a/libcxx/modules/std/memory.inc b/libcxx/modules/std/memory.inc
index ef89845457fbb..56c621c0cf17f 100644
--- a/libcxx/modules/std/memory.inc
+++ b/libcxx/modules/std/memory.inc
@@ -43,8 +43,6 @@ export namespace std {
 
 #if _LIBCPP_STD_VER >= 23
   using std::allocation_result;
-
-  using std::allocate_at_least;
 #endif
 
   // [default.allocator], the default allocator
diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/memory.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/memory.version.compile.pass.cpp
index b1f6c76d84739..45d9271faa578 100644
--- a/libcxx/test/std/language.support/support.limits/support.limits.general/memory.version.compile.pass.cpp
+++ b/libcxx/test/std/language.support/support.limits/support.limits.general/memory.version.compile.pass.cpp
@@ -17,7 +17,7 @@
 
 /*  Constant                                      Value
     __cpp_lib_addressof_constexpr                 201603L [C++17]
-    __cpp_lib_allocate_at_least                   202106L [C++23]
+    __cpp_lib_allocate_at_least                   202302L [C++23]
     __cpp_lib_allocator_traits_is_always_equal    201411L [C++17]
     __cpp_lib_assume_aligned                      201811L [C++20]
     __cpp_lib_atomic_value_initialization         201911L [C++20]
@@ -432,8 +432,8 @@
 # ifndef __cpp_lib_allocate_at_least
 #   error "__cpp_lib_allocate_at_least should be defined in c++23"
 # endif
-# if __cpp_lib_allocate_at_least != 202106L
-#   error "__cpp_lib_allocate_at_least should have the value 202106L in c++23"
+# if __cpp_lib_allocate_at_least != 202302L
+#   error "__cpp_lib_allocate_at_least should have the value 202302L in c++23"
 # endif
 
 # ifndef __cpp_lib_allocator_traits_is_always_equal
@@ -569,8 +569,8 @@
 # ifndef __cpp_lib_allocate_at_least
 #   error "__cpp_lib_allocate_at_least should be defined in c++26"
 # endif
-# if __cpp_lib_allocate_at_least != 202106L
-#   error "__cpp_lib_allocate_at_least should have the value 202106L in c++26"
+# if __cpp_lib_allocate_at_least != 202302L
+#   error "__cpp_lib_allocate_at_least should have the value 202302L in c++26"
 # endif
 
 # ifndef __cpp_lib_allocator_traits_is_always_equal
diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp
index c319940fe6e49..29f0ba89330bb 100644
--- a/libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp
+++ b/libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp
@@ -18,7 +18,7 @@
 /*  Constant                                         Value
     __cpp_lib_adaptor_iterator_pair_constructor      202106L [C++23]
     __cpp_lib_addressof_constexpr                    201603L [C++17]
-    __cpp_lib_allocate_at_least                      202106L [C++23]
+    __cpp_lib_allocate_at_least                      202302L [C++23]
     __cpp_lib_allocator_traits_is_always_equal       201411L [C++17]
     __cpp_lib_any                                    201606L [C++17]
     __cpp_lib_apply                                  201603L [C++17]
@@ -4279,8 +4279,8 @@
 # ifndef __cpp_lib_allocate_at_least
 #   error "__cpp_lib_allocate_at_least should be defined in c++23"
 # endif
-# if __cpp_lib_allocate_at_least != 202106L
-#   error "__cpp_lib_allocate_at_least should have the value 202106L in c++23"
+# if __cpp_lib_allocate_at_least != 202302L
+#   error "__cpp_lib_allocate_at_least should have the value 202302L in c++23"
 # endif
 
 # ifndef __cpp_lib_allocator_traits_is_always_equal
@@ -5846,8 +5846,8 @@
 # ifndef __cpp_lib_allocate_at_least
 #   error "__cpp_lib_allocate_at_least should be defined in c++26"
 # endif
-# if __cpp_lib_allocate_at_least != 202106L
-#   error "__cpp_lib_allocate_at_least should have the value 202106L in c++26"
+# if __cpp_lib_allocate_at_least != 202302L
+#   error "__cpp_lib_allocate_at_least should have the value 202302L in c++26"
 # endif
 
 # ifndef __cpp_lib_allocator_traits_is_always_equal
diff --git a/libcxx/test/std/utilities/memory/allocator.traits/allocate_at_least.pass.cpp b/libcxx/test/std/utilities/memory/allocator.traits/allocator.traits.members/allocate_at_least.pass.cpp
similarity index 91%
rename from libcxx/test/std/utilities/memory/allocator.traits/allocate_at_least.pass.cpp
rename to libcxx/test/std/utilities/memory/allocator.traits/allocator.traits.members/allocate_at_least.pass.cpp
index ad9a2381cbfa1..88ae44c627584 100644
--- a/libcxx/test/std/utilities/memory/allocator.traits/allocate_at_least.pass.cpp
+++ b/libcxx/test/std/utilities/memory/allocator.traits/allocator.traits.members/allocate_at_least.pass.cpp
@@ -39,22 +39,22 @@ struct has_allocate_at_least {
 
   constexpr T* allocate(std::size_t) { return &t1; }
   constexpr void deallocate(T*, std::size_t) {}
-  constexpr std::allocation_result<T*> allocate_at_least(std::size_t) {
-    return {&t2, 2};
-  }
+  constexpr std::allocation_result<T*> allocate_at_least(std::size_t) { return {&t2, 2}; }
 };
 
 constexpr bool test() {
   { // check that std::allocate_at_least forwards to allocator::allocate if no allocate_at_least exists
     no_allocate_at_least<int> alloc;
-    std::same_as<std::allocation_result<int*>> decltype(auto) ret = std::allocate_at_least(alloc, 1);
+    std::same_as<std::allocation_result<int*>> decltype(auto) ret =
+        std::allocator_traits<decltype(alloc)>::allocate_at_least(alloc, 1);
     assert(ret.count == 1);
     assert(ret.ptr == &alloc.t);
   }
 
   { // check that std::allocate_at_least forwards to allocator::allocate_at_least if allocate_at_least exists
     has_allocate_at_least<int> alloc;
-    std::same_as<std::allocation_result<int*>> decltype(auto) ret = std::allocate_at_least(alloc, 1);
+    std::same_as<std::allocation_result<int*>> decltype(auto) ret =
+        std::allocator_traits<decltype(alloc)>::allocate_at_least(alloc, 1);
     assert(ret.count == 2);
     assert(ret.ptr == &alloc.t2);
   }
diff --git a/libcxx/utils/generate_feature_test_macro_components.py b/libcxx/utils/generate_feature_test_macro_components.py
index 065b70620cd17..6078e811c8e1e 100755
--- a/libcxx/utils/generate_feature_test_macro_components.py
+++ b/libcxx/utils/generate_feature_test_macro_components.py
@@ -88,9 +88,8 @@ def add_version_header(tc):
         {
             "name": "__cpp_lib_allocate_at_least",
             "values": {
-                "c++23": 202106,
                 # Note LWG3887 Version macro for allocate_at_least
-                # "c++26": 202302, # P2652R2 Disallow User Specialization of allocator_traits
+                "c++23": 202302,  # P2652R2 Disallow User Specialization of allocator_traits
             },
             "headers": ["memory"],
         },

>From 88b15aa60832d2582dfa90946c58792a0356ea24 Mon Sep 17 00:00:00 2001
From: Simon Pilgrim <llvm-dev at redking.me.uk>
Date: Thu, 1 Feb 2024 11:42:07 +0000
Subject: [PATCH 06/42] [ARM] Add ctpop codegen tests

---
 llvm/test/CodeGen/ARM/popcnt.ll | 118 ++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

diff --git a/llvm/test/CodeGen/ARM/popcnt.ll b/llvm/test/CodeGen/ARM/popcnt.ll
index 0a96daaeb5710..edcae5e141e73 100644
--- a/llvm/test/CodeGen/ARM/popcnt.ll
+++ b/llvm/test/CodeGen/ARM/popcnt.ll
@@ -281,6 +281,121 @@ define <4 x i32> @vclsQs32(ptr %A) nounwind {
 	ret <4 x i32> %tmp2
 }
 
+define i32 @ctpop8(i8 %x) nounwind readnone {
+; CHECK-LABEL: ctpop8:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    mov r1, #85
+; CHECK-NEXT:    and r1, r1, r0, lsr #1
+; CHECK-NEXT:    sub r0, r0, r1
+; CHECK-NEXT:    mov r1, #51
+; CHECK-NEXT:    and r1, r1, r0, lsr #2
+; CHECK-NEXT:    and r0, r0, #51
+; CHECK-NEXT:    add r0, r0, r1
+; CHECK-NEXT:    add r0, r0, r0, lsr #4
+; CHECK-NEXT:    and r0, r0, #15
+; CHECK-NEXT:    mov pc, lr
+  %count = tail call i8 @llvm.ctpop.i8(i8 %x)
+  %conv = zext i8 %count to i32
+  ret i32 %conv
+}
+
+define i32 @ctpop16(i16 %x) nounwind readnone {
+; CHECK-LABEL: ctpop16:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    mov r1, #85
+; CHECK-NEXT:    orr r1, r1, #21760
+; CHECK-NEXT:    and r1, r1, r0, lsr #1
+; CHECK-NEXT:    sub r0, r0, r1
+; CHECK-NEXT:    mov r1, #51
+; CHECK-NEXT:    orr r1, r1, #13056
+; CHECK-NEXT:    and r2, r0, r1
+; CHECK-NEXT:    and r0, r1, r0, lsr #2
+; CHECK-NEXT:    add r0, r2, r0
+; CHECK-NEXT:    add r0, r0, r0, lsr #4
+; CHECK-NEXT:    and r1, r0, #3840
+; CHECK-NEXT:    and r0, r0, #15
+; CHECK-NEXT:    add r0, r0, r1, lsr #8
+; CHECK-NEXT:    mov pc, lr
+  %count = tail call i16 @llvm.ctpop.i16(i16 %x)
+  %conv = zext i16 %count to i32
+  ret i32 %conv
+}
+
+define i32 @ctpop32(i32 %x) nounwind readnone {
+; CHECK-LABEL: ctpop32:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    ldr r1, .LCPI22_0
+; CHECK-NEXT:    ldr r2, .LCPI22_3
+; CHECK-NEXT:    and r1, r1, r0, lsr #1
+; CHECK-NEXT:    ldr r12, .LCPI22_1
+; CHECK-NEXT:    sub r0, r0, r1
+; CHECK-NEXT:    ldr r3, .LCPI22_2
+; CHECK-NEXT:    and r1, r0, r2
+; CHECK-NEXT:    and r0, r2, r0, lsr #2
+; CHECK-NEXT:    add r0, r1, r0
+; CHECK-NEXT:    add r0, r0, r0, lsr #4
+; CHECK-NEXT:    and r0, r0, r12
+; CHECK-NEXT:    mul r1, r0, r3
+; CHECK-NEXT:    lsr r0, r1, #24
+; CHECK-NEXT:    mov pc, lr
+; CHECK-NEXT:    .p2align 2
+; CHECK-NEXT:  @ %bb.1:
+; CHECK-NEXT:  .LCPI22_0:
+; CHECK-NEXT:    .long 1431655765 @ 0x55555555
+; CHECK-NEXT:  .LCPI22_1:
+; CHECK-NEXT:    .long 252645135 @ 0xf0f0f0f
+; CHECK-NEXT:  .LCPI22_2:
+; CHECK-NEXT:    .long 16843009 @ 0x1010101
+; CHECK-NEXT:  .LCPI22_3:
+; CHECK-NEXT:    .long 858993459 @ 0x33333333
+  %count = tail call i32 @llvm.ctpop.i32(i32 %x)
+  ret i32 %count
+}
+
+define i32 @ctpop64(i64 %x) nounwind readnone {
+; CHECK-LABEL: ctpop64:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    .save {r4, lr}
+; CHECK-NEXT:    push {r4, lr}
+; CHECK-NEXT:    ldr r2, .LCPI23_0
+; CHECK-NEXT:    ldr r3, .LCPI23_3
+; CHECK-NEXT:    and r4, r2, r0, lsr #1
+; CHECK-NEXT:    and r2, r2, r1, lsr #1
+; CHECK-NEXT:    sub r0, r0, r4
+; CHECK-NEXT:    sub r1, r1, r2
+; CHECK-NEXT:    and r4, r0, r3
+; CHECK-NEXT:    and r2, r1, r3
+; CHECK-NEXT:    and r0, r3, r0, lsr #2
+; CHECK-NEXT:    and r1, r3, r1, lsr #2
+; CHECK-NEXT:    add r0, r4, r0
+; CHECK-NEXT:    ldr lr, .LCPI23_1
+; CHECK-NEXT:    add r1, r2, r1
+; CHECK-NEXT:    ldr r12, .LCPI23_2
+; CHECK-NEXT:    add r0, r0, r0, lsr #4
+; CHECK-NEXT:    and r0, r0, lr
+; CHECK-NEXT:    add r1, r1, r1, lsr #4
+; CHECK-NEXT:    mul r2, r0, r12
+; CHECK-NEXT:    and r0, r1, lr
+; CHECK-NEXT:    mul r1, r0, r12
+; CHECK-NEXT:    lsr r0, r2, #24
+; CHECK-NEXT:    add r0, r0, r1, lsr #24
+; CHECK-NEXT:    pop {r4, lr}
+; CHECK-NEXT:    mov pc, lr
+; CHECK-NEXT:    .p2align 2
+; CHECK-NEXT:  @ %bb.1:
+; CHECK-NEXT:  .LCPI23_0:
+; CHECK-NEXT:    .long 1431655765 @ 0x55555555
+; CHECK-NEXT:  .LCPI23_1:
+; CHECK-NEXT:    .long 252645135 @ 0xf0f0f0f
+; CHECK-NEXT:  .LCPI23_2:
+; CHECK-NEXT:    .long 16843009 @ 0x1010101
+; CHECK-NEXT:  .LCPI23_3:
+; CHECK-NEXT:    .long 858993459 @ 0x33333333
+  %count = tail call i64 @llvm.ctpop.i64(i64 %x)
+  %conv = trunc i64 %count to i32
+  ret i32 %conv
+}
+
 define i32 @ctpop_eq_one(i64 %x) nounwind readnone {
 ; CHECK-LABEL: ctpop_eq_one:
 ; CHECK:       @ %bb.0:
@@ -299,6 +414,9 @@ define i32 @ctpop_eq_one(i64 %x) nounwind readnone {
   ret i32 %conv
 }
 
+declare i8 @llvm.ctpop.i8(i8) nounwind readnone
+declare i16 @llvm.ctpop.i16(i16) nounwind readnone
+declare i32 @llvm.ctpop.i32(i32) nounwind readnone
 declare i64 @llvm.ctpop.i64(i64) nounwind readnone
 
 declare <8 x i8>  @llvm.arm.neon.vcls.v8i8(<8 x i8>) nounwind readnone

>From 06c33490881aea1a0ea7621970fb88aeb54842e0 Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Thu, 1 Feb 2024 12:57:59 +0100
Subject: [PATCH 07/42] [IndVars] Add tests for #79861 (NFC)

---
 .../test/Transforms/IndVarSimplify/pr79861.ll | 104 ++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 llvm/test/Transforms/IndVarSimplify/pr79861.ll

diff --git a/llvm/test/Transforms/IndVarSimplify/pr79861.ll b/llvm/test/Transforms/IndVarSimplify/pr79861.ll
new file mode 100644
index 0000000000000..a8e2aa42a365c
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/pr79861.ll
@@ -0,0 +1,104 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=indvars < %s | FileCheck %s
+
+target datalayout = "n64"
+
+declare void @use(i64)
+
+define void @or_disjoint() {
+; CHECK-LABEL: define void @or_disjoint() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 2, [[ENTRY:%.*]] ], [ [[IV_DEC:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[OR:%.*]] = or disjoint i64 [[IV]], 1
+; CHECK-NEXT:    call void @use(i64 [[OR]])
+; CHECK-NEXT:    [[IV_DEC]] = add nsw i64 [[IV]], -1
+; CHECK-NEXT:    [[EXIT_COND:%.*]] = icmp eq i64 [[IV_DEC]], 0
+; CHECK-NEXT:    br i1 [[EXIT_COND]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 2, %entry ], [ %iv.dec, %loop ]
+  %or = or disjoint i64 %iv, 1
+  %add = add nsw i64 %iv, 1
+  %sel = select i1 false, i64 %or, i64 %add
+  call void @use(i64 %sel)
+
+  %iv.dec = add nsw i64 %iv, -1
+  %exit.cond = icmp eq i64 %iv.dec, 0
+  br i1 %exit.cond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+define void @add_nowrap_flags(i64 %n) {
+; CHECK-LABEL: define void @add_nowrap_flags(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[ADD1:%.*]] = add nuw nsw i64 [[IV]], 123
+; CHECK-NEXT:    call void @use(i64 [[ADD1]])
+; CHECK-NEXT:    [[IV_INC]] = add i64 [[IV]], 1
+; CHECK-NEXT:    [[EXIT_COND:%.*]] = icmp eq i64 [[IV_INC]], [[N]]
+; CHECK-NEXT:    br i1 [[EXIT_COND]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.inc, %loop ]
+  %add1 = add nuw nsw i64 %iv, 123
+  %add2 = add i64 %iv, 123
+  %sel = select i1 false, i64 %add1, i64 %add2
+  call void @use(i64 %sel)
+
+  %iv.inc = add i64 %iv, 1
+  %exit.cond = icmp eq i64 %iv.inc, %n
+  br i1 %exit.cond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+
+define void @expander_or_disjoint(i64 %n) {
+; CHECK-LABEL: define void @expander_or_disjoint(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[OR:%.*]] = or i64 [[N]], 1
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_INC:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_INC]] = add i64 [[IV]], 1
+; CHECK-NEXT:    [[ADD:%.*]] = add i64 [[IV]], [[OR]]
+; CHECK-NEXT:    call void @use(i64 [[ADD]])
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[IV_INC]], [[OR]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %or = or disjoint i64 %n, 1
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.inc, %loop ]
+  %iv.inc = add i64 %iv, 1
+  %add = add i64 %iv, %or
+  call void @use(i64 %add)
+  %cmp = icmp ult i64 %iv, %n
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret void
+}

>From 05371d893936c1ab544ff0e4860d226a02849ae6 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng <wangpengcheng.pp at bytedance.com>
Date: Thu, 1 Feb 2024 20:50:20 +0800
Subject: [PATCH 08/42] [RISCV][NFC] Simplify calls.ll and autogenerate checks
 for tail-calls.ll

Split out from #78417.

Reviewers: topperc, asb, kito-cheng

Reviewed By: asb

Pull Request: https://github.com/llvm/llvm-project/pull/79248
---
 llvm/test/CodeGen/RISCV/calls.ll      | 357 +++++++++-----------------
 llvm/test/CodeGen/RISCV/tail-calls.ll | 228 ++++++++++++----
 2 files changed, 294 insertions(+), 291 deletions(-)

diff --git a/llvm/test/CodeGen/RISCV/calls.ll b/llvm/test/CodeGen/RISCV/calls.ll
index 365f255dd8244..58b10cf53971f 100644
--- a/llvm/test/CodeGen/RISCV/calls.ll
+++ b/llvm/test/CodeGen/RISCV/calls.ll
@@ -1,29 +1,20 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefix=RV32I %s
+; RUN:   | FileCheck -check-prefixes=CHECK,RV32I %s
 ; RUN: llc -relocation-model=pic -mtriple=riscv32 -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefix=RV32I-PIC %s
+; RUN:   | FileCheck -check-prefixes=CHECK,RV32I-PIC %s
 
 declare i32 @external_function(i32)
 
 define i32 @test_call_external(i32 %a) nounwind {
-; RV32I-LABEL: test_call_external:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    call external_function
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_external:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    call external_function
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_external:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call external_function
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @external_function(i32 %a)
   ret i32 %1
 }
@@ -31,85 +22,51 @@ define i32 @test_call_external(i32 %a) nounwind {
 declare dso_local i32 @dso_local_function(i32)
 
 define i32 @test_call_dso_local(i32 %a) nounwind {
-; RV32I-LABEL: test_call_dso_local:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    call dso_local_function
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_dso_local:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    call dso_local_function
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_dso_local:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call dso_local_function
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @dso_local_function(i32 %a)
   ret i32 %1
 }
 
 define i32 @defined_function(i32 %a) nounwind {
-; RV32I-LABEL: defined_function:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi a0, a0, 1
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: defined_function:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi a0, a0, 1
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: defined_function:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a0, a0, 1
+; CHECK-NEXT:    ret
   %1 = add i32 %a, 1
   ret i32 %1
 }
 
 define i32 @test_call_defined(i32 %a) nounwind {
-; RV32I-LABEL: test_call_defined:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    call defined_function
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_defined:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    call defined_function
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_defined:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call defined_function
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @defined_function(i32 %a)
   ret i32 %1
 }
 
 define i32 @test_call_indirect(ptr %a, i32 %b) nounwind {
-; RV32I-LABEL: test_call_indirect:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv a2, a0
-; RV32I-NEXT:    mv a0, a1
-; RV32I-NEXT:    jalr a2
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_indirect:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv a2, a0
-; RV32I-PIC-NEXT:    mv a0, a1
-; RV32I-PIC-NEXT:    jalr a2
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_indirect:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    mv a0, a1
+; CHECK-NEXT:    jalr a2
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 %a(i32 %b)
   ret i32 %1
 }
@@ -117,39 +74,22 @@ define i32 @test_call_indirect(ptr %a, i32 %b) nounwind {
 ; Make sure we don't use t0 as the source for jalr as that is a hint to pop the
 ; return address stack on some microarchitectures.
 define i32 @test_call_indirect_no_t0(ptr %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h) nounwind {
-; RV32I-LABEL: test_call_indirect_no_t0:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv t1, a0
-; RV32I-NEXT:    mv a0, a1
-; RV32I-NEXT:    mv a1, a2
-; RV32I-NEXT:    mv a2, a3
-; RV32I-NEXT:    mv a3, a4
-; RV32I-NEXT:    mv a4, a5
-; RV32I-NEXT:    mv a5, a6
-; RV32I-NEXT:    mv a6, a7
-; RV32I-NEXT:    jalr t1
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_indirect_no_t0:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv t1, a0
-; RV32I-PIC-NEXT:    mv a0, a1
-; RV32I-PIC-NEXT:    mv a1, a2
-; RV32I-PIC-NEXT:    mv a2, a3
-; RV32I-PIC-NEXT:    mv a3, a4
-; RV32I-PIC-NEXT:    mv a4, a5
-; RV32I-PIC-NEXT:    mv a5, a6
-; RV32I-PIC-NEXT:    mv a6, a7
-; RV32I-PIC-NEXT:    jalr t1
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_indirect_no_t0:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv t1, a0
+; CHECK-NEXT:    mv a0, a1
+; CHECK-NEXT:    mv a1, a2
+; CHECK-NEXT:    mv a2, a3
+; CHECK-NEXT:    mv a3, a4
+; CHECK-NEXT:    mv a4, a5
+; CHECK-NEXT:    mv a5, a6
+; CHECK-NEXT:    mv a6, a7
+; CHECK-NEXT:    jalr t1
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 %a(i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h)
   ret i32 %1
 }
@@ -158,45 +98,27 @@ define i32 @test_call_indirect_no_t0(ptr %a, i32 %b, i32 %c, i32 %d, i32 %e, i32
 ; introduced when compiling with optimisation.
 
 define fastcc i32 @fastcc_function(i32 %a, i32 %b) nounwind {
-; RV32I-LABEL: fastcc_function:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    add a0, a0, a1
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: fastcc_function:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    add a0, a0, a1
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: fastcc_function:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    add a0, a0, a1
+; CHECK-NEXT:    ret
  %1 = add i32 %a, %b
  ret i32 %1
 }
 
 define i32 @test_call_fastcc(i32 %a, i32 %b) nounwind {
-; RV32I-LABEL: test_call_fastcc:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv s0, a0
-; RV32I-NEXT:    call fastcc_function
-; RV32I-NEXT:    mv a0, s0
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_fastcc:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv s0, a0
-; RV32I-PIC-NEXT:    call fastcc_function
-; RV32I-PIC-NEXT:    mv a0, s0
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_fastcc:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    call fastcc_function
+; CHECK-NEXT:    mv a0, s0
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call fastcc i32 @fastcc_function(i32 %a, i32 %b)
   ret i32 %a
 }
@@ -204,107 +126,64 @@ define i32 @test_call_fastcc(i32 %a, i32 %b) nounwind {
 declare i32 @external_many_args(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) nounwind
 
 define i32 @test_call_external_many_args(i32 %a) nounwind {
-; RV32I-LABEL: test_call_external_many_args:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    mv s0, a0
-; RV32I-NEXT:    sw a0, 4(sp)
-; RV32I-NEXT:    sw a0, 0(sp)
-; RV32I-NEXT:    mv a1, a0
-; RV32I-NEXT:    mv a2, a0
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    mv a4, a0
-; RV32I-NEXT:    mv a5, a0
-; RV32I-NEXT:    mv a6, a0
-; RV32I-NEXT:    mv a7, a0
-; RV32I-NEXT:    call external_many_args
-; RV32I-NEXT:    mv a0, s0
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_external_many_args:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    mv s0, a0
-; RV32I-PIC-NEXT:    sw a0, 4(sp)
-; RV32I-PIC-NEXT:    sw a0, 0(sp)
-; RV32I-PIC-NEXT:    mv a1, a0
-; RV32I-PIC-NEXT:    mv a2, a0
-; RV32I-PIC-NEXT:    mv a3, a0
-; RV32I-PIC-NEXT:    mv a4, a0
-; RV32I-PIC-NEXT:    mv a5, a0
-; RV32I-PIC-NEXT:    mv a6, a0
-; RV32I-PIC-NEXT:    mv a7, a0
-; RV32I-PIC-NEXT:    call external_many_args
-; RV32I-PIC-NEXT:    mv a0, s0
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_external_many_args:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    sw a0, 4(sp)
+; CHECK-NEXT:    sw a0, 0(sp)
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a4, a0
+; CHECK-NEXT:    mv a5, a0
+; CHECK-NEXT:    mv a6, a0
+; CHECK-NEXT:    mv a7, a0
+; CHECK-NEXT:    call external_many_args
+; CHECK-NEXT:    mv a0, s0
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @external_many_args(i32 %a, i32 %a, i32 %a, i32 %a, i32 %a,
                                     i32 %a, i32 %a, i32 %a, i32 %a, i32 %a)
   ret i32 %a
 }
 
 define i32 @defined_many_args(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 %j) nounwind {
-; RV32I-LABEL: defined_many_args:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    lw a0, 4(sp)
-; RV32I-NEXT:    addi a0, a0, 1
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: defined_many_args:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    lw a0, 4(sp)
-; RV32I-PIC-NEXT:    addi a0, a0, 1
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: defined_many_args:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lw a0, 4(sp)
+; CHECK-NEXT:    addi a0, a0, 1
+; CHECK-NEXT:    ret
   %added = add i32 %j, 1
   ret i32 %added
 }
 
 define i32 @test_call_defined_many_args(i32 %a) nounwind {
-; RV32I-LABEL: test_call_defined_many_args:
-; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -16
-; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw a0, 4(sp)
-; RV32I-NEXT:    sw a0, 0(sp)
-; RV32I-NEXT:    mv a1, a0
-; RV32I-NEXT:    mv a2, a0
-; RV32I-NEXT:    mv a3, a0
-; RV32I-NEXT:    mv a4, a0
-; RV32I-NEXT:    mv a5, a0
-; RV32I-NEXT:    mv a6, a0
-; RV32I-NEXT:    mv a7, a0
-; RV32I-NEXT:    call defined_many_args
-; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 16
-; RV32I-NEXT:    ret
-;
-; RV32I-PIC-LABEL: test_call_defined_many_args:
-; RV32I-PIC:       # %bb.0:
-; RV32I-PIC-NEXT:    addi sp, sp, -16
-; RV32I-PIC-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32I-PIC-NEXT:    sw a0, 4(sp)
-; RV32I-PIC-NEXT:    sw a0, 0(sp)
-; RV32I-PIC-NEXT:    mv a1, a0
-; RV32I-PIC-NEXT:    mv a2, a0
-; RV32I-PIC-NEXT:    mv a3, a0
-; RV32I-PIC-NEXT:    mv a4, a0
-; RV32I-PIC-NEXT:    mv a5, a0
-; RV32I-PIC-NEXT:    mv a6, a0
-; RV32I-PIC-NEXT:    mv a7, a0
-; RV32I-PIC-NEXT:    call defined_many_args
-; RV32I-PIC-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32I-PIC-NEXT:    addi sp, sp, 16
-; RV32I-PIC-NEXT:    ret
+; CHECK-LABEL: test_call_defined_many_args:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a0, 4(sp)
+; CHECK-NEXT:    sw a0, 0(sp)
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a4, a0
+; CHECK-NEXT:    mv a5, a0
+; CHECK-NEXT:    mv a6, a0
+; CHECK-NEXT:    mv a7, a0
+; CHECK-NEXT:    call defined_many_args
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
   %1 = call i32 @defined_many_args(i32 %a, i32 %a, i32 %a, i32 %a, i32 %a,
                                    i32 %a, i32 %a, i32 %a, i32 %a, i32 %a)
   ret i32 %1
 }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32I: {{.*}}
+; RV32I-PIC: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/tail-calls.ll b/llvm/test/CodeGen/RISCV/tail-calls.ll
index e3079424230bc..87d69bfad38c2 100644
--- a/llvm/test/CodeGen/RISCV/tail-calls.ll
+++ b/llvm/test/CodeGen/RISCV/tail-calls.ll
@@ -1,11 +1,13 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -mtriple riscv32-unknown-linux-gnu -o - %s | FileCheck %s
 ; RUN: llc -mtriple riscv32-unknown-elf       -o - %s | FileCheck %s
 
 ; Perform tail call optimization for global address.
 declare i32 @callee_tail(i32 %i)
 define i32 @caller_tail(i32 %i) nounwind {
-; CHECK-LABEL: caller_tail
-; CHECK: tail callee_tail
+; CHECK-LABEL: caller_tail:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    tail callee_tail
 entry:
   %r = tail call i32 @callee_tail(i32 %i)
   ret i32 %r
@@ -15,10 +17,16 @@ entry:
 @dest = global [2 x i8] zeroinitializer
 declare void @llvm.memcpy.p0.p0.i32(ptr, ptr, i32, i1)
 define void @caller_extern(ptr %src) optsize {
+; CHECK-LABEL: caller_extern:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, %hi(dest)
+; CHECK-NEXT:    addi a1, a1, %lo(dest)
+; CHECK-NEXT:    li a2, 7
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a0, a1
+; CHECK-NEXT:    mv a1, a3
+; CHECK-NEXT:    tail memcpy
 entry:
-; CHECK: caller_extern
-; CHECK-NOT: call memcpy
-; CHECK: tail memcpy
   tail call void @llvm.memcpy.p0.p0.i32(ptr @dest, ptr %src, i32 7, i1 false)
   ret void
 }
@@ -26,10 +34,16 @@ entry:
 ; Perform tail call optimization for external symbol.
 @dest_pgso = global [2 x i8] zeroinitializer
 define void @caller_extern_pgso(ptr %src) !prof !14 {
+; CHECK-LABEL: caller_extern_pgso:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, %hi(dest_pgso)
+; CHECK-NEXT:    addi a1, a1, %lo(dest_pgso)
+; CHECK-NEXT:    li a2, 7
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a0, a1
+; CHECK-NEXT:    mv a1, a3
+; CHECK-NEXT:    tail memcpy
 entry:
-; CHECK: caller_extern_pgso
-; CHECK-NOT: call memcpy
-; CHECK: tail memcpy
   tail call void @llvm.memcpy.p0.p0.i32(ptr @dest_pgso, ptr %src, i32 7, i1 false)
   ret void
 }
@@ -38,19 +52,19 @@ entry:
 declare void @callee_indirect1()
 declare void @callee_indirect2()
 define void @caller_indirect_tail(i32 %a) nounwind {
-; CHECK-LABEL: caller_indirect_tail
-; CHECK-NOT: call callee_indirect1
-; CHECK-NOT: call callee_indirect2
-; CHECK-NOT: tail callee_indirect1
-; CHECK-NOT: tail callee_indirect2
-
-; CHECK: lui a0, %hi(callee_indirect2)
-; CHECK-NEXT: addi t1, a0, %lo(callee_indirect2)
-; CHECK-NEXT: jr t1
-
-; CHECK: lui a0, %hi(callee_indirect1)
-; CHECK-NEXT: addi t1, a0, %lo(callee_indirect1)
-; CHECK-NEXT: jr t1
+; CHECK-LABEL: caller_indirect_tail:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    beqz a0, .LBB3_2
+; CHECK-NEXT:  # %bb.1: # %entry
+; CHECK-NEXT:    lui a0, %hi(callee_indirect2)
+; CHECK-NEXT:    addi t1, a0, %lo(callee_indirect2)
+; CHECK-NEXT:    jr t1
+; CHECK-NEXT:  .LBB3_2:
+; CHECK-NEXT:    lui a0, %hi(callee_indirect1)
+; CHECK-NEXT:    addi t1, a0, %lo(callee_indirect1)
+; CHECK-NEXT:    jr t1
+
+
 entry:
   %tobool = icmp eq i32 %a, 0
   %callee = select i1 %tobool, ptr @callee_indirect1, ptr @callee_indirect2
@@ -79,9 +93,21 @@ define i32 @caller_indirect_no_t0(ptr %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5
 ; Do not tail call optimize functions with varargs passed by stack.
 declare i32 @callee_varargs(i32, ...)
 define void @caller_varargs(i32 %a, i32 %b) nounwind {
-; CHECK-LABEL: caller_varargs
-; CHECK-NOT: tail callee_varargs
-; CHECK: call callee_varargs
+; CHECK-LABEL: caller_varargs:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a0, 0(sp)
+; CHECK-NEXT:    mv a2, a1
+; CHECK-NEXT:    mv a3, a0
+; CHECK-NEXT:    mv a4, a0
+; CHECK-NEXT:    mv a5, a1
+; CHECK-NEXT:    mv a6, a1
+; CHECK-NEXT:    mv a7, a0
+; CHECK-NEXT:    call callee_varargs
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
 entry:
   %call = tail call i32 (i32, ...) @callee_varargs(i32 %a, i32 %b, i32 %b, i32 %a, i32 %a, i32 %b, i32 %b, i32 %a, i32 %a)
   ret void
@@ -90,9 +116,26 @@ entry:
 ; Do not tail call optimize if stack is used to pass parameters.
 declare i32 @callee_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n)
 define i32 @caller_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n) nounwind {
-; CHECK-LABEL: caller_args
-; CHECK-NOT: tail callee_args
-; CHECK: call callee_args
+; CHECK-LABEL: caller_args:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -32
+; CHECK-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    lw t0, 32(sp)
+; CHECK-NEXT:    lw t1, 36(sp)
+; CHECK-NEXT:    lw t2, 40(sp)
+; CHECK-NEXT:    lw t3, 44(sp)
+; CHECK-NEXT:    lw t4, 48(sp)
+; CHECK-NEXT:    lw t5, 52(sp)
+; CHECK-NEXT:    sw t5, 20(sp)
+; CHECK-NEXT:    sw t4, 16(sp)
+; CHECK-NEXT:    sw t3, 12(sp)
+; CHECK-NEXT:    sw t2, 8(sp)
+; CHECK-NEXT:    sw t1, 4(sp)
+; CHECK-NEXT:    sw t0, 0(sp)
+; CHECK-NEXT:    call callee_args
+; CHECK-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 32
+; CHECK-NEXT:    ret
 entry:
   %r = tail call i32 @callee_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n)
   ret i32 %r
@@ -101,9 +144,20 @@ entry:
 ; Do not tail call optimize if parameters need to be passed indirectly.
 declare i32 @callee_indirect_args(fp128 %a)
 define void @caller_indirect_args() nounwind {
-; CHECK-LABEL: caller_indirect_args
-; CHECK-NOT: tail callee_indirect_args
-; CHECK: call callee_indirect_args
+; CHECK-LABEL: caller_indirect_args:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -32
+; CHECK-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    lui a0, 262128
+; CHECK-NEXT:    sw a0, 12(sp)
+; CHECK-NEXT:    sw zero, 8(sp)
+; CHECK-NEXT:    sw zero, 4(sp)
+; CHECK-NEXT:    mv a0, sp
+; CHECK-NEXT:    sw zero, 0(sp)
+; CHECK-NEXT:    call callee_indirect_args
+; CHECK-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 32
+; CHECK-NEXT:    ret
 entry:
   %call = tail call i32 @callee_indirect_args(fp128 0xL00000000000000003FFF000000000000)
   ret void
@@ -112,8 +166,9 @@ entry:
 ; Perform tail call optimization for external weak symbol.
 declare extern_weak void @callee_weak()
 define void @caller_weak() nounwind {
-; CHECK-LABEL: caller_weak
-; CHECK: tail callee_weak
+; CHECK-LABEL: caller_weak:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    tail callee_weak
 entry:
   tail call void @callee_weak()
   ret void
@@ -123,24 +178,66 @@ entry:
 ; return to the hardware. Tail-calling another function would probably break
 ; this.
 declare void @callee_irq()
-define void @caller_irq() #0 {
-; CHECK-LABEL: caller_irq
-; CHECK-NOT: tail callee_irq
-; CHECK: call callee_irq
+define void @caller_irq() nounwind "interrupt"="machine" {
+; CHECK-LABEL: caller_irq:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -64
+; CHECK-NEXT:    sw ra, 60(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t0, 56(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t1, 52(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t2, 48(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a0, 44(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a1, 40(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a2, 36(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a3, 32(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a4, 28(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a5, 24(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a6, 20(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw a7, 16(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t3, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t4, 8(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t5, 4(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    sw t6, 0(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call callee_irq
+; CHECK-NEXT:    lw ra, 60(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t0, 56(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t1, 52(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t2, 48(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a0, 44(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a1, 40(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a2, 36(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a3, 32(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a4, 28(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a5, 24(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a6, 20(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw a7, 16(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t3, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t4, 8(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t5, 4(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    lw t6, 0(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 64
+; CHECK-NEXT:    mret
 entry:
   tail call void @callee_irq()
   ret void
 }
-attributes #0 = { "interrupt"="machine" }
 
 ; Byval parameters hand the function a pointer directly into the stack area
 ; we want to reuse during a tail call. Do not tail call optimize functions with
 ; byval parameters.
 declare i32 @callee_byval(ptr byval(ptr) %a)
 define i32 @caller_byval() nounwind {
-; CHECK-LABEL: caller_byval
-; CHECK-NOT: tail callee_byval
-; CHECK: call callee_byval
+; CHECK-LABEL: caller_byval:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    lw a0, 8(sp)
+; CHECK-NEXT:    sw a0, 4(sp)
+; CHECK-NEXT:    addi a0, sp, 4
+; CHECK-NEXT:    call callee_byval
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
 entry:
   %a = alloca ptr
   %r = tail call i32 @callee_byval(ptr byval(ptr) %a)
@@ -153,9 +250,16 @@ entry:
 
 declare void @callee_struct(ptr sret(%struct.A) %a)
 define void @caller_nostruct() nounwind {
-; CHECK-LABEL: caller_nostruct
-; CHECK-NOT: tail callee_struct
-; CHECK: call callee_struct
+; CHECK-LABEL: caller_nostruct:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    lui a0, %hi(a)
+; CHECK-NEXT:    addi a0, a0, %lo(a)
+; CHECK-NEXT:    call callee_struct
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
 entry:
   tail call void @callee_struct(ptr sret(%struct.A) @a)
   ret void
@@ -164,9 +268,14 @@ entry:
 ; Do not tail call optimize if caller uses structret semantics.
 declare void @callee_nostruct()
 define void @caller_struct(ptr sret(%struct.A) %a) nounwind {
-; CHECK-LABEL: caller_struct
-; CHECK-NOT: tail callee_nostruct
-; CHECK: call callee_nostruct
+; CHECK-LABEL: caller_struct:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call callee_nostruct
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
 entry:
   tail call void @callee_nostruct()
   ret void
@@ -175,8 +284,13 @@ entry:
 ; Do not tail call optimize if disabled.
 define i32 @disable_tail_calls(i32 %i) nounwind "disable-tail-calls"="true" {
 ; CHECK-LABEL: disable_tail_calls:
-; CHECK-NOT: tail callee_nostruct
-; CHECK: call callee_tail
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -16
+; CHECK-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; CHECK-NEXT:    call callee_tail
+; CHECK-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 16
+; CHECK-NEXT:    ret
 entry:
   %rv = tail call i32 @callee_tail(i32 %i)
   ret i32 %rv
@@ -189,10 +303,20 @@ declare i32 @test2()
 declare i32 @test3()
 define i32 @duplicate_returns(i32 %a, i32 %b) nounwind {
 ; CHECK-LABEL: duplicate_returns:
-; CHECK:    tail test2
-; CHECK:    tail test
-; CHECK:    tail test1
-; CHECK:    tail test3
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    beqz a0, .LBB14_4
+; CHECK-NEXT:  # %bb.1: # %if.else
+; CHECK-NEXT:    beqz a1, .LBB14_5
+; CHECK-NEXT:  # %bb.2: # %if.else4
+; CHECK-NEXT:    bge a1, a0, .LBB14_6
+; CHECK-NEXT:  # %bb.3: # %if.then6
+; CHECK-NEXT:    tail test2
+; CHECK-NEXT:  .LBB14_4: # %if.then
+; CHECK-NEXT:    tail test
+; CHECK-NEXT:  .LBB14_5: # %if.then2
+; CHECK-NEXT:    tail test1
+; CHECK-NEXT:  .LBB14_6: # %if.else8
+; CHECK-NEXT:    tail test3
 entry:
   %cmp = icmp eq i32 %a, 0
   br i1 %cmp, label %if.then, label %if.else

>From 718af44ed20f98d782c5aac37d5296e65919c462 Mon Sep 17 00:00:00 2001
From: Alexandre Ganea <37383324+aganea at users.noreply.github.com>
Date: Thu, 1 Feb 2024 08:14:05 -0500
Subject: [PATCH 09/42] [openmp] On Windows, fix standalone cmake build
 (#80174)

This fixes: https://github.com/llvm/llvm-project/issues/80117
---
 openmp/cmake/HandleOpenMPOptions.cmake | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/openmp/cmake/HandleOpenMPOptions.cmake b/openmp/cmake/HandleOpenMPOptions.cmake
index 201aeabbd3df9..71346201129b6 100644
--- a/openmp/cmake/HandleOpenMPOptions.cmake
+++ b/openmp/cmake/HandleOpenMPOptions.cmake
@@ -9,6 +9,14 @@ if (NOT COMMAND append_if)
   endfunction()
 endif()
 
+if (NOT COMMAND append)
+  function(append value)
+    foreach(variable ${ARGN})
+      set(${variable} "${${variable}} ${value}" PARENT_SCOPE)
+    endforeach(variable)
+  endfunction()
+endif()
+
 # MSVC and clang-cl in compatibility mode map -Wall to -Weverything.
 # TODO: LLVM adds /W4 instead, check if that works for the OpenMP runtimes.
 if (NOT MSVC)

>From ec2bcbe0c2ac5b01db1ffa6f0cf7fea91a639623 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 1 Feb 2024 07:19:57 -0600
Subject: [PATCH 10/42] [AMDGPU] Prefer `s_memtime` for `readcyclecounter` on
 GFX10 (#80211)

Summary:
The old `s_memtime` instruction was supported until the GFX10
architecture. Although this instruction has a higher latency than the
new shader counter, it's much more usable as a processor clock as it is
a full 64-bit counter. The new shader counter is only a 20-bit counter,
which makes it difficult to use as a standard cycle counter as it will
overflow in a few milliseconds. This patch suggests preferring
`s_memtime` for this instrinsic if it is still available.
---
 llvm/lib/Target/AMDGPU/SMInstructions.td     | 2 --
 llvm/test/CodeGen/AMDGPU/readcyclecounter.ll | 4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index f082de35b6ae9..f3096962e2f3e 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -1065,8 +1065,6 @@ def : GCNPat <
   (REG_SEQUENCE SReg_64,
     (S_GETREG_B32 getHwRegImm<HWREG.SHADER_CYCLES, 0, -12>.ret), sub0,
     (S_MOV_B32 (i32 0)), sub1)> {
-  // Prefer this to s_memtime because it has lower and more predictable latency.
-  let AddedComplexity = 1;
 }
 } // let OtherPredicates = [HasShaderCyclesRegister]
 
diff --git a/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll b/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll
index 046a6d0a8cb7e..fd422b344d834 100644
--- a/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll
+++ b/llvm/test/CodeGen/AMDGPU/readcyclecounter.ll
@@ -4,8 +4,8 @@
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=MEMTIME -check-prefix=SIVI -check-prefix=GCN %s
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=MEMTIME -check-prefix=GCN %s
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=MEMTIME -check-prefix=GCN %s
-; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GETREG,GETREG-SDAG -check-prefix=GCN %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GETREG,GETREG-GISEL -check-prefix=GCN %s
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=MEMTIME -check-prefix=GCN %s
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=MEMTIME -check-prefix=GCN %s
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < %s | FileCheck -check-prefixes=GETREG,GETREG-SDAG -check-prefix=GCN %s
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < %s | FileCheck -check-prefixes=GETREG,GETREG-GISEL -check-prefix=GCN %s
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX12 %s

>From 41e98d8c8b925eabc801e107037deb93b3448e2d Mon Sep 17 00:00:00 2001
From: Sander de Smalen <sander.desmalen at arm.com>
Date: Thu, 1 Feb 2024 13:37:37 +0000
Subject: [PATCH 11/42] [AArch64] Replace LLVM IR function attributes for
 PSTATE.ZA. (#79166)

Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.

Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.

For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0

We have now done the same for ZA, such that we add:
* aarch64_new_za       (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)

This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
---
 clang/lib/CodeGen/CGBuiltin.cpp               |   6 +-
 clang/lib/CodeGen/CGCall.cpp                  |  16 +--
 clang/lib/CodeGen/CodeGenModule.cpp           |   2 +-
 .../aarch64-sme-attrs.cpp                     |  18 +--
 .../aarch64-sme-intrinsics/acle_sme_zero.c    |   4 +-
 clang/test/Modules/aarch64-sme-keywords.cppm  |  10 +-
 llvm/docs/AArch64SME.rst                      |  37 +++---
 llvm/lib/IR/Verifier.cpp                      |  18 ++-
 .../AArch64/AArch64TargetTransformInfo.cpp    |   2 +-
 llvm/lib/Target/AArch64/SMEABIPass.cpp        |  12 +-
 .../AArch64/Utils/AArch64SMEAttributes.cpp    |  32 +++--
 .../AArch64/Utils/AArch64SMEAttributes.h      |  33 +++--
 .../AArch64/sme-disable-gisel-fisel.ll        |  10 +-
 .../AArch64/sme-lazy-save-call-remarks.ll     |   6 +-
 .../CodeGen/AArch64/sme-lazy-save-call.ll     |   8 +-
 .../CodeGen/AArch64/sme-new-za-function.ll    |   8 +-
 .../AArch64/sme-shared-za-interface.ll        |   4 +-
 llvm/test/CodeGen/AArch64/sme-zt0-state.ll    |  22 ++--
 .../Inline/AArch64/sme-pstateza-attrs.ll      |  22 ++--
 llvm/test/Verifier/sme-attributes.ll          |  32 ++++-
 .../Target/AArch64/SMEAttributesTest.cpp      | 119 ++++++++++++------
 .../mlir/Dialect/ArmSME/Transforms/Passes.td  |  35 ++++--
 mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td   |   4 +-
 mlir/lib/Target/LLVMIR/ModuleImport.cpp       |  21 ++--
 mlir/lib/Target/LLVMIR/ModuleTranslation.cpp  |  14 ++-
 mlir/test/Dialect/ArmSME/enable-arm-za.mlir   |  16 ++-
 .../LLVMIR/Import/function-attributes.ll      |  26 ++--
 mlir/test/Target/LLVMIR/llvmir.mlir           |  24 +++-
 28 files changed, 350 insertions(+), 211 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f3ab5ad7b08ec..196be813a4896 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -10676,10 +10676,8 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
         llvm::FunctionType::get(StructType::get(CGM.Int64Ty, CGM.Int64Ty), {},
                                 false),
         "__arm_sme_state"));
-    auto Attrs =
-        AttributeList()
-            .addFnAttribute(getLLVMContext(), "aarch64_pstate_sm_compatible")
-            .addFnAttribute(getLLVMContext(), "aarch64_pstate_za_preserved");
+    auto Attrs = AttributeList().addFnAttribute(getLLVMContext(),
+                                                "aarch64_pstate_sm_compatible");
     CI->setAttributes(Attrs);
     CI->setCallingConv(
         llvm::CallingConv::
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 28c211aa631e4..657666c9bda4e 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1774,14 +1774,14 @@ static void AddAttributesFromFunctionProtoType(ASTContext &Ctx,
     FuncAttrs.addAttribute("aarch64_pstate_sm_compatible");
 
   // ZA
-  if (FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_Out ||
-      FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_InOut)
-    FuncAttrs.addAttribute("aarch64_pstate_za_shared");
-  if (FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_Preserves ||
-      FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_In) {
-    FuncAttrs.addAttribute("aarch64_pstate_za_shared");
-    FuncAttrs.addAttribute("aarch64_pstate_za_preserved");
-  }
+  if (FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_Preserves)
+    FuncAttrs.addAttribute("aarch64_preserves_za");
+  if (FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_In)
+    FuncAttrs.addAttribute("aarch64_in_za");
+  if (FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_Out)
+    FuncAttrs.addAttribute("aarch64_out_za");
+  if (FunctionType::getArmZAState(SMEBits) == FunctionType::ARM_InOut)
+    FuncAttrs.addAttribute("aarch64_inout_za");
 
   // ZT0
   if (FunctionType::getArmZT0State(SMEBits) == FunctionType::ARM_Preserves)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 6ec54cc01c923..c63e4ecc3dcba 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2414,7 +2414,7 @@ void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
 
   if (auto *Attr = D->getAttr<ArmNewAttr>()) {
     if (Attr->isNewZA())
-      B.addAttribute("aarch64_pstate_za_new");
+      B.addAttribute("aarch64_new_za");
     if (Attr->isNewZT0())
       B.addAttribute("aarch64_new_zt0");
   }
diff --git a/clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp b/clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp
index f69703a8a7d89..fdd2de11365dd 100644
--- a/clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp
+++ b/clang/test/CodeGen/aarch64-sme-intrinsics/aarch64-sme-attrs.cpp
@@ -284,20 +284,20 @@ int test_variadic_template() __arm_inout("za") {
 // CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_COMPATIBLE_DECL]] = { "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind "aarch64_pstate_za_shared" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_SHARED_DECL]] = { "aarch64_pstate_za_shared" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind "aarch64_pstate_za_preserved" "aarch64_pstate_za_shared" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_PRESERVED_DECL]] = { "aarch64_pstate_za_preserved" "aarch64_pstate_za_shared" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind "aarch64_pstate_za_new" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_SHARED_DECL]] = { "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_PRESERVED_DECL]] = { "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_ENABLED_CALL]] = { "aarch64_pstate_sm_enabled" }
 // CHECK: attributes #[[SM_COMPATIBLE_CALL]] = { "aarch64_pstate_sm_compatible" }
 // CHECK: attributes #[[SM_BODY_CALL]] = { "aarch64_pstate_sm_body" }
-// CHECK: attributes #[[ZA_SHARED_CALL]] = { "aarch64_pstate_za_shared" }
-// CHECK: attributes #[[ZA_PRESERVED_CALL]] = { "aarch64_pstate_za_preserved" "aarch64_pstate_za_shared" }
+// CHECK: attributes #[[ZA_SHARED_CALL]] = { "aarch64_inout_za" }
+// CHECK: attributes #[[ZA_PRESERVED_CALL]] = { "aarch64_preserves_za" }
 // CHECK: attributes #[[NOUNWIND_CALL]] = { nounwind }
 // CHECK: attributes #[[NOUNWIND_SM_ENABLED_CALL]] = { nounwind "aarch64_pstate_sm_enabled" }
 // CHECK: attributes #[[NOUNWIND_SM_COMPATIBLE_CALL]] = { nounwind "aarch64_pstate_sm_compatible" }
-// CHECK: attributes #[[NOUNWIND_ZA_SHARED_CALL]] = { nounwind "aarch64_pstate_za_shared" }
-// CHECK: attributes #[[NOUNWIND_ZA_PRESERVED_CALL]] = { nounwind "aarch64_pstate_za_preserved" "aarch64_pstate_za_shared" }
+// CHECK: attributes #[[NOUNWIND_ZA_SHARED_CALL]] = { nounwind "aarch64_inout_za" }
+// CHECK: attributes #[[NOUNWIND_ZA_PRESERVED_CALL]] = { nounwind "aarch64_preserves_za" }
 
diff --git a/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_zero.c b/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_zero.c
index 7f56941108828..9963c0e48b8e7 100644
--- a/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_zero.c
+++ b/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_zero.c
@@ -55,13 +55,13 @@ void test_svzero_mask_za_2(void) __arm_inout("za") {
 }
 
 // CHECK-C-LABEL: define dso_local void @test_svzero_za(
-// CHECK-C-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-C-SAME: ) local_unnamed_addr #[[ATTR2:[0-9]+]] {
 // CHECK-C-NEXT:  entry:
 // CHECK-C-NEXT:    tail call void @llvm.aarch64.sme.zero(i32 255)
 // CHECK-C-NEXT:    ret void
 //
 // CHECK-CXX-LABEL: define dso_local void @_Z14test_svzero_zav(
-// CHECK-CXX-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-CXX-SAME: ) local_unnamed_addr #[[ATTR2:[0-9]+]] {
 // CHECK-CXX-NEXT:  entry:
 // CHECK-CXX-NEXT:    tail call void @llvm.aarch64.sme.zero(i32 255)
 // CHECK-CXX-NEXT:    ret void
diff --git a/clang/test/Modules/aarch64-sme-keywords.cppm b/clang/test/Modules/aarch64-sme-keywords.cppm
index df4dd32b16cff..759701a633ceb 100644
--- a/clang/test/Modules/aarch64-sme-keywords.cppm
+++ b/clang/test/Modules/aarch64-sme-keywords.cppm
@@ -43,14 +43,14 @@ import A;
 //
 // CHECK:declare void @_ZW1A22f_streaming_compatiblev() #[[STREAMING_COMPATIBLE_DECL:[0-9]+]]
 //
-// CHECK-DAG: attributes #[[SHARED_ZA_DEF]] = {{{.*}} "aarch64_pstate_za_shared" {{.*}}}
-// CHECK-DAG: attributes #[[SHARED_ZA_DECL]] = {{{.*}} "aarch64_pstate_za_shared" {{.*}}}
-// CHECK-DAG: attributes #[[PRESERVES_ZA_DECL]] = {{{.*}} "aarch64_pstate_za_preserved" {{.*}}}
+// CHECK-DAG: attributes #[[SHARED_ZA_DEF]] = {{{.*}} "aarch64_inout_za" {{.*}}}
+// CHECK-DAG: attributes #[[SHARED_ZA_DECL]] = {{{.*}} "aarch64_inout_za" {{.*}}}
+// CHECK-DAG: attributes #[[PRESERVES_ZA_DECL]] = {{{.*}} "aarch64_preserves_za" {{.*}}}
 // CHECK-DAG: attributes #[[NORMAL_DEF]] = {{{.*}}}
 // CHECK-DAG: attributes #[[STREAMING_DECL]] = {{{.*}} "aarch64_pstate_sm_enabled" {{.*}}}
 // CHECK-DAG: attributes #[[STREAMING_COMPATIBLE_DECL]] = {{{.*}} "aarch64_pstate_sm_compatible" {{.*}}}
-// CHECK-DAG: attributes #[[SHARED_ZA_USE]] = { "aarch64_pstate_za_shared" }
-// CHECK-DAG: attributes #[[PRESERVES_ZA_USE]] = { "aarch64_pstate_za_preserved" "aarch64_pstate_za_shared" }
+// CHECK-DAG: attributes #[[SHARED_ZA_USE]] = { "aarch64_inout_za" }
+// CHECK-DAG: attributes #[[PRESERVES_ZA_USE]] = { "aarch64_preserves_za" }
 // CHECK-DAG: attributes #[[STREAMING_USE]] = { "aarch64_pstate_sm_enabled" }
 // CHECK-DAG: attributes #[[STREAMING_COMPATIBLE_USE]] = { "aarch64_pstate_sm_compatible" }
 
diff --git a/llvm/docs/AArch64SME.rst b/llvm/docs/AArch64SME.rst
index 63573bf91eacb..b5a01cb204b81 100644
--- a/llvm/docs/AArch64SME.rst
+++ b/llvm/docs/AArch64SME.rst
@@ -22,26 +22,32 @@ Below we describe the LLVM IR attributes and their relation to the C/C++
 level ACLE attributes:
 
 ``aarch64_pstate_sm_enabled``
-    is used for functions with ``__attribute__((arm_streaming))``
+    is used for functions with ``__arm_streaming``
 
 ``aarch64_pstate_sm_compatible``
-    is used for functions with ``__attribute__((arm_streaming_compatible))``
+    is used for functions with ``__arm_streaming_compatible``
 
 ``aarch64_pstate_sm_body``
-  is used for functions with ``__attribute__((arm_locally_streaming))`` and is
+  is used for functions with ``__arm_locally_streaming`` and is
   only valid on function definitions (not declarations)
 
-``aarch64_pstate_za_new``
-  is used for functions with ``__attribute__((arm_new_za))``
+``aarch64_new_za``
+  is used for functions with ``__arm_new("za")``
 
-``aarch64_pstate_za_shared``
-  is used for functions with ``__attribute__((arm_shared_za))``
+``aarch64_in_za``
+  is used for functions with ``__arm_in("za")``
 
-``aarch64_pstate_za_preserved``
-  is used for functions with ``__attribute__((arm_preserves_za))``
+``aarch64_out_za``
+  is used for functions with ``__arm_out("za")``
+
+``aarch64_inout_za``
+  is used for functions with ``__arm_inout("za")``
+
+``aarch64_preserves_za``
+  is used for functions with ``__arm_preserves("za")``
 
 ``aarch64_expanded_pstate_za``
-  is used for functions with ``__attribute__((arm_new_za))``
+  is used for functions with ``__arm_new_za``
 
 Clang must ensure that the above attributes are added both to the
 function's declaration/definition as well as to their call-sites. This is
@@ -89,11 +95,10 @@ Restrictions on attributes
 * It is not allowed for a function to be decorated with both
   ``aarch64_pstate_sm_compatible`` and ``aarch64_pstate_sm_enabled``.
 
-* It is not allowed for a function to be decorated with both
-  ``aarch64_pstate_za_new`` and ``aarch64_pstate_za_preserved``.
-
-* It is not allowed for a function to be decorated with both
-  ``aarch64_pstate_za_new`` and ``aarch64_pstate_za_shared``.
+* It is not allowed for a function to be decorated with more than one of the
+  following attributes:
+  ``aarch64_new_za``, ``aarch64_in_za``, ``aarch64_out_za``, ``aarch64_inout_za``,
+  ``aarch64_preserves_za``.
 
 These restrictions also apply in the higher level SME ACLE, which means we can
 emit diagnostics in Clang to signal users about incorrect behaviour.
@@ -426,7 +431,7 @@ to toggle PSTATE.ZA using intrinsics. This also makes it simpler to setup a
 lazy-save mechanism for calls to private-ZA functions (i.e. functions that may
 either directly or indirectly clobber ZA state).
 
-For the purpose of handling functions marked with ``aarch64_pstate_za_new``,
+For the purpose of handling functions marked with ``aarch64_new_za``,
 we have introduced a new LLVM IR pass (SMEABIPass) that is run just before
 SelectionDAG. Any such functions dealt with by this pass are marked with
 ``aarch64_expanded_pstate_za``.
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index f4c1508e4b7dd..8d992c232ca7c 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2155,17 +2155,13 @@ void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,
            V);
   }
 
-  if (Attrs.hasFnAttr("aarch64_pstate_za_new")) {
-    Check(!Attrs.hasFnAttr("aarch64_pstate_za_preserved"),
-           "Attributes 'aarch64_pstate_za_new and aarch64_pstate_za_preserved' "
-           "are incompatible!",
-           V);
-
-    Check(!Attrs.hasFnAttr("aarch64_pstate_za_shared"),
-           "Attributes 'aarch64_pstate_za_new and aarch64_pstate_za_shared' "
-           "are incompatible!",
-           V);
-  }
+  Check((Attrs.hasFnAttr("aarch64_new_za") + Attrs.hasFnAttr("aarch64_in_za") +
+         Attrs.hasFnAttr("aarch64_inout_za") +
+         Attrs.hasFnAttr("aarch64_out_za") +
+         Attrs.hasFnAttr("aarch64_preserves_za")) <= 1,
+        "Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', "
+        "'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive",
+        V);
 
   Check(
       (Attrs.hasFnAttr("aarch64_new_zt0") + Attrs.hasFnAttr("aarch64_in_zt0") +
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 992b11da7eeee..cdd2750521d2c 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -242,7 +242,7 @@ bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
     CalleeAttrs.set(SMEAttrs::SM_Enabled, true);
   }
 
-  if (CalleeAttrs.hasNewZABody())
+  if (CalleeAttrs.isNewZA())
     return false;
 
   if (CallerAttrs.requiresLazySave(CalleeAttrs) ||
diff --git a/llvm/lib/Target/AArch64/SMEABIPass.cpp b/llvm/lib/Target/AArch64/SMEABIPass.cpp
index 0247488ce93f1..23b3cc9ec6215 100644
--- a/llvm/lib/Target/AArch64/SMEABIPass.cpp
+++ b/llvm/lib/Target/AArch64/SMEABIPass.cpp
@@ -60,10 +60,8 @@ FunctionPass *llvm::createSMEABIPass() { return new SMEABI(); }
 void emitTPIDR2Save(Module *M, IRBuilder<> &Builder) {
   auto *TPIDR2SaveTy =
       FunctionType::get(Builder.getVoidTy(), {}, /*IsVarArgs=*/false);
-  auto Attrs =
-      AttributeList()
-          .addFnAttribute(M->getContext(), "aarch64_pstate_sm_compatible")
-          .addFnAttribute(M->getContext(), "aarch64_pstate_za_preserved");
+  auto Attrs = AttributeList().addFnAttribute(M->getContext(),
+                                              "aarch64_pstate_sm_compatible");
   FunctionCallee Callee =
       M->getOrInsertFunction("__arm_tpidr2_save", TPIDR2SaveTy, Attrs);
   CallInst *Call = Builder.CreateCall(Callee);
@@ -78,7 +76,7 @@ void emitTPIDR2Save(Module *M, IRBuilder<> &Builder) {
 }
 
 /// This function generates code at the beginning and end of a function marked
-/// with either `aarch64_pstate_za_new` or `aarch64_new_zt0`.
+/// with either `aarch64_new_za` or `aarch64_new_zt0`.
 /// At the beginning of the function, the following code is generated:
 ///  - Commit lazy-save if active   [Private-ZA Interface*]
 ///  - Enable PSTATE.ZA             [Private-ZA Interface]
@@ -133,7 +131,7 @@ bool SMEABI::updateNewStateFunctions(Module *M, Function *F,
     Builder.CreateCall(EnableZAIntr->getFunctionType(), EnableZAIntr);
   }
 
-  if (FnAttrs.hasNewZABody()) {
+  if (FnAttrs.isNewZA()) {
     Function *ZeroIntr =
         Intrinsic::getDeclaration(M, Intrinsic::aarch64_sme_zero);
     Builder.CreateCall(ZeroIntr->getFunctionType(), ZeroIntr,
@@ -174,7 +172,7 @@ bool SMEABI::runOnFunction(Function &F) {
 
   bool Changed = false;
   SMEAttrs FnAttrs(F);
-  if (FnAttrs.hasNewZABody() || FnAttrs.isNewZT0())
+  if (FnAttrs.isNewZA() || FnAttrs.isNewZT0())
     Changed |= updateNewStateFunctions(M, &F, Builder, FnAttrs);
 
   return Changed;
diff --git a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.cpp b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.cpp
index 3ee54e5df0a13..d399e0ac0794f 100644
--- a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.cpp
+++ b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.cpp
@@ -23,13 +23,15 @@ void SMEAttrs::set(unsigned M, bool Enable) {
          "SM_Enabled and SM_Compatible are mutually exclusive");
 
   // ZA Attrs
-  assert(!(hasNewZABody() && sharesZA()) &&
-         "ZA_New and ZA_Shared are mutually exclusive");
-  assert(!(hasNewZABody() && preservesZA()) &&
-         "ZA_New and ZA_Preserved are mutually exclusive");
-  assert(!(hasNewZABody() && (Bitmask & SME_ABI_Routine)) &&
+  assert(!(isNewZA() && (Bitmask & SME_ABI_Routine)) &&
          "ZA_New and SME_ABI_Routine are mutually exclusive");
 
+  assert(
+      (!sharesZA() ||
+       (isNewZA() ^ isInZA() ^ isInOutZA() ^ isOutZA() ^ isPreservesZA())) &&
+      "Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', "
+      "'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive");
+
   // ZT0 Attrs
   assert(
       (!sharesZT0() || (isNewZT0() ^ isInZT0() ^ isInOutZT0() ^ isOutZT0() ^
@@ -49,8 +51,8 @@ SMEAttrs::SMEAttrs(StringRef FuncName) : Bitmask(0) {
   if (FuncName == "__arm_tpidr2_save" || FuncName == "__arm_sme_state")
     Bitmask |= (SMEAttrs::SM_Compatible | SMEAttrs::SME_ABI_Routine);
   if (FuncName == "__arm_tpidr2_restore")
-    Bitmask |= (SMEAttrs::SM_Compatible | SMEAttrs::ZA_Shared |
-                SMEAttrs::SME_ABI_Routine);
+    Bitmask |= SMEAttrs::SM_Compatible | encodeZAState(StateValue::In) |
+               SMEAttrs::SME_ABI_Routine;
 }
 
 SMEAttrs::SMEAttrs(const AttributeList &Attrs) {
@@ -61,12 +63,16 @@ SMEAttrs::SMEAttrs(const AttributeList &Attrs) {
     Bitmask |= SM_Compatible;
   if (Attrs.hasFnAttr("aarch64_pstate_sm_body"))
     Bitmask |= SM_Body;
-  if (Attrs.hasFnAttr("aarch64_pstate_za_shared"))
-    Bitmask |= ZA_Shared;
-  if (Attrs.hasFnAttr("aarch64_pstate_za_new"))
-    Bitmask |= ZA_New;
-  if (Attrs.hasFnAttr("aarch64_pstate_za_preserved"))
-    Bitmask |= ZA_Preserved;
+  if (Attrs.hasFnAttr("aarch64_in_za"))
+    Bitmask |= encodeZAState(StateValue::In);
+  if (Attrs.hasFnAttr("aarch64_out_za"))
+    Bitmask |= encodeZAState(StateValue::Out);
+  if (Attrs.hasFnAttr("aarch64_inout_za"))
+    Bitmask |= encodeZAState(StateValue::InOut);
+  if (Attrs.hasFnAttr("aarch64_preserves_za"))
+    Bitmask |= encodeZAState(StateValue::Preserved);
+  if (Attrs.hasFnAttr("aarch64_new_za"))
+    Bitmask |= encodeZAState(StateValue::New);
   if (Attrs.hasFnAttr("aarch64_in_zt0"))
     Bitmask |= encodeZT0State(StateValue::In);
   if (Attrs.hasFnAttr("aarch64_out_zt0"))
diff --git a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
index 27b7075a0944f..4c7c1c9b07953 100644
--- a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
+++ b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
@@ -41,10 +41,9 @@ class SMEAttrs {
     SM_Enabled = 1 << 0,      // aarch64_pstate_sm_enabled
     SM_Compatible = 1 << 1,   // aarch64_pstate_sm_compatible
     SM_Body = 1 << 2,         // aarch64_pstate_sm_body
-    ZA_Shared = 1 << 3,       // aarch64_pstate_sm_shared
-    ZA_New = 1 << 4,          // aarch64_pstate_sm_new
-    ZA_Preserved = 1 << 5,    // aarch64_pstate_sm_preserved
-    SME_ABI_Routine = 1 << 6, // Used for SME ABI routines to avoid lazy saves
+    SME_ABI_Routine = 1 << 3, // Used for SME ABI routines to avoid lazy saves
+    ZA_Shift = 4,
+    ZA_Mask = 0b111 << ZA_Shift,
     ZT0_Shift = 7,
     ZT0_Mask = 0b111 << ZT0_Shift
   };
@@ -77,13 +76,29 @@ class SMEAttrs {
   /// streaming mode.
   bool requiresSMChange(const SMEAttrs &Callee) const;
 
-  // Interfaces to query PSTATE.ZA
-  bool hasNewZABody() const { return Bitmask & ZA_New; }
-  bool sharesZA() const { return Bitmask & ZA_Shared; }
+  // Interfaces to query ZA
+  static StateValue decodeZAState(unsigned Bitmask) {
+    return static_cast<StateValue>((Bitmask & ZA_Mask) >> ZA_Shift);
+  }
+  static unsigned encodeZAState(StateValue S) {
+    return static_cast<unsigned>(S) << ZA_Shift;
+  }
+
+  bool isNewZA() const { return decodeZAState(Bitmask) == StateValue::New; }
+  bool isInZA() const { return decodeZAState(Bitmask) == StateValue::In; }
+  bool isOutZA() const { return decodeZAState(Bitmask) == StateValue::Out; }
+  bool isInOutZA() const { return decodeZAState(Bitmask) == StateValue::InOut; }
+  bool isPreservesZA() const {
+    return decodeZAState(Bitmask) == StateValue::Preserved;
+  }
+  bool sharesZA() const {
+    StateValue State = decodeZAState(Bitmask);
+    return State == StateValue::In || State == StateValue::Out ||
+           State == StateValue::InOut || State == StateValue::Preserved;
+  }
   bool hasSharedZAInterface() const { return sharesZA() || sharesZT0(); }
   bool hasPrivateZAInterface() const { return !hasSharedZAInterface(); }
-  bool preservesZA() const { return Bitmask & ZA_Preserved; }
-  bool hasZAState() const { return hasNewZABody() || sharesZA(); }
+  bool hasZAState() const { return isNewZA() || sharesZA(); }
   bool requiresLazySave(const SMEAttrs &Callee) const {
     return hasZAState() && Callee.hasPrivateZAInterface() &&
            !(Callee.Bitmask & SME_ABI_Routine);
diff --git a/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll b/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll
index 381091b453943..2a78012045ff4 100644
--- a/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll
+++ b/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll
@@ -209,9 +209,9 @@ define void @normal_call_to_streaming_callee_ptr(ptr %p) nounwind noinline optno
 ; Check ZA state
 ;
 
-declare double @za_shared_callee(double) "aarch64_pstate_za_shared"
+declare double @za_shared_callee(double) "aarch64_inout_za"
 
-define double  @za_new_caller_to_za_shared_callee(double %x) nounwind noinline optnone "aarch64_pstate_za_new"{
+define double  @za_new_caller_to_za_shared_callee(double %x) nounwind noinline optnone "aarch64_new_za"{
 ; CHECK-COMMON-LABEL: za_new_caller_to_za_shared_callee:
 ; CHECK-COMMON:       // %bb.0: // %prelude
 ; CHECK-COMMON-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -248,7 +248,7 @@ entry:
   ret double %add;
 }
 
-define double  @za_shared_caller_to_za_none_callee(double %x) nounwind noinline optnone "aarch64_pstate_za_shared"{
+define double  @za_shared_caller_to_za_none_callee(double %x) nounwind noinline optnone "aarch64_inout_za"{
 ; CHECK-COMMON-LABEL: za_shared_caller_to_za_none_callee:
 ; CHECK-COMMON:       // %bb.0: // %entry
 ; CHECK-COMMON-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -288,7 +288,7 @@ entry:
 }
 
 ; Ensure we set up and restore the lazy save correctly for instructions which are lowered to lib calls.
-define fp128 @f128_call_za(fp128 %a, fp128 %b) "aarch64_pstate_za_shared" nounwind {
+define fp128 @f128_call_za(fp128 %a, fp128 %b) "aarch64_inout_za" nounwind {
 ; CHECK-COMMON-LABEL: f128_call_za:
 ; CHECK-COMMON:       // %bb.0:
 ; CHECK-COMMON-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -350,7 +350,7 @@ define fp128 @f128_call_sm(fp128 %a, fp128 %b) "aarch64_pstate_sm_enabled" nounw
 }
 
 ; As above this should use Selection DAG to make sure the libcall call is lowered correctly.
-define double @frem_call_za(double %a, double %b) "aarch64_pstate_za_shared" nounwind {
+define double @frem_call_za(double %a, double %b) "aarch64_inout_za" nounwind {
 ; CHECK-COMMON-LABEL: frem_call_za:
 ; CHECK-COMMON:       // %bb.0:
 ; CHECK-COMMON-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
diff --git a/llvm/test/CodeGen/AArch64/sme-lazy-save-call-remarks.ll b/llvm/test/CodeGen/AArch64/sme-lazy-save-call-remarks.ll
index d999311301f94..65e50842d5d78 100644
--- a/llvm/test/CodeGen/AArch64/sme-lazy-save-call-remarks.ll
+++ b/llvm/test/CodeGen/AArch64/sme-lazy-save-call-remarks.ll
@@ -4,13 +4,13 @@
 declare void @private_za_callee()
 declare float @llvm.cos.f32(float)
 
-define void @test_lazy_save_1_callee() nounwind "aarch64_pstate_za_shared" {
+define void @test_lazy_save_1_callee() nounwind "aarch64_inout_za" {
 ; CHECK: remark: <unknown>:0:0: call from 'test_lazy_save_1_callee' to 'private_za_callee' sets up a lazy save for ZA
   call void @private_za_callee()
   ret void
 }
 
-define void @test_lazy_save_2_callees() nounwind "aarch64_pstate_za_shared" {
+define void @test_lazy_save_2_callees() nounwind "aarch64_inout_za" {
 ; CHECK: remark: <unknown>:0:0: call from 'test_lazy_save_2_callees' to 'private_za_callee' sets up a lazy save for ZA
   call void @private_za_callee()
 ; CHECK: remark: <unknown>:0:0: call from 'test_lazy_save_2_callees' to 'private_za_callee' sets up a lazy save for ZA
@@ -18,7 +18,7 @@ define void @test_lazy_save_2_callees() nounwind "aarch64_pstate_za_shared" {
   ret void
 }
 
-define float @test_lazy_save_expanded_intrinsic(float %a) nounwind "aarch64_pstate_za_shared" {
+define float @test_lazy_save_expanded_intrinsic(float %a) nounwind "aarch64_inout_za" {
 ; CHECK: remark: <unknown>:0:0: call from 'test_lazy_save_expanded_intrinsic' to 'cosf' sets up a lazy save for ZA
   %res = call float @llvm.cos.f32(float %a)
   ret float %res
diff --git a/llvm/test/CodeGen/AArch64/sme-lazy-save-call.ll b/llvm/test/CodeGen/AArch64/sme-lazy-save-call.ll
index 9625e139bd0bc..9d635f0b88f19 100644
--- a/llvm/test/CodeGen/AArch64/sme-lazy-save-call.ll
+++ b/llvm/test/CodeGen/AArch64/sme-lazy-save-call.ll
@@ -5,7 +5,7 @@ declare void @private_za_callee()
 declare float @llvm.cos.f32(float)
 
 ; Test lazy-save mechanism for a single callee.
-define void @test_lazy_save_1_callee() nounwind "aarch64_pstate_za_shared" {
+define void @test_lazy_save_1_callee() nounwind "aarch64_inout_za" {
 ; CHECK-LABEL: test_lazy_save_1_callee:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -38,7 +38,7 @@ define void @test_lazy_save_1_callee() nounwind "aarch64_pstate_za_shared" {
 }
 
 ; Test lazy-save mechanism for multiple callees.
-define void @test_lazy_save_2_callees() nounwind "aarch64_pstate_za_shared" {
+define void @test_lazy_save_2_callees() nounwind "aarch64_inout_za" {
 ; CHECK-LABEL: test_lazy_save_2_callees:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-32]! // 16-byte Folded Spill
@@ -85,7 +85,7 @@ define void @test_lazy_save_2_callees() nounwind "aarch64_pstate_za_shared" {
 }
 
 ; Test a call of an intrinsic that gets expanded to a library call.
-define float @test_lazy_save_expanded_intrinsic(float %a) nounwind "aarch64_pstate_za_shared" {
+define float @test_lazy_save_expanded_intrinsic(float %a) nounwind "aarch64_inout_za" {
 ; CHECK-LABEL: test_lazy_save_expanded_intrinsic:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -118,7 +118,7 @@ define float @test_lazy_save_expanded_intrinsic(float %a) nounwind "aarch64_psta
 }
 
 ; Test a combination of streaming-compatible -> normal call with lazy-save.
-define void @test_lazy_save_and_conditional_smstart() nounwind "aarch64_pstate_za_shared" "aarch64_pstate_sm_compatible" {
+define void @test_lazy_save_and_conditional_smstart() nounwind "aarch64_inout_za" "aarch64_pstate_sm_compatible" {
 ; CHECK-LABEL: test_lazy_save_and_conditional_smstart:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp d15, d14, [sp, #-96]! // 16-byte Folded Spill
diff --git a/llvm/test/CodeGen/AArch64/sme-new-za-function.ll b/llvm/test/CodeGen/AArch64/sme-new-za-function.ll
index 0cee26dbb349e..04d26902c536a 100644
--- a/llvm/test/CodeGen/AArch64/sme-new-za-function.ll
+++ b/llvm/test/CodeGen/AArch64/sme-new-za-function.ll
@@ -1,9 +1,9 @@
 ; RUN: opt -S -mtriple=aarch64-linux-gnu -aarch64-sme-abi %s | FileCheck %s
 ; RUN: opt -S -mtriple=aarch64-linux-gnu -aarch64-sme-abi -aarch64-sme-abi %s | FileCheck %s
 
-declare void @shared_za_callee() "aarch64_pstate_za_shared"
+declare void @shared_za_callee() "aarch64_inout_za"
 
-define void @private_za() "aarch64_pstate_za_new" {
+define void @private_za() "aarch64_new_za" {
 ; CHECK-LABEL: @private_za(
 ; CHECK-NEXT:  prelude:
 ; CHECK-NEXT:    [[TPIDR2:%.*]] = call i64 @llvm.aarch64.sme.get.tpidr2()
@@ -24,7 +24,7 @@ define void @private_za() "aarch64_pstate_za_new" {
   ret void
 }
 
-define i32 @private_za_multiple_exit(i32 %a, i32 %b, i64 %cond) "aarch64_pstate_za_new" {
+define i32 @private_za_multiple_exit(i32 %a, i32 %b, i64 %cond) "aarch64_new_za" {
 ; CHECK-LABEL: @private_za_multiple_exit(
 ; CHECK-NEXT:  prelude:
 ; CHECK-NEXT:    [[TPIDR2:%.*]] = call i64 @llvm.aarch64.sme.get.tpidr2()
@@ -62,4 +62,4 @@ if.end:
 }
 
 ; CHECK: declare void @__arm_tpidr2_save() #[[ATTR:[0-9]+]]
-; CHECK: attributes #[[ATTR]] = { "aarch64_pstate_sm_compatible" "aarch64_pstate_za_preserved" }
+; CHECK: attributes #[[ATTR]] = { "aarch64_pstate_sm_compatible" }
diff --git a/llvm/test/CodeGen/AArch64/sme-shared-za-interface.ll b/llvm/test/CodeGen/AArch64/sme-shared-za-interface.ll
index a2e20013d94ff..cd7460b177c4b 100644
--- a/llvm/test/CodeGen/AArch64/sme-shared-za-interface.ll
+++ b/llvm/test/CodeGen/AArch64/sme-shared-za-interface.ll
@@ -4,7 +4,7 @@
 declare void @private_za_callee()
 
 ; Ensure that we don't use tail call optimization when a lazy-save is required.
-define void @disable_tailcallopt() "aarch64_pstate_za_shared" nounwind {
+define void @disable_tailcallopt() "aarch64_inout_za" nounwind {
 ; CHECK-LABEL: disable_tailcallopt:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -37,7 +37,7 @@ define void @disable_tailcallopt() "aarch64_pstate_za_shared" nounwind {
 }
 
 ; Ensure we set up and restore the lazy save correctly for instructions which are lowered to lib calls
-define fp128 @f128_call_za(fp128 %a, fp128 %b) "aarch64_pstate_za_shared" nounwind {
+define fp128 @f128_call_za(fp128 %a, fp128 %b) "aarch64_inout_za" nounwind {
 ; CHECK-LABEL: f128_call_za:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
diff --git a/llvm/test/CodeGen/AArch64/sme-zt0-state.ll b/llvm/test/CodeGen/AArch64/sme-zt0-state.ll
index 18d1e40bf4d0f..7f40b5e7e1344 100644
--- a/llvm/test/CodeGen/AArch64/sme-zt0-state.ll
+++ b/llvm/test/CodeGen/AArch64/sme-zt0-state.ll
@@ -30,7 +30,7 @@ define void @zt0_in_caller_no_state_callee() "aarch64_in_zt0" nounwind {
 ; Expect spill & fill of ZT0 around call
 ; Expect setup and restore lazy-save around call
 ; Expect smstart za after call
-define void @za_zt0_shared_caller_no_state_callee() "aarch64_pstate_za_shared" "aarch64_in_zt0" nounwind {
+define void @za_zt0_shared_caller_no_state_callee() "aarch64_inout_za" "aarch64_in_zt0" nounwind {
 ; CHECK-LABEL: za_zt0_shared_caller_no_state_callee:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-32]! // 16-byte Folded Spill
@@ -84,7 +84,7 @@ define void @zt0_shared_caller_zt0_shared_callee() "aarch64_in_zt0" nounwind {
 }
 
 ; Expect spill & fill of ZT0 around call
-define void @za_zt0_shared_caller_za_shared_callee() "aarch64_pstate_za_shared" "aarch64_in_zt0" nounwind {
+define void @za_zt0_shared_caller_za_shared_callee() "aarch64_inout_za" "aarch64_in_zt0" nounwind {
 ; CHECK-LABEL: za_zt0_shared_caller_za_shared_callee:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-32]! // 16-byte Folded Spill
@@ -106,12 +106,12 @@ define void @za_zt0_shared_caller_za_shared_callee() "aarch64_pstate_za_shared"
 ; CHECK-NEXT:    ldr x19, [sp, #16] // 8-byte Folded Reload
 ; CHECK-NEXT:    ldp x29, x30, [sp], #32 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
-  call void @callee() "aarch64_pstate_za_shared";
+  call void @callee() "aarch64_inout_za";
   ret void;
 }
 
 ; Caller and callee have shared ZA & ZT0
-define void @za_zt0_shared_caller_za_zt0_shared_callee() "aarch64_pstate_za_shared" "aarch64_in_zt0" nounwind {
+define void @za_zt0_shared_caller_za_zt0_shared_callee() "aarch64_inout_za" "aarch64_in_zt0" nounwind {
 ; CHECK-LABEL: za_zt0_shared_caller_za_zt0_shared_callee:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -128,7 +128,7 @@ define void @za_zt0_shared_caller_za_zt0_shared_callee() "aarch64_pstate_za_shar
 ; CHECK-NEXT:    mov sp, x29
 ; CHECK-NEXT:    ldp x29, x30, [sp], #16 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
-  call void @callee() "aarch64_pstate_za_shared" "aarch64_in_zt0";
+  call void @callee() "aarch64_inout_za" "aarch64_in_zt0";
   ret void;
 }
 
@@ -189,7 +189,7 @@ define void @zt0_new_caller() "aarch64_new_zt0" nounwind {
 ; Expect commit of lazy-save if ZA is dormant
 ; Expect smstart ZA, clear ZA & clear ZT0
 ; Before return, expect smstop ZA
-define void @new_za_zt0_caller() "aarch64_pstate_za_new" "aarch64_new_zt0" nounwind {
+define void @new_za_zt0_caller() "aarch64_new_za" "aarch64_new_zt0" nounwind {
 ; CHECK-LABEL: new_za_zt0_caller:
 ; CHECK:       // %bb.0: // %prelude
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -219,12 +219,12 @@ define void @new_za_zt0_caller() "aarch64_pstate_za_new" "aarch64_new_zt0" nounw
 ; CHECK-NEXT:    mov sp, x29
 ; CHECK-NEXT:    ldp x29, x30, [sp], #16 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
-  call void @callee() "aarch64_pstate_za_shared" "aarch64_in_zt0";
+  call void @callee() "aarch64_inout_za" "aarch64_in_zt0";
   ret void;
 }
 
 ; Expect clear ZA on entry
-define void @new_za_shared_zt0_caller() "aarch64_pstate_za_new" "aarch64_in_zt0" nounwind {
+define void @new_za_shared_zt0_caller() "aarch64_new_za" "aarch64_in_zt0" nounwind {
 ; CHECK-LABEL: new_za_shared_zt0_caller:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -242,12 +242,12 @@ define void @new_za_shared_zt0_caller() "aarch64_pstate_za_new" "aarch64_in_zt0"
 ; CHECK-NEXT:    mov sp, x29
 ; CHECK-NEXT:    ldp x29, x30, [sp], #16 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
-  call void @callee() "aarch64_pstate_za_shared" "aarch64_in_zt0";
+  call void @callee() "aarch64_inout_za" "aarch64_in_zt0";
   ret void;
 }
 
 ; Expect clear ZT0 on entry
-define void @shared_za_new_zt0() "aarch64_pstate_za_shared" "aarch64_new_zt0" nounwind {
+define void @shared_za_new_zt0() "aarch64_inout_za" "aarch64_new_zt0" nounwind {
 ; CHECK-LABEL: shared_za_new_zt0:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
@@ -265,6 +265,6 @@ define void @shared_za_new_zt0() "aarch64_pstate_za_shared" "aarch64_new_zt0" no
 ; CHECK-NEXT:    mov sp, x29
 ; CHECK-NEXT:    ldp x29, x30, [sp], #16 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
-  call void @callee() "aarch64_pstate_za_shared" "aarch64_in_zt0";
+  call void @callee() "aarch64_inout_za" "aarch64_in_zt0";
   ret void;
 }
diff --git a/llvm/test/Transforms/Inline/AArch64/sme-pstateza-attrs.ll b/llvm/test/Transforms/Inline/AArch64/sme-pstateza-attrs.ll
index 7fca45b1e43f6..816492768cc0f 100644
--- a/llvm/test/Transforms/Inline/AArch64/sme-pstateza-attrs.ll
+++ b/llvm/test/Transforms/Inline/AArch64/sme-pstateza-attrs.ll
@@ -22,7 +22,7 @@ entry:
   ret void
 }
 
-define void @shared_za_callee() "aarch64_pstate_za_shared" {
+define void @shared_za_callee() "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_callee
 ; CHECK-SAME: () #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:  entry:
@@ -34,7 +34,7 @@ entry:
   ret void
 }
 
-define void @new_za_callee() "aarch64_pstate_za_new" {
+define void @new_za_callee() "aarch64_new_za" {
 ; CHECK-LABEL: define void @new_za_callee
 ; CHECK-SAME: () #[[ATTR2:[0-9]+]] {
 ; CHECK-NEXT:    call void @inlined_body()
@@ -84,7 +84,7 @@ entry:
 ; [x] Z -> N
 ; [ ] Z -> S
 ; [ ] Z -> Z
-define void @new_za_caller_nonza_callee_inline() "aarch64_pstate_za_new" {
+define void @new_za_caller_nonza_callee_inline() "aarch64_new_za" {
 ; CHECK-LABEL: define void @new_za_caller_nonza_callee_inline
 ; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:  entry:
@@ -99,7 +99,7 @@ entry:
 ; [ ] Z -> N
 ; [x] Z -> S
 ; [ ] Z -> Z
-define void @new_za_caller_shared_za_callee_inline() "aarch64_pstate_za_new" {
+define void @new_za_caller_shared_za_callee_inline() "aarch64_new_za" {
 ; CHECK-LABEL: define void @new_za_caller_shared_za_callee_inline
 ; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:  entry:
@@ -114,7 +114,7 @@ entry:
 ; [ ] Z -> N
 ; [ ] Z -> S
 ; [x] Z -> Z
-define void @new_za_caller_new_za_callee_dont_inline() "aarch64_pstate_za_new" {
+define void @new_za_caller_new_za_callee_dont_inline() "aarch64_new_za" {
 ; CHECK-LABEL: define void @new_za_caller_new_za_callee_dont_inline
 ; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:  entry:
@@ -129,7 +129,7 @@ entry:
 ; [x] Z -> N
 ; [ ] Z -> S
 ; [ ] Z -> Z
-define void @shared_za_caller_nonza_callee_inline() "aarch64_pstate_za_shared" {
+define void @shared_za_caller_nonza_callee_inline() "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_caller_nonza_callee_inline
 ; CHECK-SAME: () #[[ATTR1]] {
 ; CHECK-NEXT:  entry:
@@ -144,7 +144,7 @@ entry:
 ; [ ] S -> N
 ; [x] S -> Z
 ; [ ] S -> S
-define void @shared_za_caller_new_za_callee_dont_inline() "aarch64_pstate_za_shared" {
+define void @shared_za_caller_new_za_callee_dont_inline() "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_caller_new_za_callee_dont_inline
 ; CHECK-SAME: () #[[ATTR1]] {
 ; CHECK-NEXT:  entry:
@@ -159,7 +159,7 @@ entry:
 ; [ ] S -> N
 ; [ ] S -> Z
 ; [x] S -> S
-define void @shared_za_caller_shared_za_callee_inline() "aarch64_pstate_za_shared" {
+define void @shared_za_caller_shared_za_callee_inline() "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_caller_shared_za_callee_inline
 ; CHECK-SAME: () #[[ATTR1]] {
 ; CHECK-NEXT:  entry:
@@ -181,7 +181,7 @@ define void @private_za_callee_call_za_disable() {
   ret void
 }
 
-define void @shared_za_caller_private_za_callee_call_za_disable() "aarch64_pstate_za_shared" {
+define void @shared_za_caller_private_za_callee_call_za_disable() "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_caller_private_za_callee_call_za_disable
 ; CHECK-SAME: () #[[ATTR1]] {
 ; CHECK-NEXT:    call void @private_za_callee_call_za_disable()
@@ -201,7 +201,7 @@ define void @private_za_callee_call_tpidr2_save() {
   ret void
 }
 
-define void @shared_za_caller_private_za_callee_call_tpidr2_save_dont_inline() "aarch64_pstate_za_shared" {
+define void @shared_za_caller_private_za_callee_call_tpidr2_save_dont_inline() "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_caller_private_za_callee_call_tpidr2_save_dont_inline
 ; CHECK-SAME: () #[[ATTR1]] {
 ; CHECK-NEXT:    call void @private_za_callee_call_tpidr2_save()
@@ -221,7 +221,7 @@ define void @private_za_callee_call_tpidr2_restore(ptr %ptr) {
   ret void
 }
 
-define void @shared_za_caller_private_za_callee_call_tpidr2_restore_dont_inline(ptr %ptr) "aarch64_pstate_za_shared" {
+define void @shared_za_caller_private_za_callee_call_tpidr2_restore_dont_inline(ptr %ptr) "aarch64_inout_za" {
 ; CHECK-LABEL: define void @shared_za_caller_private_za_callee_call_tpidr2_restore_dont_inline
 ; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR1]] {
 ; CHECK-NEXT:    call void @private_za_callee_call_tpidr2_restore(ptr [[PTR]])
diff --git a/llvm/test/Verifier/sme-attributes.ll b/llvm/test/Verifier/sme-attributes.ll
index 2b949951dc1bb..3d01613ebf2fe 100644
--- a/llvm/test/Verifier/sme-attributes.ll
+++ b/llvm/test/Verifier/sme-attributes.ll
@@ -3,11 +3,35 @@
 declare void @sm_attrs() "aarch64_pstate_sm_enabled" "aarch64_pstate_sm_compatible";
 ; CHECK: Attributes 'aarch64_pstate_sm_enabled and aarch64_pstate_sm_compatible' are incompatible!
 
-declare void @za_preserved() "aarch64_pstate_za_new" "aarch64_pstate_za_preserved";
-; CHECK: Attributes 'aarch64_pstate_za_new and aarch64_pstate_za_preserved' are incompatible!
+declare void @za_new_preserved() "aarch64_new_za" "aarch64_preserves_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
 
-declare void @za_shared() "aarch64_pstate_za_new" "aarch64_pstate_za_shared";
-; CHECK: Attributes 'aarch64_pstate_za_new and aarch64_pstate_za_shared' are incompatible!
+declare void @za_new_in() "aarch64_new_za" "aarch64_in_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_new_inout() "aarch64_new_za" "aarch64_inout_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_new_out() "aarch64_new_za" "aarch64_out_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_preserved_in() "aarch64_preserves_za" "aarch64_in_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_preserved_inout() "aarch64_preserves_za" "aarch64_inout_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_preserved_out() "aarch64_preserves_za" "aarch64_out_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_in_inout() "aarch64_in_za" "aarch64_inout_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_in_out() "aarch64_in_za" "aarch64_out_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
+
+declare void @za_inout_out() "aarch64_inout_za" "aarch64_out_za";
+; CHECK: Attributes 'aarch64_new_za', 'aarch64_in_za', 'aarch64_out_za', 'aarch64_inout_za' and 'aarch64_preserves_za' are mutually exclusive
 
 declare void @zt0_new_preserved() "aarch64_new_zt0" "aarch64_preserves_zt0";
 ; CHECK: Attributes 'aarch64_new_zt0', 'aarch64_in_zt0', 'aarch64_out_zt0', 'aarch64_inout_zt0' and 'aarch64_preserves_zt0' are mutually exclusive
diff --git a/llvm/unittests/Target/AArch64/SMEAttributesTest.cpp b/llvm/unittests/Target/AArch64/SMEAttributesTest.cpp
index 044de72449ec8..3af5e24168c8c 100644
--- a/llvm/unittests/Target/AArch64/SMEAttributesTest.cpp
+++ b/llvm/unittests/Target/AArch64/SMEAttributesTest.cpp
@@ -38,21 +38,21 @@ TEST(SMEAttributes, Constructors) {
               ->getFunction("foo"))
           .hasStreamingCompatibleInterface());
 
-  ASSERT_TRUE(SA(*parseIR("declare void @foo() \"aarch64_pstate_za_shared\"")
-                      ->getFunction("foo"))
-                  .sharesZA());
-
-  ASSERT_TRUE(SA(*parseIR("declare void @foo() \"aarch64_pstate_za_shared\"")
-                      ->getFunction("foo"))
-                  .hasSharedZAInterface());
-
-  ASSERT_TRUE(SA(*parseIR("declare void @foo() \"aarch64_pstate_za_new\"")
+  ASSERT_TRUE(
+      SA(*parseIR("declare void @foo() \"aarch64_in_za\"")->getFunction("foo"))
+          .isInZA());
+  ASSERT_TRUE(
+      SA(*parseIR("declare void @foo() \"aarch64_out_za\"")->getFunction("foo"))
+          .isOutZA());
+  ASSERT_TRUE(SA(*parseIR("declare void @foo() \"aarch64_inout_za\"")
                       ->getFunction("foo"))
-                  .hasNewZABody());
-
-  ASSERT_TRUE(SA(*parseIR("declare void @foo() \"aarch64_pstate_za_preserved\"")
+                  .isInOutZA());
+  ASSERT_TRUE(SA(*parseIR("declare void @foo() \"aarch64_preserves_za\"")
                       ->getFunction("foo"))
-                  .preservesZA());
+                  .isPreservesZA());
+  ASSERT_TRUE(
+      SA(*parseIR("declare void @foo() \"aarch64_new_za\"")->getFunction("foo"))
+          .isNewZA());
 
   ASSERT_TRUE(
       SA(*parseIR("declare void @foo() \"aarch64_in_zt0\"")->getFunction("foo"))
@@ -73,10 +73,6 @@ TEST(SMEAttributes, Constructors) {
   // Invalid combinations.
   EXPECT_DEBUG_DEATH(SA(SA::SM_Enabled | SA::SM_Compatible),
                      "SM_Enabled and SM_Compatible are mutually exclusive");
-  EXPECT_DEBUG_DEATH(SA(SA::ZA_New | SA::ZA_Shared),
-                     "ZA_New and ZA_Shared are mutually exclusive");
-  EXPECT_DEBUG_DEATH(SA(SA::ZA_New | SA::ZA_Preserved),
-                     "ZA_New and ZA_Preserved are mutually exclusive");
 
   // Test that the set() methods equally check validity.
   EXPECT_DEBUG_DEATH(SA(SA::SM_Enabled).set(SA::SM_Compatible),
@@ -99,29 +95,69 @@ TEST(SMEAttributes, Basics) {
   ASSERT_TRUE(SA(SA::SM_Compatible | SA::SM_Body).hasStreamingBody());
   ASSERT_FALSE(SA(SA::SM_Compatible | SA::SM_Body).hasNonStreamingInterface());
 
-  // Test PSTATE.ZA interfaces.
-  ASSERT_FALSE(SA(SA::ZA_Shared).hasPrivateZAInterface());
-  ASSERT_TRUE(SA(SA::ZA_Shared).hasSharedZAInterface());
-  ASSERT_TRUE(SA(SA::ZA_Shared).sharesZA());
-  ASSERT_TRUE(SA(SA::ZA_Shared).hasZAState());
-  ASSERT_FALSE(SA(SA::ZA_Shared).preservesZA());
-  ASSERT_TRUE(SA(SA::ZA_Shared | SA::ZA_Preserved).preservesZA());
-  ASSERT_FALSE(SA(SA::ZA_Shared).sharesZT0());
-  ASSERT_FALSE(SA(SA::ZA_Shared).hasZT0State());
-
-  ASSERT_TRUE(SA(SA::ZA_New).hasPrivateZAInterface());
-  ASSERT_FALSE(SA(SA::ZA_New).hasSharedZAInterface());
-  ASSERT_TRUE(SA(SA::ZA_New).hasNewZABody());
-  ASSERT_TRUE(SA(SA::ZA_New).hasZAState());
-  ASSERT_FALSE(SA(SA::ZA_New).preservesZA());
-  ASSERT_FALSE(SA(SA::ZA_New).sharesZT0());
-  ASSERT_FALSE(SA(SA::ZA_New).hasZT0State());
-
-  ASSERT_TRUE(SA(SA::Normal).hasPrivateZAInterface());
-  ASSERT_FALSE(SA(SA::Normal).hasSharedZAInterface());
-  ASSERT_FALSE(SA(SA::Normal).hasNewZABody());
+  // Test ZA State interfaces
+  SA ZA_In = SA(SA::encodeZAState(SA::StateValue::In));
+  ASSERT_TRUE(ZA_In.isInZA());
+  ASSERT_FALSE(ZA_In.isOutZA());
+  ASSERT_FALSE(ZA_In.isInOutZA());
+  ASSERT_FALSE(ZA_In.isPreservesZA());
+  ASSERT_FALSE(ZA_In.isNewZA());
+  ASSERT_TRUE(ZA_In.sharesZA());
+  ASSERT_TRUE(ZA_In.hasZAState());
+  ASSERT_TRUE(ZA_In.hasSharedZAInterface());
+  ASSERT_FALSE(ZA_In.hasPrivateZAInterface());
+
+  SA ZA_Out = SA(SA::encodeZAState(SA::StateValue::Out));
+  ASSERT_TRUE(ZA_Out.isOutZA());
+  ASSERT_FALSE(ZA_Out.isInZA());
+  ASSERT_FALSE(ZA_Out.isInOutZA());
+  ASSERT_FALSE(ZA_Out.isPreservesZA());
+  ASSERT_FALSE(ZA_Out.isNewZA());
+  ASSERT_TRUE(ZA_Out.sharesZA());
+  ASSERT_TRUE(ZA_Out.hasZAState());
+  ASSERT_TRUE(ZA_Out.hasSharedZAInterface());
+  ASSERT_FALSE(ZA_Out.hasPrivateZAInterface());
+
+  SA ZA_InOut = SA(SA::encodeZAState(SA::StateValue::InOut));
+  ASSERT_TRUE(ZA_InOut.isInOutZA());
+  ASSERT_FALSE(ZA_InOut.isInZA());
+  ASSERT_FALSE(ZA_InOut.isOutZA());
+  ASSERT_FALSE(ZA_InOut.isPreservesZA());
+  ASSERT_FALSE(ZA_InOut.isNewZA());
+  ASSERT_TRUE(ZA_InOut.sharesZA());
+  ASSERT_TRUE(ZA_InOut.hasZAState());
+  ASSERT_TRUE(ZA_InOut.hasSharedZAInterface());
+  ASSERT_FALSE(ZA_InOut.hasPrivateZAInterface());
+
+  SA ZA_Preserved = SA(SA::encodeZAState(SA::StateValue::Preserved));
+  ASSERT_TRUE(ZA_Preserved.isPreservesZA());
+  ASSERT_FALSE(ZA_Preserved.isInZA());
+  ASSERT_FALSE(ZA_Preserved.isOutZA());
+  ASSERT_FALSE(ZA_Preserved.isInOutZA());
+  ASSERT_FALSE(ZA_Preserved.isNewZA());
+  ASSERT_TRUE(ZA_Preserved.sharesZA());
+  ASSERT_TRUE(ZA_Preserved.hasZAState());
+  ASSERT_TRUE(ZA_Preserved.hasSharedZAInterface());
+  ASSERT_FALSE(ZA_Preserved.hasPrivateZAInterface());
+
+  SA ZA_New = SA(SA::encodeZAState(SA::StateValue::New));
+  ASSERT_TRUE(ZA_New.isNewZA());
+  ASSERT_FALSE(ZA_New.isInZA());
+  ASSERT_FALSE(ZA_New.isOutZA());
+  ASSERT_FALSE(ZA_New.isInOutZA());
+  ASSERT_FALSE(ZA_New.isPreservesZA());
+  ASSERT_FALSE(ZA_New.sharesZA());
+  ASSERT_TRUE(ZA_New.hasZAState());
+  ASSERT_FALSE(ZA_New.hasSharedZAInterface());
+  ASSERT_TRUE(ZA_New.hasPrivateZAInterface());
+
+  ASSERT_FALSE(SA(SA::Normal).isInZA());
+  ASSERT_FALSE(SA(SA::Normal).isOutZA());
+  ASSERT_FALSE(SA(SA::Normal).isInOutZA());
+  ASSERT_FALSE(SA(SA::Normal).isPreservesZA());
+  ASSERT_FALSE(SA(SA::Normal).isNewZA());
+  ASSERT_FALSE(SA(SA::Normal).sharesZA());
   ASSERT_FALSE(SA(SA::Normal).hasZAState());
-  ASSERT_FALSE(SA(SA::Normal).preservesZA());
 
   // Test ZT0 State interfaces
   SA ZT0_In = SA(SA::encodeZT0State(SA::StateValue::In));
@@ -245,9 +281,10 @@ TEST(SMEAttributes, Transitions) {
                    .requiresSMChange(SA(SA::SM_Compatible | SA::SM_Body)));
 
   SA Private_ZA = SA(SA::Normal);
-  SA ZA_Shared = SA(SA::ZA_Shared);
+  SA ZA_Shared = SA(SA::encodeZAState(SA::StateValue::In));
   SA ZT0_Shared = SA(SA::encodeZT0State(SA::StateValue::In));
-  SA ZA_ZT0_Shared = SA(SA::ZA_Shared | SA::encodeZT0State(SA::StateValue::In));
+  SA ZA_ZT0_Shared = SA(SA::encodeZAState(SA::StateValue::In) |
+                        SA::encodeZT0State(SA::StateValue::In));
 
   // Shared ZA -> Private ZA Interface
   ASSERT_FALSE(ZA_Shared.requiresDisablingZABeforeCall(Private_ZA));
diff --git a/mlir/include/mlir/Dialect/ArmSME/Transforms/Passes.td b/mlir/include/mlir/Dialect/ArmSME/Transforms/Passes.td
index 66027c5ba77bd..7959d291e8926 100644
--- a/mlir/include/mlir/Dialect/ArmSME/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/ArmSME/Transforms/Passes.td
@@ -43,10 +43,16 @@ def ArmZaMode : I32EnumAttr<"ArmZaMode", "Armv9 ZA storage mode",
       I32EnumAttrCase<"Disabled", 0, "disabled">,
       // A function's ZA state is created on entry and destroyed on exit.
       I32EnumAttrCase<"NewZA", 1, "arm_new_za">,
-      // A function that preserves ZA state.
-      I32EnumAttrCase<"PreservesZA", 2, "arm_preserves_za">,
-      // A function that uses ZA state as input and/or output
-      I32EnumAttrCase<"SharedZA", 3, "arm_shared_za">,
+      // A function with a Shared-ZA interfaces that takes ZA as input.
+      I32EnumAttrCase<"InZA", 2, "arm_in_za">,
+      // A function with a Shared-ZA interfaces that returns ZA as output.
+      I32EnumAttrCase<"OutZA", 3, "arm_out_za">,
+      // A function with a Shared-ZA interfaces that takes ZA as input and
+      // returns ZA as output.
+      I32EnumAttrCase<"InOutZA", 4, "arm_inout_za">,
+      // A function with a Shared-ZA interface that does not read ZA and
+      // returns with ZA unchanged.
+      I32EnumAttrCase<"PreservesZA", 5, "arm_preserves_za">,
     ]>{
   let cppNamespace = "mlir::arm_sme";
   let genSpecializedAttr = 0;
@@ -92,14 +98,23 @@ def EnableArmStreaming
                             "new-za",
                             "The function has ZA state. The ZA state is "
                             "created on entry and destroyed on exit."),
+                 clEnumValN(mlir::arm_sme::ArmZaMode::InZA,
+                            "in-za",
+                            "The function uses ZA state. The ZA state may "
+                            "be used for input."),
+                 clEnumValN(mlir::arm_sme::ArmZaMode::OutZA,
+                            "out-za",
+                            "The function uses ZA state. The ZA state may "
+                            "be used for output."),
+                 clEnumValN(mlir::arm_sme::ArmZaMode::InOutZA,
+                            "inout-za",
+                            "The function uses ZA state. The ZA state may "
+                            "be used for input and/or output."),
                  clEnumValN(mlir::arm_sme::ArmZaMode::PreservesZA,
                             "preserves-za",
-                            "The function preserves ZA state. The ZA state is "
-                            "saved on entry and restored on exit."),
-                 clEnumValN(mlir::arm_sme::ArmZaMode::SharedZA,
-                            "shared-za",
-                            "The function uses ZA state. The ZA state may "
-                            "be used for input and/or output.")
+                            "The function shares ZA state. The ZA state may "
+                            "not be used for input and/or output and the "
+                            "function must return with ZA unchanged")
            )}]>,
     Option<"onlyIfRequiredByOps", "only-if-required-by-ops", "bool",
            /*default=*/"false",
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
index ad67fba5a81cf..d9b130bdf18cb 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
@@ -1420,8 +1420,10 @@ def LLVM_LLVMFuncOp : LLVM_Op<"func", [
     OptionalAttr<UnitAttr>:$arm_locally_streaming,
     OptionalAttr<UnitAttr>:$arm_streaming_compatible,
     OptionalAttr<UnitAttr>:$arm_new_za,
+    OptionalAttr<UnitAttr>:$arm_in_za,
+    OptionalAttr<UnitAttr>:$arm_out_za,
+    OptionalAttr<UnitAttr>:$arm_inout_za,
     OptionalAttr<UnitAttr>:$arm_preserves_za,
-    OptionalAttr<UnitAttr>:$arm_shared_za,
     OptionalAttr<StrAttr>:$section,
     OptionalAttr<UnnamedAddr>:$unnamed_addr,
     OptionalAttr<I64Attr>:$alignment,
diff --git a/mlir/lib/Target/LLVMIR/ModuleImport.cpp b/mlir/lib/Target/LLVMIR/ModuleImport.cpp
index 5ca4a9fd68d65..97ccb2b29f3ae 100644
--- a/mlir/lib/Target/LLVMIR/ModuleImport.cpp
+++ b/mlir/lib/Target/LLVMIR/ModuleImport.cpp
@@ -1640,9 +1640,11 @@ static constexpr std::array ExplicitAttributes{
     StringLiteral("aarch64_pstate_sm_enabled"),
     StringLiteral("aarch64_pstate_sm_body"),
     StringLiteral("aarch64_pstate_sm_compatible"),
-    StringLiteral("aarch64_pstate_za_new"),
-    StringLiteral("aarch64_pstate_za_preserved"),
-    StringLiteral("aarch64_pstate_za_shared"),
+    StringLiteral("aarch64_new_za"),
+    StringLiteral("aarch64_preserves_za"),
+    StringLiteral("aarch64_in_za"),
+    StringLiteral("aarch64_out_za"),
+    StringLiteral("aarch64_inout_za"),
     StringLiteral("vscale_range"),
     StringLiteral("frame-pointer"),
     StringLiteral("target-features"),
@@ -1722,12 +1724,15 @@ void ModuleImport::processFunctionAttributes(llvm::Function *func,
   else if (func->hasFnAttribute("aarch64_pstate_sm_compatible"))
     funcOp.setArmStreamingCompatible(true);
 
-  if (func->hasFnAttribute("aarch64_pstate_za_new"))
+  if (func->hasFnAttribute("aarch64_new_za"))
     funcOp.setArmNewZa(true);
-  else if (func->hasFnAttribute("aarch64_pstate_za_shared"))
-    funcOp.setArmSharedZa(true);
-  // PreservedZA can be used with either NewZA or SharedZA.
-  if (func->hasFnAttribute("aarch64_pstate_za_preserved"))
+  else if (func->hasFnAttribute("aarch64_in_za"))
+    funcOp.setArmInZa(true);
+  else if (func->hasFnAttribute("aarch64_out_za"))
+    funcOp.setArmOutZa(true);
+  else if (func->hasFnAttribute("aarch64_inout_za"))
+    funcOp.setArmInoutZa(true);
+  else if (func->hasFnAttribute("aarch64_preserves_za"))
     funcOp.setArmPreservesZa(true);
 
   llvm::Attribute attr = func->getFnAttribute(llvm::Attribute::VScaleRange);
diff --git a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
index 6364cacbd1924..a54221580b28b 100644
--- a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
@@ -1198,11 +1198,15 @@ LogicalResult ModuleTranslation::convertOneFunction(LLVMFuncOp func) {
     llvmFunc->addFnAttr("aarch64_pstate_sm_compatible");
 
   if (func.getArmNewZa())
-    llvmFunc->addFnAttr("aarch64_pstate_za_new");
-  else if (func.getArmSharedZa())
-    llvmFunc->addFnAttr("aarch64_pstate_za_shared");
-  if (func.getArmPreservesZa())
-    llvmFunc->addFnAttr("aarch64_pstate_za_preserved");
+    llvmFunc->addFnAttr("aarch64_new_za");
+  else if (func.getArmInZa())
+    llvmFunc->addFnAttr("aarch64_in_za");
+  else if (func.getArmOutZa())
+    llvmFunc->addFnAttr("aarch64_out_za");
+  else if (func.getArmInoutZa())
+    llvmFunc->addFnAttr("aarch64_inout_za");
+  else if (func.getArmPreservesZa())
+    llvmFunc->addFnAttr("aarch64_preserves_za");
 
   if (auto targetCpu = func.getTargetCpu())
     llvmFunc->addFnAttr("target-cpu", *targetCpu);
diff --git a/mlir/test/Dialect/ArmSME/enable-arm-za.mlir b/mlir/test/Dialect/ArmSME/enable-arm-za.mlir
index 0aa00f75c3a56..a20203d7e5579 100644
--- a/mlir/test/Dialect/ArmSME/enable-arm-za.mlir
+++ b/mlir/test/Dialect/ArmSME/enable-arm-za.mlir
@@ -1,6 +1,8 @@
 // RUN: mlir-opt %s -enable-arm-streaming=za-mode=new-za -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=ENABLE-ZA
 // RUN: mlir-opt %s -enable-arm-streaming -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=DISABLE-ZA
-// RUN: mlir-opt %s -enable-arm-streaming=za-mode=shared-za -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=SHARED-ZA
+// RUN: mlir-opt %s -enable-arm-streaming=za-mode=in-za -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=IN-ZA
+// RUN: mlir-opt %s -enable-arm-streaming=za-mode=out-za -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=OUT-ZA
+// RUN: mlir-opt %s -enable-arm-streaming=za-mode=inout-za -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=INOUT-ZA
 // RUN: mlir-opt %s -enable-arm-streaming=za-mode=preserves-za -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=PRESERVES-ZA
 // RUN: mlir-opt %s -convert-arm-sme-to-llvm | FileCheck %s -check-prefix=NO-ARM-STREAMING
 
@@ -9,8 +11,12 @@ func.func private @declaration()
 
 // ENABLE-ZA-LABEL: @arm_new_za
 // ENABLE-ZA-SAME: attributes {arm_new_za, arm_streaming}
-// SHARED-ZA-LABEL: @arm_new_za
-// SHARED-ZA-SAME: attributes {arm_shared_za, arm_streaming}
+// IN-ZA-LABEL: @arm_new_za
+// IN-ZA-SAME: attributes {arm_in_za, arm_streaming}
+// OUT-ZA-LABEL: @arm_new_za
+// OUT-ZA-SAME: attributes {arm_out_za, arm_streaming}
+// INOUT-ZA-LABEL: @arm_new_za
+// INOUT-ZA-SAME: attributes {arm_inout_za, arm_streaming}
 // PRESERVES-ZA-LABEL: @arm_new_za
 // PRESERVES-ZA-SAME: attributes {arm_preserves_za, arm_streaming}
 // DISABLE-ZA-LABEL: @arm_new_za
@@ -19,6 +25,8 @@ func.func private @declaration()
 // NO-ARM-STREAMING-LABEL: @arm_new_za
 // NO-ARM-STREAMING-NOT: arm_new_za
 // NO-ARM-STREAMING-NOT: arm_streaming
-// NO-ARM-STREAMING-NOT: arm_shared_za
+// NO-ARM-STREAMING-NOT: arm_in_za
+// NO-ARM-STREAMING-NOT: arm_out_za
+// NO-ARM-STREAMING-NOT: arm_inout_za
 // NO-ARM-STREAMING-NOT: arm_preserves_za
 func.func @arm_new_za() { return }
diff --git a/mlir/test/Target/LLVMIR/Import/function-attributes.ll b/mlir/test/Target/LLVMIR/Import/function-attributes.ll
index c46db5e346434..f5fb06df49487 100644
--- a/mlir/test/Target/LLVMIR/Import/function-attributes.ll
+++ b/mlir/test/Target/LLVMIR/Import/function-attributes.ll
@@ -222,20 +222,32 @@ define void @streaming_compatible_func() "aarch64_pstate_sm_compatible" {
 
 ; CHECK-LABEL: @arm_new_za_func
 ; CHECK-SAME: attributes {arm_new_za}
-define void @arm_new_za_func() "aarch64_pstate_za_new" {
+define void @arm_new_za_func() "aarch64_new_za" {
   ret void
 }
 
 
-; CHECK-LABEL: @arm_preserves_za_func
-; CHECK-SAME: attributes {arm_preserves_za}
-define void @arm_preserves_za_func() "aarch64_pstate_za_preserved" {
+; CHECK-LABEL: @arm_in_za_func
+; CHECK-SAME: attributes {arm_in_za}
+define void @arm_in_za_func() "aarch64_in_za" {
+  ret void
+}
+
+; CHECK-LABEL: @arm_out_za_func
+; CHECK-SAME: attributes {arm_out_za}
+define void @arm_out_za_func() "aarch64_out_za" {
   ret void
 }
 
-; CHECK-LABEL: @arm_shared_za_func
-; CHECK-SAME: attributes {arm_shared_za}
-define void @arm_shared_za_func() "aarch64_pstate_za_shared" {
+; CHECK-LABEL: @arm_inout_za_func
+; CHECK-SAME: attributes {arm_inout_za}
+define void @arm_inout_za_func() "aarch64_inout_za" {
+  ret void
+}
+
+; CHECK-LABEL: @arm_preserves_za_func
+; CHECK-SAME: attributes {arm_preserves_za}
+define void @arm_preserves_za_func() "aarch64_preserves_za" {
   ret void
 }
 
diff --git a/mlir/test/Target/LLVMIR/llvmir.mlir b/mlir/test/Target/LLVMIR/llvmir.mlir
index 448aa3a5d85d7..63774bf0baf68 100644
--- a/mlir/test/Target/LLVMIR/llvmir.mlir
+++ b/mlir/test/Target/LLVMIR/llvmir.mlir
@@ -2358,21 +2358,35 @@ llvm.func @streaming_compatible_func() attributes {arm_streaming_compatible} {
 llvm.func @new_za_func() attributes {arm_new_za} {
   llvm.return
 }
-// CHECK #[[ATTR]] = { "aarch64_pstate_za_new" }
+// CHECK #[[ATTR]] = { "aarch64_new_za" }
 
-// CHECK-LABEL: @shared_za_func
+// CHECK-LABEL: @in_za_func
 // CHECK: #[[ATTR:[0-9]*]]
-llvm.func @shared_za_func() attributes {arm_shared_za } {
+llvm.func @in_za_func() attributes {arm_in_za } {
   llvm.return
 }
-// CHECK #[[ATTR]] = { "aarch64_pstate_za_shared" }
+// CHECK #[[ATTR]] = { "aarch64_in_za" }
+
+// CHECK-LABEL: @out_za_func
+// CHECK: #[[ATTR:[0-9]*]]
+llvm.func @out_za_func() attributes {arm_out_za } {
+  llvm.return
+}
+// CHECK #[[ATTR]] = { "aarch64_out_za" }
+
+// CHECK-LABEL: @inout_za_func
+// CHECK: #[[ATTR:[0-9]*]]
+llvm.func @inout_za_func() attributes {arm_inout_za } {
+  llvm.return
+}
+// CHECK #[[ATTR]] = { "aarch64_inout_za" }
 
 // CHECK-LABEL: @preserves_za_func
 // CHECK: #[[ATTR:[0-9]*]]
 llvm.func @preserves_za_func() attributes {arm_preserves_za} {
   llvm.return
 }
-// CHECK #[[ATTR]] = { "aarch64_pstate_za_preserved" }
+// CHECK #[[ATTR]] = { "aarch64_preserves_za" }
 
 // -----
 

>From 49f938896dc4135b3f8f33064b3142cafff37820 Mon Sep 17 00:00:00 2001
From: David Green <david.green at arm.com>
Date: Thu, 1 Feb 2024 13:42:14 +0000
Subject: [PATCH 12/42] [AArch64] Alter latency of FCSEL under Cortex-A510
 (#80178)

As per the Cortex-A510 software optimization guide, the latency of a
fcsel should be 3 not 4. It would previously get the latency from
WriteF.
---
 llvm/lib/Target/AArch64/AArch64SchedA510.td   |   2 +
 llvm/test/CodeGen/AArch64/select_fmf.ll       |  16 +-
 llvm/test/CodeGen/AArch64/tbl-loops.ll        |   8 +-
 .../AArch64/vecreduce-fmax-legalization.ll    | 140 +++++++++---------
 .../AArch64/vecreduce-fmin-legalization.ll    | 140 +++++++++---------
 .../AArch64/Cortex/A510-basic-instructions.s  |   4 +-
 6 files changed, 156 insertions(+), 154 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64SchedA510.td b/llvm/lib/Target/AArch64/AArch64SchedA510.td
index 1b66d6bb8fbd4..5e36b6f4d34a2 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedA510.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedA510.td
@@ -394,6 +394,8 @@ def : InstRW<[CortexA510WriteFSqrtHP], (instregex "^.*SQRT.*16$")>;
 def : InstRW<[CortexA510WriteFSqrtSP], (instregex "^.*SQRT.*32$")>;
 def : InstRW<[CortexA510WriteFSqrtDP], (instregex "^.*SQRT.*64$")>;
 
+def : InstRW<[CortexA510WriteFPALU_F3], (instrs FCSELHrrr, FCSELSrrr, FCSELDrrr)>;
+
 // 4.15. Advanced SIMD integer instructions
 // ASIMD absolute diff
 def : InstRW<[CortexA510Write<3, CortexA510UnitVALU>], (instregex "[SU]ABDv(2i32|4i16|8i8)")>;
diff --git a/llvm/test/CodeGen/AArch64/select_fmf.ll b/llvm/test/CodeGen/AArch64/select_fmf.ll
index 5479e5f3b88d2..92d8676ca04be 100644
--- a/llvm/test/CodeGen/AArch64/select_fmf.ll
+++ b/llvm/test/CodeGen/AArch64/select_fmf.ll
@@ -9,11 +9,11 @@ define float @select_select_fold_select_and(float %w, float %x, float %y, float
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    fminnm s4, s1, s2
 ; CHECK-NEXT:    fcmp s1, s2
-; CHECK-NEXT:    fmaxnm s1, s0, s3
+; CHECK-NEXT:    fmaxnm s2, s0, s3
+; CHECK-NEXT:    fmov s1, #0.50000000
 ; CHECK-NEXT:    fccmp s4, s0, #4, lt
-; CHECK-NEXT:    fmov s4, #0.50000000
-; CHECK-NEXT:    fcsel s2, s1, s0, gt
-; CHECK-NEXT:    fadd s1, s0, s4
+; CHECK-NEXT:    fadd s1, s0, s1
+; CHECK-NEXT:    fcsel s2, s2, s0, gt
 ; CHECK-NEXT:    fadd s4, s1, s2
 ; CHECK-NEXT:    fcmp s4, s1
 ; CHECK-NEXT:    b.le .LBB0_2
@@ -67,11 +67,11 @@ define float @select_select_fold_select_or(float %w, float %x, float %y, float %
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    fminnm s4, s1, s2
 ; CHECK-NEXT:    fcmp s1, s2
-; CHECK-NEXT:    fmaxnm s1, s0, s3
+; CHECK-NEXT:    fmaxnm s2, s0, s3
+; CHECK-NEXT:    fmov s1, #0.50000000
 ; CHECK-NEXT:    fccmp s4, s0, #0, ge
-; CHECK-NEXT:    fmov s4, #0.50000000
-; CHECK-NEXT:    fcsel s2, s0, s1, gt
-; CHECK-NEXT:    fadd s1, s0, s4
+; CHECK-NEXT:    fadd s1, s0, s1
+; CHECK-NEXT:    fcsel s2, s0, s2, gt
 ; CHECK-NEXT:    fadd s4, s1, s2
 ; CHECK-NEXT:    fcmp s4, s1
 ; CHECK-NEXT:    b.le .LBB1_2
diff --git a/llvm/test/CodeGen/AArch64/tbl-loops.ll b/llvm/test/CodeGen/AArch64/tbl-loops.ll
index 365fe03ab0b08..4f8a4f7aede3e 100644
--- a/llvm/test/CodeGen/AArch64/tbl-loops.ll
+++ b/llvm/test/CodeGen/AArch64/tbl-loops.ll
@@ -562,25 +562,25 @@ define void @loop4(ptr noalias nocapture noundef writeonly %dst, ptr nocapture n
 ; CHECK-NEXT:    fcmp s3, s1
 ; CHECK-NEXT:    fcsel s4, s1, s3, gt
 ; CHECK-NEXT:    fcmp s3, #0.0
-; CHECK-NEXT:    ldp s3, s5, [x8, #8]
 ; CHECK-NEXT:    fcvtzs w11, s2
+; CHECK-NEXT:    ldp s3, s5, [x8, #8]
 ; CHECK-NEXT:    add x8, x8, #16
 ; CHECK-NEXT:    fcsel s4, s0, s4, mi
 ; CHECK-NEXT:    fcmp s3, s1
 ; CHECK-NEXT:    strb w11, [x9]
+; CHECK-NEXT:    fcvtzs w12, s4
 ; CHECK-NEXT:    fcsel s6, s1, s3, gt
 ; CHECK-NEXT:    fcmp s3, #0.0
-; CHECK-NEXT:    fcvtzs w12, s4
 ; CHECK-NEXT:    fcsel s3, s0, s6, mi
 ; CHECK-NEXT:    fcmp s5, s1
 ; CHECK-NEXT:    strb w12, [x9, #1]
 ; CHECK-NEXT:    fcsel s6, s1, s5, gt
 ; CHECK-NEXT:    fcmp s5, #0.0
 ; CHECK-NEXT:    fcvtzs w13, s3
-; CHECK-NEXT:    fcsel s5, s0, s6, mi
+; CHECK-NEXT:    fcsel s2, s0, s6, mi
 ; CHECK-NEXT:    subs w10, w10, #1
 ; CHECK-NEXT:    strb w13, [x9, #2]
-; CHECK-NEXT:    fcvtzs w14, s5
+; CHECK-NEXT:    fcvtzs w14, s2
 ; CHECK-NEXT:    strb w14, [x9, #3]
 ; CHECK-NEXT:    add x9, x9, #4
 ; CHECK-NEXT:    b.ne .LBB3_6
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll b/llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
index 4f1e3fdc34fcd..16b34cce93293 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
@@ -242,34 +242,34 @@ define half @test_v16f16(<16 x half> %a) nounwind {
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, gt
 ; CHECK-NOFP-SD-NEXT:    mov h4, v1.h[3]
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[3]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fmaxnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, gt
 ; CHECK-NOFP-SD-NEXT:    mov h4, v1.h[4]
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[4]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fmaxnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, gt
 ; CHECK-NOFP-SD-NEXT:    mov h4, v1.h[5]
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[5]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fmaxnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, gt
@@ -277,24 +277,24 @@ define half @test_v16f16(<16 x half> %a) nounwind {
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[6]
 ; CHECK-NOFP-SD-NEXT:    mov h1, v1.h[7]
 ; CHECK-NOFP-SD-NEXT:    mov h0, v0.h[7]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
 ; CHECK-NOFP-SD-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-SD-NEXT:    fcvt s0, h0
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fmaxnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, gt
 ; CHECK-NOFP-SD-NEXT:    fcmp s0, s1
+; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s0, s0, s1, gt
 ; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
-; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-SD-NEXT:    fcvt h0, s0
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-SD-NEXT:    fmaxnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcvt h1, s2
@@ -420,6 +420,7 @@ define half @test_v11f16(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcvt s17, h17
 ; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-NEXT:    fcmp s1, s16
 ; CHECK-NOFP-NEXT:    fcsel s1, s1, s16, gt
 ; CHECK-NOFP-NEXT:    fcmp s0, s17
@@ -427,8 +428,8 @@ define half @test_v11f16(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcsel s0, s0, s17, gt
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
+; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
@@ -436,50 +437,49 @@ define half @test_v11f16(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    ldr h2, [x8, :lo12:.LCPI14_0]
 ; CHECK-NOFP-NEXT:    mov w8, #-8388608 // =0xff800000
 ; CHECK-NOFP-NEXT:    fcvt s2, h2
-; CHECK-NOFP-NEXT:    fmov s16, w8
-; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcmp s3, s2
-; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
+; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h4
+; CHECK-NOFP-NEXT:    fmov s1, w8
+; CHECK-NOFP-NEXT:    fcsel s3, s3, s1, gt
+; CHECK-NOFP-NEXT:    fcmp s4, s2
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h5
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, gt
+; CHECK-NOFP-NEXT:    fcvt s4, h5
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h6
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, gt
+; CHECK-NOFP-NEXT:    fcvt s4, h6
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h7
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, gt
+; CHECK-NOFP-NEXT:    fcvt s4, h7
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcsel s1, s4, s1, gt
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
+; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    ret
@@ -527,6 +527,7 @@ define half @test_v11f16_ninf(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcvt s17, h17
 ; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-NEXT:    fcmp s1, s16
 ; CHECK-NOFP-NEXT:    fcsel s1, s1, s16, gt
 ; CHECK-NOFP-NEXT:    fcmp s0, s17
@@ -534,8 +535,8 @@ define half @test_v11f16_ninf(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcsel s0, s0, s17, gt
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
+; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
@@ -544,50 +545,49 @@ define half @test_v11f16_ninf(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    mov w8, #57344 // =0xe000
 ; CHECK-NOFP-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-NEXT:    movk w8, #51071, lsl #16
-; CHECK-NOFP-NEXT:    fmov s16, w8
-; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcmp s3, s2
-; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
+; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h4
+; CHECK-NOFP-NEXT:    fmov s1, w8
+; CHECK-NOFP-NEXT:    fcsel s3, s3, s1, gt
+; CHECK-NOFP-NEXT:    fcmp s4, s2
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h5
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, gt
+; CHECK-NOFP-NEXT:    fcvt s4, h5
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h6
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, gt
+; CHECK-NOFP-NEXT:    fcvt s4, h6
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
-; CHECK-NOFP-NEXT:    fcvt s3, h7
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, gt
+; CHECK-NOFP-NEXT:    fcvt s4, h7
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcsel s1, s4, s1, gt
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
+; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s3
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, gt
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fmaxnm s0, s0, s1
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    ret
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll b/llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll
index a2bfc3c438da3..497109dfeaf09 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll
@@ -242,34 +242,34 @@ define half @test_v16f16(<16 x half> %a) nounwind {
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, lt
 ; CHECK-NOFP-SD-NEXT:    mov h4, v1.h[3]
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[3]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fminnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, lt
 ; CHECK-NOFP-SD-NEXT:    mov h4, v1.h[4]
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[4]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fminnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, lt
 ; CHECK-NOFP-SD-NEXT:    mov h4, v1.h[5]
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[5]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fminnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, lt
@@ -277,24 +277,24 @@ define half @test_v16f16(<16 x half> %a) nounwind {
 ; CHECK-NOFP-SD-NEXT:    mov h5, v0.h[6]
 ; CHECK-NOFP-SD-NEXT:    mov h1, v1.h[7]
 ; CHECK-NOFP-SD-NEXT:    mov h0, v0.h[7]
-; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
+; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
 ; CHECK-NOFP-SD-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-SD-NEXT:    fcvt s5, h5
 ; CHECK-NOFP-SD-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-SD-NEXT:    fcvt s0, h0
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcmp s5, s4
 ; CHECK-NOFP-SD-NEXT:    fminnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s3, s5, s4, lt
 ; CHECK-NOFP-SD-NEXT:    fcmp s0, s1
+; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-SD-NEXT:    fcsel s0, s0, s1, lt
 ; CHECK-NOFP-SD-NEXT:    fcvt h2, s2
-; CHECK-NOFP-SD-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-SD-NEXT:    fcvt h0, s0
-; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s3, h3
+; CHECK-NOFP-SD-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-SD-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-SD-NEXT:    fminnm s2, s2, s3
 ; CHECK-NOFP-SD-NEXT:    fcvt h1, s2
@@ -420,6 +420,7 @@ define half @test_v11f16(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcvt s17, h17
 ; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-NEXT:    fcmp s1, s16
 ; CHECK-NOFP-NEXT:    fcsel s1, s1, s16, lt
 ; CHECK-NOFP-NEXT:    fcmp s0, s17
@@ -427,8 +428,8 @@ define half @test_v11f16(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcsel s0, s0, s17, lt
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
+; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
@@ -436,50 +437,49 @@ define half @test_v11f16(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    ldr h2, [x8, :lo12:.LCPI14_0]
 ; CHECK-NOFP-NEXT:    mov w8, #2139095040 // =0x7f800000
 ; CHECK-NOFP-NEXT:    fcvt s2, h2
-; CHECK-NOFP-NEXT:    fmov s16, w8
-; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcmp s3, s2
-; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
+; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h4
+; CHECK-NOFP-NEXT:    fmov s1, w8
+; CHECK-NOFP-NEXT:    fcsel s3, s3, s1, lt
+; CHECK-NOFP-NEXT:    fcmp s4, s2
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h5
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, lt
+; CHECK-NOFP-NEXT:    fcvt s4, h5
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h6
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, lt
+; CHECK-NOFP-NEXT:    fcvt s4, h6
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h7
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, lt
+; CHECK-NOFP-NEXT:    fcvt s4, h7
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcsel s1, s4, s1, lt
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
+; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    ret
@@ -527,6 +527,7 @@ define half @test_v11f16_ninf(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcvt s17, h17
 ; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcvt s4, h4
 ; CHECK-NOFP-NEXT:    fcmp s1, s16
 ; CHECK-NOFP-NEXT:    fcsel s1, s1, s16, lt
 ; CHECK-NOFP-NEXT:    fcmp s0, s17
@@ -534,8 +535,8 @@ define half @test_v11f16_ninf(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    fcvt s16, h16
 ; CHECK-NOFP-NEXT:    fcsel s0, s0, s17, lt
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
+; CHECK-NOFP-NEXT:    fcmp s2, s16
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
@@ -544,50 +545,49 @@ define half @test_v11f16_ninf(<11 x half> %a) nounwind {
 ; CHECK-NOFP-NEXT:    mov w8, #57344 // =0xe000
 ; CHECK-NOFP-NEXT:    fcvt s2, h2
 ; CHECK-NOFP-NEXT:    movk w8, #18303, lsl #16
-; CHECK-NOFP-NEXT:    fmov s16, w8
-; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    fcmp s3, s2
-; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
+; CHECK-NOFP-NEXT:    fcvt s0, h0
 ; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h4
+; CHECK-NOFP-NEXT:    fmov s1, w8
+; CHECK-NOFP-NEXT:    fcsel s3, s3, s1, lt
+; CHECK-NOFP-NEXT:    fcmp s4, s2
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h5
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, lt
+; CHECK-NOFP-NEXT:    fcvt s4, h5
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h6
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, lt
+; CHECK-NOFP-NEXT:    fcvt s4, h6
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
-; CHECK-NOFP-NEXT:    fcvt s3, h7
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
+; CHECK-NOFP-NEXT:    fcsel s3, s4, s1, lt
+; CHECK-NOFP-NEXT:    fcvt s4, h7
+; CHECK-NOFP-NEXT:    fcvt h3, s3
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
-; CHECK-NOFP-NEXT:    fcmp s3, s2
+; CHECK-NOFP-NEXT:    fcmp s4, s2
+; CHECK-NOFP-NEXT:    fcvt s3, h3
+; CHECK-NOFP-NEXT:    fcsel s1, s4, s1, lt
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
+; CHECK-NOFP-NEXT:    fcvt h1, s1
+; CHECK-NOFP-NEXT:    fminnm s0, s0, s3
 ; CHECK-NOFP-NEXT:    fcvt s1, h1
-; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
-; CHECK-NOFP-NEXT:    fcsel s1, s3, s16, lt
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
-; CHECK-NOFP-NEXT:    fcvt h1, s1
 ; CHECK-NOFP-NEXT:    fcvt s0, h0
-; CHECK-NOFP-NEXT:    fcvt s1, h1
 ; CHECK-NOFP-NEXT:    fminnm s0, s0, s1
 ; CHECK-NOFP-NEXT:    fcvt h0, s0
 ; CHECK-NOFP-NEXT:    ret
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/A510-basic-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/A510-basic-instructions.s
index 1edb0b8ad21e0..8a5df91ad7973 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Cortex/A510-basic-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/A510-basic-instructions.s
@@ -1886,8 +1886,8 @@ drps
 # CHECK-NEXT:  1      3     0.50                        fccmpe	d9, d31, #0, le
 # CHECK-NEXT:  1      3     0.50                        fccmpe	d3, d0, #15, gt
 # CHECK-NEXT:  1      3     0.50                        fccmpe	d31, d5, #7, ne
-# CHECK-NEXT:  1      4     0.50                        fcsel	s3, s20, s9, pl
-# CHECK-NEXT:  1      4     0.50                        fcsel	d9, d10, d11, mi
+# CHECK-NEXT:  1      3     0.50                        fcsel	s3, s20, s9, pl
+# CHECK-NEXT:  1      3     0.50                        fcsel	d9, d10, d11, mi
 # CHECK-NEXT:  1      4     0.50                        fmov	s0, s1
 # CHECK-NEXT:  1      4     0.50                        fabs	s2, s3
 # CHECK-NEXT:  1      4     0.50                        fneg	s4, s5

>From ce7e896c1fe035b9a3578a9688c2998f56d1f640 Mon Sep 17 00:00:00 2001
From: Hsiangkai Wang <hsiangkai.wang at arm.com>
Date: Thu, 1 Feb 2024 13:57:31 +0000
Subject: [PATCH 13/42] [mlir][scf] Considering defining operators of indices
 when fusing scf::ParallelOp (#80145)

When checking the load indices of the second loop coincide with the
store indices of the first loop, it only considers the index values are
the same or not. However, there are some cases the index values defined
by other operators. In these cases, it will treat them as different even
the results of defining operators are the same.

We already check if the iteration space is the same in isFusionLegal().
When checking operands of defining operators, we only need to consider
the operands come from the same induction variables. If so, we know the
results of defining operators are the same.
---
 .../SCF/Transforms/ParallelLoopFusion.cpp     | 27 +++++-
 .../Dialect/SCF/parallel-loop-fusion.mlir     | 95 +++++++++++++++++++
 2 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp b/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
index 8f2ab5f5e6dc1..d3dca1427e517 100644
--- a/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
@@ -19,6 +19,7 @@
 #include "mlir/IR/Builders.h"
 #include "mlir/IR/IRMapping.h"
 #include "mlir/IR/OpDefinition.h"
+#include "mlir/IR/OperationSupport.h"
 #include "mlir/Interfaces/SideEffectInterfaces.h"
 
 namespace mlir {
@@ -102,8 +103,30 @@ static bool haveNoReadsAfterWriteExceptSameIndex(
       return WalkResult::interrupt();
     for (int i = 0, e = storeIndices.size(); i < e; ++i) {
       if (firstToSecondPloopIndices.lookupOrDefault(storeIndices[i]) !=
-          loadIndices[i])
-        return WalkResult::interrupt();
+          loadIndices[i]) {
+        auto *storeIndexDefOp = storeIndices[i].getDefiningOp();
+        auto *loadIndexDefOp = loadIndices[i].getDefiningOp();
+        if (storeIndexDefOp && loadIndexDefOp) {
+          if (!isMemoryEffectFree(storeIndexDefOp))
+            return WalkResult::interrupt();
+          if (!isMemoryEffectFree(loadIndexDefOp))
+            return WalkResult::interrupt();
+          if (!OperationEquivalence::isEquivalentTo(
+                  storeIndexDefOp, loadIndexDefOp,
+                  [&](Value storeIndex, Value loadIndex) {
+                    if (firstToSecondPloopIndices.lookupOrDefault(storeIndex) !=
+                        firstToSecondPloopIndices.lookupOrDefault(loadIndex))
+                      return failure();
+                    else
+                      return success();
+                  },
+                  /*markEquivalent=*/nullptr,
+                  OperationEquivalence::Flags::IgnoreLocations)) {
+            return WalkResult::interrupt();
+          }
+        } else
+          return WalkResult::interrupt();
+      }
     }
     return WalkResult::advance();
   });
diff --git a/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir b/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
index 110168ba6eca5..9c136bb635658 100644
--- a/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
+++ b/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
@@ -480,3 +480,98 @@ func.func @do_not_fuse_multiple_stores_on_diff_indices(
 // CHECK:        scf.reduce
 // CHECK:      }
 // CHECK:      memref.dealloc [[SUM]]
+
+// -----
+
+func.func @fuse_same_indices_by_affine_apply(
+  %A: memref<2x2xf32>, %B: memref<2x2xf32>) {
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %c2 = arith.constant 2 : index
+  %sum = memref.alloc()  : memref<2x3xf32>
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%i, %j)
+    memref.store %B_elem, %sum[%i, %1] : memref<2x3xf32>
+    scf.reduce
+  }
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
+    %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%i, %j)
+    %sum_elem = memref.load %sum[%i, %1] : memref<2x3xf32>
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    %product = arith.mulf %sum_elem, %A_elem : f32
+    memref.store %product, %B[%i, %j] : memref<2x2xf32>
+    scf.reduce
+  }
+  memref.dealloc %sum : memref<2x3xf32>
+  return
+}
+// CHECK:      #[[$MAP:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
+// CHECK-LABEL: fuse_same_indices_by_affine_apply
+// CHECK-SAME:  (%[[ARG0:.*]]: memref<2x2xf32>, %[[ARG1:.*]]: memref<2x2xf32>) {
+// CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
+// CHECK-DAG:   %[[C1:.*]] = arith.constant 1 : index
+// CHECK-DAG:   %[[C2:.*]] = arith.constant 2 : index
+// CHECK:       %[[ALLOC:.*]] = memref.alloc() : memref<2x3xf32>
+// CHECK-NEXT:  scf.parallel (%[[ARG2:.*]], %[[ARG3:.*]]) = (%[[C0]], %[[C0]]) to (%[[C2]], %[[C2]]) step (%[[C1]], %[[C1]]) {
+// CHECK-NEXT:    %[[S0:.*]] = memref.load %[[ARG1]][%[[ARG2]], %[[ARG3]]] : memref<2x2xf32>
+// CHECK-NEXT:    %[[S1:.*]] = affine.apply #[[$MAP]](%[[ARG2]], %[[ARG3]])
+// CHECK-NEXT:    memref.store %[[S0]], %[[ALLOC]][%[[ARG2]], %[[S1]]] : memref<2x3xf32>
+// CHECK-NEXT:    %[[S2:.*]] = affine.apply #[[$MAP]](%[[ARG2]], %[[ARG3]])
+// CHECK-NEXT:    %[[S3:.*]] = memref.load %[[ALLOC]][%[[ARG2]], %[[S2]]] : memref<2x3xf32>
+// CHECK-NEXT:    %[[S4:.*]] = memref.load %[[ARG0]][%[[ARG2]], %[[ARG3]]] : memref<2x2xf32>
+// CHECK-NEXT:    %[[S5:.*]] = arith.mulf %[[S3]], %[[S4]] : f32
+// CHECK-NEXT:    memref.store %[[S5]], %[[ARG1]][%[[ARG2]], %[[ARG3]]] : memref<2x2xf32>
+// CHECK-NEXT:    scf.reduce
+// CHECK-NEXT:  }
+// CHECK-NEXT:  memref.dealloc %[[ALLOC]] : memref<2x3xf32>
+// CHECK-NEXT:  return
+
+// -----
+
+func.func @do_not_fuse_affine_apply_to_non_ind_var(
+  %A: memref<2x2xf32>, %B: memref<2x2xf32>, %OffsetA: index, %OffsetB: index) {
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %c2 = arith.constant 2 : index
+  %sum = memref.alloc()  : memref<2x3xf32>
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%i, %OffsetA)
+    memref.store %B_elem, %sum[%i, %1] : memref<2x3xf32>
+    scf.reduce
+  }
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
+    %1 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%i, %OffsetB)
+    %sum_elem = memref.load %sum[%i, %1] : memref<2x3xf32>
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    %product = arith.mulf %sum_elem, %A_elem : f32
+    memref.store %product, %B[%i, %j] : memref<2x2xf32>
+    scf.reduce
+  }
+  memref.dealloc %sum : memref<2x3xf32>
+  return
+}
+// CHECK:       #[[$MAP:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
+// CHECK-LABEL: do_not_fuse_affine_apply_to_non_ind_var
+// CHECK-SAME:  (%[[ARG0:.*]]: memref<2x2xf32>, %[[ARG1:.*]]: memref<2x2xf32>, %[[ARG2:.*]]: index, %[[ARG3:.*]]: index) {
+// CHECK-DAG:     %[[C0:.*]] = arith.constant 0 : index
+// CHECK-DAG:     %[[C1:.*]] = arith.constant 1 : index
+// CHECK-DAG:     %[[C2:.*]] = arith.constant 2 : index
+// CHECK:         %[[ALLOC:.*]] = memref.alloc() : memref<2x3xf32>
+// CHECK-NEXT:    scf.parallel (%[[ARG4:.*]], %[[ARG5:.*]]) = (%[[C0]], %[[C0]]) to (%[[C2]], %[[C2]]) step (%[[C1]], %[[C1]]) {
+// CHECK-NEXT:      %[[S0:.*]] = memref.load %[[ARG1]][%[[ARG4]], %[[ARG5]]] : memref<2x2xf32>
+// CHECK-NEXT:      %[[S1:.*]] = affine.apply #[[$MAP]](%[[ARG4]], %[[ARG2]])
+// CHECK-NEXT:      memref.store %[[S0]], %[[ALLOC]][%[[ARG4]], %[[S1]]] : memref<2x3xf32>
+// CHECK-NEXT:      scf.reduce
+// CHECK-NEXT:    }
+// CHECK-NEXT:    scf.parallel (%[[ARG4:.*]], %[[ARG5:.*]]) = (%[[C0]], %[[C0]]) to (%[[C2]], %[[C2]]) step (%[[C1]], %[[C1]]) {
+// CHECK-NEXT:      %[[S0:.*]] = affine.apply #[[$MAP]](%[[ARG4]], %[[ARG3]])
+// CHECK-NEXT:      %[[S1:.*]] = memref.load %[[ALLOC]][%[[ARG4]], %[[S0]]] : memref<2x3xf32>
+// CHECK-NEXT:      %[[S2:.*]] = memref.load %[[ARG0]][%[[ARG4]], %[[ARG5]]] : memref<2x2xf32>
+// CHECK-NEXT:      %[[S3:.*]] = arith.mulf %[[S1]], %[[S2]] : f32
+// CHECK-NEXT:      memref.store %[[S3]], %[[ARG1]][%[[ARG4]], %[[ARG5]]] : memref<2x2xf32>
+// CHECK-NEXT:      scf.reduce
+// CHECK-NEXT:    }
+// CHECK-NEXT:    memref.dealloc %[[ALLOC]] : memref<2x3xf32>
+// CHECK-NEXT:    return

>From 32004c89a91eeeb0cb66654ea7a3a69b678ed999 Mon Sep 17 00:00:00 2001
From: Quentin Dian <dianqk at dianqk.net>
Date: Thu, 1 Feb 2024 22:10:52 +0800
Subject: [PATCH 14/42] [MIRPrinter] Don't print line break when there is no
 instructions (NFC) (#80147)

Per #80143, we can remove the extra line break when there is no
instruction.
---
 llvm/lib/CodeGen/MIRPrinter.cpp               |  2 +-
 .../GlobalISel/arm64-irtranslator-switch.ll   | 45 ++++++--------
 .../AArch64/GlobalISel/arm64-pcsections.ll    | 22 -------
 .../AArch64/GlobalISel/legalize-phi.mir       |  1 -
 .../GlobalISel/select-redundant-zext.mir      |  1 -
 .../GlobalISel/select-unreachable-blocks.mir  |  1 -
 .../callbr-asm-outputs-indirect-isel.ll       |  2 -
 ...implicit-def-with-impdef-greedy-assert.mir |  1 -
 ...egalloc-last-chance-recolor-with-split.mir |  1 -
 .../AArch64/tail-dup-redundant-phi.mir        |  2 -
 .../GlobalISel/inline-asm-mismatched-size.ll  |  4 --
 .../AMDGPU/GlobalISel/inst-select-brcond.mir  |  1 -
 .../GlobalISel/inst-select-constant.mir       |  2 -
 .../inst-select-scalar-float-sop2.mir         |  1 -
 .../postlegalizer-combiner-unmerge-undef.mir  |  1 -
 llvm/test/CodeGen/AMDGPU/collapse-endcf.mir   | 11 ----
 .../CodeGen/AMDGPU/dagcombine-fma-crash.ll    |  1 -
 .../CodeGen/AMDGPU/insert-singleuse-vdst.mir  | 20 -------
 .../lower-control-flow-live-intervals.mir     |  2 -
 .../machine-sink-ignorable-exec-use.mir       |  1 -
 ...pt-exec-masking-pre-ra-update-liveness.mir |  3 -
 llvm/test/CodeGen/AMDGPU/optimize-compare.mir | 58 -------------------
 ...ptimize-exec-mask-pre-ra-def-after-use.mir |  1 -
 .../AMDGPU/optimize-exec-masking-pre-ra.mir   |  2 -
 .../ra-inserted-scalar-instructions.mir       |  1 -
 .../ran-out-of-sgprs-allocation-failure.mir   |  1 -
 .../CodeGen/AMDGPU/si-lower-control-flow.mir  |  1 -
 .../AMDGPU/sink-after-control-flow-postra.mir |  8 ---
 llvm/test/CodeGen/AMDGPU/spill-agpr.mir       |  4 --
 llvm/test/CodeGen/AMDGPU/tail-dup-bundle.mir  |  1 -
 llvm/test/CodeGen/AMDGPU/wqm-terminators.mir  |  1 -
 llvm/test/CodeGen/ARM/cmpxchg.mir             |  2 -
 .../CodeGen/ARM/machine-outliner-noreturn.mir |  1 -
 .../MIR/X86/unreachable-block-print.mir       |  1 -
 .../GlobalISel/instruction-select/phi.mir     |  8 ---
 .../legalizer/jump_table_and_brjt.mir         |  1 -
 .../CodeGen/PowerPC/branch_coalescing.mir     |  1 -
 .../CodeGen/PowerPC/machine-cse-rm-pre.mir    |  1 -
 llvm/test/CodeGen/PowerPC/nofpexcept.ll       |  1 -
 .../legalizer/legalize-phi-rv32.mir           |  8 ---
 .../legalizer/legalize-phi-rv64.mir           |  9 ---
 .../test/CodeGen/RISCV/float-select-verify.ll |  2 -
 llvm/test/CodeGen/Thumb2/cmpxchg.mir          |  2 -
 .../CodeGen/X86/GlobalISel/legalize-phi.mir   |  6 --
 .../CodeGen/X86/GlobalISel/select-phi.mir     |  6 --
 .../X86/branchfolding-landingpad-cfg.mir      |  1 -
 ...-remat-with-undef-implicit-def-operand.mir |  2 -
 llvm/test/CodeGen/X86/cse-two-preds.mir       |  2 -
 ...tatepoint-invoke-ra-remove-back-copies.mir |  1 -
 llvm/test/CodeGen/X86/tail-dup-asm-goto.ll    |  1 -
 50 files changed, 20 insertions(+), 239 deletions(-)

diff --git a/llvm/lib/CodeGen/MIRPrinter.cpp b/llvm/lib/CodeGen/MIRPrinter.cpp
index b1ad035739a0d..4ed44d1c06f48 100644
--- a/llvm/lib/CodeGen/MIRPrinter.cpp
+++ b/llvm/lib/CodeGen/MIRPrinter.cpp
@@ -728,7 +728,7 @@ void MIPrinter::print(const MachineBasicBlock &MBB) {
     HasLineAttributes = true;
   }
 
-  if (HasLineAttributes)
+  if (HasLineAttributes && !MBB.empty())
     OS << "\n";
   bool IsInBundle = false;
   for (const MachineInstr &MI : MBB.instrs()) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
index 9371edd439fe4..476b3c709ffc5 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
@@ -136,9 +136,8 @@ define i32 @test_cfg_remap_multiple_preds(i32 %in) {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.odd:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.3.next:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3.next:
   ; CHECK-NEXT:   G_BR %bb.5
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.4.other:
@@ -1147,25 +1146,20 @@ define void @jt_2_tables_phi_edge_from_second() {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.if.then:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.3.sw.bb2.i41:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3.sw.bb2.i41:
+  ; CHECK-NEXT:   successors:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4.sw.bb7.i44:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.4.sw.bb7.i44:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5.sw.bb8.i45:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.5.sw.bb8.i45:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.6.sw.bb13.i47:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.6.sw.bb13.i47:
-  ; CHECK:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.7.sw.bb14.i48:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.7.sw.bb14.i48:
   ; CHECK-NEXT:   [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[DEF1]](s32), [[C5]]
   ; CHECK-NEXT:   G_BRCOND [[ICMP5]](s1), %bb.10
   ; CHECK-NEXT:   G_BR %bb.24
@@ -1207,9 +1201,8 @@ define void @jt_2_tables_phi_edge_from_second() {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.8.sw.default.i49:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK: {{  $}}
-  ; CHECK: bb.9.sw.bb1.i:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.9.sw.bb1.i:
   ; CHECK-NEXT:   G_BR %bb.16
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.10.sw.bb4.i:
@@ -1237,8 +1230,8 @@ define void @jt_2_tables_phi_edge_from_second() {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.17.while.body:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK:   ADJCALLSTACKDOWN 0, 0, implicit-def $sp, implicit $sp
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $sp, implicit $sp
   ; CHECK-NEXT:   BL @jt_2_tables_phi_edge_from_second, csr_aarch64_aapcs, implicit-def $lr, implicit $sp
   ; CHECK-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $sp, implicit $sp
   ; CHECK-NEXT: {{  $}}
@@ -1463,8 +1456,8 @@ define i1 @i1_value_cmp_is_signed(i1) {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.BadValue:
   ; CHECK-NEXT:   successors:
-  ; CHECK: {{  $}}
-  ; CHECK:   ADJCALLSTACKDOWN 0, 0, implicit-def $sp, implicit $sp
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $sp, implicit $sp
   ; CHECK-NEXT:   BL @bar, csr_aarch64_aapcs, implicit-def $lr, implicit $sp
   ; CHECK-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $sp, implicit $sp
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
index 5a7bd6ee20f9b..c7f3bcf640e38 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
@@ -8,7 +8,6 @@ define i32 @val_compare_and_swap(ptr %p, i32 %cmp, i32 %new) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.3(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
@@ -88,7 +87,6 @@ define i32 @val_compare_and_swap_rel(ptr %p, i32 %cmp, i32 %new) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.3(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
@@ -127,7 +125,6 @@ define i64 @val_compare_and_swap_64(ptr %p, i64 %cmp, i64 %new) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $x0, $x1, $x2
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.3(0x04000000)
   ; CHECK-NEXT:   liveins: $x0, $x1, $x2
@@ -166,7 +163,6 @@ define i64 @val_compare_and_swap_64_monotonic_seqcst(ptr %p, i64 %cmp, i64 %new)
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $x0, $x1, $x2
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.3(0x04000000)
   ; CHECK-NEXT:   liveins: $x0, $x1, $x2
@@ -205,7 +201,6 @@ define i64 @val_compare_and_swap_64_release_acquire(ptr %p, i64 %cmp, i64 %new)
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $x0, $x1, $x2
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.3(0x04000000)
   ; CHECK-NEXT:   liveins: $x0, $x1, $x2
@@ -244,7 +239,6 @@ define i32 @fetch_and_nand(ptr %p) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $x0
@@ -270,7 +264,6 @@ define i64 @fetch_and_nand_64(ptr %p) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $x0
@@ -322,7 +315,6 @@ define i64 @fetch_and_or_64(ptr %p) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $x0
@@ -730,7 +722,6 @@ define i8 @atomicrmw_add_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -780,7 +771,6 @@ define i8 @atomicrmw_sub_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -805,7 +795,6 @@ define i8 @atomicrmw_and_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -830,7 +819,6 @@ define i8 @atomicrmw_or_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -855,7 +843,6 @@ define i8 @atomicrmw_xor_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -880,7 +867,6 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -907,7 +893,6 @@ define i8 @atomicrmw_max_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -990,7 +975,6 @@ define i16 @atomicrmw_add_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -1040,7 +1024,6 @@ define i16 @atomicrmw_sub_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -1065,7 +1048,6 @@ define i16 @atomicrmw_and_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -1090,7 +1072,6 @@ define i16 @atomicrmw_or_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -1115,7 +1096,6 @@ define i16 @atomicrmw_xor_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -1140,7 +1120,6 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
@@ -1167,7 +1146,6 @@ define i16 @atomicrmw_max_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
index 4154ab7039c2f..d8fa456cf7e94 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-phi.mir
@@ -125,7 +125,6 @@ body:             |
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(p0) = G_PHI [[COPY]](p0), %bb.0, [[COPY1]](p0), %bb.1
   ; CHECK-NEXT:   $x0 = COPY [[PHI]](p0)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir
index e167fc52dfa06..4e7affc12b092 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-redundant-zext.mir
@@ -158,7 +158,6 @@ body:             |
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   %phi:gpr32 = PHI %copy1, %bb.0, %copy2, %bb.1
   ; CHECK-NEXT:   [[ORRWrs:%[0-9]+]]:gpr32 = ORRWrs $wzr, %phi, 0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-unreachable-blocks.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-unreachable-blocks.mir
index 2ef9f84a308dd..70f08d0685611 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-unreachable-blocks.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-unreachable-blocks.mir
@@ -23,7 +23,6 @@ body:             |
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AArch64/callbr-asm-outputs-indirect-isel.ll b/llvm/test/CodeGen/AArch64/callbr-asm-outputs-indirect-isel.ll
index 89745f4df4cde..3b7b5dd3fa7a5 100644
--- a/llvm/test/CodeGen/AArch64/callbr-asm-outputs-indirect-isel.ll
+++ b/llvm/test/CodeGen/AArch64/callbr-asm-outputs-indirect-isel.ll
@@ -174,7 +174,6 @@ define i32 @dont_split3() {
   ; CHECK-NEXT: bb.1.x:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.v (machine-block-address-taken, inlineasm-br-indirect-target):
   ; CHECK-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 42
   ; CHECK-NEXT:   $w0 = COPY [[MOVi32imm]]
@@ -424,7 +423,6 @@ define i32 @split_me3() {
   ; CHECK-NEXT: bb.2.y:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3.out:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY1]], %bb.1, [[COPY]], %bb.2
   ; CHECK-NEXT:   $w0 = COPY [[PHI]]
diff --git a/llvm/test/CodeGen/AArch64/implicit-def-with-impdef-greedy-assert.mir b/llvm/test/CodeGen/AArch64/implicit-def-with-impdef-greedy-assert.mir
index c79c951fdc152..e5395b20afd42 100644
--- a/llvm/test/CodeGen/AArch64/implicit-def-with-impdef-greedy-assert.mir
+++ b/llvm/test/CodeGen/AArch64/implicit-def-with-impdef-greedy-assert.mir
@@ -32,7 +32,6 @@ body:             |
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   successors: %bb.3(0x0fbefbf0), %bb.4(0x70410410)
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AArch64/regalloc-last-chance-recolor-with-split.mir b/llvm/test/CodeGen/AArch64/regalloc-last-chance-recolor-with-split.mir
index 26b801e9587f3..9bd3ad9165cee 100644
--- a/llvm/test/CodeGen/AArch64/regalloc-last-chance-recolor-with-split.mir
+++ b/llvm/test/CodeGen/AArch64/regalloc-last-chance-recolor-with-split.mir
@@ -371,7 +371,6 @@ body:             |
   ; CHECK-NEXT:   successors: %bb.8(0x80000000)
   ; CHECK-NEXT:   liveins: $fp, $w23, $w24, $x10, $x19, $x20, $x22, $x25, $x26, $x27
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.8.bb79:
   ; CHECK-NEXT:   successors: %bb.9(0x04000000), %bb.8(0x7c000000)
   ; CHECK-NEXT:   liveins: $fp, $w23, $w24, $x10, $x19, $x20, $x22, $x25, $x26, $x27
diff --git a/llvm/test/CodeGen/AArch64/tail-dup-redundant-phi.mir b/llvm/test/CodeGen/AArch64/tail-dup-redundant-phi.mir
index dbb4cc31cf806..bc141ff5084ca 100644
--- a/llvm/test/CodeGen/AArch64/tail-dup-redundant-phi.mir
+++ b/llvm/test/CodeGen/AArch64/tail-dup-redundant-phi.mir
@@ -252,7 +252,6 @@ body:             |
   ; CHECK-NEXT: bb.7.bb24:
   ; CHECK-NEXT:   successors: %bb.8(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.8.bb25:
   ; CHECK-NEXT:   successors: %bb.18(0x30000000), %bb.14(0x50000000)
   ; CHECK-NEXT: {{  $}}
@@ -298,7 +297,6 @@ body:             |
   ; CHECK-NEXT: bb.11.bb35:
   ; CHECK-NEXT:   successors:
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.13.bb40:
   ; CHECK-NEXT:   successors:
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
index 3462693df9a47..136c51d775b43 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inline-asm-mismatched-size.ll
@@ -20,7 +20,6 @@ define amdgpu_kernel void @return_type_is_too_big_vector() {
   ; CHECK: bb.0:
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1 (%ir-block.0):
   ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12
   %sgpr = call <4 x i32> asm sideeffect "; def $0", "={s[8:12]}" ()
@@ -40,7 +39,6 @@ define i64 @return_type_is_too_big_scalar() {
   ; CHECK: bb.0:
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1 (%ir-block.0):
   ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8
   %reg = call i64 asm sideeffect "; def $0", "={v8}" ()
@@ -64,7 +62,6 @@ define ptr addrspace(1) @return_type_is_too_big_pointer() {
   ; CHECK: bb.0:
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1 (%ir-block.0):
   ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8
   %reg = call ptr addrspace(1) asm sideeffect "; def $0", "={v8}" ()
@@ -76,7 +73,6 @@ define ptr addrspace(3) @return_type_is_too_small_pointer() {
   ; CHECK: bb.0:
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1 (%ir-block.0):
   ; CHECK-NEXT:   INLINEASM &"; def $0", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def $vgpr8_vgpr9
   %reg = call ptr addrspace(3) asm sideeffect "; def $0", "={v[8:9]}" ()
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-brcond.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-brcond.mir
index 23c64f7c7c711..ecb07f79e9fd1 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-brcond.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-brcond.mir
@@ -89,7 +89,6 @@ body: |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   bb.0:
     liveins: $sgpr0, $sgpr1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-constant.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-constant.mir
index 9a9045006edd2..390541ae76849 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-constant.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-constant.mir
@@ -523,7 +523,6 @@ body: |
   ; WAVE64-NEXT: bb.1:
   ; WAVE64-NEXT:   successors: %bb.2(0x80000000)
   ; WAVE64-NEXT: {{  $}}
-  ; WAVE64-NEXT: {{  $}}
   ; WAVE64-NEXT: bb.2:
   ;
   ; WAVE32-LABEL: name: zext_sgpr_s1_to_sgpr_s32
@@ -539,7 +538,6 @@ body: |
   ; WAVE32-NEXT: bb.1:
   ; WAVE32-NEXT:   successors: %bb.2(0x80000000)
   ; WAVE32-NEXT: {{  $}}
-  ; WAVE32-NEXT: {{  $}}
   ; WAVE32-NEXT: bb.2:
   bb.0:
     %0:sgpr(s1) = G_CONSTANT i1 true
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-scalar-float-sop2.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-scalar-float-sop2.mir
index 2660c843814d3..dac85561208d4 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-scalar-float-sop2.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-scalar-float-sop2.mir
@@ -212,7 +212,6 @@ body:             |
 
     ; GFX1150-LABEL: name: fmax_f16
     ; GFX1150: liveins: $sgpr0, $sgpr1
-    ; GFX1150-NEXT: {{  $}}
     %0:sgpr(s32) = COPY $sgpr0
     %1:sgpr(s16) = G_TRUNC %0(s32)
     %2:sgpr(s32) = COPY $sgpr1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-unmerge-undef.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-unmerge-undef.mir
index f0aa0b09a9544..4d8f8b0ec8821 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-unmerge-undef.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/postlegalizer-combiner-unmerge-undef.mir
@@ -10,7 +10,6 @@ body: |
     liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $vgpr4_vgpr5
     ; CHECK-LABEL: name: split_unmerge_undef
     ; CHECK: liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $vgpr4_vgpr5
-    ; CHECK-NEXT: {{  $}}
     %ptr1:_(p1) = COPY $vgpr0_vgpr1
     %ptr2:_(p1) = COPY $vgpr2_vgpr3
     %ptr3:_(p1) = COPY $vgpr4_vgpr5
diff --git a/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir b/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
index e921f53ce320d..48ca53732ed06 100644
--- a/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
+++ b/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
@@ -28,7 +28,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   $exec = S_OR_B64 $exec, [[COPY]], implicit-def $scc
   ; GCN-NEXT:   DBG_VALUE
@@ -83,11 +82,9 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   successors: %bb.5(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.5:
   ; GCN-NEXT:   $exec = S_OR_B64 $exec, [[COPY]], implicit-def $scc
   ; GCN-NEXT:   S_ENDPGM 0
@@ -139,7 +136,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   successors: %bb.5(0x80000000)
   ; GCN-NEXT: {{  $}}
@@ -199,7 +195,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.3(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.3:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
@@ -263,7 +258,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.3(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.3:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
@@ -327,7 +321,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.3(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.3:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
@@ -387,7 +380,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.3(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.3:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
@@ -561,7 +553,6 @@ body:             |
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   successors: %bb.5(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.5:
   ; GCN-NEXT:   successors: %bb.6(0x80000000)
   ; GCN-NEXT: {{  $}}
@@ -640,7 +631,6 @@ body:             |
   ; GCN-NEXT: bb.5:
   ; GCN-NEXT:   successors: %bb.6(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.6:
   ; GCN-NEXT:   successors: %bb.4(0x40000000), %bb.0(0x40000000)
   ; GCN-NEXT: {{  $}}
@@ -704,7 +694,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.4(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   successors: %bb.5(0x80000000)
   ; GCN-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/dagcombine-fma-crash.ll b/llvm/test/CodeGen/AMDGPU/dagcombine-fma-crash.ll
index 6d2361f779ddc..09a1f45557608 100644
--- a/llvm/test/CodeGen/AMDGPU/dagcombine-fma-crash.ll
+++ b/llvm/test/CodeGen/AMDGPU/dagcombine-fma-crash.ll
@@ -44,7 +44,6 @@ define void @main(float %arg) {
   ; CHECK-NEXT: bb.3.bb15:
   ; CHECK-NEXT:   successors: %bb.4(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.4.bb17:
   ; CHECK-NEXT:   SI_RETURN
 bb:
diff --git a/llvm/test/CodeGen/AMDGPU/insert-singleuse-vdst.mir b/llvm/test/CodeGen/AMDGPU/insert-singleuse-vdst.mir
index 1af008ffa22cb..833699b4656b6 100644
--- a/llvm/test/CodeGen/AMDGPU/insert-singleuse-vdst.mir
+++ b/llvm/test/CodeGen/AMDGPU/insert-singleuse-vdst.mir
@@ -18,7 +18,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr0, $vgpr2
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr1 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
@@ -43,7 +42,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr4_vgpr5
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0_vgpr1
     $vgpr2_vgpr3 = V_LSHLREV_B64_e64 0, $vgpr0_vgpr1, implicit $exec
@@ -70,7 +68,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr0, $vgpr3
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr1 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
@@ -144,7 +141,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr0, $vgpr2
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr1 = V_MOV_B32_e32 $vgpr0, implicit $exec
@@ -180,7 +176,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr16, $vgpr18
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr3, $vgpr5, $sgpr0, $sgpr2, $sgpr4, $sgpr5, $sgpr16, $sgpr17, $sgpr18, $sgpr19
     $vgpr14 = V_MUL_F32_e32 $sgpr4, $vgpr3, implicit $exec, implicit $mode
@@ -213,7 +208,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0 = V_MOV_B32_e32 $vgpr0, implicit $exec
@@ -244,7 +238,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0 = V_MOV_B32_e32 $vgpr0, implicit $exec
@@ -275,7 +268,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr2, $vgpr3
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0 = V_MOV_B32_e32 $vgpr0, implicit $exec
@@ -310,7 +302,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   liveins: $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0 = V_MOV_B32_e32 $vgpr0, implicit $exec
@@ -347,7 +338,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   liveins: $vgpr3
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0, $vgpr1
     $vgpr2 = V_MOV_B32_e32 $vgpr1, implicit $exec
@@ -384,7 +374,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0 = V_MOV_B32_e32 $vgpr0, implicit $exec
@@ -416,7 +405,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr0
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $sgpr0_sgpr1
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
@@ -443,7 +431,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr0
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $sgpr0
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
@@ -470,7 +457,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr0
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $sgpr0
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
@@ -495,7 +481,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1_lo16
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
     $vgpr1_lo16 = V_MOV_B16_t16_e32 $vgpr0_lo16, implicit $exec
@@ -518,7 +503,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1_hi16
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
     $vgpr1_hi16 = V_MOV_B16_t16_e32 $vgpr0_hi16, implicit $exec
@@ -544,7 +528,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
     $vgpr1_lo16 = V_MOV_B16_t16_e32 $vgpr0_lo16, implicit $exec
@@ -568,7 +551,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1_lo16
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
     $vgpr1_lo16 = V_ADD_F16_t16_e32 $vgpr0_lo16, $vgpr0_hi16, implicit $mode, implicit $exec
@@ -592,7 +574,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0_lo16 = V_MOV_B16_t16_e32 0, implicit $exec
@@ -617,7 +598,6 @@ body: |
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   liveins: $vgpr1
-  ; CHECK-NEXT: {{  $}}
   bb.0:
     liveins: $vgpr0
     $vgpr0_hi16 = V_MOV_B16_t16_e32 0, implicit $exec
diff --git a/llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir b/llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir
index f6233ab45c9f8..1679773c945b8 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir
+++ b/llvm/test/CodeGen/AMDGPU/lower-control-flow-live-intervals.mir
@@ -29,7 +29,6 @@ body:             |
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3:
   ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.1(0x40000000)
   ; CHECK-NEXT: {{  $}}
@@ -157,7 +156,6 @@ body:             |
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   successors: %bb.4(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.4:
   ; CHECK-NEXT:   $exec_lo = S_OR_B32 $exec_lo, [[COPY1]], implicit-def $scc
   ; CHECK-NEXT:   S_ENDPGM 0
diff --git a/llvm/test/CodeGen/AMDGPU/machine-sink-ignorable-exec-use.mir b/llvm/test/CodeGen/AMDGPU/machine-sink-ignorable-exec-use.mir
index 5753d5cb61756..3bd113988af63 100644
--- a/llvm/test/CodeGen/AMDGPU/machine-sink-ignorable-exec-use.mir
+++ b/llvm/test/CodeGen/AMDGPU/machine-sink-ignorable-exec-use.mir
@@ -279,7 +279,6 @@ body:             |
   ; GFX9-NEXT: bb.1:
   ; GFX9-NEXT:   successors: %bb.2(0x80000000)
   ; GFX9-NEXT: {{  $}}
-  ; GFX9-NEXT: {{  $}}
   ; GFX9-NEXT: bb.2:
   ; GFX9-NEXT:   successors: %bb.3(0x80000000)
   ; GFX9-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/opt-exec-masking-pre-ra-update-liveness.mir b/llvm/test/CodeGen/AMDGPU/opt-exec-masking-pre-ra-update-liveness.mir
index ae2c77ca87039..897370b3d9b50 100644
--- a/llvm/test/CodeGen/AMDGPU/opt-exec-masking-pre-ra-update-liveness.mir
+++ b/llvm/test/CodeGen/AMDGPU/opt-exec-masking-pre-ra-update-liveness.mir
@@ -564,7 +564,6 @@ body:             |
   ; CHECK-NEXT: bb.4:
   ; CHECK-NEXT:   successors: %bb.5(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.5:
   bb.0:
     liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr14, $sgpr15, $sgpr16
@@ -615,7 +614,6 @@ body:             |
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3:
   bb.0:
     liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr14, $sgpr15, $sgpr16
@@ -742,7 +740,6 @@ body:             |
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $sgpr4_sgpr5
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT:   liveins: $sgpr4_sgpr5
diff --git a/llvm/test/CodeGen/AMDGPU/optimize-compare.mir b/llvm/test/CodeGen/AMDGPU/optimize-compare.mir
index 63af3659bdc2d..c1cf06e30c745 100644
--- a/llvm/test/CodeGen/AMDGPU/optimize-compare.mir
+++ b/llvm/test/CodeGen/AMDGPU/optimize-compare.mir
@@ -17,7 +17,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -55,7 +54,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -93,7 +91,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -131,7 +128,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -173,7 +169,6 @@ body:             |
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.3(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.3:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -215,7 +210,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -252,7 +246,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -289,7 +282,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -327,7 +319,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -367,7 +358,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -407,7 +397,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -447,7 +436,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -486,7 +474,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -524,7 +511,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -561,7 +547,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -598,7 +583,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -635,7 +619,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -672,7 +655,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -710,7 +692,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -747,7 +728,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -784,7 +764,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -821,7 +800,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -858,7 +836,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -895,7 +872,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -932,7 +908,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -969,7 +944,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1006,7 +980,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1043,7 +1016,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1080,7 +1052,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1117,7 +1088,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1154,7 +1124,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1190,7 +1159,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1226,7 +1194,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1263,7 +1230,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1300,7 +1266,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1337,7 +1302,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1374,7 +1338,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1411,7 +1374,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1489,7 +1451,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1527,7 +1488,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1565,7 +1525,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1604,7 +1563,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1643,7 +1601,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1681,7 +1638,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1718,7 +1674,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1755,7 +1710,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1793,7 +1747,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1830,7 +1783,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1867,7 +1819,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1905,7 +1856,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1942,7 +1892,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -1979,7 +1928,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -2016,7 +1964,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -2094,7 +2041,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -2132,7 +2078,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -2169,7 +2114,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -2206,7 +2150,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
@@ -2243,7 +2186,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
diff --git a/llvm/test/CodeGen/AMDGPU/optimize-exec-mask-pre-ra-def-after-use.mir b/llvm/test/CodeGen/AMDGPU/optimize-exec-mask-pre-ra-def-after-use.mir
index 61a0d47f060fc..3b0aadbd81fda 100644
--- a/llvm/test/CodeGen/AMDGPU/optimize-exec-mask-pre-ra-def-after-use.mir
+++ b/llvm/test/CodeGen/AMDGPU/optimize-exec-mask-pre-ra-def-after-use.mir
@@ -43,7 +43,6 @@ body:             |
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   successors: %bb.5(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.5:
   ; GCN-NEXT:   S_ENDPGM 0
   ; GCN-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir b/llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir
index 3f4c2d71a12e0..6170fe63f765d 100644
--- a/llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir
+++ b/llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir
@@ -29,7 +29,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   successors: %bb.3(0x40000000), %bb.6(0x40000000)
   ; GCN-NEXT: {{  $}}
@@ -53,7 +52,6 @@ body:             |
   ; GCN-NEXT: bb.4:
   ; GCN-NEXT:   successors: %bb.5(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.5:
   ; GCN-NEXT:   successors: %bb.6(0x80000000)
   ; GCN-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir b/llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir
index dca9ffad7e800..d406f2932dc96 100644
--- a/llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir
+++ b/llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir
@@ -407,7 +407,6 @@ body:             |
   ; GCN-NEXT: bb.11:
   ; GCN-NEXT:   successors: %bb.12(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.12:
   ; GCN-NEXT:   [[SI_SPILL_S64_RESTORE3:%[0-9]+]]:sgpr_64 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s64) from %stack.1, align 4, addrspace 5)
   ; GCN-NEXT:   GLOBAL_STORE_DWORD_SADDR undef [[DEF]], undef [[DEF]], [[SI_SPILL_S64_RESTORE3]], 0, 0, implicit $exec :: (store (s32), addrspace 1)
diff --git a/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir b/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
index 2b613cbbabeee..e72ed2ba99e1a 100644
--- a/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
+++ b/llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
@@ -225,7 +225,6 @@ body:             |
   ; CHECK-NEXT:   successors: %bb.15(0x80000000)
   ; CHECK-NEXT:   liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x0000000000000003, $sgpr10_sgpr11, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75:0x0000000F00000000
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.15:
   ; CHECK-NEXT:   successors: %bb.11(0x40000000), %bb.16(0x40000000)
   ; CHECK-NEXT:   liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x0000000000000003, $sgpr10_sgpr11, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75:0x0000000F00000000
diff --git a/llvm/test/CodeGen/AMDGPU/si-lower-control-flow.mir b/llvm/test/CodeGen/AMDGPU/si-lower-control-flow.mir
index b99ca2b9fd327..e342c2b83524d 100644
--- a/llvm/test/CodeGen/AMDGPU/si-lower-control-flow.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-lower-control-flow.mir
@@ -38,7 +38,6 @@ body:             |
   ; GCN-NEXT: bb.1:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   S_ENDPGM 0
   bb.0:
diff --git a/llvm/test/CodeGen/AMDGPU/sink-after-control-flow-postra.mir b/llvm/test/CodeGen/AMDGPU/sink-after-control-flow-postra.mir
index 8ade6106ee250..266e67364c82b 100644
--- a/llvm/test/CodeGen/AMDGPU/sink-after-control-flow-postra.mir
+++ b/llvm/test/CodeGen/AMDGPU/sink-after-control-flow-postra.mir
@@ -221,7 +221,6 @@ body:             |
   ; GFX10-NEXT:   successors: %bb.1(0x80000000)
   ; GFX10-NEXT:   liveins: $sgpr8, $vgpr0, $vgpr1
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.1:
   ; GFX10-NEXT:   successors: %bb.7(0x40000000), %bb.2(0x40000000)
   ; GFX10-NEXT:   liveins: $sgpr8, $vgpr0
@@ -234,7 +233,6 @@ body:             |
   ; GFX10-NEXT:   successors: %bb.3(0x80000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr8, $vgpr0
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.3:
   ; GFX10-NEXT:   successors: %bb.5(0x40000000), %bb.4(0x40000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr8, $vgpr0
@@ -248,7 +246,6 @@ body:             |
   ; GFX10-NEXT:   successors: %bb.5(0x80000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr5, $sgpr8, $vgpr0
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.5:
   ; GFX10-NEXT:   successors: %bb.3(0x40000000), %bb.6(0x40000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr5, $sgpr8, $vgpr0
@@ -278,7 +275,6 @@ body:             |
   ; GFX10-NEXT: bb.8:
   ; GFX10-NEXT:   successors: %bb.9(0x80000000)
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.9:
   ; GFX10-NEXT:   successors: %bb.9(0x40000000), %bb.10(0x40000000)
   ; GFX10-NEXT: {{  $}}
@@ -357,7 +353,6 @@ body:             |
   ; GFX10-NEXT:   successors: %bb.1(0x80000000)
   ; GFX10-NEXT:   liveins: $sgpr8, $sgpr9, $sgpr10
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.1:
   ; GFX10-NEXT:   successors: %bb.7(0x40000000), %bb.2(0x40000000)
   ; GFX10-NEXT:   liveins: $sgpr8, $sgpr9
@@ -370,7 +365,6 @@ body:             |
   ; GFX10-NEXT:   successors: %bb.3(0x80000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr8, $sgpr9
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.3:
   ; GFX10-NEXT:   successors: %bb.5(0x40000000), %bb.4(0x40000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr8, $sgpr9
@@ -384,7 +378,6 @@ body:             |
   ; GFX10-NEXT:   successors: %bb.5(0x80000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr5, $sgpr8, $sgpr9
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.5:
   ; GFX10-NEXT:   successors: %bb.3(0x40000000), %bb.6(0x40000000)
   ; GFX10-NEXT:   liveins: $sgpr4, $sgpr5, $sgpr8, $sgpr9
@@ -415,7 +408,6 @@ body:             |
   ; GFX10-NEXT: bb.8:
   ; GFX10-NEXT:   successors: %bb.9(0x80000000)
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.9:
   ; GFX10-NEXT:   successors: %bb.9(0x40000000), %bb.10(0x40000000)
   ; GFX10-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/spill-agpr.mir b/llvm/test/CodeGen/AMDGPU/spill-agpr.mir
index d6127e50104c7..16f7e15f267ee 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-agpr.mir
+++ b/llvm/test/CodeGen/AMDGPU/spill-agpr.mir
@@ -220,7 +220,6 @@ body: |
   ; GFX908-SPILLED-NEXT: bb.1:
   ; GFX908-SPILLED-NEXT:   successors: %bb.2(0x80000000)
   ; GFX908-SPILLED-NEXT: {{  $}}
-  ; GFX908-SPILLED-NEXT: {{  $}}
   ; GFX908-SPILLED-NEXT: bb.2:
   ; GFX908-SPILLED-NEXT:   $agpr0 = SI_SPILL_A32_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
   ; GFX908-SPILLED-NEXT:   S_NOP 0, implicit undef $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
@@ -253,7 +252,6 @@ body: |
   ; GFX908-EXPANDED-NEXT: bb.1:
   ; GFX908-EXPANDED-NEXT:   successors: %bb.2(0x80000000)
   ; GFX908-EXPANDED-NEXT: {{  $}}
-  ; GFX908-EXPANDED-NEXT: {{  $}}
   ; GFX908-EXPANDED-NEXT: bb.2:
   ; GFX908-EXPANDED-NEXT:   $vgpr63 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
   ; GFX908-EXPANDED-NEXT:   $agpr0 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr63, implicit $exec
@@ -286,7 +284,6 @@ body: |
   ; GFX90A-SPILLED-NEXT: bb.1:
   ; GFX90A-SPILLED-NEXT:   successors: %bb.2(0x80000000)
   ; GFX90A-SPILLED-NEXT: {{  $}}
-  ; GFX90A-SPILLED-NEXT: {{  $}}
   ; GFX90A-SPILLED-NEXT: bb.2:
   ; GFX90A-SPILLED-NEXT:   $agpr0 = SI_SPILL_A32_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
   ; GFX90A-SPILLED-NEXT:   S_NOP 0, implicit undef $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
@@ -318,7 +315,6 @@ body: |
   ; GFX90A-EXPANDED-NEXT: bb.1:
   ; GFX90A-EXPANDED-NEXT:   successors: %bb.2(0x80000000)
   ; GFX90A-EXPANDED-NEXT: {{  $}}
-  ; GFX90A-EXPANDED-NEXT: {{  $}}
   ; GFX90A-EXPANDED-NEXT: bb.2:
   ; GFX90A-EXPANDED-NEXT:   $agpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
   ; GFX90A-EXPANDED-NEXT:   S_NOP 0, implicit undef $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
diff --git a/llvm/test/CodeGen/AMDGPU/tail-dup-bundle.mir b/llvm/test/CodeGen/AMDGPU/tail-dup-bundle.mir
index 125489275bc02..43708d32d4329 100644
--- a/llvm/test/CodeGen/AMDGPU/tail-dup-bundle.mir
+++ b/llvm/test/CodeGen/AMDGPU/tail-dup-bundle.mir
@@ -11,7 +11,6 @@ body:             |
   ; GCN: bb.0:
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   BUNDLE {
   ; GCN-NEXT:     S_NOP 0
diff --git a/llvm/test/CodeGen/AMDGPU/wqm-terminators.mir b/llvm/test/CodeGen/AMDGPU/wqm-terminators.mir
index 3e0e23a09b5e2..b4df02e47d2ac 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm-terminators.mir
+++ b/llvm/test/CodeGen/AMDGPU/wqm-terminators.mir
@@ -46,7 +46,6 @@ body: |
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   [[V_SUB_F32_e64_:%[0-9]+]]:vgpr_32 = nofpexcept V_SUB_F32_e64 0, [[IMAGE_SAMPLE_V3_V2_gfx10_]].sub0, 0, [[IMAGE_SAMPLE_V3_V2_gfx10_]].sub1, 0, 0, implicit $mode, implicit $exec
   ; CHECK-NEXT:   BUFFER_STORE_DWORD_OFFSET_exact [[V_SUB_F32_e64_]], [[DEF1]], [[COPY1]], 4, 0, 0, implicit $exec
diff --git a/llvm/test/CodeGen/ARM/cmpxchg.mir b/llvm/test/CodeGen/ARM/cmpxchg.mir
index bb1c998e6eadb..20ab787fb4575 100644
--- a/llvm/test/CodeGen/ARM/cmpxchg.mir
+++ b/llvm/test/CodeGen/ARM/cmpxchg.mir
@@ -10,7 +10,6 @@ body: |
     ; CHECK: successors: %bb.1(0x80000000)
     ; CHECK-NEXT: liveins: $r0_r1, $r4_r5, $r3, $lr
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: .1:
     ; CHECK-NEXT: successors: %bb.3(0x40000000), %bb.2(0x40000000)
     ; CHECK-NEXT: liveins: $r4_r5, $r3
@@ -41,7 +40,6 @@ body: |
     ; CHECK: successors: %bb.1(0x80000000)
     ; CHECK-NEXT: liveins: $r1, $r2, $r3, $r12, $lr
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: .1:
     ; CHECK-NEXT: successors: %bb.3(0x40000000), %bb.2(0x40000000)
     ; CHECK-NEXT: liveins: $lr, $r3, $r12
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-noreturn.mir b/llvm/test/CodeGen/ARM/machine-outliner-noreturn.mir
index cd8313f27d73c..af9cebda122f5 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-noreturn.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-noreturn.mir
@@ -48,7 +48,6 @@ body:             |
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT:   liveins: $r4
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2:
   ; CHECK-NEXT:   tBL 14 /* CC::al */, $noreg, @noreturn, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit undef $r0, implicit undef $r1, implicit undef $r2, implicit-def $sp
   bb.0:
diff --git a/llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir b/llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir
index 9fa016f2850f2..c11f20a7ee6bb 100644
--- a/llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir
+++ b/llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir
@@ -7,7 +7,6 @@ body: |
   ; CHECK: bb.0:
   ; CHECK-NEXT:   successors:
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   RET 0, $eax
   bb.0:
diff --git a/llvm/test/CodeGen/Mips/GlobalISel/instruction-select/phi.mir b/llvm/test/CodeGen/Mips/GlobalISel/instruction-select/phi.mir
index 59964039c5868..77e5ee2fdcbc2 100644
--- a/llvm/test/CodeGen/Mips/GlobalISel/instruction-select/phi.mir
+++ b/llvm/test/CodeGen/Mips/GlobalISel/instruction-select/phi.mir
@@ -91,7 +91,6 @@ body:             |
   ; MIPS32FP32-NEXT: bb.2.cond.false:
   ; MIPS32FP32-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP32-NEXT: {{  $}}
-  ; MIPS32FP32-NEXT: {{  $}}
   ; MIPS32FP32-NEXT: bb.3.cond.end:
   ; MIPS32FP32-NEXT:   [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY1]], %bb.1, [[COPY2]], %bb.2
   ; MIPS32FP32-NEXT:   $v0 = COPY [[PHI]]
@@ -117,7 +116,6 @@ body:             |
   ; MIPS32FP64-NEXT: bb.2.cond.false:
   ; MIPS32FP64-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP64-NEXT: {{  $}}
-  ; MIPS32FP64-NEXT: {{  $}}
   ; MIPS32FP64-NEXT: bb.3.cond.end:
   ; MIPS32FP64-NEXT:   [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY1]], %bb.1, [[COPY2]], %bb.2
   ; MIPS32FP64-NEXT:   $v0 = COPY [[PHI]]
@@ -179,7 +177,6 @@ body:             |
   ; MIPS32FP32-NEXT: bb.2.cond.false:
   ; MIPS32FP32-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP32-NEXT: {{  $}}
-  ; MIPS32FP32-NEXT: {{  $}}
   ; MIPS32FP32-NEXT: bb.3.cond.end:
   ; MIPS32FP32-NEXT:   [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY1]], %bb.1, [[LW]], %bb.2
   ; MIPS32FP32-NEXT:   [[PHI1:%[0-9]+]]:gpr32 = PHI [[COPY2]], %bb.1, [[LW1]], %bb.2
@@ -211,7 +208,6 @@ body:             |
   ; MIPS32FP64-NEXT: bb.2.cond.false:
   ; MIPS32FP64-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP64-NEXT: {{  $}}
-  ; MIPS32FP64-NEXT: {{  $}}
   ; MIPS32FP64-NEXT: bb.3.cond.end:
   ; MIPS32FP64-NEXT:   [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY1]], %bb.1, [[LW]], %bb.2
   ; MIPS32FP64-NEXT:   [[PHI1:%[0-9]+]]:gpr32 = PHI [[COPY2]], %bb.1, [[LW1]], %bb.2
@@ -274,7 +270,6 @@ body:             |
   ; MIPS32FP32-NEXT: bb.2.cond.false:
   ; MIPS32FP32-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP32-NEXT: {{  $}}
-  ; MIPS32FP32-NEXT: {{  $}}
   ; MIPS32FP32-NEXT: bb.3.cond.end:
   ; MIPS32FP32-NEXT:   [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY1]], %bb.1, [[COPY2]], %bb.2
   ; MIPS32FP32-NEXT:   $f0 = COPY [[PHI]]
@@ -300,7 +295,6 @@ body:             |
   ; MIPS32FP64-NEXT: bb.2.cond.false:
   ; MIPS32FP64-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP64-NEXT: {{  $}}
-  ; MIPS32FP64-NEXT: {{  $}}
   ; MIPS32FP64-NEXT: bb.3.cond.end:
   ; MIPS32FP64-NEXT:   [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY1]], %bb.1, [[COPY2]], %bb.2
   ; MIPS32FP64-NEXT:   $f0 = COPY [[PHI]]
@@ -358,7 +352,6 @@ body:             |
   ; MIPS32FP32-NEXT: bb.2.cond.false:
   ; MIPS32FP32-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP32-NEXT: {{  $}}
-  ; MIPS32FP32-NEXT: {{  $}}
   ; MIPS32FP32-NEXT: bb.3.cond.end:
   ; MIPS32FP32-NEXT:   [[PHI:%[0-9]+]]:afgr64 = PHI [[COPY]], %bb.1, [[COPY1]], %bb.2
   ; MIPS32FP32-NEXT:   $d0 = COPY [[PHI]]
@@ -385,7 +378,6 @@ body:             |
   ; MIPS32FP64-NEXT: bb.2.cond.false:
   ; MIPS32FP64-NEXT:   successors: %bb.3(0x80000000)
   ; MIPS32FP64-NEXT: {{  $}}
-  ; MIPS32FP64-NEXT: {{  $}}
   ; MIPS32FP64-NEXT: bb.3.cond.end:
   ; MIPS32FP64-NEXT:   [[PHI:%[0-9]+]]:fgr64 = PHI [[COPY]], %bb.1, [[COPY1]], %bb.2
   ; MIPS32FP64-NEXT:   $d0 = COPY [[PHI]]
diff --git a/llvm/test/CodeGen/Mips/GlobalISel/legalizer/jump_table_and_brjt.mir b/llvm/test/CodeGen/Mips/GlobalISel/legalizer/jump_table_and_brjt.mir
index 301fd2b9196e9..fe32504be62db 100644
--- a/llvm/test/CodeGen/Mips/GlobalISel/legalizer/jump_table_and_brjt.mir
+++ b/llvm/test/CodeGen/Mips/GlobalISel/legalizer/jump_table_and_brjt.mir
@@ -112,7 +112,6 @@ body:             |
   ; MIPS32-NEXT: bb.6.sw.default:
   ; MIPS32-NEXT:   successors: %bb.7(0x80000000)
   ; MIPS32-NEXT: {{  $}}
-  ; MIPS32-NEXT: {{  $}}
   ; MIPS32-NEXT: bb.7.sw.epilog:
   ; MIPS32-NEXT:   successors: %bb.13(0x40000000), %bb.8(0x40000000)
   ; MIPS32-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/PowerPC/branch_coalescing.mir b/llvm/test/CodeGen/PowerPC/branch_coalescing.mir
index 028c30661311d..4e789b869250e 100644
--- a/llvm/test/CodeGen/PowerPC/branch_coalescing.mir
+++ b/llvm/test/CodeGen/PowerPC/branch_coalescing.mir
@@ -49,7 +49,6 @@ body:             |
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.6(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.6:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:f8rc = PHI [[LFD]], %bb.1, [[COPY3]], %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:f8rc = PHI [[XXLXORdpz]], %bb.1, [[COPY2]], %bb.0
diff --git a/llvm/test/CodeGen/PowerPC/machine-cse-rm-pre.mir b/llvm/test/CodeGen/PowerPC/machine-cse-rm-pre.mir
index 36484be012362..32f5e0172047e 100644
--- a/llvm/test/CodeGen/PowerPC/machine-cse-rm-pre.mir
+++ b/llvm/test/CodeGen/PowerPC/machine-cse-rm-pre.mir
@@ -72,7 +72,6 @@ body:             |
   ; CHECK-NEXT: bb.2.if.else:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3.if.end:
   ; CHECK-NEXT:   BLR implicit $lr, implicit $rm
   bb.0.for.body:
diff --git a/llvm/test/CodeGen/PowerPC/nofpexcept.ll b/llvm/test/CodeGen/PowerPC/nofpexcept.ll
index 8b998242b28d4..f764dfc2d2498 100644
--- a/llvm/test/CodeGen/PowerPC/nofpexcept.ll
+++ b/llvm/test/CodeGen/PowerPC/nofpexcept.ll
@@ -128,7 +128,6 @@ define void @fptoint_nofpexcept(ppc_fp128 %p, fp128 %m, ptr %addr1, ptr %addr2)
   ; CHECK-NEXT: bb.1.entry:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.entry:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:f8rc = PHI [[COPY13]], %bb.1, [[XXLXORdpz]], %bb.0
   ; CHECK-NEXT:   ADJCALLSTACKDOWN 32, 0, implicit-def dead $r1, implicit $r1
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv32.mir b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv32.mir
index 4512d190133ab..c8f73afc0ac7c 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv32.mir
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv32.mir
@@ -135,7 +135,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY2]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s32)
@@ -196,7 +195,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY2]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s32)
@@ -257,7 +255,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY2]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s32)
@@ -315,7 +312,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY2]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s32)
@@ -370,7 +366,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(p0) = G_PHI [[COPY2]](p0), %bb.1, [[COPY1]](p0), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](p0)
@@ -436,7 +431,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY3]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:_(s32) = G_PHI [[COPY4]](s32), %bb.1, [[COPY2]](s32), %bb.0
@@ -510,7 +504,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY3]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:_(s32) = G_PHI [[COPY4]](s32), %bb.1, [[COPY2]](s32), %bb.0
@@ -589,7 +582,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY4]](s32), %bb.1, [[COPY1]](s32), %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:_(s32) = G_PHI [[COPY5]](s32), %bb.1, [[COPY2]](s32), %bb.0
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv64.mir b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv64.mir
index 22bd7a306e007..7111f7429e75c 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv64.mir
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-phi-rv64.mir
@@ -147,7 +147,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY2]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s64)
@@ -208,7 +207,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY2]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s64)
@@ -269,7 +267,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY2]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s64)
@@ -330,7 +327,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY2]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s64)
@@ -388,7 +384,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY2]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](s64)
@@ -443,7 +438,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(p0) = G_PHI [[COPY2]](p0), %bb.1, [[COPY1]](p0), %bb.0
   ; CHECK-NEXT:   $x10 = COPY [[PHI]](p0)
@@ -509,7 +503,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY3]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:_(s64) = G_PHI [[COPY4]](s64), %bb.1, [[COPY2]](s64), %bb.0
@@ -583,7 +576,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY3]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:_(s64) = G_PHI [[COPY4]](s64), %bb.1, [[COPY2]](s64), %bb.0
@@ -659,7 +651,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY4]](s64), %bb.1, [[COPY1]](s64), %bb.0
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:_(s64) = G_PHI [[COPY5]](s64), %bb.1, [[COPY2]](s64), %bb.0
diff --git a/llvm/test/CodeGen/RISCV/float-select-verify.ll b/llvm/test/CodeGen/RISCV/float-select-verify.ll
index cf1a2a89229db..2d5d6d7cb4825 100644
--- a/llvm/test/CodeGen/RISCV/float-select-verify.ll
+++ b/llvm/test/CodeGen/RISCV/float-select-verify.ll
@@ -35,7 +35,6 @@ define dso_local void @buz(i1 %pred, float %a, float %b) {
   ; CHECK-NEXT: bb.3.entry:
   ; CHECK-NEXT:   successors: %bb.4(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.4.entry:
   ; CHECK-NEXT:   successors: %bb.5(0x40000000), %bb.6(0x40000000)
   ; CHECK-NEXT: {{  $}}
@@ -60,7 +59,6 @@ define dso_local void @buz(i1 %pred, float %a, float %b) {
   ; CHECK-NEXT: bb.7.entry:
   ; CHECK-NEXT:   successors: %bb.8(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.8.entry:
   ; CHECK-NEXT:   [[PHI3:%[0-9]+]]:fpr32 = PHI [[PHI2]], %bb.6, [[FMV_W_X]], %bb.7
   ; CHECK-NEXT:   [[FCVT_L_S:%[0-9]+]]:gpr = nofpexcept FCVT_L_S killed [[PHI3]], 1
diff --git a/llvm/test/CodeGen/Thumb2/cmpxchg.mir b/llvm/test/CodeGen/Thumb2/cmpxchg.mir
index ab606e3dca203..33de25d469a75 100644
--- a/llvm/test/CodeGen/Thumb2/cmpxchg.mir
+++ b/llvm/test/CodeGen/Thumb2/cmpxchg.mir
@@ -10,7 +10,6 @@ body: |
     ; CHECK: successors: %bb.1(0x80000000)
     ; CHECK-NEXT: liveins: $r0_r1, $r4_r5, $r3, $lr
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: .1:
     ; CHECK-NEXT: successors: %bb.3(0x40000000), %bb.2(0x40000000)
     ; CHECK-NEXT: liveins: $r4, $r5, $r3
@@ -41,7 +40,6 @@ body: |
     ; CHECK: successors: %bb.1(0x80000000)
     ; CHECK-NEXT: liveins: $r1, $r2, $r3, $r12, $lr
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: .1:
     ; CHECK-NEXT: successors: %bb.3(0x40000000), %bb.2(0x40000000)
     ; CHECK-NEXT: liveins: $lr, $r3, $r12
diff --git a/llvm/test/CodeGen/X86/GlobalISel/legalize-phi.mir b/llvm/test/CodeGen/X86/GlobalISel/legalize-phi.mir
index fa67c1e6bf352..31de686878a97 100644
--- a/llvm/test/CodeGen/X86/GlobalISel/legalize-phi.mir
+++ b/llvm/test/CodeGen/X86/GlobalISel/legalize-phi.mir
@@ -229,7 +229,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s8) = G_PHI [[TRUNC1]](s8), %bb.1, [[TRUNC]](s8), %bb.0
   ; CHECK-NEXT:   $al = COPY [[PHI]](s8)
@@ -298,7 +297,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s16) = G_PHI [[TRUNC1]](s16), %bb.1, [[TRUNC]](s16), %bb.0
   ; CHECK-NEXT:   $ax = COPY [[PHI]](s16)
@@ -369,7 +367,6 @@ body:             |
   ; CHECK-NEXT: bb.2.cond.false:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[COPY1]](s32), %bb.1, [[COPY2]](s32), %bb.2
   ; CHECK-NEXT:   $eax = COPY [[PHI]](s32)
@@ -444,7 +441,6 @@ body:             |
   ; CHECK-NEXT: bb.2.cond.false:
   ; CHECK-NEXT:   successors: %bb.3(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.3.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[COPY1]](s64), %bb.1, [[COPY2]](s64), %bb.2
   ; CHECK-NEXT:   $rax = COPY [[PHI]](s64)
@@ -518,7 +514,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s32) = G_PHI [[TRUNC1]](s32), %bb.1, [[TRUNC]](s32), %bb.0
   ; CHECK-NEXT:   [[ANYEXT:%[0-9]+]]:_(s128) = G_ANYEXT [[PHI]](s32)
@@ -590,7 +585,6 @@ body:             |
   ; CHECK-NEXT: bb.1.cond.false:
   ; CHECK-NEXT:   successors: %bb.2(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cond.end:
   ; CHECK-NEXT:   [[PHI:%[0-9]+]]:_(s64) = G_PHI [[TRUNC1]](s64), %bb.1, [[TRUNC]](s64), %bb.0
   ; CHECK-NEXT:   [[ANYEXT:%[0-9]+]]:_(s128) = G_ANYEXT [[PHI]](s64)
diff --git a/llvm/test/CodeGen/X86/GlobalISel/select-phi.mir b/llvm/test/CodeGen/X86/GlobalISel/select-phi.mir
index fa52d0b12de77..6626131661b22 100644
--- a/llvm/test/CodeGen/X86/GlobalISel/select-phi.mir
+++ b/llvm/test/CodeGen/X86/GlobalISel/select-phi.mir
@@ -135,7 +135,6 @@ body:             |
   ; ALL-NEXT: bb.1.cond.false:
   ; ALL-NEXT:   successors: %bb.2(0x80000000)
   ; ALL-NEXT: {{  $}}
-  ; ALL-NEXT: {{  $}}
   ; ALL-NEXT: bb.2.cond.end:
   ; ALL-NEXT:   [[PHI:%[0-9]+]]:gr8 = PHI [[COPY4]], %bb.1, [[COPY2]], %bb.0
   ; ALL-NEXT:   $al = COPY [[PHI]]
@@ -199,7 +198,6 @@ body:             |
   ; ALL-NEXT: bb.1.cond.false:
   ; ALL-NEXT:   successors: %bb.2(0x80000000)
   ; ALL-NEXT: {{  $}}
-  ; ALL-NEXT: {{  $}}
   ; ALL-NEXT: bb.2.cond.end:
   ; ALL-NEXT:   [[PHI:%[0-9]+]]:gr16 = PHI [[COPY4]], %bb.1, [[COPY2]], %bb.0
   ; ALL-NEXT:   $ax = COPY [[PHI]]
@@ -266,7 +264,6 @@ body:             |
   ; ALL-NEXT: bb.2.cond.false:
   ; ALL-NEXT:   successors: %bb.3(0x80000000)
   ; ALL-NEXT: {{  $}}
-  ; ALL-NEXT: {{  $}}
   ; ALL-NEXT: bb.3.cond.end:
   ; ALL-NEXT:   [[PHI:%[0-9]+]]:gr32 = PHI [[COPY1]], %bb.1, [[COPY2]], %bb.2
   ; ALL-NEXT:   $eax = COPY [[PHI]]
@@ -337,7 +334,6 @@ body:             |
   ; ALL-NEXT: bb.2.cond.false:
   ; ALL-NEXT:   successors: %bb.3(0x80000000)
   ; ALL-NEXT: {{  $}}
-  ; ALL-NEXT: {{  $}}
   ; ALL-NEXT: bb.3.cond.end:
   ; ALL-NEXT:   [[PHI:%[0-9]+]]:gr64 = PHI [[COPY1]], %bb.1, [[COPY2]], %bb.2
   ; ALL-NEXT:   $rax = COPY [[PHI]]
@@ -410,7 +406,6 @@ body:             |
   ; ALL-NEXT: bb.1.cond.false:
   ; ALL-NEXT:   successors: %bb.2(0x80000000)
   ; ALL-NEXT: {{  $}}
-  ; ALL-NEXT: {{  $}}
   ; ALL-NEXT: bb.2.cond.end:
   ; ALL-NEXT:   [[PHI:%[0-9]+]]:fr32 = PHI [[COPY4]], %bb.1, [[COPY2]], %bb.0
   ; ALL-NEXT:   [[COPY5:%[0-9]+]]:vr128 = COPY [[PHI]]
@@ -476,7 +471,6 @@ body:             |
   ; ALL-NEXT: bb.1.cond.false:
   ; ALL-NEXT:   successors: %bb.2(0x80000000)
   ; ALL-NEXT: {{  $}}
-  ; ALL-NEXT: {{  $}}
   ; ALL-NEXT: bb.2.cond.end:
   ; ALL-NEXT:   [[PHI:%[0-9]+]]:fr64 = PHI [[COPY4]], %bb.1, [[COPY2]], %bb.0
   ; ALL-NEXT:   [[COPY5:%[0-9]+]]:vr128 = COPY [[PHI]]
diff --git a/llvm/test/CodeGen/X86/branchfolding-landingpad-cfg.mir b/llvm/test/CodeGen/X86/branchfolding-landingpad-cfg.mir
index 98dadbfcc17bd..8f79c12966415 100644
--- a/llvm/test/CodeGen/X86/branchfolding-landingpad-cfg.mir
+++ b/llvm/test/CodeGen/X86/branchfolding-landingpad-cfg.mir
@@ -7,7 +7,6 @@ body:             |
   ; CHECK: bb.0:
   ; CHECK-NEXT:   successors: %bb.1(0x7ffff800), %bb.3(0x00000800)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1:
   ; CHECK-NEXT:   successors: %bb.2(0x00000800)
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir b/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir
index 42d17130412b6..a09e751ce3ffb 100644
--- a/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir
+++ b/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir
@@ -31,7 +31,6 @@ body:             |
   ; CHECK-NEXT: bb.3:
   ; CHECK-NEXT:   successors: %bb.4(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.4:
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
@@ -91,7 +90,6 @@ body:             |
   ; CHECK-NEXT: bb.3:
   ; CHECK-NEXT:   successors: %bb.4(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.4:
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/X86/cse-two-preds.mir b/llvm/test/CodeGen/X86/cse-two-preds.mir
index bc1bad5f3daf3..6479747daf426 100644
--- a/llvm/test/CodeGen/X86/cse-two-preds.mir
+++ b/llvm/test/CodeGen/X86/cse-two-preds.mir
@@ -128,7 +128,6 @@ body:             |
   ; CHECK-NEXT: bb.4.EQ:
   ; CHECK-NEXT:   successors: %bb.5(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.5.EQ:
   ; CHECK-NEXT:   successors: %bb.8(0x80000000)
   ; CHECK-NEXT: {{  $}}
@@ -145,7 +144,6 @@ body:             |
   ; CHECK-NEXT: bb.7.ULB:
   ; CHECK-NEXT:   successors: %bb.8(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.8.EXIT:
   ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:fr32 = PHI [[COPY1]], %bb.1, [[COPY]], %bb.2, [[PHI]], %bb.5, [[COPY1]], %bb.6, [[COPY]], %bb.7
   ; CHECK-NEXT:   $xmm0 = COPY [[PHI1]]
diff --git a/llvm/test/CodeGen/X86/statepoint-invoke-ra-remove-back-copies.mir b/llvm/test/CodeGen/X86/statepoint-invoke-ra-remove-back-copies.mir
index d142b5e7374c9..49253968fcca5 100644
--- a/llvm/test/CodeGen/X86/statepoint-invoke-ra-remove-back-copies.mir
+++ b/llvm/test/CodeGen/X86/statepoint-invoke-ra-remove-back-copies.mir
@@ -367,7 +367,6 @@ body:             |
   ; CHECK-NEXT: bb.14.bb39:
   ; CHECK-NEXT:   successors:
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.15.bb40:
   ; CHECK-NEXT:   successors: %bb.16(0x80000000)
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/X86/tail-dup-asm-goto.ll b/llvm/test/CodeGen/X86/tail-dup-asm-goto.ll
index e1f8323cf206f..05fefbe750e51 100644
--- a/llvm/test/CodeGen/X86/tail-dup-asm-goto.ll
+++ b/llvm/test/CodeGen/X86/tail-dup-asm-goto.ll
@@ -42,7 +42,6 @@ define ptr @test1(ptr %arg1, ptr %arg2) {
   ; CHECK-NEXT: bb.4.bb17.i.i.i (machine-block-address-taken, inlineasm-br-indirect-target):
   ; CHECK-NEXT:   successors: %bb.5(0x80000000)
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.5.kmem_cache_has_cpu_partial.exit:
   ; CHECK-NEXT:   $rax = COPY [[PHI]]
   ; CHECK-NEXT:   RET 0, $rax

>From b5f40b2fce04729bc92e57403e6a82cd16bacfb0 Mon Sep 17 00:00:00 2001
From: Alexey Bataev <a.bataev at outlook.com>
Date: Thu, 1 Feb 2024 05:44:20 -0800
Subject: [PATCH 15/42] [SLP][NFC]Introduce and use computeCommonAlignment
 function, NFC.

---
 .../Transforms/Vectorize/SLPVectorizer.cpp    | 31 +++++++++----------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index bde65717ac1d4..a8aea112bc28e 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -3869,6 +3869,15 @@ static bool arePointersCompatible(Value *Ptr1, Value *Ptr2,
               .getOpcode());
 }
 
+/// Calculates minimal alignment as a common alignment.
+template <typename T>
+static Align computeCommonAlignment(ArrayRef<Value *> VL) {
+  Align CommonAlignment = cast<T>(VL.front())->getAlign();
+  for (Value *V : VL.drop_front())
+    CommonAlignment = std::min(CommonAlignment, cast<T>(V)->getAlign());
+  return CommonAlignment;
+}
+
 /// Checks if the given array of loads can be represented as a vectorized,
 /// scatter or just simple gather.
 static LoadsState canVectorizeLoads(ArrayRef<Value *> VL, const Value *VL0,
@@ -3939,10 +3948,7 @@ static LoadsState canVectorizeLoads(ArrayRef<Value *> VL, const Value *VL0,
           return (IsSorted && !GEP && doesNotNeedToBeScheduled(P)) ||
                  (GEP && GEP->getNumOperands() == 2);
         })) {
-      Align CommonAlignment = cast<LoadInst>(VL0)->getAlign();
-      for (Value *V : VL)
-        CommonAlignment =
-            std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
+      Align CommonAlignment = computeCommonAlignment<LoadInst>(VL);
       auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
       if (TTI.isLegalMaskedGather(VecTy, CommonAlignment) &&
           !TTI.forceScalarizeMaskedGather(VecTy, CommonAlignment))
@@ -7087,10 +7093,8 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
         }
         for (std::pair<unsigned, unsigned> P : ScatterVectorized) {
           auto *LI0 = cast<LoadInst>(VL[P.first]);
-          Align CommonAlignment = LI0->getAlign();
-          for (Value *V : VL.slice(P.first + 1, VF - 1))
-            CommonAlignment =
-                std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
+          Align CommonAlignment =
+              computeCommonAlignment<LoadInst>(VL.slice(P.first + 1, VF - 1));
           GatherCost += TTI.getGatherScatterOpCost(
               Instruction::Load, LoadTy, LI0->getPointerOperand(),
               /*VariableMask=*/false, CommonAlignment, CostKind, LI0);
@@ -8334,10 +8338,8 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
         assert((E->State == TreeEntry::ScatterVectorize ||
                 E->State == TreeEntry::PossibleStridedVectorize) &&
                "Unknown EntryState");
-        Align CommonAlignment = LI0->getAlign();
-        for (Value *V : UniqueValues)
-          CommonAlignment =
-              std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
+        Align CommonAlignment =
+            computeCommonAlignment<LoadInst>(UniqueValues.getArrayRef());
         VecLdCost = TTI->getGatherScatterOpCost(
             Instruction::Load, VecTy, LI0->getPointerOperand(),
             /*VariableMask=*/false, CommonAlignment, CostKind);
@@ -11600,10 +11602,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) {
           return E->VectorizedValue;
         }
         // Use the minimum alignment of the gathered loads.
-        Align CommonAlignment = LI->getAlign();
-        for (Value *V : E->Scalars)
-          CommonAlignment =
-              std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
+        Align CommonAlignment = computeCommonAlignment<LoadInst>(E->Scalars);
         NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);
       }
       Value *V = propagateMetadata(NewLI, E->Scalars);

>From b528f9ccb566700f8571b90e6326d71b8d878d84 Mon Sep 17 00:00:00 2001
From: Amy Kwan <amy.kwan1 at ibm.com>
Date: Thu, 1 Feb 2024 09:29:21 -0500
Subject: [PATCH 16/42] [AIX][TLS] Optimize the small local-exec access
 sequence for non-zero offsets (#71485)

This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence for the local-exec TLS model.
Specifically, for
when the offsets from the TLS variable are non-zero.

In particular, this patch produces either a single:
- addi/la with a displacement off of R13 plus a non-zero offset for when
an address is calculated, or
- load or store off of R13 plus a non-zero offset for when an address is
calculated and used for further
  access where R13 is the thread pointer, respectively.

In order to produce a single addi or load/store off of the thread
pointer with a non-zero offset,
this patch also adds the necessary support in the assembly printer when
printing these instructions.

Specifically:
- The non-zero offset is added to the TLS variable address when the
address of the
  TLS variable + it's offset is less than 32KB.
- Otherwise, when the address of the TLS variable + its offset is
greater than 32KB, the
non-zero offset (and a multiple of 64KB) is subtracted from the TLS
address.

This handling in the assembly printer is necessary to ensure that the
TLS address + the non-zero offset
is between [-32768, 32768), so that the total displacement can fit
within the addi/load/store instructions.

This patch is meant to be a follow-up to
3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the
optimization occurs for when the offset is zero).
---
 llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp     | 161 +++++++++++--
 llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp   | 108 ++++++++-
 .../PowerPC/aix-small-local-exec-tls-char.ll  |   6 +-
 .../aix-small-local-exec-tls-double.ll        |   6 +-
 .../PowerPC/aix-small-local-exec-tls-float.ll |   6 +-
 .../PowerPC/aix-small-local-exec-tls-int.ll   |   6 +-
 .../aix-small-local-exec-tls-largeaccess.ll   | 211 ++++++++----------
 .../aix-small-local-exec-tls-largeaccess2.ll  | 160 +++++++++++++
 .../PowerPC/aix-small-local-exec-tls-short.ll |   6 +-
 9 files changed, 515 insertions(+), 155 deletions(-)
 create mode 100644 llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess2.ll

diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index ce600654df72a..d1a167273956c 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -66,9 +66,10 @@
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/Process.h"
-#include "llvm/Support/raw_ostream.h"
 #include "llvm/Support/Threading.h"
+#include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/TargetParser/Triple.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
@@ -155,6 +156,11 @@ class PPCAsmPrinter : public AsmPrinter {
       TOC;
   const PPCSubtarget *Subtarget = nullptr;
 
+  // Keep track of the number of TLS variables and their corresponding
+  // addresses, which is then used for the assembly printing of
+  // non-TOC-based local-exec variables.
+  MapVector<const GlobalValue *, uint64_t> TLSVarsToAddressMapping;
+
 public:
   explicit PPCAsmPrinter(TargetMachine &TM,
                          std::unique_ptr<MCStreamer> Streamer)
@@ -199,6 +205,8 @@ class PPCAsmPrinter : public AsmPrinter {
   void LowerPATCHPOINT(StackMaps &SM, const MachineInstr &MI);
   void EmitTlsCall(const MachineInstr *MI, MCSymbolRefExpr::VariantKind VK);
   void EmitAIXTlsCallHelper(const MachineInstr *MI);
+  const MCExpr *getAdjustedLocalExecExpr(const MachineOperand &MO,
+                                         int64_t Offset);
   bool runOnMachineFunction(MachineFunction &MF) override {
     Subtarget = &MF.getSubtarget<PPCSubtarget>();
     bool Changed = AsmPrinter::runOnMachineFunction(MF);
@@ -753,6 +761,7 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
   MCInst TmpInst;
   const bool IsPPC64 = Subtarget->isPPC64();
   const bool IsAIX = Subtarget->isAIXABI();
+  const bool HasAIXSmallLocalExecTLS = Subtarget->hasAIXSmallLocalExecTLS();
   const Module *M = MF->getFunction().getParent();
   PICLevel::Level PL = M->getPICLevel();
 
@@ -1504,12 +1513,70 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
     // Verify alignment is legal, so we don't create relocations
     // that can't be supported.
     unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1;
+    // For non-TOC-based local-exec TLS accesses with non-zero offsets, the
+    // machine operand (which is a TargetGlobalTLSAddress) is expected to be
+    // the same operand for both loads and stores.
+    for (const MachineOperand &TempMO : MI->operands()) {
+      if (((TempMO.getTargetFlags() == PPCII::MO_TPREL_FLAG)) &&
+          TempMO.getOperandNo() == 1)
+        OpNum = 1;
+    }
     const MachineOperand &MO = MI->getOperand(OpNum);
     if (MO.isGlobal()) {
       const DataLayout &DL = MO.getGlobal()->getParent()->getDataLayout();
       if (MO.getGlobal()->getPointerAlignment(DL) < 4)
         llvm_unreachable("Global must be word-aligned for LD, STD, LWA!");
     }
+    // As these load/stores share common code with the following load/stores,
+    // fall through to the subsequent cases in order to either process the
+    // non-TOC-based local-exec sequence or to process the instruction normally.
+    [[fallthrough]];
+  }
+  case PPC::LBZ:
+  case PPC::LBZ8:
+  case PPC::LHA:
+  case PPC::LHA8:
+  case PPC::LHZ:
+  case PPC::LHZ8:
+  case PPC::LWZ:
+  case PPC::LWZ8:
+  case PPC::STB:
+  case PPC::STB8:
+  case PPC::STH:
+  case PPC::STH8:
+  case PPC::STW:
+  case PPC::STW8:
+  case PPC::LFS:
+  case PPC::STFS:
+  case PPC::LFD:
+  case PPC::STFD:
+  case PPC::ADDI8: {
+    // A faster non-TOC-based local-exec sequence is represented by `addi`
+    // or a load/store instruction (that directly loads or stores off of the
+    // thread pointer) with an immediate operand having the MO_TPREL_FLAG.
+    // Such instructions do not otherwise arise.
+    if (!HasAIXSmallLocalExecTLS)
+      break;
+    bool IsMIADDI8 = MI->getOpcode() == PPC::ADDI8;
+    unsigned OpNum = IsMIADDI8 ? 2 : 1;
+    const MachineOperand &MO = MI->getOperand(OpNum);
+    unsigned Flag = MO.getTargetFlags();
+    if (Flag == PPCII::MO_TPREL_FLAG ||
+        Flag == PPCII::MO_GOT_TPREL_PCREL_FLAG ||
+        Flag == PPCII::MO_TPREL_PCREL_FLAG) {
+      LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
+
+      const MCExpr *Expr = getAdjustedLocalExecExpr(MO, MO.getOffset());
+      if (Expr)
+        TmpInst.getOperand(OpNum) = MCOperand::createExpr(Expr);
+
+      // Change the opcode to load address if the original opcode is an `addi`.
+      if (IsMIADDI8)
+        TmpInst.setOpcode(PPC::LA8);
+
+      EmitToStreamer(*OutStreamer, TmpInst);
+      return;
+    }
     // Now process the instruction normally.
     break;
   }
@@ -1523,30 +1590,73 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
     EmitToStreamer(*OutStreamer, MCInstBuilder(PPC::EnforceIEIO));
     return;
   }
-  case PPC::ADDI8: {
-    // The faster non-TOC-based local-exec sequence is represented by `addi`
-    // with an immediate operand having the MO_TPREL_FLAG. Such an instruction
-    // does not otherwise arise.
-    unsigned Flag = MI->getOperand(2).getTargetFlags();
-    if (Flag == PPCII::MO_TPREL_FLAG ||
-        Flag == PPCII::MO_GOT_TPREL_PCREL_FLAG ||
-        Flag == PPCII::MO_TPREL_PCREL_FLAG) {
-      assert(
-          Subtarget->hasAIXSmallLocalExecTLS() &&
-          "addi with thread-pointer only expected with local-exec small TLS");
-      LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
-      TmpInst.setOpcode(PPC::LA8);
-      EmitToStreamer(*OutStreamer, TmpInst);
-      return;
-    }
-    break;
-  }
   }
 
   LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
   EmitToStreamer(*OutStreamer, TmpInst);
 }
 
+// For non-TOC-based local-exec variables that have a non-zero offset,
+// we need to create a new MCExpr that adds the non-zero offset to the address
+// of the local-exec variable that will be used in either an addi, load or
+// store. However, the final displacement for these instructions must be
+// between [-32768, 32768), so if the TLS address + its non-zero offset is
+// greater than 32KB, a new MCExpr is produced to accommodate this situation.
+const MCExpr *PPCAsmPrinter::getAdjustedLocalExecExpr(const MachineOperand &MO,
+                                                      int64_t Offset) {
+  // Non-zero offsets (for loads, stores or `addi`) require additional handling.
+  // When the offset is zero, there is no need to create an adjusted MCExpr.
+  if (!Offset)
+    return nullptr;
+
+  assert(MO.isGlobal() && "Only expecting a global MachineOperand here!");
+  const GlobalValue *GValue = MO.getGlobal();
+  assert(TM.getTLSModel(GValue) == TLSModel::LocalExec &&
+         "Only local-exec accesses are handled!");
+
+  bool IsGlobalADeclaration = GValue->isDeclarationForLinker();
+  // Find the GlobalVariable that corresponds to the particular TLS variable
+  // in the TLS variable-to-address mapping. All TLS variables should exist
+  // within this map, with the exception of TLS variables marked as extern.
+  const auto TLSVarsMapEntryIter = TLSVarsToAddressMapping.find(GValue);
+  if (TLSVarsMapEntryIter == TLSVarsToAddressMapping.end())
+    assert(IsGlobalADeclaration &&
+           "Only expecting to find extern TLS variables not present in the TLS "
+           "variable-to-address map!");
+
+  unsigned TLSVarAddress =
+      IsGlobalADeclaration ? 0 : TLSVarsMapEntryIter->second;
+  ptrdiff_t FinalAddress = (TLSVarAddress + Offset);
+  // If the address of the TLS variable + the offset is less than 32KB,
+  // or if the TLS variable is extern, we simply produce an MCExpr to add the
+  // non-zero offset to the TLS variable address.
+  // For when TLS variables are extern, this is safe to do because we can
+  // assume that the address of extern TLS variables are zero.
+  const MCExpr *Expr = MCSymbolRefExpr::create(
+      getSymbol(GValue), MCSymbolRefExpr::VK_PPC_AIX_TLSLE, OutContext);
+  Expr = MCBinaryExpr::createAdd(
+      Expr, MCConstantExpr::create(Offset, OutContext), OutContext);
+  if (FinalAddress >= 32768) {
+    // Handle the written offset for cases where:
+    //   TLS variable address + Offset > 32KB.
+
+    // The assembly that is printed will look like:
+    //  TLSVar at le + Offset - Delta
+    // where Delta is a multiple of 64KB: ((FinalAddress + 32768) & ~0xFFFF).
+    ptrdiff_t Delta = ((FinalAddress + 32768) & ~0xFFFF);
+    // Check that the total instruction displacement fits within [-32768,32768).
+    ptrdiff_t InstDisp = TLSVarAddress + Offset - Delta;
+    assert((InstDisp < 32768) ||
+           (InstDisp >= -32768) &&
+               "Expecting the instruction displacement for local-exec TLS "
+               "variables to be between [-32768, 32768)!");
+    Expr = MCBinaryExpr::createAdd(
+        Expr, MCConstantExpr::create(-Delta, OutContext), OutContext);
+  }
+
+  return Expr;
+}
+
 void PPCLinuxAsmPrinter::emitGNUAttributes(Module &M) {
   // Emit float ABI into GNU attribute
   Metadata *MD = M.getModuleFlag("float-abi");
@@ -2772,6 +2882,19 @@ bool PPCAIXAsmPrinter::doInitialization(Module &M) {
     Csect->ensureMinAlignment(GOAlign);
   };
 
+  // For all TLS variables, calculate their corresponding addresses and store
+  // them into TLSVarsToAddressMapping, which will be used to determine whether
+  // or not local-exec TLS variables require special assembly printing.
+  uint64_t TLSVarAddress = 0;
+  auto DL = M.getDataLayout();
+  for (const auto &G : M.globals()) {
+    if (G.isThreadLocal() && !G.isDeclaration()) {
+      TLSVarAddress = alignTo(TLSVarAddress, getGVAlignment(&G, DL));
+      TLSVarsToAddressMapping[&G] = TLSVarAddress;
+      TLSVarAddress += DL.getTypeAllocSize(G.getValueType());
+    }
+  }
+
   // We need to know, up front, the alignment of csects for the assembly path,
   // because once a .csect directive gets emitted, we could not change the
   // alignment value on it.
diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index 9fad96340737c..97df3000e402c 100644
--- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -7562,8 +7562,98 @@ static void reduceVSXSwap(SDNode *N, SelectionDAG *DAG) {
   DAG->ReplaceAllUsesOfValueWith(SDValue(N, 0), N->getOperand(0));
 }
 
+// Is an ADDI eligible for folding for non-TOC-based local-exec accesses?
+static bool isEligibleToFoldADDIForLocalExecAccesses(SelectionDAG *DAG,
+                                                     SDValue ADDIToFold) {
+  // Check if ADDIToFold (the ADDI that we want to fold into local-exec
+  // accesses), is truly an ADDI.
+  if (!ADDIToFold.isMachineOpcode() ||
+      (ADDIToFold.getMachineOpcode() != PPC::ADDI8))
+    return false;
+
+  // The first operand of the ADDIToFold should be the thread pointer.
+  // This transformation is only performed if the first operand of the
+  // addi is the thread pointer.
+  SDValue TPRegNode = ADDIToFold.getOperand(0);
+  RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
+  const PPCSubtarget &Subtarget =
+      DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
+  if (!TPReg || (TPReg->getReg() != Subtarget.getThreadPointerRegister()))
+    return false;
+
+  // The second operand of the ADDIToFold should be the global TLS address
+  // (the local-exec TLS variable). We only perform the folding if the TLS
+  // variable is the second operand.
+  SDValue TLSVarNode = ADDIToFold.getOperand(1);
+  GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(TLSVarNode);
+  if (!GA)
+    return false;
+
+  // The local-exec TLS variable should only have the MO_TPREL_FLAG target flag,
+  // so this optimization is not performed otherwise if the flag is not set.
+  unsigned TargetFlags = GA->getTargetFlags();
+  if (TargetFlags != PPCII::MO_TPREL_FLAG)
+    return false;
+
+  // If all conditions are satisfied, the ADDI is valid for folding.
+  return true;
+}
+
+// For non-TOC-based local-exec access where an addi is feeding into another
+// addi, fold this sequence into a single addi if possible.
+// Before this optimization, the sequence appears as:
+//    addi rN, r13, sym at le
+//    addi rM, rN, imm
+// After this optimization, we can fold the two addi into a single one:
+//    addi rM, r13, sym at le + imm
+static void foldADDIForLocalExecAccesses(SDNode *N, SelectionDAG *DAG) {
+  if (N->getMachineOpcode() != PPC::ADDI8)
+    return;
+
+  // InitialADDI is the addi feeding into N (also an addi), and the addi that
+  // we want optimized out.
+  SDValue InitialADDI = N->getOperand(0);
+
+  if (!isEligibleToFoldADDIForLocalExecAccesses(DAG, InitialADDI))
+    return;
+
+  // At this point, InitialADDI can be folded into a non-TOC-based local-exec
+  // access. The first operand of InitialADDI should be the thread pointer,
+  // which has been checked in isEligibleToFoldADDIForLocalExecAccesses().
+  SDValue TPRegNode = InitialADDI.getOperand(0);
+  RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
+  const PPCSubtarget &Subtarget =
+      DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
+  assert((TPReg && (TPReg->getReg() == Subtarget.getThreadPointerRegister())) &&
+         "Expecting the first operand to be a thread pointer for folding addi "
+         "in local-exec accesses!");
+
+  // The second operand of the InitialADDI should be the global TLS address
+  // (the local-exec TLS variable), with the MO_TPREL_FLAG target flag.
+  // This has been checked in isEligibleToFoldADDIForLocalExecAccesses().
+  SDValue TLSVarNode = InitialADDI.getOperand(1);
+  GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(TLSVarNode);
+  assert(GA && "Expecting a valid GlobalAddressSDNode when folding addi into "
+               "local-exec accesses!");
+  unsigned TargetFlags = GA->getTargetFlags();
+
+  // The second operand of the addi that we want to preserve will be an
+  // immediate. We add this immediate, together with the address of the TLS
+  // variable found in InitialADDI, in order to preserve the correct TLS address
+  // information during assembly printing. The offset is likely to be non-zero
+  // when we end up in this case.
+  int Offset = N->getConstantOperandVal(1);
+  TLSVarNode = DAG->getTargetGlobalAddress(GA->getGlobal(), SDLoc(GA), MVT::i64,
+                                           Offset, TargetFlags);
+
+  (void)DAG->UpdateNodeOperands(N, TPRegNode, TLSVarNode);
+  if (InitialADDI.getNode()->use_empty())
+    DAG->RemoveDeadNode(InitialADDI.getNode());
+}
+
 void PPCDAGToDAGISel::PeepholePPC64() {
   SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_end();
+  bool HasAIXSmallLocalExecTLS = Subtarget->hasAIXSmallLocalExecTLS();
 
   while (Position != CurDAG->allnodes_begin()) {
     SDNode *N = &*--Position;
@@ -7574,6 +7664,10 @@ void PPCDAGToDAGISel::PeepholePPC64() {
     if (isVSXSwap(SDValue(N, 0)))
       reduceVSXSwap(N, CurDAG);
 
+    // This optimization is performed for non-TOC-based local-exec accesses.
+    if (HasAIXSmallLocalExecTLS)
+      foldADDIForLocalExecAccesses(N, CurDAG);
+
     unsigned FirstOp;
     unsigned StorageOpcode = N->getMachineOpcode();
     bool RequiresMod4Offset = false;
@@ -7730,7 +7824,19 @@ void PPCDAGToDAGISel::PeepholePPC64() {
         ImmOpnd = CurDAG->getTargetConstant(Offset, SDLoc(ImmOpnd),
                                             ImmOpnd.getValueType());
       } else if (Offset != 0) {
-        continue;
+        // This optimization is performed for non-TOC-based local-exec accesses.
+        if (HasAIXSmallLocalExecTLS &&
+            isEligibleToFoldADDIForLocalExecAccesses(CurDAG, Base)) {
+          // Add the non-zero offset information into the load or store
+          // instruction to be used for non-TOC-based local-exec accesses.
+          GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd);
+          assert(GA && "Expecting a valid GlobalAddressSDNode when folding "
+                       "addi into local-exec accesses!");
+          ImmOpnd = CurDAG->getTargetGlobalAddress(GA->getGlobal(), SDLoc(GA),
+                                                   MVT::i64, Offset,
+                                                   GA->getTargetFlags());
+        } else
+          continue;
       }
     }
 
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-char.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-char.ll
index 6c05fb38ee16d..c938b9485c257 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-char.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-char.ll
@@ -16,14 +16,12 @@ declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
 define nonnull ptr @AddrTest1() local_unnamed_addr #0 {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, c[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    addi r3, r3, 1
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, c[TL]@le+1(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, c[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addi r3, r3, 1
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, c[TL]@le+1(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
   %0 = tail call align 1 ptr @llvm.threadlocal.address.p0(ptr align 1 @c)
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-double.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-double.ll
index 5cf359f68f8bd..02d794fec75cc 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-double.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-double.ll
@@ -16,14 +16,12 @@ declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
 define nonnull ptr @AddrTest1() local_unnamed_addr #0 {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, f[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    addi r3, r3, 48
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, f[TL]@le+48(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, f[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addi r3, r3, 48
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, f[TL]@le+48(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
   %0 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @f)
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-float.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-float.ll
index 1fc014edaf2bb..a1f6f4f974bd8 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-float.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-float.ll
@@ -16,14 +16,12 @@ declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
 define nonnull ptr @AddrTest1() local_unnamed_addr #0 {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, e[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    addi r3, r3, 16
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, e[TL]@le+16(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, e[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addi r3, r3, 16
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, e[TL]@le+16(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
   %0 = tail call align 4 ptr @llvm.threadlocal.address.p0(ptr align 4 @e)
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-int.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-int.ll
index 40adf27d7ee39..c74abe93c18bf 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-int.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-int.ll
@@ -18,14 +18,12 @@ declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
 define nonnull ptr @AddrTest1() local_unnamed_addr #0 {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, a[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    addi r3, r3, 12
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, a[TL]@le+12(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, a[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addi r3, r3, 12
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, a[TL]@le+12(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
   %0 = tail call align 4 ptr @llvm.threadlocal.address.p0(ptr align 4 @a)
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess.ll
index 55c69839515c4..22b8503ef403c 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess.ll
@@ -25,43 +25,33 @@ declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
 define signext i32 @StoreArrays1() {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: StoreArrays1:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, mySmallLocalExecTLSv1[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 1
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r5, 4
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, mySmallLocalExecTLSv1[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r4, mySmallLocalExecTLS2[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r5, 24(r3)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 1
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 4
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, mySmallLocalExecTLSv1[TL]@le(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 2
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, 320(r4)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, mySmallLocalExecTLS3[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 3
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, 324(r3)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, mySmallLocalExecTLS4[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 88
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r5, 328(r3)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, mySmallLocalExecTLS5[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, 332(r3)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, mySmallLocalExecTLSv1[TL]@le+24(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, (mySmallLocalExecTLS2[TL]@le+320)-65536(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 3
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, (mySmallLocalExecTLS3[TL]@le+324)-65536(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 88
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, (mySmallLocalExecTLS4[TL]@le+328)-65536(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, (mySmallLocalExecTLS5[TL]@le+332)-65536(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 102
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: StoreArrays1:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLSv1[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 1
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r5, 4
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, mySmallLocalExecTLSv1[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r4, mySmallLocalExecTLS2[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r5, 24(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 1
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 4
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, mySmallLocalExecTLSv1[TL]@le(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 2
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, 320(r4)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLS3[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 3
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, 324(r3)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLS4[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 88
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r5, 328(r3)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLS5[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, 332(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, mySmallLocalExecTLSv1[TL]@le+24(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, (mySmallLocalExecTLS2[TL]@le+320)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 3
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, (mySmallLocalExecTLS3[TL]@le+324)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 88
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, (mySmallLocalExecTLS4[TL]@le+328)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, (mySmallLocalExecTLS5[TL]@le+332)-65536(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 102
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
@@ -98,46 +88,38 @@ entry:
 define signext i32 @StoreArrays2() {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: StoreArrays2:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLSv2
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 1
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r5, 4
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    add r3, r13, r3
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, 0(r3)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r4, mySmallLocalExecTLS2[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r5, 24(r3)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 2
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, 320(r4)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, mySmallLocalExecTLS3[TL]@le(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    ld r4, L..C0(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLSv2
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 1
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    add r4, r13, r4
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, 0(r4)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 4
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, 24(r4)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 2
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, (mySmallLocalExecTLS2[TL]@le+320)-65536(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 3
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, 324(r3)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, mySmallLocalExecTLS4[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r4, mySmallLocalExecTLS5[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r5, 328(r3)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 88
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, 332(r4)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, (mySmallLocalExecTLS3[TL]@le+324)-65536(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 88
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r3, (mySmallLocalExecTLS4[TL]@le+328)-65536(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 102
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stw r4, (mySmallLocalExecTLS5[TL]@le+332)-65536(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: StoreArrays2:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addis r4, L..C0 at u(r2)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 1
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r5, 4
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    ld r4, L..C0 at l(r4)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    add r4, r13, r4
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, 0(r4)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLS2[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r5, 24(r4)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 2
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, 320(r3)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLS3[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 3
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, 324(r3)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, mySmallLocalExecTLS4[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r4, mySmallLocalExecTLS5[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r5, 328(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addis r3, L..C0 at u(r2)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 1
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    ld r3, L..C0 at l(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    add r3, r13, r3
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, 0(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 4
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, 24(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 2
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, (mySmallLocalExecTLS2[TL]@le+320)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 3
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, (mySmallLocalExecTLS3[TL]@le+324)-65536(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 88
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, 332(r4)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r4, (mySmallLocalExecTLS4[TL]@le+328)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stw r3, (mySmallLocalExecTLS5[TL]@le+332)-65536(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 102
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
@@ -173,77 +155,76 @@ entry:
 ; DIS:      {{.*}}aix-small-local-exec-tls-largeaccess.ll.tmp.o:	file format aix5coff64-rs6000
 ; DIS:      Disassembly of section .text:
 ; DIS:      0000000000000000 (idx: 3) .StoreArrays1:
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, 0
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 1
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 4
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, 0(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 15) mySmallLocalExecTLSv1[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 1
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 5, 4
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 0(13)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 2
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 24(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 15) mySmallLocalExecTLSv1[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 4, 13, 32748
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, -32468(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 17) mySmallLocalExecTLS2[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 5, 24(3)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 2
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, 320(4)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, -16788
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 3
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, -16464(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 19) mySmallLocalExecTLS3[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 3
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 324(3)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, -788
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 88
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, -460(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 21) mySmallLocalExecTLS4[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 88
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 5, 328(3)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, 15212
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, 15544(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 23) mySmallLocalExecTLS5[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 332(3)
 ; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 102
 ; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                blr
 
-; DIS:      0000000000000050 (idx: 5) .StoreArrays2:
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addis 4, 2, 0
+; DIS:      0000000000000040 (idx: 5) .StoreArrays2:
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addis 3, 2, 0
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TOCU	(idx: 13) mySmallLocalExecTLSv2[TE]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 1
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 5, 4
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                ld 4, 0(4)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 1
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                ld 3, 0(3)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TOCL	(idx: 13) mySmallLocalExecTLSv2[TE]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                add 4, 13, 4
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, 0(4)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, 32748
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                add 3, 13, 3
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 0(3)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 4
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 24(3)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 2
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, -32468(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 17) mySmallLocalExecTLS2[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 5, 24(4)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 2
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 320(3)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, -16788
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 3
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, -16464(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 19) mySmallLocalExecTLS3[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 3
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, 324(3)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 3, 13, -788
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 88
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 4, -460(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 21) mySmallLocalExecTLS4[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 4, 13, 15212
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, 15544(13)
 ; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 23) mySmallLocalExecTLS5[TL]
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 5, 328(3)
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 88
-; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stw 3, 332(4)
 ; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 102
 ; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                blr
 
 ; DIS:      Disassembly of section .data:
-; DIS:      00000000000000a0 (idx: 7) StoreArrays1[DS]:
+; DIS:      0000000000000080 (idx: 7) StoreArrays1[DS]:
+; DIS-NEXT:       80: 00 00 00 00
+; DIS-NEXT: 0000000000000080:  R_POS	(idx: 3) .StoreArrays1
+; DIS-NEXT:       84: 00 00 00 00
+; DIS-NEXT:       88: 00 00 00 00
+; DIS-NEXT: 0000000000000088:  R_POS        (idx: 11) TOC[TC0]
+; DIS-NEXT:       8c: 00 00 00 b0
+
+; DIS:      0000000000000098 (idx: 9) StoreArrays2[DS]:
+; DIS-NEXT:       98: 00 00 00 00
+; DIS-NEXT: 0000000000000098:  R_POS	(idx: 5) .StoreArrays2
+; DIS-NEXT:       9c: 00 00 00 40
 ; DIS-NEXT:       a0: 00 00 00 00
-; DIS-NEXT: 00000000000000a0:  R_POS	(idx: 3) .StoreArrays1
-; DIS-NEXT:       a4: 00 00 00 00
-; DIS-NEXT:       a8: 00 00 00 00
-; DIS-NEXT: 00000000000000a8:  R_POS        (idx: 11) TOC[TC0]
-; DIS-NEXT:       ac: 00 00 00 d0
+; DIS-NEXT: 00000000000000a0:  R_POS        (idx: 11) TOC[TC0]
+; DIS-NEXT:       a4: 00 00 00 b0
 
-; DIS:      00000000000000b8 (idx: 9) StoreArrays2[DS]:
-; DIS-NEXT:       b8: 00 00 00 00
-; DIS-NEXT: 00000000000000b8:  R_POS	(idx: 5) .StoreArrays2
-; DIS-NEXT:       bc: 00 00 00 50
-; DIS-NEXT:       c0: 00 00 00 00
-; DIS-NEXT: 00000000000000c0:  R_POS        (idx: 11) TOC[TC0]
-; DIS-NEXT:       c4: 00 00 00 d0
+; DIS:      00000000000000b0 (idx: 13) mySmallLocalExecTLSv2[TE]:
+; DIS-NEXT:       b0: 00 00 00 00
+; DIS-NEXT: 00000000000000b0:  R_TLS_LE     (idx: 25) mySmallLocalExecTLSv2[TL]
+; DIS-NEXT:       b4: 00 01 79 ec
 
-; DIS:      00000000000000d0 (idx: 13) mySmallLocalExecTLSv2[TE]:
-; DIS-NEXT:       d0: 00 00 00 00
-; DIS-NEXT: 00000000000000d0:  R_TLS_LE     (idx: 25) mySmallLocalExecTLSv2[TL]
-; DIS-NEXT:       d4: 00 01 79 ec
+; DIS:      Disassembly of section .tdata:
+; DIS:      0000000000000000 (idx: 15) mySmallLocalExecTLSv1[TL]:
+; DIS:      0000000000007fec (idx: 17) mySmallLocalExecTLS2[TL]:
+; DIS:      000000000000be6c (idx: 19) mySmallLocalExecTLS3[TL]:
+; DIS:      000000000000fcec (idx: 21) mySmallLocalExecTLS4[TL]:
+; DIS:      0000000000013b6c (idx: 23) mySmallLocalExecTLS5[TL]:
+; DIS:      00000000000179ec (idx: 25) mySmallLocalExecTLSv2[TL]:
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess2.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess2.ll
new file mode 100644
index 0000000000000..725b680054926
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-largeaccess2.ll
@@ -0,0 +1,160 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff -mattr=+aix-small-local-exec-tls < %s \
+; RUN:      | FileCheck %s --check-prefix=SMALL-LOCAL-EXEC-SMALLCM64
+; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
+; RUN:      -mattr=+aix-small-local-exec-tls < %s | FileCheck %s \
+; RUN:      --check-prefix=SMALL-LOCAL-EXEC-LARGECM64
+
+; Test disassembly of object.
+; RUN: llc -verify-machineinstrs -mcpu=pwr7 -mattr=+aix-small-local-exec-tls \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff -xcoff-traceback-table=false \
+; RUN:      --code-model=large -filetype=obj -o %t.o < %s
+; RUN: llvm-objdump -D -r --symbol-description %t.o | FileCheck --check-prefix=DIS %s
+
+ at mySmallLocalExecTLS6 = external thread_local(localexec) global [60 x i64], align 8
+ at mySmallLocalExecTLS2 = thread_local(localexec) global [3000 x i64] zeroinitializer, align 8
+ at MyTLSGDVar = thread_local global [800 x i64] zeroinitializer, align 8
+ at mySmallLocalExecTLS3 = thread_local(localexec) global [3000 x i64] zeroinitializer, align 8
+ at mySmallLocalExecTLS4 = thread_local(localexec) global [3000 x i64] zeroinitializer, align 8
+ at mySmallLocalExecTLS5 = thread_local(localexec) global [3000 x i64] zeroinitializer, align 8
+ at mySmallLocalExecTLS = thread_local(localexec) local_unnamed_addr global [7800 x i64] zeroinitializer, align 8
+declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
+
+; All accesses use a "faster" local-exec sequence directly off the thread pointer.
+define i64 @StoreLargeAccess1() {
+; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: StoreLargeAccess1:
+; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    mflr r0
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    stdu r1, -48(r1)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 212
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 203
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r0, 64(r1)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r3, mySmallLocalExecTLS6[UL]@le+424(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r4, mySmallLocalExecTLS2[TL]@le+1200(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    ld r4, L..C1(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    bla .__tls_get_addr[PR]
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 44
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r4, 440(r3)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 6
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r4, 100
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r3, mySmallLocalExecTLS3[TL]@le+2000(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 882
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r4, (mySmallLocalExecTLS4[TL]@le+6800)-65536(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    std r3, (mySmallLocalExecTLS5[TL]@le+8400)-65536(r13)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    li r3, 1191
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    addi r1, r1, 48
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    ld r0, 16(r1)
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    mtlr r0
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
+;
+; SMALL-LOCAL-EXEC-LARGECM64-LABEL: StoreLargeAccess1:
+; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    mflr r0
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    stdu r1, -48(r1)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 212
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r0, 64(r1)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addis r4, L..C0 at u(r2)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    ld r4, L..C0 at l(r4)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS6[UL]@le+424(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 203
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS2[TL]@le+1200(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addis r3, L..C1 at u(r2)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    ld r3, L..C1 at l(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    bla .__tls_get_addr[PR]
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 44
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r4, 440(r3)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 6
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r4, 100
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS3[TL]@le+2000(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 882
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r4, (mySmallLocalExecTLS4[TL]@le+6800)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    std r3, (mySmallLocalExecTLS5[TL]@le+8400)-65536(r13)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    li r3, 1191
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addi r1, r1, 48
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    ld r0, 16(r1)
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    mtlr r0
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
+entry:
+  %0 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS6)
+  %arrayidx = getelementptr inbounds [60 x i64], ptr %0, i64 0, i64 53
+  store i64 212, ptr %arrayidx, align 8
+  %1 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS2)
+  %arrayidx1 = getelementptr inbounds [3000 x i64], ptr %1, i64 0, i64 150
+  store i64 203, ptr %arrayidx1, align 8
+  %2 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @MyTLSGDVar)
+  %arrayidx2 = getelementptr inbounds [800 x i64], ptr %2, i64 0, i64 55
+  store i64 44, ptr %arrayidx2, align 8
+  %3 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS3)
+  %arrayidx3 = getelementptr inbounds [3000 x i64], ptr %3, i64 0, i64 250
+  store i64 6, ptr %arrayidx3, align 8
+  %4 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS4)
+  %arrayidx4 = getelementptr inbounds [3000 x i64], ptr %4, i64 0, i64 850
+  store i64 100, ptr %arrayidx4, align 8
+  %5 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS5)
+  %arrayidx5 = getelementptr inbounds [3000 x i64], ptr %5, i64 0, i64 1050
+  store i64 882, ptr %arrayidx5, align 8
+  %6 = load i64, ptr %arrayidx1, align 8
+  %7 = load i64, ptr %arrayidx3, align 8
+  %8 = load i64, ptr %arrayidx4, align 8
+  %add = add i64 %6, 882
+  %add9 = add i64 %add, %7
+  %add11 = add i64 %add9, %8
+  ret i64 %add11
+}
+
+; DIS:      0000000000000000 (idx: 7) .StoreLargeAccess1:
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                mflr 0
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                stdu 1, -48(1)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 212
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 0, 64(1)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addis 4, 2, 0
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TOCU	(idx: 13) MyTLSGDVar[TE]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                ld 4, 0(4)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TOCL	(idx: 13) MyTLSGDVar[TE]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 3, 424(13)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 1) mySmallLocalExecTLS6[UL]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 203
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 3, 1200(13)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE	(idx: 17) mySmallLocalExecTLS2[TL]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addis 3, 2, 0
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TOCU	(idx: 15) .MyTLSGDVar[TE]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                ld 3, 8(3)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TOCL	(idx: 15) .MyTLSGDVar[TE]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                bla 0
+; DIS-NEXT: {{0*}}[[#ADDR]]: R_RBA  (idx: 3)      .__tls_get_addr[PR]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 44
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 4, 440(3)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 6
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 4, 100
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 3, 32400(13)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE       (idx: 21) mySmallLocalExecTLS3[TL]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 882
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 4, -4336(13)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE       (idx: 23) mySmallLocalExecTLS4[TL]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                std 3, 21264(13)
+; DIS-NEXT: {{0*}}[[#ADDR + 2]]: R_TLS_LE       (idx: 25) mySmallLocalExecTLS5[TL]
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                li 3, 1191
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                addi 1, 1, 48
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                ld 0, 16(1)
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                mtlr 0
+; DIS-NEXT: [[#%x, ADDR:]]: {{.*}}                blr
+
+; DIS:      Disassembly of section .data:
+; DIS:      0000000000000068 (idx: 9) StoreLargeAccess1[DS]:
+; DIS-NEXT:       68: 00 00 00 00
+; DIS-NEXT: 0000000000000068:  R_POS    (idx: 7) .StoreLargeAccess1
+; DIS-NEXT:       6c: 00 00 00 00
+; DIS-NEXT:       70: 00 00 00 00
+; DIS-NEXT: 0000000000000070:  R_POS        (idx: 11) TOC[TC0]
+; DIS-NEXT:       74: 00 00 00 80
+
+; DIS:      Disassembly of section .tdata:
+; DIS:      0000000000000000 (idx: 17) mySmallLocalExecTLS2[TL]:
+; DIS:      0000000000005dc0 (idx: 19) MyTLSGDVar[TL]:
+; DIS:      00000000000076c0 (idx: 21) mySmallLocalExecTLS3[TL]:
+; DIS:      000000000000d480 (idx: 23) mySmallLocalExecTLS4[TL]:
+; DIS:      0000000000013240 (idx: 25) mySmallLocalExecTLS5[TL]:
+; DIS:      0000000000019000 (idx: 27) mySmallLocalExecTLS[TL]:
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-short.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-short.ll
index bf1b7fab30814..b172c2985e695 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-short.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-exec-tls-short.ll
@@ -16,14 +16,12 @@ declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull) #1
 define nonnull ptr @AddrTest1() local_unnamed_addr #0 {
 ; SMALL-LOCAL-EXEC-SMALLCM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-SMALLCM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, b[TL]@le(r13)
-; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    addi r3, r3, 4
+; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    la r3, b[TL]@le+4(r13)
 ; SMALL-LOCAL-EXEC-SMALLCM64-NEXT:    blr
 ;
 ; SMALL-LOCAL-EXEC-LARGECM64-LABEL: AddrTest1:
 ; SMALL-LOCAL-EXEC-LARGECM64:       # %bb.0: # %entry
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, b[TL]@le(r13)
-; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    addi r3, r3, 4
+; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    la r3, b[TL]@le+4(r13)
 ; SMALL-LOCAL-EXEC-LARGECM64-NEXT:    blr
 entry:
   %0 = tail call align 2 ptr @llvm.threadlocal.address.p0(ptr align 2 @b)

>From 67ad39ed6542c83956174b54ebbdef318b1f19a9 Mon Sep 17 00:00:00 2001
From: Jie Fu <jiefu at tencent.com>
Date: Thu, 1 Feb 2024 22:50:14 +0800
Subject: [PATCH 17/42] [PowerPC] Fix -Wunused-variable in PPCAsmPrinter.cpp
 and PPCISelDAGToDAG.cpp (NFC)

llvm-project/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp:1648:15:
error: unused variable 'InstDisp' [-Werror,-Wunused-variable]
    ptrdiff_t InstDisp = TLSVarAddress + Offset - Delta;
              ^
llvm-project/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7624:19:
error: unused variable 'TPReg' [-Werror,-Wunused-variable]
  RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
                  ^
llvm-project/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7625:23:
error: unused variable 'Subtarget' [-Werror,-Wunused-variable]
  const PPCSubtarget &Subtarget =
                      ^
---
 llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp   | 2 +-
 llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index d1a167273956c..b75501e8bae52 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -1645,7 +1645,7 @@ const MCExpr *PPCAsmPrinter::getAdjustedLocalExecExpr(const MachineOperand &MO,
     // where Delta is a multiple of 64KB: ((FinalAddress + 32768) & ~0xFFFF).
     ptrdiff_t Delta = ((FinalAddress + 32768) & ~0xFFFF);
     // Check that the total instruction displacement fits within [-32768,32768).
-    ptrdiff_t InstDisp = TLSVarAddress + Offset - Delta;
+    [[maybe_unused]] ptrdiff_t InstDisp = TLSVarAddress + Offset - Delta;
     assert((InstDisp < 32768) ||
            (InstDisp >= -32768) &&
                "Expecting the instruction displacement for local-exec TLS "
diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index 97df3000e402c..6a3710407bc4f 100644
--- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -7621,8 +7621,8 @@ static void foldADDIForLocalExecAccesses(SDNode *N, SelectionDAG *DAG) {
   // access. The first operand of InitialADDI should be the thread pointer,
   // which has been checked in isEligibleToFoldADDIForLocalExecAccesses().
   SDValue TPRegNode = InitialADDI.getOperand(0);
-  RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
-  const PPCSubtarget &Subtarget =
+  [[maybe_unused]] RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
+  [[maybe_unused]] const PPCSubtarget &Subtarget =
       DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
   assert((TPReg && (TPReg->getReg() == Subtarget.getThreadPointerRegister())) &&
          "Expecting the first operand to be a thread pointer for folding addi "

>From 5f3615ff6bfadce543f380c219dd0115ab3aa1ff Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Thu, 1 Feb 2024 16:05:29 +0100
Subject: [PATCH 18/42] [LoopUnroll] Add test for #80289 (NFC)

---
 .../Transforms/LoopUnroll/runtime-i128.ll     | 72 +++++++++++++++++++
 1 file changed, 72 insertions(+)
 create mode 100644 llvm/test/Transforms/LoopUnroll/runtime-i128.ll

diff --git a/llvm/test/Transforms/LoopUnroll/runtime-i128.ll b/llvm/test/Transforms/LoopUnroll/runtime-i128.ll
new file mode 100644
index 0000000000000..50e09beddb37a
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/runtime-i128.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=loop-unroll -unroll-runtime < %s | FileCheck %s
+
+declare void @foo()
+
+define void @test(i128 %n, i128 %m) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: i128 [[N:%.*]], i128 [[M:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = freeze i128 [[N]]
+; CHECK-NEXT:    [[TMP1:%.*]] = add i128 [[TMP0]], 18446744073709551615
+; CHECK-NEXT:    [[XTRAITER:%.*]] = and i128 [[TMP0]], 7
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i128 [[TMP1]], 7
+; CHECK-NEXT:    br i1 [[TMP2]], label [[EXIT_UNR_LCSSA:%.*]], label [[ENTRY_NEW:%.*]]
+; CHECK:       entry.new:
+; CHECK-NEXT:    [[UNROLL_ITER:%.*]] = sub i128 [[TMP0]], [[XTRAITER]]
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i128 [ 0, [[ENTRY_NEW]] ], [ [[IV_NEXT_7:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[NITER:%.*]] = phi i128 [ 0, [[ENTRY_NEW]] ], [ [[NITER_NEXT_7:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    [[IV_NEXT_7]] = add i128 [[IV]], 8
+; CHECK-NEXT:    [[NITER_NEXT_7]] = add i128 [[NITER]], 8
+; CHECK-NEXT:    [[NITER_NCMP_7:%.*]] = icmp ne i128 [[NITER_NEXT_7]], [[UNROLL_ITER]]
+; CHECK-NEXT:    br i1 [[NITER_NCMP_7]], label [[LOOP]], label [[EXIT_UNR_LCSSA_LOOPEXIT:%.*]]
+; CHECK:       exit.unr-lcssa.loopexit:
+; CHECK-NEXT:    [[IV_UNR_PH:%.*]] = phi i128 [ [[IV_NEXT_7]], [[LOOP]] ]
+; CHECK-NEXT:    br label [[EXIT_UNR_LCSSA]]
+; CHECK:       exit.unr-lcssa:
+; CHECK-NEXT:    [[IV_UNR:%.*]] = phi i128 [ 0, [[ENTRY:%.*]] ], [ [[IV_UNR_PH]], [[EXIT_UNR_LCSSA_LOOPEXIT]] ]
+; CHECK-NEXT:    [[LCMP_MOD:%.*]] = icmp ne i128 [[XTRAITER]], 0
+; CHECK-NEXT:    br i1 [[LCMP_MOD]], label [[LOOP_EPIL_PREHEADER:%.*]], label [[EXIT:%.*]]
+; CHECK:       loop.epil.preheader:
+; CHECK-NEXT:    br label [[LOOP_EPIL:%.*]]
+; CHECK:       loop.epil:
+; CHECK-NEXT:    [[IV_EPIL:%.*]] = phi i128 [ [[IV_UNR]], [[LOOP_EPIL_PREHEADER]] ], [ [[IV_NEXT_EPIL:%.*]], [[LOOP_EPIL]] ]
+; CHECK-NEXT:    [[EPIL_ITER:%.*]] = phi i128 [ 0, [[LOOP_EPIL_PREHEADER]] ], [ [[EPIL_ITER_NEXT:%.*]], [[LOOP_EPIL]] ]
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    [[IV_NEXT_EPIL]] = add i128 [[IV_EPIL]], 1
+; CHECK-NEXT:    [[CMP_EPIL:%.*]] = icmp ne i128 [[IV_NEXT_EPIL]], [[N]]
+; CHECK-NEXT:    [[EPIL_ITER_NEXT]] = add i128 [[EPIL_ITER]], 1
+; CHECK-NEXT:    [[EPIL_ITER_CMP:%.*]] = icmp ne i128 [[EPIL_ITER_NEXT]], [[XTRAITER]]
+; CHECK-NEXT:    br i1 [[EPIL_ITER_CMP]], label [[LOOP_EPIL]], label [[EXIT_EPILOG_LCSSA:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       exit.epilog-lcssa:
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i128 [ 0, %entry ], [ %iv.next, %loop ]
+  call void @foo()
+  %iv.next = add i128 %iv, 1
+  %cmp = icmp ne i128 %iv.next, %n
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.unroll.disable"}
+;.

>From 3c854878023ffa6b2ec3d07d0b2df5c9982892c6 Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Thu, 1 Feb 2024 16:06:58 +0100
Subject: [PATCH 19/42] [LoopUnroll] Fix missing sign extension

For integers larger than 64-bit, this would zero-extend a -1
value, instead of sign-extending it.

Fixes https://github.com/llvm/llvm-project/issues/80289.
---
 llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp | 2 +-
 llvm/test/Transforms/LoopUnroll/runtime-i128.ll | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
index 612f699708814..650f055356c07 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
@@ -776,7 +776,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
       !isGuaranteedNotToBeUndefOrPoison(TripCount, AC, PreHeaderBR, DT)) {
     TripCount = B.CreateFreeze(TripCount);
     BECount =
-        B.CreateAdd(TripCount, ConstantInt::get(TripCount->getType(), -1));
+        B.CreateAdd(TripCount, Constant::getAllOnesValue(TripCount->getType()));
   } else {
     // If we don't need to freeze, use SCEVExpander for BECount as well, to
     // allow slightly better value reuse.
diff --git a/llvm/test/Transforms/LoopUnroll/runtime-i128.ll b/llvm/test/Transforms/LoopUnroll/runtime-i128.ll
index 50e09beddb37a..4cd8e7ca5d16f 100644
--- a/llvm/test/Transforms/LoopUnroll/runtime-i128.ll
+++ b/llvm/test/Transforms/LoopUnroll/runtime-i128.ll
@@ -8,7 +8,7 @@ define void @test(i128 %n, i128 %m) {
 ; CHECK-SAME: i128 [[N:%.*]], i128 [[M:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = freeze i128 [[N]]
-; CHECK-NEXT:    [[TMP1:%.*]] = add i128 [[TMP0]], 18446744073709551615
+; CHECK-NEXT:    [[TMP1:%.*]] = add i128 [[TMP0]], -1
 ; CHECK-NEXT:    [[XTRAITER:%.*]] = and i128 [[TMP0]], 7
 ; CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i128 [[TMP1]], 7
 ; CHECK-NEXT:    br i1 [[TMP2]], label [[EXIT_UNR_LCSSA:%.*]], label [[ENTRY_NEW:%.*]]

>From f39d51fe556c1b8c99862ea5d3d5d7aed4c0ca7c Mon Sep 17 00:00:00 2001
From: Amir Ayupov <aaupov at fb.com>
Date: Thu, 1 Feb 2024 07:16:40 -0800
Subject: [PATCH 20/42] [BOLT] Add extra staleness logging (#80225)

Report two extra metrics:
- # of stale functions with matching block count,
- # of stale blocks with matching instruction count.
---
 bolt/include/bolt/Core/BinaryContext.h    |  5 +++++
 bolt/lib/Passes/BinaryPasses.cpp          | 16 ++++++++++++++++
 bolt/lib/Profile/StaleProfileMatching.cpp |  3 +++
 bolt/lib/Profile/YAMLProfileReader.cpp    | 20 ++++++++++----------
 4 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/bolt/include/bolt/Core/BinaryContext.h b/bolt/include/bolt/Core/BinaryContext.h
index b6614917a27c3..f1db1fbded6a4 100644
--- a/bolt/include/bolt/Core/BinaryContext.h
+++ b/bolt/include/bolt/Core/BinaryContext.h
@@ -671,6 +671,11 @@ class BinaryContext {
     uint64_t StaleSampleCount{0};
     ///   the count of matched samples
     uint64_t MatchedSampleCount{0};
+    ///   the number of stale functions that have matching number of blocks in
+    ///   the profile
+    uint64_t NumStaleFuncsWithEqualBlockCount{0};
+    ///   the number of blocks that have matching size but a differing hash
+    uint64_t NumStaleBlocksWithEqualIcount{0};
   } Stats;
 
   // Address of the first allocated segment.
diff --git a/bolt/lib/Passes/BinaryPasses.cpp b/bolt/lib/Passes/BinaryPasses.cpp
index 955cd3726ad41..8505d37491415 100644
--- a/bolt/lib/Passes/BinaryPasses.cpp
+++ b/bolt/lib/Passes/BinaryPasses.cpp
@@ -1420,6 +1420,12 @@ void PrintProgramStats::runOnFunctions(BinaryContext &BC) {
   if (NumAllStaleFunctions) {
     const float PctStale =
         NumAllStaleFunctions / (float)NumAllProfiledFunctions * 100.0f;
+    const float PctStaleFuncsWithEqualBlockCount =
+        (float)BC.Stats.NumStaleFuncsWithEqualBlockCount /
+        NumAllStaleFunctions * 100.0f;
+    const float PctStaleBlocksWithEqualIcount =
+        (float)BC.Stats.NumStaleBlocksWithEqualIcount /
+        BC.Stats.NumStaleBlocks * 100.0f;
     auto printErrorOrWarning = [&]() {
       if (PctStale > opts::StaleThreshold)
         errs() << "BOLT-ERROR: ";
@@ -1442,6 +1448,16 @@ void PrintProgramStats::runOnFunctions(BinaryContext &BC) {
              << "%) belong to functions with invalid"
                 " (possibly stale) profile.\n";
     }
+    outs() << "BOLT-INFO: " << BC.Stats.NumStaleFuncsWithEqualBlockCount
+           << " stale function"
+           << (BC.Stats.NumStaleFuncsWithEqualBlockCount == 1 ? "" : "s")
+           << format(" (%.1f%% of all stale)", PctStaleFuncsWithEqualBlockCount)
+           << " have matching block count.\n";
+    outs() << "BOLT-INFO: " << BC.Stats.NumStaleBlocksWithEqualIcount
+           << " stale block"
+           << (BC.Stats.NumStaleBlocksWithEqualIcount == 1 ? "" : "s")
+           << format(" (%.1f%% of all stale)", PctStaleBlocksWithEqualIcount)
+           << " have matching icount.\n";
     if (PctStale > opts::StaleThreshold) {
       errs() << "BOLT-ERROR: stale functions exceed specified threshold of "
              << opts::StaleThreshold << "%. Exiting.\n";
diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp b/bolt/lib/Profile/StaleProfileMatching.cpp
index 26180f1321477..631ccaec6ae61 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -418,6 +418,7 @@ void matchWeightsByHashes(BinaryContext &BC,
     if (MatchedBlock == nullptr && YamlBB.Index == 0)
       MatchedBlock = Blocks[0];
     if (MatchedBlock != nullptr) {
+      const BinaryBasicBlock *BB = BlockOrder[MatchedBlock->Index - 1];
       MatchedBlocks[YamlBB.Index] = MatchedBlock;
       BlendedBlockHash BinHash = BlendedHashes[MatchedBlock->Index - 1];
       LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBB.Index << ")"
@@ -433,6 +434,8 @@ void matchWeightsByHashes(BinaryContext &BC,
       } else {
         LLVM_DEBUG(dbgs() << "  loose match\n");
       }
+      if (YamlBB.NumInstructions == BB->size())
+        ++BC.Stats.NumStaleBlocksWithEqualIcount;
     } else {
       LLVM_DEBUG(
           dbgs() << "Couldn't match yaml block (bid = " << YamlBB.Index << ")"
diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp b/bolt/lib/Profile/YAMLProfileReader.cpp
index a4a401fd3cabf..e4673f6e3c301 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -246,20 +246,20 @@ bool YAMLProfileReader::parseFunctionProfile(
 
   ProfileMatched &= !MismatchedBlocks && !MismatchedCalls && !MismatchedEdges;
 
-  if (ProfileMatched)
-    BF.markProfiled(YamlBP.Header.Flags);
+  if (!ProfileMatched) {
+    if (opts::Verbosity >= 1)
+      errs() << "BOLT-WARNING: " << MismatchedBlocks << " blocks, "
+             << MismatchedCalls << " calls, and " << MismatchedEdges
+             << " edges in profile did not match function " << BF << '\n';
 
-  if (!ProfileMatched && opts::Verbosity >= 1)
-    errs() << "BOLT-WARNING: " << MismatchedBlocks << " blocks, "
-           << MismatchedCalls << " calls, and " << MismatchedEdges
-           << " edges in profile did not match function " << BF << '\n';
+    if (YamlBF.NumBasicBlocks != BF.size())
+      ++BC.Stats.NumStaleFuncsWithEqualBlockCount;
 
-  if (!ProfileMatched && opts::InferStaleProfile) {
-    if (inferStaleProfile(BF, YamlBF)) {
+    if (opts::InferStaleProfile && inferStaleProfile(BF, YamlBF))
       ProfileMatched = true;
-      BF.markProfiled(YamlBP.Header.Flags);
-    }
   }
+  if (ProfileMatched)
+    BF.markProfiled(YamlBP.Header.Flags);
 
   return ProfileMatched;
 }

>From b84e9cc72172acc72c00de7b4dc3e6661aba3294 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <yaxun.liu at amd.com>
Date: Thu, 1 Feb 2024 10:33:51 -0500
Subject: [PATCH 21/42] [HIP] fix HIP detection for /usr (#80190)

Skip checking HIP version file under parent directory for /usr/local
since /usr will be checked after /usr/local.

Fixes: https://github.com/llvm/llvm-project/issues/78344
---
 clang/lib/Driver/ToolChains/AMDGPU.cpp | 14 ++++++++++----
 clang/test/Driver/rocm-detect.hip      | 17 ++++++++++++++++-
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index b3c9d5908654f..4a35da6140b2a 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -486,10 +486,16 @@ void RocmInstallationDetector::detectHIPRuntime() {
       return newpath;
     };
     // If HIP version file can be found and parsed, use HIP version from there.
-    for (const auto &VersionFilePath :
-         {Append(SharePath, "hip", "version"),
-          Append(ParentSharePath, "hip", "version"),
-          Append(BinPath, ".hipVersion")}) {
+    std::vector<SmallString<0>> VersionFilePaths = {
+        Append(SharePath, "hip", "version"),
+        InstallPath != D.SysRoot + "/usr/local"
+            ? Append(ParentSharePath, "hip", "version")
+            : SmallString<0>(),
+        Append(BinPath, ".hipVersion")};
+
+    for (const auto &VersionFilePath : VersionFilePaths) {
+      if (VersionFilePath.empty())
+        continue;
       llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> VersionFile =
           FS.getBufferForFile(VersionFilePath);
       if (!VersionFile)
diff --git a/clang/test/Driver/rocm-detect.hip b/clang/test/Driver/rocm-detect.hip
index 3644f215a345b..0db994af556f3 100644
--- a/clang/test/Driver/rocm-detect.hip
+++ b/clang/test/Driver/rocm-detect.hip
@@ -77,8 +77,18 @@
 // RUN:   --hip-path=%t/myhip --print-rocm-search-dirs %s 2>&1 \
 // RUN:   | FileCheck -check-prefixes=ROCM-ENV,HIP-PATH %s
 
+// Test detecting /usr directory.
+// RUN: rm -rf %t/*
+// RUN: cp -r %S/Inputs/rocm %t/usr
+// RUN: mkdir -p %t/usr/share/hip
+// RUN: mv %t/usr/bin/.hipVersion %t/usr/share/hip/version
+// RUN: mkdir -p %t/usr/local
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx1010 --sysroot=%t \
+// RUN:   --print-rocm-search-dirs --hip-link %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=USR %s
+
 // Test detecting latest /opt/rocm-{release} directory.
-// RUN: rm -rf %t/opt
+// RUN: rm -rf %t/*
 // RUN: mkdir -p %t/opt
 // RUN: cp -r %S/Inputs/rocm %t/opt/rocm-3.9.0-1234
 // RUN: cp -r %S/Inputs/rocm %t/opt/rocm-3.10.0
@@ -130,6 +140,11 @@
 // ROCM-PATH: "-idirafter" "[[ROCM_PATH]]/include"
 // ROCM-PATH: "-L[[ROCM_PATH]]/lib" {{.*}}"-lamdhip64"
 
+// USR: ROCm installation search path: [[ROCM_PATH:.*/usr$]]
+// USR: "-mlink-builtin-bitcode" "[[ROCM_PATH]]/amdgcn/bitcode/oclc_isa_version_1010.bc"
+// USR: "-idirafter" "[[ROCM_PATH]]/include"
+// USR: "-L[[ROCM_PATH]]/lib" {{.*}}"-lamdhip64"
+
 // ROCM-REL: ROCm installation search path: {{.*}}/opt/rocm
 // ROCM-REL: ROCm installation search path: {{.*}}/opt/rocm-3.10.0
 

>From 27ca8c95fcb4b77dd15f6d7aaff5d2915f064442 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Thu, 1 Feb 2024 18:37:19 +0300
Subject: [PATCH 22/42] [mlir][scf] Add reductions support to `scf.parallel`
 fusion (#75955)

Properly handle fusion of loops with reductions:
* Check there are no first loop results users between loops
* Create new loop op with merged reduction init values
* Update `scf.reduce` op to contain reductions from both loops
* Update loops users with new loop results
---
 .../SCF/Transforms/ParallelLoopFusion.cpp     |  74 +++++-
 .../Dialect/SCF/parallel-loop-fusion.mlir     | 240 +++++++++++++++++-
 2 files changed, 304 insertions(+), 10 deletions(-)

diff --git a/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp b/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
index d3dca1427e517..5934d85373b03 100644
--- a/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
@@ -161,29 +161,85 @@ static bool isFusionLegal(ParallelOp firstPloop, ParallelOp secondPloop,
 }
 
 /// Prepends operations of firstPloop's body into secondPloop's body.
-static void fuseIfLegal(ParallelOp firstPloop, ParallelOp secondPloop,
-                        OpBuilder b,
+/// Updates secondPloop with new loop.
+static void fuseIfLegal(ParallelOp firstPloop, ParallelOp &secondPloop,
+                        OpBuilder builder,
                         llvm::function_ref<bool(Value, Value)> mayAlias) {
+  Block *block1 = firstPloop.getBody();
+  Block *block2 = secondPloop.getBody();
   IRMapping firstToSecondPloopIndices;
-  firstToSecondPloopIndices.map(firstPloop.getBody()->getArguments(),
-                                secondPloop.getBody()->getArguments());
+  firstToSecondPloopIndices.map(block1->getArguments(), block2->getArguments());
 
   if (!isFusionLegal(firstPloop, secondPloop, firstToSecondPloopIndices,
                      mayAlias))
     return;
 
-  b.setInsertionPointToStart(secondPloop.getBody());
-  for (auto &op : firstPloop.getBody()->without_terminator())
-    b.clone(op, firstToSecondPloopIndices);
+  DominanceInfo dom;
+  // We are fusing first loop into second, make sure there are no users of the
+  // first loop results between loops.
+  for (Operation *user : firstPloop->getUsers())
+    if (!dom.properlyDominates(secondPloop, user, /*enclosingOpOk*/ false))
+      return;
+
+  ValueRange inits1 = firstPloop.getInitVals();
+  ValueRange inits2 = secondPloop.getInitVals();
+
+  SmallVector<Value> newInitVars(inits1.begin(), inits1.end());
+  newInitVars.append(inits2.begin(), inits2.end());
+
+  IRRewriter b(builder);
+  b.setInsertionPoint(secondPloop);
+  auto newSecondPloop = b.create<ParallelOp>(
+      secondPloop.getLoc(), secondPloop.getLowerBound(),
+      secondPloop.getUpperBound(), secondPloop.getStep(), newInitVars);
+
+  Block *newBlock = newSecondPloop.getBody();
+  auto term1 = cast<ReduceOp>(block1->getTerminator());
+  auto term2 = cast<ReduceOp>(block2->getTerminator());
+
+  b.inlineBlockBefore(block2, newBlock, newBlock->begin(),
+                      newBlock->getArguments());
+  b.inlineBlockBefore(block1, newBlock, newBlock->begin(),
+                      newBlock->getArguments());
+
+  ValueRange results = newSecondPloop.getResults();
+  if (!results.empty()) {
+    b.setInsertionPointToEnd(newBlock);
+
+    ValueRange reduceArgs1 = term1.getOperands();
+    ValueRange reduceArgs2 = term2.getOperands();
+    SmallVector<Value> newReduceArgs(reduceArgs1.begin(), reduceArgs1.end());
+    newReduceArgs.append(reduceArgs2.begin(), reduceArgs2.end());
+
+    auto newReduceOp = b.create<scf::ReduceOp>(term2.getLoc(), newReduceArgs);
+
+    for (auto &&[i, reg] : llvm::enumerate(llvm::concat<Region>(
+             term1.getReductions(), term2.getReductions()))) {
+      Block &oldRedBlock = reg.front();
+      Block &newRedBlock = newReduceOp.getReductions()[i].front();
+      b.inlineBlockBefore(&oldRedBlock, &newRedBlock, newRedBlock.begin(),
+                          newRedBlock.getArguments());
+    }
+
+    firstPloop.replaceAllUsesWith(results.take_front(inits1.size()));
+    secondPloop.replaceAllUsesWith(results.take_back(inits2.size()));
+  }
+  term1->erase();
+  term2->erase();
   firstPloop.erase();
+  secondPloop.erase();
+  secondPloop = newSecondPloop;
 }
 
 void mlir::scf::naivelyFuseParallelOps(
     Region &region, llvm::function_ref<bool(Value, Value)> mayAlias) {
   OpBuilder b(region);
   // Consider every single block and attempt to fuse adjacent loops.
+  SmallVector<SmallVector<ParallelOp>, 1> ploopChains;
   for (auto &block : region) {
-    SmallVector<SmallVector<ParallelOp, 8>, 1> ploopChains{{}};
+    ploopChains.clear();
+    ploopChains.push_back({});
+
     // Not using `walk()` to traverse only top-level parallel loops and also
     // make sure that there are no side-effecting ops between the parallel
     // loops.
@@ -201,7 +257,7 @@ void mlir::scf::naivelyFuseParallelOps(
       // TODO: Handle region side effects properly.
       noSideEffects &= isMemoryEffectFree(&op) && op.getNumRegions() == 0;
     }
-    for (ArrayRef<ParallelOp> ploops : ploopChains) {
+    for (MutableArrayRef<ParallelOp> ploops : ploopChains) {
       for (int i = 0, e = ploops.size(); i + 1 < e; ++i)
         fuseIfLegal(ploops[i], ploops[i + 1], b, mayAlias);
     }
diff --git a/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir b/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
index 9c136bb635658..0d4ea6f20e8d9 100644
--- a/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
+++ b/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
@@ -24,6 +24,32 @@ func.func @fuse_empty_loops() {
 
 // -----
 
+func.func @fuse_ops_between(%A: f32, %B: f32) -> f32 {
+  %c2 = arith.constant 2 : index
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
+    scf.reduce
+  }
+  %res = arith.addf %A, %B : f32
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
+    scf.reduce
+  }
+  return %res : f32
+}
+// CHECK-LABEL: func @fuse_ops_between
+// CHECK-DAG:    [[C0:%.*]] = arith.constant 0 : index
+// CHECK-DAG:    [[C1:%.*]] = arith.constant 1 : index
+// CHECK-DAG:    [[C2:%.*]] = arith.constant 2 : index
+// CHECK:        %{{.*}} = arith.addf %{{.*}}, %{{.*}} : f32
+// CHECK:        scf.parallel ([[I:%.*]], [[J:%.*]]) = ([[C0]], [[C0]])
+// CHECK-SAME:       to ([[C2]], [[C2]]) step ([[C1]], [[C1]]) {
+// CHECK:          scf.reduce
+// CHECK:        }
+// CHECK-NOT:    scf.parallel
+
+// -----
+
 func.func @fuse_two(%A: memref<2x2xf32>, %B: memref<2x2xf32>) {
   %c2 = arith.constant 2 : index
   %c0 = arith.constant 0 : index
@@ -89,7 +115,7 @@ func.func @fuse_three(%A: memref<2x2xf32>, %B: memref<2x2xf32>) {
     memref.store %product_elem, %prod[%i, %j] : memref<2x2xf32>
     scf.reduce
   }
-  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) { 
+  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
     %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
     %res_elem = arith.addf %A_elem, %c2fp : f32
     memref.store %res_elem, %B[%i, %j] : memref<2x2xf32>
@@ -575,3 +601,215 @@ func.func @do_not_fuse_affine_apply_to_non_ind_var(
 // CHECK-NEXT:    }
 // CHECK-NEXT:    memref.dealloc %[[ALLOC]] : memref<2x3xf32>
 // CHECK-NEXT:    return
+
+// -----
+
+func.func @fuse_reductions_two(%A: memref<2x2xf32>, %B: memref<2x2xf32>) -> (f32, f32) {
+  %c2 = arith.constant 2 : index
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %init1 = arith.constant 1.0 : f32
+  %init2 = arith.constant 2.0 : f32
+  %res1 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init1) -> f32 {
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    scf.reduce(%A_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.addf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  %res2 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init2) -> f32 {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    scf.reduce(%B_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.mulf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  return %res1, %res2 : f32, f32
+}
+
+// CHECK-LABEL: func @fuse_reductions_two
+//  CHECK-SAME:  (%[[A:.*]]: memref<2x2xf32>, %[[B:.*]]: memref<2x2xf32>) -> (f32, f32)
+//   CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
+//   CHECK-DAG:   %[[C1:.*]] = arith.constant 1 : index
+//   CHECK-DAG:   %[[C2:.*]] = arith.constant 2 : index
+//   CHECK-DAG:   %[[INIT1:.*]] = arith.constant 1.000000e+00 : f32
+//   CHECK-DAG:   %[[INIT2:.*]] = arith.constant 2.000000e+00 : f32
+//       CHECK:   %[[RES:.*]]:2 = scf.parallel (%[[I:.*]], %[[J:.*]]) = (%[[C0]], %[[C0]])
+//  CHECK-SAME:   to (%[[C2]], %[[C2]]) step (%[[C1]], %[[C1]])
+//  CHECK-SAME:   init (%[[INIT1]], %[[INIT2]]) -> (f32, f32)
+//       CHECK:   %[[VAL_A:.*]] = memref.load %[[A]][%[[I]], %[[J]]]
+//       CHECK:   %[[VAL_B:.*]] = memref.load %[[B]][%[[I]], %[[J]]]
+//       CHECK:   scf.reduce(%[[VAL_A]], %[[VAL_B]] : f32, f32) {
+//       CHECK:   ^bb0(%[[LHS:.*]]: f32, %[[RHS:.*]]: f32):
+//       CHECK:     %[[R:.*]] = arith.addf %[[LHS]], %[[RHS]] : f32
+//       CHECK:     scf.reduce.return %[[R]] : f32
+//       CHECK:   }
+//       CHECK:   ^bb0(%[[LHS:.*]]: f32, %[[RHS:.*]]: f32):
+//       CHECK:     %[[R:.*]] = arith.mulf %[[LHS]], %[[RHS]] : f32
+//       CHECK:     scf.reduce.return %[[R]] : f32
+//       CHECK:   }
+//       CHECK:   return %[[RES]]#0, %[[RES]]#1 : f32, f32
+
+// -----
+
+func.func @fuse_reductions_three(%A: memref<2x2xf32>, %B: memref<2x2xf32>, %C: memref<2x2xf32>) -> (f32, f32, f32) {
+  %c2 = arith.constant 2 : index
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %init1 = arith.constant 1.0 : f32
+  %init2 = arith.constant 2.0 : f32
+  %init3 = arith.constant 3.0 : f32
+  %res1 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init1) -> f32 {
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    scf.reduce(%A_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.addf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  %res2 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init2) -> f32 {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    scf.reduce(%B_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.mulf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  %res3 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init3) -> f32 {
+    %A_elem = memref.load %C[%i, %j] : memref<2x2xf32>
+    scf.reduce(%A_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.addf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  return %res1, %res2, %res3 : f32, f32, f32
+}
+
+// CHECK-LABEL: func @fuse_reductions_three
+//  CHECK-SAME:  (%[[A:.*]]: memref<2x2xf32>, %[[B:.*]]: memref<2x2xf32>, %[[C:.*]]: memref<2x2xf32>) -> (f32, f32, f32)
+//   CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
+//   CHECK-DAG:   %[[C1:.*]] = arith.constant 1 : index
+//   CHECK-DAG:   %[[C2:.*]] = arith.constant 2 : index
+//   CHECK-DAG:   %[[INIT1:.*]] = arith.constant 1.000000e+00 : f32
+//   CHECK-DAG:   %[[INIT2:.*]] = arith.constant 2.000000e+00 : f32
+//   CHECK-DAG:   %[[INIT3:.*]] = arith.constant 3.000000e+00 : f32
+//       CHECK:   %[[RES:.*]]:3 = scf.parallel (%[[I:.*]], %[[J:.*]]) = (%[[C0]], %[[C0]])
+//  CHECK-SAME:   to (%[[C2]], %[[C2]]) step (%[[C1]], %[[C1]])
+//  CHECK-SAME:   init (%[[INIT1]], %[[INIT2]], %[[INIT3]]) -> (f32, f32, f32)
+//       CHECK:   %[[VAL_A:.*]] = memref.load %[[A]][%[[I]], %[[J]]]
+//       CHECK:   %[[VAL_B:.*]] = memref.load %[[B]][%[[I]], %[[J]]]
+//       CHECK:   %[[VAL_C:.*]] = memref.load %[[C]][%[[I]], %[[J]]]
+//       CHECK:   scf.reduce(%[[VAL_A]], %[[VAL_B]], %[[VAL_C]] : f32, f32, f32) {
+//       CHECK:   ^bb0(%[[LHS:.*]]: f32, %[[RHS:.*]]: f32):
+//       CHECK:     %[[R:.*]] = arith.addf %[[LHS]], %[[RHS]] : f32
+//       CHECK:     scf.reduce.return %[[R]] : f32
+//       CHECK:   }
+//       CHECK:   ^bb0(%[[LHS:.*]]: f32, %[[RHS:.*]]: f32):
+//       CHECK:     %[[R:.*]] = arith.mulf %[[LHS]], %[[RHS]] : f32
+//       CHECK:     scf.reduce.return %[[R]] : f32
+//       CHECK:   }
+//       CHECK:   ^bb0(%[[LHS:.*]]: f32, %[[RHS:.*]]: f32):
+//       CHECK:     %[[R:.*]] = arith.addf %[[LHS]], %[[RHS]] : f32
+//       CHECK:     scf.reduce.return %[[R]] : f32
+//       CHECK:   }
+//       CHECK:   return %[[RES]]#0, %[[RES]]#1, %[[RES]]#2 : f32, f32, f32
+
+// -----
+
+func.func @reductions_use_res(%A: memref<2x2xf32>, %B: memref<2x2xf32>) -> (f32, f32) {
+  %c2 = arith.constant 2 : index
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %init1 = arith.constant 1.0 : f32
+  %res1 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init1) -> f32 {
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    scf.reduce(%A_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.addf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  %res2 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%res1) -> f32 {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    scf.reduce(%B_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.mulf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  return %res1, %res2 : f32, f32
+}
+
+// %res1 is used as second scf.parallel arg, cannot fuse
+// CHECK-LABEL: func @reductions_use_res
+// CHECK:      scf.parallel
+// CHECK:      scf.parallel
+
+// -----
+
+func.func @reductions_use_res_inside(%A: memref<2x2xf32>, %B: memref<2x2xf32>) -> (f32, f32) {
+  %c2 = arith.constant 2 : index
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %init1 = arith.constant 1.0 : f32
+  %init2 = arith.constant 2.0 : f32
+  %res1 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init1) -> f32 {
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    scf.reduce(%A_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.addf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  %res2 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init2) -> f32 {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    %sum = arith.addf %B_elem, %res1 : f32
+    scf.reduce(%sum : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.mulf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  return %res1, %res2 : f32, f32
+}
+
+// %res1 is used inside second scf.parallel, cannot fuse
+// CHECK-LABEL: func @reductions_use_res_inside
+// CHECK:      scf.parallel
+// CHECK:      scf.parallel
+
+// -----
+
+func.func @reductions_use_res_between(%A: memref<2x2xf32>, %B: memref<2x2xf32>) -> (f32, f32, f32) {
+  %c2 = arith.constant 2 : index
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %init1 = arith.constant 1.0 : f32
+  %init2 = arith.constant 2.0 : f32
+  %res1 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init1) -> f32 {
+    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
+    scf.reduce(%A_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.addf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  %res3 = arith.addf %res1, %init2 : f32
+  %res2 = scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) init(%init2) -> f32 {
+    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
+    scf.reduce(%B_elem : f32) {
+    ^bb0(%lhs: f32, %rhs: f32):
+      %1 = arith.mulf %lhs, %rhs : f32
+      scf.reduce.return %1 : f32
+    }
+  }
+  return %res1, %res2, %res3 : f32, f32, f32
+}
+
+// instruction in between the loops uses the first loop result
+// CHECK-LABEL: func @reductions_use_res_between
+// CHECK:      scf.parallel
+// CHECK:      scf.parallel

>From a6be317f7a30f8db56af2fba993771c11701e613 Mon Sep 17 00:00:00 2001
From: Philip Reames <preames at rivosinc.com>
Date: Thu, 1 Feb 2024 07:42:19 -0800
Subject: [PATCH 23/42] [LSR] Add tests for restricting term-fold budget based
 on exact trip count

---
 .../LoopStrengthReduce/lsr-term-fold.ll       | 129 ++++++++++++++++--
 1 file changed, 121 insertions(+), 8 deletions(-)

diff --git a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
index c6ffca5f145e4..be24ecf112e30 100644
--- a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
+++ b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
@@ -474,8 +474,11 @@ for.end:                                          ; preds = %for.body
   ret void
 }
 
-define void @expensive_expand_short_tc(ptr %a, i32 %offset, i32 %n) {
-; CHECK-LABEL: @expensive_expand_short_tc(
+;; The next step of tests exercise various cases with the expansion
+;; budget and different trip counts or estimated trip counts.
+
+define void @profiled_short_tc(ptr %a, i32 %offset, i32 %n) {
+; CHECK-LABEL: @profiled_short_tc(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
@@ -514,8 +517,8 @@ for.end:                                          ; preds = %for.body
   ret void
 }
 
-define void @expensive_expand_long_tc(ptr %a, i32 %offset, i32 %n) {
-; CHECK-LABEL: @expensive_expand_long_tc(
+define void @profiled_long_tc(ptr %a, i32 %offset, i32 %n) {
+; CHECK-LABEL: @profiled_long_tc(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
@@ -554,8 +557,8 @@ for.end:                                          ; preds = %for.body
   ret void
 }
 
-define void @expensive_expand_unknown_tc(ptr %a, i32 %offset, i32 %n) {
-; CHECK-LABEL: @expensive_expand_unknown_tc(
+define void @unknown_tc(ptr %a, i32 %offset, i32 %n) {
+; CHECK-LABEL: @unknown_tc(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
@@ -594,8 +597,8 @@ for.end:                                          ; preds = %for.body
   ret void
 }
 
-define void @expensive_expand_unknown_tc2(ptr %a, i32 %offset, i32 %n, i32 %step) mustprogress {
-; CHECK-LABEL: @expensive_expand_unknown_tc2(
+define void @unknown_tc2(ptr %a, i32 %offset, i32 %n, i32 %step) mustprogress {
+; CHECK-LABEL: @unknown_tc2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
@@ -628,3 +631,113 @@ for.body:                                         ; preds = %for.body, %entry
 for.end:                                          ; preds = %for.body
   ret void
 }
+
+define void @small_tc_trivial_loop(ptr %a, i32 %offset) {
+; CHECK-LABEL: @small_tc_trivial_loop(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
+; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
+; CHECK-NEXT:    [[TMP0:%.*]] = sext i32 [[OFFSET_NONZERO]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i64 [[TMP0]], 84
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET_NONZERO]]
+; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
+; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK:       for.end:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %offset.nonzero = or i32 %offset, 1
+  %uglygep = getelementptr i8, ptr %a, i64 84
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %lsr.iv1 = phi ptr [ %uglygep2, %for.body ], [ %uglygep, %entry ]
+  %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ 0, %entry ]
+  store i32 1, ptr %lsr.iv1, align 4
+  %lsr.iv.next = add nsw i32 %lsr.iv, 1
+  %uglygep2 = getelementptr i8, ptr %lsr.iv1, i32 %offset.nonzero
+  %exitcond.not = icmp eq i32 %lsr.iv.next, 1
+  br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end:                                          ; preds = %for.body
+  ret void
+}
+
+define void @small_tc_below_threshold(ptr %a, i32 %offset) {
+; CHECK-LABEL: @small_tc_below_threshold(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
+; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
+; CHECK-NEXT:    [[TMP0:%.*]] = sext i32 [[OFFSET_NONZERO]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nsw i64 [[TMP0]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = add nsw i64 [[TMP1]], 84
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET_NONZERO]]
+; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
+; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK:       for.end:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %offset.nonzero = or i32 %offset, 1
+  %uglygep = getelementptr i8, ptr %a, i64 84
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %lsr.iv1 = phi ptr [ %uglygep2, %for.body ], [ %uglygep, %entry ]
+  %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ 0, %entry ]
+  store i32 1, ptr %lsr.iv1, align 4
+  %lsr.iv.next = add nsw i32 %lsr.iv, 1
+  %uglygep2 = getelementptr i8, ptr %lsr.iv1, i32 %offset.nonzero
+  %exitcond.not = icmp eq i32 %lsr.iv.next, 2
+  br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end:                                          ; preds = %for.body
+  ret void
+}
+
+define void @small_tc_above_threshold(ptr %a, i32 %offset) {
+; CHECK-LABEL: @small_tc_above_threshold(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
+; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
+; CHECK-NEXT:    [[TMP0:%.*]] = sext i32 [[OFFSET_NONZERO]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = mul nsw i64 [[TMP0]], 10
+; CHECK-NEXT:    [[TMP2:%.*]] = add nsw i64 [[TMP1]], 84
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET_NONZERO]]
+; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
+; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK:       for.end:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %offset.nonzero = or i32 %offset, 1
+  %uglygep = getelementptr i8, ptr %a, i64 84
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %lsr.iv1 = phi ptr [ %uglygep2, %for.body ], [ %uglygep, %entry ]
+  %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ 0, %entry ]
+  store i32 1, ptr %lsr.iv1, align 4
+  %lsr.iv.next = add nsw i32 %lsr.iv, 1
+  %uglygep2 = getelementptr i8, ptr %lsr.iv1, i32 %offset.nonzero
+  %exitcond.not = icmp eq i32 %lsr.iv.next, 10
+  br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end:                                          ; preds = %for.body
+  ret void
+}

>From 02fe348847841c283c905c53552579f6404dae2d Mon Sep 17 00:00:00 2001
From: Simon Pilgrim <llvm-dev at redking.me.uk>
Date: Thu, 1 Feb 2024 16:04:58 +0000
Subject: [PATCH 24/42] [X86] X86FixupVectorConstants.cpp - refactor constant
 search loop to take array of sorted candidates

Pulled out of #79815 - refactors the internal FixupConstant logic to just accept an array of vzload/broadcast candidates that are pre-sorted in ascending constant pool size
---
 .../Target/X86/X86FixupVectorConstants.cpp    | 208 +++++++++++-------
 1 file changed, 125 insertions(+), 83 deletions(-)

diff --git a/llvm/lib/Target/X86/X86FixupVectorConstants.cpp b/llvm/lib/Target/X86/X86FixupVectorConstants.cpp
index 037a745d632fb..be3c4f0b1564c 100644
--- a/llvm/lib/Target/X86/X86FixupVectorConstants.cpp
+++ b/llvm/lib/Target/X86/X86FixupVectorConstants.cpp
@@ -216,8 +216,8 @@ static Constant *rebuildConstant(LLVMContext &Ctx, Type *SclTy,
 
 // Attempt to rebuild a normalized splat vector constant of the requested splat
 // width, built up of potentially smaller scalar values.
-static Constant *rebuildSplatableConstant(const Constant *C,
-                                          unsigned SplatBitWidth) {
+static Constant *rebuildSplatCst(const Constant *C, unsigned /*NumElts*/,
+                                 unsigned SplatBitWidth) {
   std::optional<APInt> Splat = getSplatableConstant(C, SplatBitWidth);
   if (!Splat)
     return nullptr;
@@ -238,8 +238,8 @@ static Constant *rebuildSplatableConstant(const Constant *C,
   return rebuildConstant(OriginalType->getContext(), SclTy, *Splat, NumSclBits);
 }
 
-static Constant *rebuildZeroUpperConstant(const Constant *C,
-                                          unsigned ScalarBitWidth) {
+static Constant *rebuildZeroUpperCst(const Constant *C, unsigned /*NumElts*/,
+                                     unsigned ScalarBitWidth) {
   Type *Ty = C->getType();
   Type *SclTy = Ty->getScalarType();
   unsigned NumBits = Ty->getPrimitiveSizeInBits();
@@ -265,8 +265,6 @@ static Constant *rebuildZeroUpperConstant(const Constant *C,
   return nullptr;
 }
 
-typedef std::function<Constant *(const Constant *, unsigned)> RebuildFn;
-
 bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
                                                      MachineBasicBlock &MBB,
                                                      MachineInstr &MI) {
@@ -277,43 +275,42 @@ bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
   bool HasBWI = ST->hasBWI();
   bool HasVLX = ST->hasVLX();
 
-  auto FixupConstant =
-      [&](unsigned OpBcst256, unsigned OpBcst128, unsigned OpBcst64,
-          unsigned OpBcst32, unsigned OpBcst16, unsigned OpBcst8,
-          unsigned OpUpper64, unsigned OpUpper32, unsigned OperandNo) {
-        assert(MI.getNumOperands() >= (OperandNo + X86::AddrNumOperands) &&
-               "Unexpected number of operands!");
-
-        if (auto *C = X86::getConstantFromPool(MI, OperandNo)) {
-          // Attempt to detect a suitable splat/vzload from increasing constant
-          // bitwidths.
-          // Prefer vzload vs broadcast for same bitwidth to avoid domain flips.
-          std::tuple<unsigned, unsigned, RebuildFn> FixupLoad[] = {
-              {8, OpBcst8, rebuildSplatableConstant},
-              {16, OpBcst16, rebuildSplatableConstant},
-              {32, OpUpper32, rebuildZeroUpperConstant},
-              {32, OpBcst32, rebuildSplatableConstant},
-              {64, OpUpper64, rebuildZeroUpperConstant},
-              {64, OpBcst64, rebuildSplatableConstant},
-              {128, OpBcst128, rebuildSplatableConstant},
-              {256, OpBcst256, rebuildSplatableConstant},
-          };
-          for (auto [BitWidth, Op, RebuildConstant] : FixupLoad) {
-            if (Op) {
-              // Construct a suitable constant and adjust the MI to use the new
-              // constant pool entry.
-              if (Constant *NewCst = RebuildConstant(C, BitWidth)) {
-                unsigned NewCPI =
-                    CP->getConstantPoolIndex(NewCst, Align(BitWidth / 8));
-                MI.setDesc(TII->get(Op));
-                MI.getOperand(OperandNo + X86::AddrDisp).setIndex(NewCPI);
-                return true;
-              }
-            }
+  struct FixupEntry {
+    int Op;
+    int NumCstElts;
+    int BitWidth;
+    std::function<Constant *(const Constant *, unsigned, unsigned)>
+        RebuildConstant;
+  };
+  auto FixupConstant = [&](ArrayRef<FixupEntry> Fixups, unsigned OperandNo) {
+#ifdef EXPENSIVE_CHECKS
+    assert(llvm::is_sorted(Fixups,
+                           [](const FixupEntry &A, const FixupEntry &B) {
+                             return (A.NumCstElts * A.BitWidth) <
+                                    (B.NumCstElts * B.BitWidth);
+                           }) &&
+           "Constant fixup table not sorted in ascending constant size");
+#endif
+    assert(MI.getNumOperands() >= (OperandNo + X86::AddrNumOperands) &&
+           "Unexpected number of operands!");
+    if (auto *C = X86::getConstantFromPool(MI, OperandNo)) {
+      for (const FixupEntry &Fixup : Fixups) {
+        if (Fixup.Op) {
+          // Construct a suitable constant and adjust the MI to use the new
+          // constant pool entry.
+          if (Constant *NewCst =
+                  Fixup.RebuildConstant(C, Fixup.NumCstElts, Fixup.BitWidth)) {
+            unsigned NewCPI =
+                CP->getConstantPoolIndex(NewCst, Align(Fixup.BitWidth / 8));
+            MI.setDesc(TII->get(Fixup.Op));
+            MI.getOperand(OperandNo + X86::AddrDisp).setIndex(NewCPI);
+            return true;
           }
         }
-        return false;
-      };
+      }
+    }
+    return false;
+  };
 
   // Attempt to convert full width vector loads into broadcast/vzload loads.
   switch (Opc) {
@@ -323,82 +320,125 @@ bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
   case X86::MOVUPDrm:
   case X86::MOVUPSrm:
     // TODO: SSE3 MOVDDUP Handling
-    return FixupConstant(0, 0, 0, 0, 0, 0, X86::MOVSDrm, X86::MOVSSrm, 1);
+    return FixupConstant({{X86::MOVSSrm, 1, 32, rebuildZeroUpperCst},
+                          {X86::MOVSDrm, 1, 64, rebuildZeroUpperCst}},
+                         1);
   case X86::VMOVAPDrm:
   case X86::VMOVAPSrm:
   case X86::VMOVUPDrm:
   case X86::VMOVUPSrm:
-    return FixupConstant(0, 0, X86::VMOVDDUPrm, X86::VBROADCASTSSrm, 0, 0,
-                         X86::VMOVSDrm, X86::VMOVSSrm, 1);
+    return FixupConstant({{X86::VMOVSSrm, 1, 32, rebuildZeroUpperCst},
+                          {X86::VBROADCASTSSrm, 1, 32, rebuildSplatCst},
+                          {X86::VMOVSDrm, 1, 64, rebuildZeroUpperCst},
+                          {X86::VMOVDDUPrm, 1, 64, rebuildSplatCst}},
+                         1);
   case X86::VMOVAPDYrm:
   case X86::VMOVAPSYrm:
   case X86::VMOVUPDYrm:
   case X86::VMOVUPSYrm:
-    return FixupConstant(0, X86::VBROADCASTF128rm, X86::VBROADCASTSDYrm,
-                         X86::VBROADCASTSSYrm, 0, 0, 0, 0, 1);
+    return FixupConstant({{X86::VBROADCASTSSYrm, 1, 32, rebuildSplatCst},
+                          {X86::VBROADCASTSDYrm, 1, 64, rebuildSplatCst},
+                          {X86::VBROADCASTF128rm, 1, 128, rebuildSplatCst}},
+                         1);
   case X86::VMOVAPDZ128rm:
   case X86::VMOVAPSZ128rm:
   case X86::VMOVUPDZ128rm:
   case X86::VMOVUPSZ128rm:
-    return FixupConstant(0, 0, X86::VMOVDDUPZ128rm, X86::VBROADCASTSSZ128rm, 0,
-                         0, X86::VMOVSDZrm, X86::VMOVSSZrm, 1);
+    return FixupConstant({{X86::VMOVSSZrm, 1, 32, rebuildZeroUpperCst},
+                          {X86::VBROADCASTSSZ128rm, 1, 32, rebuildSplatCst},
+                          {X86::VMOVSDZrm, 1, 64, rebuildZeroUpperCst},
+                          {X86::VMOVDDUPZ128rm, 1, 64, rebuildSplatCst}},
+                         1);
   case X86::VMOVAPDZ256rm:
   case X86::VMOVAPSZ256rm:
   case X86::VMOVUPDZ256rm:
   case X86::VMOVUPSZ256rm:
-    return FixupConstant(0, X86::VBROADCASTF32X4Z256rm, X86::VBROADCASTSDZ256rm,
-                         X86::VBROADCASTSSZ256rm, 0, 0, 0, 0, 1);
+    return FixupConstant(
+        {{X86::VBROADCASTSSZ256rm, 1, 32, rebuildSplatCst},
+         {X86::VBROADCASTSDZ256rm, 1, 64, rebuildSplatCst},
+         {X86::VBROADCASTF32X4Z256rm, 1, 128, rebuildSplatCst}},
+        1);
   case X86::VMOVAPDZrm:
   case X86::VMOVAPSZrm:
   case X86::VMOVUPDZrm:
   case X86::VMOVUPSZrm:
-    return FixupConstant(X86::VBROADCASTF64X4rm, X86::VBROADCASTF32X4rm,
-                         X86::VBROADCASTSDZrm, X86::VBROADCASTSSZrm, 0, 0, 0, 0,
+    return FixupConstant({{X86::VBROADCASTSSZrm, 1, 32, rebuildSplatCst},
+                          {X86::VBROADCASTSDZrm, 1, 64, rebuildSplatCst},
+                          {X86::VBROADCASTF32X4rm, 1, 128, rebuildSplatCst},
+                          {X86::VBROADCASTF64X4rm, 1, 256, rebuildSplatCst}},
                          1);
     /* Integer Loads */
   case X86::MOVDQArm:
-  case X86::MOVDQUrm:
-    return FixupConstant(0, 0, 0, 0, 0, 0, X86::MOVQI2PQIrm, X86::MOVDI2PDIrm,
+  case X86::MOVDQUrm: {
+    return FixupConstant({{X86::MOVDI2PDIrm, 1, 32, rebuildZeroUpperCst},
+                          {X86::MOVQI2PQIrm, 1, 64, rebuildZeroUpperCst}},
                          1);
+  }
   case X86::VMOVDQArm:
-  case X86::VMOVDQUrm:
-    return FixupConstant(0, 0, HasAVX2 ? X86::VPBROADCASTQrm : X86::VMOVDDUPrm,
-                         HasAVX2 ? X86::VPBROADCASTDrm : X86::VBROADCASTSSrm,
-                         HasAVX2 ? X86::VPBROADCASTWrm : 0,
-                         HasAVX2 ? X86::VPBROADCASTBrm : 0, X86::VMOVQI2PQIrm,
-                         X86::VMOVDI2PDIrm, 1);
+  case X86::VMOVDQUrm: {
+    FixupEntry Fixups[] = {
+        {HasAVX2 ? X86::VPBROADCASTBrm : 0, 1, 8, rebuildSplatCst},
+        {HasAVX2 ? X86::VPBROADCASTWrm : 0, 1, 16, rebuildSplatCst},
+        {X86::VMOVDI2PDIrm, 1, 32, rebuildZeroUpperCst},
+        {HasAVX2 ? X86::VPBROADCASTDrm : X86::VBROADCASTSSrm, 1, 32,
+         rebuildSplatCst},
+        {X86::VMOVQI2PQIrm, 1, 64, rebuildZeroUpperCst},
+        {HasAVX2 ? X86::VPBROADCASTQrm : X86::VMOVDDUPrm, 1, 64,
+         rebuildSplatCst},
+    };
+    return FixupConstant(Fixups, 1);
+  }
   case X86::VMOVDQAYrm:
-  case X86::VMOVDQUYrm:
-    return FixupConstant(
-        0, HasAVX2 ? X86::VBROADCASTI128rm : X86::VBROADCASTF128rm,
-        HasAVX2 ? X86::VPBROADCASTQYrm : X86::VBROADCASTSDYrm,
-        HasAVX2 ? X86::VPBROADCASTDYrm : X86::VBROADCASTSSYrm,
-        HasAVX2 ? X86::VPBROADCASTWYrm : 0, HasAVX2 ? X86::VPBROADCASTBYrm : 0,
-        0, 0, 1);
+  case X86::VMOVDQUYrm: {
+    FixupEntry Fixups[] = {
+        {HasAVX2 ? X86::VPBROADCASTBYrm : 0, 1, 8, rebuildSplatCst},
+        {HasAVX2 ? X86::VPBROADCASTWYrm : 0, 1, 16, rebuildSplatCst},
+        {HasAVX2 ? X86::VPBROADCASTDYrm : X86::VBROADCASTSSYrm, 1, 32,
+         rebuildSplatCst},
+        {HasAVX2 ? X86::VPBROADCASTQYrm : X86::VBROADCASTSDYrm, 1, 64,
+         rebuildSplatCst},
+        {HasAVX2 ? X86::VBROADCASTI128rm : X86::VBROADCASTF128rm, 1, 128,
+         rebuildSplatCst}};
+    return FixupConstant(Fixups, 1);
+  }
   case X86::VMOVDQA32Z128rm:
   case X86::VMOVDQA64Z128rm:
   case X86::VMOVDQU32Z128rm:
-  case X86::VMOVDQU64Z128rm:
-    return FixupConstant(0, 0, X86::VPBROADCASTQZ128rm, X86::VPBROADCASTDZ128rm,
-                         HasBWI ? X86::VPBROADCASTWZ128rm : 0,
-                         HasBWI ? X86::VPBROADCASTBZ128rm : 0,
-                         X86::VMOVQI2PQIZrm, X86::VMOVDI2PDIZrm, 1);
+  case X86::VMOVDQU64Z128rm: {
+    FixupEntry Fixups[] = {
+        {HasBWI ? X86::VPBROADCASTBZ128rm : 0, 1, 8, rebuildSplatCst},
+        {HasBWI ? X86::VPBROADCASTWZ128rm : 0, 1, 16, rebuildSplatCst},
+        {X86::VMOVDI2PDIZrm, 1, 32, rebuildZeroUpperCst},
+        {X86::VPBROADCASTDZ128rm, 1, 32, rebuildSplatCst},
+        {X86::VMOVQI2PQIZrm, 1, 64, rebuildZeroUpperCst},
+        {X86::VPBROADCASTQZ128rm, 1, 64, rebuildSplatCst}};
+    return FixupConstant(Fixups, 1);
+  }
   case X86::VMOVDQA32Z256rm:
   case X86::VMOVDQA64Z256rm:
   case X86::VMOVDQU32Z256rm:
-  case X86::VMOVDQU64Z256rm:
-    return FixupConstant(0, X86::VBROADCASTI32X4Z256rm, X86::VPBROADCASTQZ256rm,
-                         X86::VPBROADCASTDZ256rm,
-                         HasBWI ? X86::VPBROADCASTWZ256rm : 0,
-                         HasBWI ? X86::VPBROADCASTBZ256rm : 0, 0, 0, 1);
+  case X86::VMOVDQU64Z256rm: {
+    FixupEntry Fixups[] = {
+        {HasBWI ? X86::VPBROADCASTBZ256rm : 0, 1, 8, rebuildSplatCst},
+        {HasBWI ? X86::VPBROADCASTWZ256rm : 0, 1, 16, rebuildSplatCst},
+        {X86::VPBROADCASTDZ256rm, 1, 32, rebuildSplatCst},
+        {X86::VPBROADCASTQZ256rm, 1, 64, rebuildSplatCst},
+        {X86::VBROADCASTI32X4Z256rm, 1, 128, rebuildSplatCst}};
+    return FixupConstant(Fixups, 1);
+  }
   case X86::VMOVDQA32Zrm:
   case X86::VMOVDQA64Zrm:
   case X86::VMOVDQU32Zrm:
-  case X86::VMOVDQU64Zrm:
-    return FixupConstant(X86::VBROADCASTI64X4rm, X86::VBROADCASTI32X4rm,
-                         X86::VPBROADCASTQZrm, X86::VPBROADCASTDZrm,
-                         HasBWI ? X86::VPBROADCASTWZrm : 0,
-                         HasBWI ? X86::VPBROADCASTBZrm : 0, 0, 0, 1);
+  case X86::VMOVDQU64Zrm: {
+    FixupEntry Fixups[] = {
+        {HasBWI ? X86::VPBROADCASTBZrm : 0, 1, 8, rebuildSplatCst},
+        {HasBWI ? X86::VPBROADCASTWZrm : 0, 1, 16, rebuildSplatCst},
+        {X86::VPBROADCASTDZrm, 1, 32, rebuildSplatCst},
+        {X86::VPBROADCASTQZrm, 1, 64, rebuildSplatCst},
+        {X86::VBROADCASTI32X4rm, 1, 128, rebuildSplatCst},
+        {X86::VBROADCASTI64X4rm, 1, 256, rebuildSplatCst}};
+    return FixupConstant(Fixups, 1);
+  }
   }
 
   auto ConvertToBroadcastAVX512 = [&](unsigned OpSrc32, unsigned OpSrc64) {
@@ -423,7 +463,9 @@ bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
 
     if (OpBcst32 || OpBcst64) {
       unsigned OpNo = OpBcst32 == 0 ? OpNoBcst64 : OpNoBcst32;
-      return FixupConstant(0, 0, OpBcst64, OpBcst32, 0, 0, 0, 0, OpNo);
+      FixupEntry Fixups[] = {{(int)OpBcst32, 32, 32, rebuildSplatCst},
+                             {(int)OpBcst64, 64, 64, rebuildSplatCst}};
+      return FixupConstant(Fixups, OpNo);
     }
     return false;
   };

>From c6af36d8391aba88108c7cefc327aef278060388 Mon Sep 17 00:00:00 2001
From: Ilya Biryukov <ibiryukov at google.com>
Date: Thu, 1 Feb 2024 17:09:32 +0100
Subject: [PATCH 25/42] [Sema] Fix crash in __datasizeof with unknown types
 (#80300)

Fixes #80284.

Calling `getASTRecordLayout` on invalid types may crash and results of
`__datasizeof` on invalid types can be arbitrary, so just use whatever
`sizeof` returns.
---
 clang/lib/AST/ASTContext.cpp      | 3 ++-
 clang/test/SemaCXX/datasizeof.cpp | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 71c9c0003d18c..78a04b4c69426 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -1749,7 +1749,8 @@ TypeInfoChars ASTContext::getTypeInfoDataSizeInChars(QualType T) const {
   // of a base-class subobject.  We decide whether that's possible
   // during class layout, so here we can just trust the layout results.
   if (getLangOpts().CPlusPlus) {
-    if (const auto *RT = T->getAs<RecordType>()) {
+    if (const auto *RT = T->getAs<RecordType>();
+        RT && !RT->getDecl()->isInvalidDecl()) {
       const ASTRecordLayout &layout = getASTRecordLayout(RT->getDecl());
       Info.Width = layout.getDataSize();
     }
diff --git a/clang/test/SemaCXX/datasizeof.cpp b/clang/test/SemaCXX/datasizeof.cpp
index f96660d2028d0..5baf2ecb24ed7 100644
--- a/clang/test/SemaCXX/datasizeof.cpp
+++ b/clang/test/SemaCXX/datasizeof.cpp
@@ -51,3 +51,11 @@ struct S {
 };
 
 static_assert(S{}.i == 9);
+
+namespace GH80284 {
+struct Bar; // expected-note{{forward declaration}}
+struct Foo {
+  Bar x; // expected-error{{field has incomplete type}}
+};
+constexpr int a = __datasizeof(Foo);
+}

>From c8c44cc485b611100435b5a6d7d53e2ad72cb65c Mon Sep 17 00:00:00 2001
From: Krystian Stasiowski <sdkrystian at gmail.com>
Date: Thu, 1 Feb 2024 11:19:04 -0500
Subject: [PATCH 26/42] [Clang][Parse] Diagnose member template declarations
 with multiple declarators (#78243)

According to [temp.pre] p5:
> In a template-declaration, explicit specialization, or explicit instantiation the init-declarator-list in the declaration shall contain at most one declarator.

A member-declaration that is a template-declaration or explicit-specialization contains a declaration, even though it declares a member. This means it _will_ contain an init-declarator-list (not a member-declarator-list), so [temp.pre] p5 applies.

This diagnoses declarations such as:
```
struct A
{
    template<typename T>
    static const int x = 0, f(); // error: a template declaration can only declare a single entity

    template<typename T>
    static const int g(), y = 0; // error: a template declaration can only declare a single entity
};
```
The diagnostic messages are the same as those of the equivalent namespace scope declarations.

Note: since we currently do not diagnose declarations with multiple abbreviated function template declarators at namespace scope e.g., `void f(auto), g(auto);`, so this patch does not add diagnostics for the equivalent member declarations.

This patch also refactors `ParseSingleDeclarationAfterTemplate` (now named `ParseDeclarationAfterTemplate`) to call `ParseDeclGroup` and return the resultant `DeclGroup`.
---
 clang/docs/ReleaseNotes.rst                 |   2 +
 clang/include/clang/Parse/Parser.h          |  32 +--
 clang/lib/Parse/ParseDecl.cpp               |  97 +++++++--
 clang/lib/Parse/ParseDeclCXX.cpp            |  35 +++-
 clang/lib/Parse/ParseTemplate.cpp           | 206 ++++----------------
 clang/lib/Parse/Parser.cpp                  |   9 +-
 clang/test/CXX/temp/p3.cpp                  |   6 +
 clang/test/OpenMP/declare_simd_messages.cpp |   3 +-
 8 files changed, 185 insertions(+), 205 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index ec9e3ef07057f..efd925e990f43 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -144,6 +144,8 @@ Improvements to Clang's diagnostics
 - Clang now applies syntax highlighting to the code snippets it
   prints.
 
+- Clang now diagnoses member template declarations with multiple declarators.
+
 Improvements to Clang's time-trace
 ----------------------------------
 
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index 4a066acf511a1..da18cf88edcc9 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -2423,6 +2423,7 @@ class Parser : public CodeCompletionHandler {
   bool MightBeDeclarator(DeclaratorContext Context);
   DeclGroupPtrTy ParseDeclGroup(ParsingDeclSpec &DS, DeclaratorContext Context,
                                 ParsedAttributes &Attrs,
+                                ParsedTemplateInfo &TemplateInfo,
                                 SourceLocation *DeclEnd = nullptr,
                                 ForRangeInit *FRI = nullptr);
   Decl *ParseDeclarationAfterDeclarator(Declarator &D,
@@ -3615,16 +3616,15 @@ class Parser : public CodeCompletionHandler {
   // C++ 14: Templates [temp]
 
   // C++ 14.1: Template Parameters [temp.param]
-  Decl *ParseDeclarationStartingWithTemplate(DeclaratorContext Context,
-                                             SourceLocation &DeclEnd,
-                                             ParsedAttributes &AccessAttrs,
-                                             AccessSpecifier AS = AS_none);
-  Decl *ParseTemplateDeclarationOrSpecialization(DeclaratorContext Context,
-                                                 SourceLocation &DeclEnd,
-                                                 ParsedAttributes &AccessAttrs,
-                                                 AccessSpecifier AS);
-  Decl *ParseSingleDeclarationAfterTemplate(
-      DeclaratorContext Context, const ParsedTemplateInfo &TemplateInfo,
+  DeclGroupPtrTy
+  ParseDeclarationStartingWithTemplate(DeclaratorContext Context,
+                                       SourceLocation &DeclEnd,
+                                       ParsedAttributes &AccessAttrs);
+  DeclGroupPtrTy ParseTemplateDeclarationOrSpecialization(
+      DeclaratorContext Context, SourceLocation &DeclEnd,
+      ParsedAttributes &AccessAttrs, AccessSpecifier AS);
+  DeclGroupPtrTy ParseDeclarationAfterTemplate(
+      DeclaratorContext Context, ParsedTemplateInfo &TemplateInfo,
       ParsingDeclRAIIObject &DiagsFromParams, SourceLocation &DeclEnd,
       ParsedAttributes &AccessAttrs, AccessSpecifier AS = AS_none);
   bool ParseTemplateParameters(MultiParseScope &TemplateScopes, unsigned Depth,
@@ -3673,12 +3673,12 @@ class Parser : public CodeCompletionHandler {
                                  TemplateTy Template, SourceLocation OpenLoc);
   ParsedTemplateArgument ParseTemplateTemplateArgument();
   ParsedTemplateArgument ParseTemplateArgument();
-  Decl *ParseExplicitInstantiation(DeclaratorContext Context,
-                                   SourceLocation ExternLoc,
-                                   SourceLocation TemplateLoc,
-                                   SourceLocation &DeclEnd,
-                                   ParsedAttributes &AccessAttrs,
-                                   AccessSpecifier AS = AS_none);
+  DeclGroupPtrTy ParseExplicitInstantiation(DeclaratorContext Context,
+                                            SourceLocation ExternLoc,
+                                            SourceLocation TemplateLoc,
+                                            SourceLocation &DeclEnd,
+                                            ParsedAttributes &AccessAttrs,
+                                            AccessSpecifier AS = AS_none);
   // C++2a: Template, concept definition [temp]
   Decl *
   ParseConceptDefinition(const ParsedTemplateInfo &TemplateInfo,
diff --git a/clang/lib/Parse/ParseDecl.cpp b/clang/lib/Parse/ParseDecl.cpp
index bcbe2d9c635a6..a186253954f68 100644
--- a/clang/lib/Parse/ParseDecl.cpp
+++ b/clang/lib/Parse/ParseDecl.cpp
@@ -1916,9 +1916,7 @@ Parser::DeclGroupPtrTy Parser::ParseDeclaration(DeclaratorContext Context,
   case tok::kw_export:
     ProhibitAttributes(DeclAttrs);
     ProhibitAttributes(DeclSpecAttrs);
-    SingleDecl =
-        ParseDeclarationStartingWithTemplate(Context, DeclEnd, DeclAttrs);
-    break;
+    return ParseDeclarationStartingWithTemplate(Context, DeclEnd, DeclAttrs);
   case tok::kw_inline:
     // Could be the start of an inline namespace. Allowed as an ext in C++03.
     if (getLangOpts().CPlusPlus && NextToken().is(tok::kw_namespace)) {
@@ -1994,8 +1992,9 @@ Parser::DeclGroupPtrTy Parser::ParseSimpleDeclaration(
   ParsingDeclSpec DS(*this);
   DS.takeAttributesFrom(DeclSpecAttrs);
 
+  ParsedTemplateInfo TemplateInfo;
   DeclSpecContext DSContext = getDeclSpecContextFromDeclaratorContext(Context);
-  ParseDeclarationSpecifiers(DS, ParsedTemplateInfo(), AS_none, DSContext);
+  ParseDeclarationSpecifiers(DS, TemplateInfo, AS_none, DSContext);
 
   // If we had a free-standing type definition with a missing semicolon, we
   // may get this far before the problem becomes obvious.
@@ -2027,7 +2026,7 @@ Parser::DeclGroupPtrTy Parser::ParseSimpleDeclaration(
   if (DeclSpecStart)
     DS.SetRangeStart(*DeclSpecStart);
 
-  return ParseDeclGroup(DS, Context, DeclAttrs, &DeclEnd, FRI);
+  return ParseDeclGroup(DS, Context, DeclAttrs, TemplateInfo, &DeclEnd, FRI);
 }
 
 /// Returns true if this might be the start of a declarator, or a common typo
@@ -2184,6 +2183,7 @@ void Parser::SkipMalformedDecl() {
 Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
                                               DeclaratorContext Context,
                                               ParsedAttributes &Attrs,
+                                              ParsedTemplateInfo &TemplateInfo,
                                               SourceLocation *DeclEnd,
                                               ForRangeInit *FRI) {
   // Parse the first declarator.
@@ -2193,8 +2193,19 @@ Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
   ParsedAttributes LocalAttrs(AttrFactory);
   LocalAttrs.takeAllFrom(Attrs);
   ParsingDeclarator D(*this, DS, LocalAttrs, Context);
+  if (TemplateInfo.TemplateParams)
+    D.setTemplateParameterLists(*TemplateInfo.TemplateParams);
+
+  bool IsTemplateSpecOrInst =
+      (TemplateInfo.Kind == ParsedTemplateInfo::ExplicitInstantiation ||
+       TemplateInfo.Kind == ParsedTemplateInfo::ExplicitSpecialization);
+  SuppressAccessChecks SAC(*this, IsTemplateSpecOrInst);
+
   ParseDeclarator(D);
 
+  if (IsTemplateSpecOrInst)
+    SAC.done();
+
   // Bail out if the first declarator didn't seem well-formed.
   if (!D.hasName() && !D.mayOmitIdentifier()) {
     SkipMalformedDecl();
@@ -2262,15 +2273,54 @@ Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
       // need to handle the file scope definition case.
       if (Context == DeclaratorContext::File) {
         if (isStartOfFunctionDefinition(D)) {
+          // C++23 [dcl.typedef] p1:
+          //   The typedef specifier shall not be [...], and it shall not be
+          //   used in the decl-specifier-seq of a parameter-declaration nor in
+          //   the decl-specifier-seq of a function-definition.
           if (DS.getStorageClassSpec() == DeclSpec::SCS_typedef) {
-            Diag(Tok, diag::err_function_declared_typedef);
-
-            // Recover by treating the 'typedef' as spurious.
+            // If the user intended to write 'typename', we should have already
+            // suggested adding it elsewhere. In any case, recover by ignoring
+            // 'typedef' and suggest removing it.
+            Diag(DS.getStorageClassSpecLoc(),
+                 diag::err_function_declared_typedef)
+                << FixItHint::CreateRemoval(DS.getStorageClassSpecLoc());
             DS.ClearStorageClassSpecs();
           }
+          Decl *TheDecl = nullptr;
+
+          if (TemplateInfo.Kind == ParsedTemplateInfo::ExplicitInstantiation) {
+            if (D.getName().getKind() != UnqualifiedIdKind::IK_TemplateId) {
+              // If the declarator-id is not a template-id, issue a diagnostic
+              // and recover by ignoring the 'template' keyword.
+              Diag(Tok, diag::err_template_defn_explicit_instantiation) << 0;
+              TheDecl = ParseFunctionDefinition(D, ParsedTemplateInfo(),
+                                                &LateParsedAttrs);
+            } else {
+              SourceLocation LAngleLoc =
+                  PP.getLocForEndOfToken(TemplateInfo.TemplateLoc);
+              Diag(D.getIdentifierLoc(),
+                   diag::err_explicit_instantiation_with_definition)
+                  << SourceRange(TemplateInfo.TemplateLoc)
+                  << FixItHint::CreateInsertion(LAngleLoc, "<>");
+
+              // Recover as if it were an explicit specialization.
+              TemplateParameterLists FakedParamLists;
+              FakedParamLists.push_back(Actions.ActOnTemplateParameterList(
+                  0, SourceLocation(), TemplateInfo.TemplateLoc, LAngleLoc,
+                  std::nullopt, LAngleLoc, nullptr));
+
+              TheDecl = ParseFunctionDefinition(
+                  D,
+                  ParsedTemplateInfo(&FakedParamLists,
+                                     /*isSpecialization=*/true,
+                                     /*lastParameterListWasEmpty=*/true),
+                  &LateParsedAttrs);
+            }
+          } else {
+            TheDecl =
+                ParseFunctionDefinition(D, TemplateInfo, &LateParsedAttrs);
+          }
 
-          Decl *TheDecl = ParseFunctionDefinition(D, ParsedTemplateInfo(),
-                                                  &LateParsedAttrs);
           return Actions.ConvertDeclToDeclGroup(TheDecl);
         }
 
@@ -2360,8 +2410,8 @@ Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
   }
 
   SmallVector<Decl *, 8> DeclsInGroup;
-  Decl *FirstDecl = ParseDeclarationAfterDeclaratorAndAttributes(
-      D, ParsedTemplateInfo(), FRI);
+  Decl *FirstDecl =
+      ParseDeclarationAfterDeclaratorAndAttributes(D, TemplateInfo, FRI);
   if (LateParsedAttrs.size() > 0)
     ParseLexedAttributeList(LateParsedAttrs, FirstDecl, true, false);
   D.complete(FirstDecl);
@@ -2384,6 +2434,16 @@ Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
       break;
     }
 
+    // C++23 [temp.pre]p5:
+    //   In a template-declaration, explicit specialization, or explicit
+    //   instantiation the init-declarator-list in the declaration shall
+    //   contain at most one declarator.
+    if (TemplateInfo.Kind != ParsedTemplateInfo::NonTemplate &&
+        D.isFirstDeclarator()) {
+      Diag(CommaLoc, diag::err_multiple_template_declarators)
+          << TemplateInfo.Kind;
+    }
+
     // Parse the next declarator.
     D.clear();
     D.setCommaLoc(CommaLoc);
@@ -2413,7 +2473,7 @@ Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
       //        declarator requires-clause
       if (Tok.is(tok::kw_requires))
         ParseTrailingRequiresClause(D);
-      Decl *ThisDecl = ParseDeclarationAfterDeclarator(D);
+      Decl *ThisDecl = ParseDeclarationAfterDeclarator(D, TemplateInfo);
       D.complete(ThisDecl);
       if (ThisDecl)
         DeclsInGroup.push_back(ThisDecl);
@@ -6526,6 +6586,17 @@ void Parser::ParseDirectDeclarator(Declarator &D) {
           /*ObjectHasErrors=*/false, EnteringContext);
     }
 
+    // C++23 [basic.scope.namespace]p1:
+    //   For each non-friend redeclaration or specialization whose target scope
+    //   is or is contained by the scope, the portion after the declarator-id,
+    //   class-head-name, or enum-head-name is also included in the scope.
+    // C++23 [basic.scope.class]p1:
+    //   For each non-friend redeclaration or specialization whose target scope
+    //   is or is contained by the scope, the portion after the declarator-id,
+    //   class-head-name, or enum-head-name is also included in the scope.
+    //
+    // FIXME: We should not be doing this for friend declarations; they have
+    // their own special lookup semantics specified by [basic.lookup.unqual]p6.
     if (D.getCXXScopeSpec().isValid()) {
       if (Actions.ShouldEnterDeclaratorScope(getCurScope(),
                                              D.getCXXScopeSpec()))
diff --git a/clang/lib/Parse/ParseDeclCXX.cpp b/clang/lib/Parse/ParseDeclCXX.cpp
index 06ccc1e3d04e9..cdbfbb1bc9fff 100644
--- a/clang/lib/Parse/ParseDeclCXX.cpp
+++ b/clang/lib/Parse/ParseDeclCXX.cpp
@@ -2855,7 +2855,7 @@ Parser::ParseCXXClassMemberDeclaration(AccessSpecifier AS,
   }
 
   // static_assert-declaration. A templated static_assert declaration is
-  // diagnosed in Parser::ParseSingleDeclarationAfterTemplate.
+  // diagnosed in Parser::ParseDeclarationAfterTemplate.
   if (!TemplateInfo.Kind &&
       Tok.isOneOf(tok::kw_static_assert, tok::kw__Static_assert)) {
     SourceLocation DeclEnd;
@@ -2868,9 +2868,8 @@ Parser::ParseCXXClassMemberDeclaration(AccessSpecifier AS,
            "Nested template improperly parsed?");
     ObjCDeclContextSwitch ObjCDC(*this);
     SourceLocation DeclEnd;
-    return DeclGroupPtrTy::make(
-        DeclGroupRef(ParseTemplateDeclarationOrSpecialization(
-            DeclaratorContext::Member, DeclEnd, AccessAttrs, AS)));
+    return ParseTemplateDeclarationOrSpecialization(DeclaratorContext::Member,
+                                                    DeclEnd, AccessAttrs, AS);
   }
 
   // Handle:  member-declaration ::= '__extension__' member-declaration
@@ -3279,6 +3278,16 @@ Parser::ParseCXXClassMemberDeclaration(AccessSpecifier AS,
       break;
     }
 
+    // C++23 [temp.pre]p5:
+    //   In a template-declaration, explicit specialization, or explicit
+    //   instantiation the init-declarator-list in the declaration shall
+    //   contain at most one declarator.
+    if (TemplateInfo.Kind != ParsedTemplateInfo::NonTemplate &&
+        DeclaratorInfo.isFirstDeclarator()) {
+      Diag(CommaLoc, diag::err_multiple_template_declarators)
+          << TemplateInfo.Kind;
+    }
+
     // Parse the next declarator.
     DeclaratorInfo.clear();
     VS.clear();
@@ -4228,6 +4237,24 @@ void Parser::ParseTrailingRequiresClause(Declarator &D) {
 
   SourceLocation RequiresKWLoc = ConsumeToken();
 
+  // C++23 [basic.scope.namespace]p1:
+  //   For each non-friend redeclaration or specialization whose target scope
+  //   is or is contained by the scope, the portion after the declarator-id,
+  //   class-head-name, or enum-head-name is also included in the scope.
+  // C++23 [basic.scope.class]p1:
+  //   For each non-friend redeclaration or specialization whose target scope
+  //   is or is contained by the scope, the portion after the declarator-id,
+  //   class-head-name, or enum-head-name is also included in the scope.
+  //
+  // FIXME: We should really be calling ParseTrailingRequiresClause in
+  // ParseDirectDeclarator, when we are already in the declarator scope.
+  // This would also correctly suppress access checks for specializations
+  // and explicit instantiations, which we currently do not do.
+  CXXScopeSpec &SS = D.getCXXScopeSpec();
+  DeclaratorScopeObj DeclScopeObj(*this, SS);
+  if (SS.isValid() && Actions.ShouldEnterDeclaratorScope(getCurScope(), SS))
+    DeclScopeObj.EnterDeclaratorScope();
+
   ExprResult TrailingRequiresClause;
   ParseScope ParamScope(this, Scope::DeclScope |
                                   Scope::FunctionDeclarationScope |
diff --git a/clang/lib/Parse/ParseTemplate.cpp b/clang/lib/Parse/ParseTemplate.cpp
index 64fe4d50bba27..d4897f8f66072 100644
--- a/clang/lib/Parse/ParseTemplate.cpp
+++ b/clang/lib/Parse/ParseTemplate.cpp
@@ -36,17 +36,19 @@ unsigned Parser::ReenterTemplateScopes(MultiParseScope &S, Decl *D) {
 
 /// Parse a template declaration, explicit instantiation, or
 /// explicit specialization.
-Decl *Parser::ParseDeclarationStartingWithTemplate(
-    DeclaratorContext Context, SourceLocation &DeclEnd,
-    ParsedAttributes &AccessAttrs, AccessSpecifier AS) {
+Parser::DeclGroupPtrTy
+Parser::ParseDeclarationStartingWithTemplate(DeclaratorContext Context,
+                                             SourceLocation &DeclEnd,
+                                             ParsedAttributes &AccessAttrs) {
   ObjCDeclContextSwitch ObjCDC(*this);
 
   if (Tok.is(tok::kw_template) && NextToken().isNot(tok::less)) {
     return ParseExplicitInstantiation(Context, SourceLocation(), ConsumeToken(),
-                                      DeclEnd, AccessAttrs, AS);
+                                      DeclEnd, AccessAttrs,
+                                      AccessSpecifier::AS_none);
   }
   return ParseTemplateDeclarationOrSpecialization(Context, DeclEnd, AccessAttrs,
-                                                  AS);
+                                                  AccessSpecifier::AS_none);
 }
 
 /// Parse a template declaration or an explicit specialization.
@@ -73,7 +75,7 @@ Decl *Parser::ParseDeclarationStartingWithTemplate(
 ///
 ///       explicit-specialization: [ C++ temp.expl.spec]
 ///         'template' '<' '>' declaration
-Decl *Parser::ParseTemplateDeclarationOrSpecialization(
+Parser::DeclGroupPtrTy Parser::ParseTemplateDeclarationOrSpecialization(
     DeclaratorContext Context, SourceLocation &DeclEnd,
     ParsedAttributes &AccessAttrs, AccessSpecifier AS) {
   assert(Tok.isOneOf(tok::kw_export, tok::kw_template) &&
@@ -161,17 +163,16 @@ Decl *Parser::ParseTemplateDeclarationOrSpecialization(
         TemplateParams, RAngleLoc, OptionalRequiresClauseConstraintER.get()));
   } while (Tok.isOneOf(tok::kw_export, tok::kw_template));
 
+  ParsedTemplateInfo TemplateInfo(&ParamLists, isSpecialization,
+                                  LastParamListWasEmpty);
+
   // Parse the actual template declaration.
   if (Tok.is(tok::kw_concept))
-    return ParseConceptDefinition(
-        ParsedTemplateInfo(&ParamLists, isSpecialization,
-                           LastParamListWasEmpty),
-        DeclEnd);
-
-  return ParseSingleDeclarationAfterTemplate(
-      Context,
-      ParsedTemplateInfo(&ParamLists, isSpecialization, LastParamListWasEmpty),
-      ParsingTemplateParams, DeclEnd, AccessAttrs, AS);
+    return Actions.ConvertDeclToDeclGroup(
+        ParseConceptDefinition(TemplateInfo, DeclEnd));
+
+  return ParseDeclarationAfterTemplate(
+      Context, TemplateInfo, ParsingTemplateParams, DeclEnd, AccessAttrs, AS);
 }
 
 /// Parse a single declaration that declares a template,
@@ -184,8 +185,8 @@ Decl *Parser::ParseTemplateDeclarationOrSpecialization(
 /// declaration. Will be AS_none for namespace-scope declarations.
 ///
 /// \returns the new declaration.
-Decl *Parser::ParseSingleDeclarationAfterTemplate(
-    DeclaratorContext Context, const ParsedTemplateInfo &TemplateInfo,
+Parser::DeclGroupPtrTy Parser::ParseDeclarationAfterTemplate(
+    DeclaratorContext Context, ParsedTemplateInfo &TemplateInfo,
     ParsingDeclRAIIObject &DiagsFromTParams, SourceLocation &DeclEnd,
     ParsedAttributes &AccessAttrs, AccessSpecifier AS) {
   assert(TemplateInfo.Kind != ParsedTemplateInfo::NonTemplate &&
@@ -196,37 +197,29 @@ Decl *Parser::ParseSingleDeclarationAfterTemplate(
     Diag(Tok.getLocation(), diag::err_templated_invalid_declaration)
       << TemplateInfo.getSourceRange();
     // Parse the static_assert declaration to improve error recovery.
-    return ParseStaticAssertDeclaration(DeclEnd);
+    return Actions.ConvertDeclToDeclGroup(
+        ParseStaticAssertDeclaration(DeclEnd));
   }
 
-  if (Context == DeclaratorContext::Member) {
-    // We are parsing a member template.
-    DeclGroupPtrTy D = ParseCXXClassMemberDeclaration(
-        AS, AccessAttrs, TemplateInfo, &DiagsFromTParams);
+  // We are parsing a member template.
+  if (Context == DeclaratorContext::Member)
+    return ParseCXXClassMemberDeclaration(AS, AccessAttrs, TemplateInfo,
+                                          &DiagsFromTParams);
 
-    if (!D || !D.get().isSingleDecl())
-      return nullptr;
-    return D.get().getSingleDecl();
-  }
-
-  ParsedAttributes prefixAttrs(AttrFactory);
+  ParsedAttributes DeclAttrs(AttrFactory);
   ParsedAttributes DeclSpecAttrs(AttrFactory);
 
   // GNU attributes are applied to the declaration specification while the
   // standard attributes are applied to the declaration.  We parse the two
   // attribute sets into different containters so we can apply them during
   // the regular parsing process.
-  while (MaybeParseCXX11Attributes(prefixAttrs) ||
+  while (MaybeParseCXX11Attributes(DeclAttrs) ||
          MaybeParseGNUAttributes(DeclSpecAttrs))
     ;
 
-  if (Tok.is(tok::kw_using)) {
-    auto usingDeclPtr = ParseUsingDirectiveOrDeclaration(Context, TemplateInfo, DeclEnd,
-                                                         prefixAttrs);
-    if (!usingDeclPtr || !usingDeclPtr.get().isSingleDecl())
-      return nullptr;
-    return usingDeclPtr.get().getSingleDecl();
-  }
+  if (Tok.is(tok::kw_using))
+    return ParseUsingDirectiveOrDeclaration(Context, TemplateInfo, DeclEnd,
+                                            DeclAttrs);
 
   // Parse the declaration specifiers, stealing any diagnostics from
   // the template parameters.
@@ -239,7 +232,7 @@ Decl *Parser::ParseSingleDeclarationAfterTemplate(
                              getDeclSpecContextFromDeclaratorContext(Context));
 
   if (Tok.is(tok::semi)) {
-    ProhibitAttributes(prefixAttrs);
+    ProhibitAttributes(DeclAttrs);
     DeclEnd = ConsumeToken();
     RecordDecl *AnonRecord = nullptr;
     Decl *Decl = Actions.ParsedFreeStandingDeclSpec(
@@ -252,7 +245,7 @@ Decl *Parser::ParseSingleDeclarationAfterTemplate(
     assert(!AnonRecord &&
            "Anonymous unions/structs should not be valid with template");
     DS.complete(Decl);
-    return Decl;
+    return Actions.ConvertDeclToDeclGroup(Decl);
   }
 
   if (DS.hasTagDefinition())
@@ -260,125 +253,9 @@ Decl *Parser::ParseSingleDeclarationAfterTemplate(
 
   // Move the attributes from the prefix into the DS.
   if (TemplateInfo.Kind == ParsedTemplateInfo::ExplicitInstantiation)
-    ProhibitAttributes(prefixAttrs);
-
-  // Parse the declarator.
-  ParsingDeclarator DeclaratorInfo(*this, DS, prefixAttrs,
-                                   (DeclaratorContext)Context);
-  if (TemplateInfo.TemplateParams)
-    DeclaratorInfo.setTemplateParameterLists(*TemplateInfo.TemplateParams);
-
-  // Turn off usual access checking for template specializations and
-  // instantiations.
-  // C++20 [temp.spec] 13.9/6.
-  // This disables the access checking rules for function template explicit
-  // instantiation and explicit specialization:
-  // - parameter-list;
-  // - template-argument-list;
-  // - noexcept-specifier;
-  // - dynamic-exception-specifications (deprecated in C++11, removed since
-  //   C++17).
-  bool IsTemplateSpecOrInst =
-      (TemplateInfo.Kind == ParsedTemplateInfo::ExplicitInstantiation ||
-       TemplateInfo.Kind == ParsedTemplateInfo::ExplicitSpecialization);
-  SuppressAccessChecks SAC(*this, IsTemplateSpecOrInst);
-
-  ParseDeclarator(DeclaratorInfo);
-
-  if (IsTemplateSpecOrInst)
-    SAC.done();
-
-  // Error parsing the declarator?
-  if (!DeclaratorInfo.hasName()) {
-    SkipMalformedDecl();
-    return nullptr;
-  }
-
-  LateParsedAttrList LateParsedAttrs(true);
-  if (DeclaratorInfo.isFunctionDeclarator()) {
-    if (Tok.is(tok::kw_requires)) {
-      CXXScopeSpec &ScopeSpec = DeclaratorInfo.getCXXScopeSpec();
-      DeclaratorScopeObj DeclScopeObj(*this, ScopeSpec);
-      if (ScopeSpec.isValid() &&
-          Actions.ShouldEnterDeclaratorScope(getCurScope(), ScopeSpec))
-        DeclScopeObj.EnterDeclaratorScope();
-      ParseTrailingRequiresClause(DeclaratorInfo);
-    }
-
-    MaybeParseGNUAttributes(DeclaratorInfo, &LateParsedAttrs);
-  }
-
-  if (DeclaratorInfo.isFunctionDeclarator() &&
-      isStartOfFunctionDefinition(DeclaratorInfo)) {
+    ProhibitAttributes(DeclAttrs);
 
-    // Function definitions are only allowed at file scope and in C++ classes.
-    // The C++ inline method definition case is handled elsewhere, so we only
-    // need to handle the file scope definition case.
-    if (Context != DeclaratorContext::File) {
-      Diag(Tok, diag::err_function_definition_not_allowed);
-      SkipMalformedDecl();
-      return nullptr;
-    }
-
-    if (DS.getStorageClassSpec() == DeclSpec::SCS_typedef) {
-      // Recover by ignoring the 'typedef'. This was probably supposed to be
-      // the 'typename' keyword, which we should have already suggested adding
-      // if it's appropriate.
-      Diag(DS.getStorageClassSpecLoc(), diag::err_function_declared_typedef)
-        << FixItHint::CreateRemoval(DS.getStorageClassSpecLoc());
-      DS.ClearStorageClassSpecs();
-    }
-
-    if (TemplateInfo.Kind == ParsedTemplateInfo::ExplicitInstantiation) {
-      if (DeclaratorInfo.getName().getKind() !=
-          UnqualifiedIdKind::IK_TemplateId) {
-        // If the declarator-id is not a template-id, issue a diagnostic and
-        // recover by ignoring the 'template' keyword.
-        Diag(Tok, diag::err_template_defn_explicit_instantiation) << 0;
-        return ParseFunctionDefinition(DeclaratorInfo, ParsedTemplateInfo(),
-                                       &LateParsedAttrs);
-      } else {
-        SourceLocation LAngleLoc
-          = PP.getLocForEndOfToken(TemplateInfo.TemplateLoc);
-        Diag(DeclaratorInfo.getIdentifierLoc(),
-             diag::err_explicit_instantiation_with_definition)
-            << SourceRange(TemplateInfo.TemplateLoc)
-            << FixItHint::CreateInsertion(LAngleLoc, "<>");
-
-        // Recover as if it were an explicit specialization.
-        TemplateParameterLists FakedParamLists;
-        FakedParamLists.push_back(Actions.ActOnTemplateParameterList(
-            0, SourceLocation(), TemplateInfo.TemplateLoc, LAngleLoc,
-            std::nullopt, LAngleLoc, nullptr));
-
-        return ParseFunctionDefinition(
-            DeclaratorInfo, ParsedTemplateInfo(&FakedParamLists,
-                                               /*isSpecialization=*/true,
-                                               /*lastParameterListWasEmpty=*/true),
-            &LateParsedAttrs);
-      }
-    }
-    return ParseFunctionDefinition(DeclaratorInfo, TemplateInfo,
-                                   &LateParsedAttrs);
-  }
-
-  // Parse this declaration.
-  Decl *ThisDecl = ParseDeclarationAfterDeclarator(DeclaratorInfo,
-                                                   TemplateInfo);
-
-  if (Tok.is(tok::comma)) {
-    Diag(Tok, diag::err_multiple_template_declarators)
-      << (int)TemplateInfo.Kind;
-    SkipUntil(tok::semi);
-    return ThisDecl;
-  }
-
-  // Eat the semi colon after the declaration.
-  ExpectAndConsumeSemi(diag::err_expected_semi_declaration);
-  if (LateParsedAttrs.size() > 0)
-    ParseLexedAttributeList(LateParsedAttrs, ThisDecl, true, false);
-  DeclaratorInfo.complete(ThisDecl);
-  return ThisDecl;
+  return ParseDeclGroup(DS, Context, DeclAttrs, TemplateInfo, &DeclEnd);
 }
 
 /// \brief Parse a single declaration that declares a concept.
@@ -1686,19 +1563,16 @@ bool Parser::ParseTemplateArgumentList(TemplateArgList &TemplateArgs,
 ///         'extern' [opt] 'template' declaration
 ///
 /// Note that the 'extern' is a GNU extension and C++11 feature.
-Decl *Parser::ParseExplicitInstantiation(DeclaratorContext Context,
-                                         SourceLocation ExternLoc,
-                                         SourceLocation TemplateLoc,
-                                         SourceLocation &DeclEnd,
-                                         ParsedAttributes &AccessAttrs,
-                                         AccessSpecifier AS) {
+Parser::DeclGroupPtrTy Parser::ParseExplicitInstantiation(
+    DeclaratorContext Context, SourceLocation ExternLoc,
+    SourceLocation TemplateLoc, SourceLocation &DeclEnd,
+    ParsedAttributes &AccessAttrs, AccessSpecifier AS) {
   // This isn't really required here.
   ParsingDeclRAIIObject
     ParsingTemplateParams(*this, ParsingDeclRAIIObject::NoParent);
-
-  return ParseSingleDeclarationAfterTemplate(
-      Context, ParsedTemplateInfo(ExternLoc, TemplateLoc),
-      ParsingTemplateParams, DeclEnd, AccessAttrs, AS);
+  ParsedTemplateInfo TemplateInfo(ExternLoc, TemplateLoc);
+  return ParseDeclarationAfterTemplate(
+      Context, TemplateInfo, ParsingTemplateParams, DeclEnd, AccessAttrs, AS);
 }
 
 SourceRange Parser::ParsedTemplateInfo::getSourceRange() const {
diff --git a/clang/lib/Parse/Parser.cpp b/clang/lib/Parse/Parser.cpp
index 3dfd44677e5a2..2dd4a73bfbc26 100644
--- a/clang/lib/Parse/Parser.cpp
+++ b/clang/lib/Parse/Parser.cpp
@@ -1040,8 +1040,8 @@ Parser::ParseExternalDeclaration(ParsedAttributes &Attrs,
              diag::warn_cxx98_compat_extern_template :
              diag::ext_extern_template) << SourceRange(ExternLoc, TemplateLoc);
       SourceLocation DeclEnd;
-      return Actions.ConvertDeclToDeclGroup(ParseExplicitInstantiation(
-          DeclaratorContext::File, ExternLoc, TemplateLoc, DeclEnd, Attrs));
+      return ParseExplicitInstantiation(DeclaratorContext::File, ExternLoc,
+                                        TemplateLoc, DeclEnd, Attrs);
     }
     goto dont_know;
 
@@ -1143,9 +1143,10 @@ Parser::DeclGroupPtrTy Parser::ParseDeclOrFunctionDefInternal(
   DS.SetRangeEnd(DeclSpecAttrs.Range.getEnd());
   DS.takeAttributesFrom(DeclSpecAttrs);
 
+  ParsedTemplateInfo TemplateInfo;
   MaybeParseMicrosoftAttributes(DS.getAttributes());
   // Parse the common declaration-specifiers piece.
-  ParseDeclarationSpecifiers(DS, ParsedTemplateInfo(), AS,
+  ParseDeclarationSpecifiers(DS, TemplateInfo, AS,
                              DeclSpecContext::DSC_top_level);
 
   // If we had a free-standing type definition with a missing semicolon, we
@@ -1241,7 +1242,7 @@ Parser::DeclGroupPtrTy Parser::ParseDeclOrFunctionDefInternal(
     return Actions.ConvertDeclToDeclGroup(TheDecl);
   }
 
-  return ParseDeclGroup(DS, DeclaratorContext::File, Attrs);
+  return ParseDeclGroup(DS, DeclaratorContext::File, Attrs, TemplateInfo);
 }
 
 Parser::DeclGroupPtrTy Parser::ParseDeclarationOrFunctionDefinition(
diff --git a/clang/test/CXX/temp/p3.cpp b/clang/test/CXX/temp/p3.cpp
index b708c613d352d..9e561d0b9a83b 100644
--- a/clang/test/CXX/temp/p3.cpp
+++ b/clang/test/CXX/temp/p3.cpp
@@ -15,3 +15,9 @@ template<typename T> struct B { } f(); // expected-error {{expected ';' after st
 template<typename T> struct C { } // expected-error {{expected ';' after struct}}
 
 A<int> c;
+
+struct D {
+  template<typename T> static const int x = 0, f(); // expected-error {{can only declare a single entity}}
+
+  template<typename T> static const int g(), y = 0; // expected-error {{can only declare a single entity}}
+};
diff --git a/clang/test/OpenMP/declare_simd_messages.cpp b/clang/test/OpenMP/declare_simd_messages.cpp
index dd24322694b69..fea045400e1fa 100644
--- a/clang/test/OpenMP/declare_simd_messages.cpp
+++ b/clang/test/OpenMP/declare_simd_messages.cpp
@@ -33,10 +33,9 @@ int main();
 int main();
 
 struct A {
-// expected-error at +1 {{function declaration is expected after 'declare simd' directive}}
   #pragma omp declare simd
   template<typename T>
-  T infunc1(T a), infunc2(T a);
+  T infunc1(T a);
 };
 
 // expected-error at +1 {{single declaration is expected after 'declare simd' directive}}

>From b0bab58eedacfee97f3ce8aa9644aa80384aa1b8 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko <atrosinenko at accesssoftek.com>
Date: Thu, 1 Feb 2024 19:23:55 +0300
Subject: [PATCH 27/42] [AArch64] Make +pauth enabled in Armv8.3-a by default
 (#78027)

Add AEK_PAUTH to ARMV8_3A in TargetParser and let it propagate to
ARMV8R, as it aligns with GCC defaults.

After adding AEK_PAUTH, several tests from TargetParserTest.cpp crashed
when trying to format an error message, thus update a format string in
AssertSameExtensionFlags to account for bitmask being pre-formatted as
std::string.

The CHECK-PAUTH* lines in aarch64-target-features.c are updated to
account for the fact that FEAT_PAUTH support and pac-ret can be enabled
independently and all four combinations are possible.
---
 clang/lib/Basic/Targets/AArch64.cpp           |  1 -
 clang/test/CodeGen/aarch64-targetattr.c       | 10 ++--
 .../Preprocessor/aarch64-target-features.c    | 35 +++++++-----
 .../llvm/TargetParser/AArch64TargetParser.h   |  2 +-
 .../TargetParser/TargetParserTest.cpp         | 54 +++++++++++--------
 5 files changed, 60 insertions(+), 42 deletions(-)

diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 46f14b47261ae..36d178ea8ae98 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -258,7 +258,6 @@ void AArch64TargetInfo::getTargetDefinesARMV83A(const LangOptions &Opts,
                                                 MacroBuilder &Builder) const {
   Builder.defineMacro("__ARM_FEATURE_COMPLEX", "1");
   Builder.defineMacro("__ARM_FEATURE_JCVT", "1");
-  Builder.defineMacro("__ARM_FEATURE_PAUTH", "1");
   // Also include the Armv8.2 defines
   getTargetDefinesARMV82A(Opts, Builder);
 }
diff --git a/clang/test/CodeGen/aarch64-targetattr.c b/clang/test/CodeGen/aarch64-targetattr.c
index 02da18264da0a..1a3a84a73dbad 100644
--- a/clang/test/CodeGen/aarch64-targetattr.c
+++ b/clang/test/CodeGen/aarch64-targetattr.c
@@ -97,19 +97,19 @@ void minusarch() {}
 // CHECK: attributes #0 = { {{.*}} "target-features"="+crc,+fp-armv8,+lse,+neon,+ras,+rdm,+v8.1a,+v8.2a,+v8a" }
 // CHECK: attributes #1 = { {{.*}} "target-features"="+crc,+fp-armv8,+fullfp16,+lse,+neon,+ras,+rdm,+sve,+v8.1a,+v8.2a,+v8a" }
 // CHECK: attributes #2 = { {{.*}} "target-features"="+crc,+fp-armv8,+fullfp16,+lse,+neon,+ras,+rdm,+sve,+sve2,+v8.1a,+v8.2a,+v8a" }
-// CHECK: attributes #3 = { {{.*}} "target-features"="+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+ras,+rcpc,+rdm,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" }
+// CHECK: attributes #3 = { {{.*}} "target-features"="+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+ras,+rcpc,+rdm,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" }
 // CHECK: attributes #4 = { {{.*}} "target-cpu"="cortex-a710" "target-features"="+bf16,+complxnum,+crc,+dotprod,+flagm,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+mte,+neon,+pauth,+ras,+rcpc,+rdm,+sb,+sve,+sve2,+sve2-bitperm" }
 // CHECK: attributes #5 = { {{.*}} "tune-cpu"="cortex-a710" }
 // CHECK: attributes #6 = { {{.*}} "target-cpu"="generic" }
 // CHECK: attributes #7 = { {{.*}} "tune-cpu"="generic" }
 // CHECK: attributes #8 = { {{.*}} "target-cpu"="neoverse-n1" "target-features"="+aes,+crc,+dotprod,+fp-armv8,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #9 = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+sve" "tune-cpu"="cortex-a710" }
-// CHECK: attributes #10 = { {{.*}} "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,+sve,+sve2" }
-// CHECK: attributes #11 = { {{.*}} "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,-sve" }
+// CHECK: attributes #10 = { {{.*}} "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,+sve,+sve2" }
+// CHECK: attributes #11 = { {{.*}} "target-cpu"="neoverse-v1" "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+rand,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+spe,+ssbs,-sve" }
 // CHECK: attributes #12 = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+sve" }
 // CHECK: attributes #13 = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+sve,-sve2" }
 // CHECK: attributes #14 = { {{.*}} "target-features"="+fullfp16" }
-// CHECK: attributes #15 = { {{.*}} "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
-// CHECK: attributes #16 = { {{.*}} "branch-target-enforcement"="true" "guarded-control-stack"="true" {{.*}} "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
+// CHECK: attributes #15 = { {{.*}} "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
+// CHECK: attributes #16 = { {{.*}} "branch-target-enforcement"="true" "guarded-control-stack"="true" {{.*}} "target-features"="+aes,+bf16,+complxnum,+crc,+dotprod,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #17 = { {{.*}} "target-features"="-neon" }
 // CHECK: attributes #18 = { {{.*}} "target-features"="-v9.3a" }
diff --git a/clang/test/Preprocessor/aarch64-target-features.c b/clang/test/Preprocessor/aarch64-target-features.c
index 062b802909f16..43402225f0689 100644
--- a/clang/test/Preprocessor/aarch64-target-features.c
+++ b/clang/test/Preprocessor/aarch64-target-features.c
@@ -314,15 +314,15 @@
 // CHECK-MCPU-APPLE-A7: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8a" "-target-feature" "+aes"{{.*}} "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-APPLE-A10: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8a" "-target-feature" "+aes"{{.*}} "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-APPLE-A11: "-cc1"{{.*}} "-triple" "aarch64{{.*}}"{{.*}}"-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.2a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
-// CHECK-MCPU-APPLE-A12: "-cc1"{{.*}} "-triple" "aarch64"{{.*}} "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.3a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
+// CHECK-MCPU-APPLE-A12: "-cc1"{{.*}} "-triple" "aarch64"{{.*}} "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.3a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+pauth" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-A34: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
-// CHECK-MCPU-APPLE-A13: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "apple-a13" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.4a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+fp16fml" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+neon"
+// CHECK-MCPU-APPLE-A13: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "apple-a13" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.4a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+fp16fml" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+pauth" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+neon"
 // CHECK-MCPU-A35: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-A53: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-A57: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-A72: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-CORTEX-A73: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
-// CHECK-MCPU-CORTEX-R82: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8r" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+fp16fml" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sb" "-target-feature" "+neon" "-target-feature" "+ssbs"
+// CHECK-MCPU-CORTEX-R82: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8r" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+fp16fml" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+pauth" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sb" "-target-feature" "+neon" "-target-feature" "+ssbs"
 // CHECK-MCPU-M3: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-M4: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8.2a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
 // CHECK-MCPU-KRYO: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+sha2" "-target-feature" "+neon"
@@ -331,10 +331,10 @@
 // CHECK-MCPU-CARMEL: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-feature" "+v8.2a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
 
 // RUN: %clang -target x86_64-apple-macosx -arch arm64 -### -c %s 2>&1 | FileCheck --check-prefix=CHECK-ARCH-ARM64 %s
-// CHECK-ARCH-ARM64: "-target-cpu" "apple-m1" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.5a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+fp16fml" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+neon"
+// CHECK-ARCH-ARM64: "-target-cpu" "apple-m1" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.5a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+fp16fml" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+pauth" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+neon"
 
 // RUN: %clang -target x86_64-apple-macosx -arch arm64_32 -### -c %s 2>&1 | FileCheck --check-prefix=CHECK-ARCH-ARM64_32 %s
-// CHECK-ARCH-ARM64_32: "-target-cpu" "apple-s4" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.3a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
+// CHECK-ARCH-ARM64_32: "-target-cpu" "apple-s4" "-target-feature" "+zcm" "-target-feature" "+zcz" "-target-feature" "+v8.3a" "-target-feature" "+aes" "-target-feature" "+crc" "-target-feature" "+complxnum" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+pauth" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+neon"
 
 // RUN: %clang -target aarch64 -march=armv8-a+fp+simd+crc+crypto -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-MARCH-1 %s
 // RUN: %clang -target aarch64 -march=armv8-a+nofp+nosimd+nocrc+nocrypto+fp+simd+crc+crypto -### -c %s 2>&1 | FileCheck -check-prefix=CHECK-MARCH-1 %s
@@ -497,9 +497,10 @@
 // CHECK-MEMTAG: __ARM_FEATURE_MEMORY_TAGGING 1
 
 // ================== Check Pointer Authentication Extension (PAuth).
-// RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-OFF %s
-// RUN: %clang -target arm64-none-linux-gnu -march=armv8.5-a -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-OFF %s
-// RUN: %clang -target arm64-none-linux-gnu -march=armv8-a+pauth -mbranch-protection=none -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-ON %s
+// RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -x c -E -dM %s -o - | FileCheck -check-prefixes=CHECK-PAUTH-OFF,CHECK-CPU-NOPAUTH %s
+// RUN: %clang -target arm64-none-linux-gnu -march=armv8.5-a+nopauth -x c -E -dM %s -o - | FileCheck -check-prefixes=CHECK-PAUTH-OFF,CHECK-CPU-NOPAUTH %s
+// RUN: %clang -target arm64-none-linux-gnu -march=armv8.5-a -x c -E -dM %s -o - | FileCheck -check-prefixes=CHECK-PAUTH-OFF,CHECK-CPU-PAUTH %s
+// RUN: %clang -target arm64-none-linux-gnu -march=armv8-a+pauth -mbranch-protection=none -x c -E -dM %s -o - | FileCheck -check-prefixes=CHECK-PAUTH-OFF,CHECK-CPU-PAUTH %s
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -mbranch-protection=none -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-OFF %s
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -mbranch-protection=bti -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-OFF %s
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -mbranch-protection=standard -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH %s
@@ -507,12 +508,18 @@
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -mbranch-protection=pac-ret+b-key -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-BKEY %s
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -mbranch-protection=pac-ret+leaf -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-ALL %s
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -mbranch-protection=pac-ret+leaf+b-key -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-PAUTH-BKEY-ALL %s
-// CHECK-PAUTH-OFF-NOT:  __ARM_FEATURE_PAC_DEFAULT
-// CHECK-PAUTH:          #define __ARM_FEATURE_PAC_DEFAULT 1
-// CHECK-PAUTH-BKEY:     #define __ARM_FEATURE_PAC_DEFAULT 2
-// CHECK-PAUTH-ALL:      #define __ARM_FEATURE_PAC_DEFAULT 5
-// CHECK-PAUTH-BKEY-ALL: #define __ARM_FEATURE_PAC_DEFAULT 6
-// CHECK-PAUTH-ON:       #define __ARM_FEATURE_PAUTH 1
+//
+// Note: PAUTH-OFF - pac-ret is disabled
+//       CPU-NOPAUTH - FEAT_PAUTH support is disabled (but pac-ret can still use HINT-encoded instructions)
+//
+// CHECK-CPU-NOPAUTH-NOT: __ARM_FEATURE_PAUTH
+// CHECK-PAUTH-OFF-NOT:   __ARM_FEATURE_PAC_DEFAULT
+// CHECK-PAUTH:           #define __ARM_FEATURE_PAC_DEFAULT 1
+// CHECK-PAUTH-BKEY:      #define __ARM_FEATURE_PAC_DEFAULT 2
+// CHECK-PAUTH-ALL:       #define __ARM_FEATURE_PAC_DEFAULT 5
+// CHECK-PAUTH-BKEY-ALL:  #define __ARM_FEATURE_PAC_DEFAULT 6
+// CHECK-CPU-PAUTH:       #define __ARM_FEATURE_PAUTH 1
+// CHECK-CPU-NOPAUTH-NOT: __ARM_FEATURE_PAUTH
 
 // ================== Check Branch Target Identification (BTI).
 // RUN: %clang -target arm64-none-linux-gnu -march=armv8-a -x c -E -dM %s -o - | FileCheck -check-prefix=CHECK-BTI-OFF %s
diff --git a/llvm/include/llvm/TargetParser/AArch64TargetParser.h b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
index 92039c20c9044..cce9d6db260d7 100644
--- a/llvm/include/llvm/TargetParser/AArch64TargetParser.h
+++ b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
@@ -478,7 +478,7 @@ inline constexpr ArchInfo ARMV8_1A  = { VersionTuple{8, 1}, AProfile, "armv8.1-a
 inline constexpr ArchInfo ARMV8_2A  = { VersionTuple{8, 2}, AProfile, "armv8.2-a", "+v8.2a", (ARMV8_1A.DefaultExts |
                                         AArch64::ExtensionBitset({AArch64::AEK_RAS}))};
 inline constexpr ArchInfo ARMV8_3A  = { VersionTuple{8, 3}, AProfile, "armv8.3-a", "+v8.3a", (ARMV8_2A.DefaultExts |
-                                        AArch64::ExtensionBitset({AArch64::AEK_RCPC, AArch64::AEK_JSCVT, AArch64::AEK_FCMA}))};
+                                        AArch64::ExtensionBitset({AArch64::AEK_FCMA, AArch64::AEK_JSCVT, AArch64::AEK_PAUTH, AArch64::AEK_RCPC}))};
 inline constexpr ArchInfo ARMV8_4A  = { VersionTuple{8, 4}, AProfile, "armv8.4-a", "+v8.4a", (ARMV8_3A.DefaultExts |
                                         AArch64::ExtensionBitset({AArch64::AEK_DOTPROD}))};
 inline constexpr ArchInfo ARMV8_5A  = { VersionTuple{8, 5}, AProfile, "armv8.5-a", "+v8.5a", (ARMV8_4A.DefaultExts)};
diff --git a/llvm/unittests/TargetParser/TargetParserTest.cpp b/llvm/unittests/TargetParser/TargetParserTest.cpp
index 2fde5b4e642c5..cbd8fe18cd181 100644
--- a/llvm/unittests/TargetParser/TargetParserTest.cpp
+++ b/llvm/unittests/TargetParser/TargetParserTest.cpp
@@ -133,8 +133,8 @@ template <ARM::ISAKind ISAKind> struct AssertSameExtensionFlags {
 
     return testing::AssertionFailure() << llvm::formatv(
                "CPU: {4}\n"
-               "Expected extension flags: {0} ({1:x})\n"
-               "     Got extension flags: {2} ({3:x})\n",
+               "Expected extension flags: {0} ({1})\n"
+               "     Got extension flags: {2} ({3})\n",
                FormatExtensionFlags(ExpectedFlags),
                SerializeExtensionFlags(ExpectedFlags),
                FormatExtensionFlags(GotFlags),
@@ -1260,7 +1260,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_AES,     AArch64::AEK_SHA2,  AArch64::AEK_SHA3,
                  AArch64::AEK_SM4,     AArch64::AEK_FP16,  AArch64::AEK_BF16,
                  AArch64::AEK_PROFILE, AArch64::AEK_RAND,  AArch64::AEK_FP16FML,
-                 AArch64::AEK_I8MM,    AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM,    AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.4-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "neoverse-v2", "armv9-a", "neon-fp-armv8",
@@ -1275,7 +1276,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_SVE2,        AArch64::AEK_PROFILE,
                  AArch64::AEK_FP16FML,     AArch64::AEK_I8MM,
                  AArch64::AEK_SVE2BITPERM, AArch64::AEK_RAND,
-                 AArch64::AEK_JSCVT,       AArch64::AEK_FCMA})),
+                 AArch64::AEK_JSCVT,       AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "9-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "cortex-r82", "armv8-r", "crypto-neon-fp-armv8",
@@ -1284,7 +1286,7 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_DOTPROD, AArch64::AEK_FP, AArch64::AEK_SIMD,
                  AArch64::AEK_FP16, AArch64::AEK_FP16FML, AArch64::AEK_RAS,
                  AArch64::AEK_RCPC, AArch64::AEK_LSE, AArch64::AEK_SB,
-                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH})),
             "8-R"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "cortex-x1", "armv8.2-a", "crypto-neon-fp-armv8",
@@ -1389,7 +1391,8 @@ INSTANTIATE_TEST_SUITE_P(
                 {AArch64::AEK_CRC, AArch64::AEK_AES, AArch64::AEK_SHA2,
                  AArch64::AEK_FP, AArch64::AEK_SIMD, AArch64::AEK_LSE,
                  AArch64::AEK_RAS, AArch64::AEK_RDM, AArch64::AEK_RCPC,
-                 AArch64::AEK_FP16, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_FP16, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.3-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-a13", "armv8.4-a", "crypto-neon-fp-armv8",
@@ -1399,7 +1402,7 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_JSCVT,
-                 AArch64::AEK_FCMA})),
+                 AArch64::AEK_FCMA, AArch64::AEK_PAUTH})),
             "8.4-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-a14", "armv8.5-a", "crypto-neon-fp-armv8",
@@ -1409,7 +1412,7 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_JSCVT,
-                 AArch64::AEK_FCMA})),
+                 AArch64::AEK_FCMA, AArch64::AEK_PAUTH})),
             "8.5-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-a15", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1419,7 +1422,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_BF16,
-                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-a16", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1429,7 +1433,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_BF16,
-                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-a17", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1439,7 +1444,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_BF16,
-                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-m1", "armv8.5-a", "crypto-neon-fp-armv8",
@@ -1449,7 +1455,7 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_JSCVT,
-                 AArch64::AEK_FCMA})),
+                 AArch64::AEK_FCMA, AArch64::AEK_PAUTH})),
             "8.5-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-m2", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1459,7 +1465,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_BF16,
-                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-m3", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1469,7 +1476,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_LSE, AArch64::AEK_RAS, AArch64::AEK_RDM,
                  AArch64::AEK_RCPC, AArch64::AEK_DOTPROD, AArch64::AEK_FP16,
                  AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_BF16,
-                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-s4", "armv8.3-a", "crypto-neon-fp-armv8",
@@ -1477,7 +1485,8 @@ INSTANTIATE_TEST_SUITE_P(
                 {AArch64::AEK_CRC, AArch64::AEK_AES, AArch64::AEK_SHA2,
                  AArch64::AEK_FP, AArch64::AEK_SIMD, AArch64::AEK_LSE,
                  AArch64::AEK_RAS, AArch64::AEK_RDM, AArch64::AEK_RCPC,
-                 AArch64::AEK_FP16, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_FP16, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.3-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "apple-s5", "armv8.3-a", "crypto-neon-fp-armv8",
@@ -1485,7 +1494,8 @@ INSTANTIATE_TEST_SUITE_P(
                 {AArch64::AEK_CRC, AArch64::AEK_AES, AArch64::AEK_SHA2,
                  AArch64::AEK_FP, AArch64::AEK_SIMD, AArch64::AEK_LSE,
                  AArch64::AEK_RAS, AArch64::AEK_RDM, AArch64::AEK_RCPC,
-                 AArch64::AEK_FP16, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_FP16, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.3-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "exynos-m3", "armv8-a", "crypto-neon-fp-armv8",
@@ -1550,7 +1560,7 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_SB,          AArch64::AEK_SVE2,
                  AArch64::AEK_SVE2BITPERM, AArch64::AEK_BF16,
                  AArch64::AEK_I8MM,        AArch64::AEK_JSCVT,
-                 AArch64::AEK_FCMA})),
+                 AArch64::AEK_FCMA,        AArch64::AEK_PAUTH})),
             "8.5-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "ampere1", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1561,7 +1571,7 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_SHA3, AArch64::AEK_BF16, AArch64::AEK_SHA2,
                  AArch64::AEK_AES, AArch64::AEK_I8MM, AArch64::AEK_SSBS,
                  AArch64::AEK_SB, AArch64::AEK_RAND, AArch64::AEK_JSCVT,
-                 AArch64::AEK_FCMA})),
+                 AArch64::AEK_FCMA, AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "ampere1a", "armv8.6-a", "crypto-neon-fp-armv8",
@@ -1572,7 +1582,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_SM4, AArch64::AEK_SHA3, AArch64::AEK_BF16,
                  AArch64::AEK_SHA2, AArch64::AEK_AES, AArch64::AEK_I8MM,
                  AArch64::AEK_SSBS, AArch64::AEK_SB, AArch64::AEK_RAND,
-                 AArch64::AEK_MTE, AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_MTE, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.6-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "neoverse-512tvb", "armv8.4-a", "crypto-neon-fp-armv8",
@@ -1584,7 +1595,8 @@ INSTANTIATE_TEST_SUITE_P(
                  AArch64::AEK_AES,     AArch64::AEK_SHA2,  AArch64::AEK_SHA3,
                  AArch64::AEK_SM4,     AArch64::AEK_FP16,  AArch64::AEK_BF16,
                  AArch64::AEK_PROFILE, AArch64::AEK_RAND,  AArch64::AEK_FP16FML,
-                 AArch64::AEK_I8MM,    AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_I8MM,    AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
+                 AArch64::AEK_PAUTH})),
             "8.4-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "thunderx2t99", "armv8.1-a", "crypto-neon-fp-armv8",
@@ -1599,7 +1611,7 @@ INSTANTIATE_TEST_SUITE_P(
                 {AArch64::AEK_CRC, AArch64::AEK_AES, AArch64::AEK_SHA2,
                  AArch64::AEK_LSE, AArch64::AEK_RDM, AArch64::AEK_FP,
                  AArch64::AEK_SIMD, AArch64::AEK_RAS, AArch64::AEK_RCPC,
-                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA})),
+                 AArch64::AEK_JSCVT, AArch64::AEK_FCMA, AArch64::AEK_PAUTH})),
             "8.3-A"),
         ARMCPUTestParams<AArch64::ExtensionBitset>(
             "thunderx", "armv8-a", "crypto-neon-fp-armv8",

>From c84cdca747dc1c0b524378e98c41865cb87991f8 Mon Sep 17 00:00:00 2001
From: Amir Ayupov <aaupov at fb.com>
Date: Thu, 1 Feb 2024 08:26:21 -0800
Subject: [PATCH 28/42] [BOLT][NFC] Factor out RI::disassemblePLTInstruction
 (#80302)

---
 bolt/include/bolt/Rewrite/RewriteInstance.h |  5 ++
 bolt/lib/Rewrite/RewriteInstance.cpp        | 52 ++++++++-------------
 2 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/bolt/include/bolt/Rewrite/RewriteInstance.h b/bolt/include/bolt/Rewrite/RewriteInstance.h
index 3074ae77c7ef1..170da78846b8f 100644
--- a/bolt/include/bolt/Rewrite/RewriteInstance.h
+++ b/bolt/include/bolt/Rewrite/RewriteInstance.h
@@ -264,6 +264,11 @@ class RewriteInstance {
   void createPLTBinaryFunction(uint64_t TargetAddress, uint64_t EntryAddress,
                                uint64_t EntrySize);
 
+  /// Disassemble PLT instruction.
+  void disassemblePLTInstruction(const BinarySection &Section,
+                                 uint64_t InstrOffset, MCInst &Instruction,
+                                 uint64_t &InstrSize);
+
   /// Disassemble aarch64-specific .plt \p Section auxiliary function
   void disassemblePLTSectionAArch64(BinarySection &Section);
 
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index dee1bf125f0a7..8cd7adff1c4c7 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -1470,25 +1470,29 @@ void RewriteInstance::createPLTBinaryFunction(uint64_t TargetAddress,
   setPLTSymbol(BF, Symbol->getName());
 }
 
-void RewriteInstance::disassemblePLTSectionAArch64(BinarySection &Section) {
+void RewriteInstance::disassemblePLTInstruction(const BinarySection &Section,
+                                                uint64_t InstrOffset,
+                                                MCInst &Instruction,
+                                                uint64_t &InstrSize) {
   const uint64_t SectionAddress = Section.getAddress();
   const uint64_t SectionSize = Section.getSize();
   StringRef PLTContents = Section.getContents();
   ArrayRef<uint8_t> PLTData(
       reinterpret_cast<const uint8_t *>(PLTContents.data()), SectionSize);
 
-  auto disassembleInstruction = [&](uint64_t InstrOffset, MCInst &Instruction,
-                                    uint64_t &InstrSize) {
-    const uint64_t InstrAddr = SectionAddress + InstrOffset;
-    if (!BC->DisAsm->getInstruction(Instruction, InstrSize,
-                                    PLTData.slice(InstrOffset), InstrAddr,
-                                    nulls())) {
-      errs() << "BOLT-ERROR: unable to disassemble instruction in PLT section "
-             << Section.getName() << " at offset 0x"
-             << Twine::utohexstr(InstrOffset) << '\n';
-      exit(1);
-    }
-  };
+  const uint64_t InstrAddr = SectionAddress + InstrOffset;
+  if (!BC->DisAsm->getInstruction(Instruction, InstrSize,
+                                  PLTData.slice(InstrOffset), InstrAddr,
+                                  nulls())) {
+    errs() << "BOLT-ERROR: unable to disassemble instruction in PLT section "
+           << Section.getName() << formatv(" at offset {0:x}\n", InstrOffset);
+    exit(1);
+  }
+}
+
+void RewriteInstance::disassemblePLTSectionAArch64(BinarySection &Section) {
+  const uint64_t SectionAddress = Section.getAddress();
+  const uint64_t SectionSize = Section.getSize();
 
   uint64_t InstrOffset = 0;
   // Locate new plt entry
@@ -1500,7 +1504,7 @@ void RewriteInstance::disassemblePLTSectionAArch64(BinarySection &Section) {
     uint64_t InstrSize;
     // Loop through entry instructions
     while (InstrOffset < SectionSize) {
-      disassembleInstruction(InstrOffset, Instruction, InstrSize);
+      disassemblePLTInstruction(Section, InstrOffset, Instruction, InstrSize);
       EntrySize += InstrSize;
       if (!BC->MIB->isIndirectBranch(Instruction)) {
         Instructions.emplace_back(Instruction);
@@ -1521,7 +1525,7 @@ void RewriteInstance::disassemblePLTSectionAArch64(BinarySection &Section) {
 
     // Skip nops if any
     while (InstrOffset < SectionSize) {
-      disassembleInstruction(InstrOffset, Instruction, InstrSize);
+      disassemblePLTInstruction(Section, InstrOffset, Instruction, InstrSize);
       if (!BC->MIB->isNoop(Instruction))
         break;
 
@@ -1578,29 +1582,13 @@ void RewriteInstance::disassemblePLTSectionX86(BinarySection &Section,
                                                uint64_t EntrySize) {
   const uint64_t SectionAddress = Section.getAddress();
   const uint64_t SectionSize = Section.getSize();
-  StringRef PLTContents = Section.getContents();
-  ArrayRef<uint8_t> PLTData(
-      reinterpret_cast<const uint8_t *>(PLTContents.data()), SectionSize);
-
-  auto disassembleInstruction = [&](uint64_t InstrOffset, MCInst &Instruction,
-                                    uint64_t &InstrSize) {
-    const uint64_t InstrAddr = SectionAddress + InstrOffset;
-    if (!BC->DisAsm->getInstruction(Instruction, InstrSize,
-                                    PLTData.slice(InstrOffset), InstrAddr,
-                                    nulls())) {
-      errs() << "BOLT-ERROR: unable to disassemble instruction in PLT section "
-             << Section.getName() << " at offset 0x"
-             << Twine::utohexstr(InstrOffset) << '\n';
-      exit(1);
-    }
-  };
 
   for (uint64_t EntryOffset = 0; EntryOffset + EntrySize <= SectionSize;
        EntryOffset += EntrySize) {
     MCInst Instruction;
     uint64_t InstrSize, InstrOffset = EntryOffset;
     while (InstrOffset < EntryOffset + EntrySize) {
-      disassembleInstruction(InstrOffset, Instruction, InstrSize);
+      disassemblePLTInstruction(Section, InstrOffset, Instruction, InstrSize);
       // Check if the entry size needs adjustment.
       if (EntryOffset == 0 && BC->MIB->isTerminateBranch(Instruction) &&
           EntrySize == 8)

>From 32c80a17446360e42f9c15118d96a81f17e5de26 Mon Sep 17 00:00:00 2001
From: Emma Pilkington <emma.pilkington95 at gmail.com>
Date: Thu, 1 Feb 2024 11:26:42 -0500
Subject: [PATCH 29/42] [llvm-objdump][AMDGPU] Pass ELF ABIVersion through
 disassembler (#78907)

Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart
but I'm not sure what a better place for it would be.
---
 .../llvm/MC/MCDisassembler/MCDisassembler.h   |  3 +++
 llvm/include/llvm/Object/ELFObjectFile.h      |  7 +++++++
 .../Disassembler/AMDGPUDisassembler.cpp       | 10 ++++++---
 .../AMDGPU/Disassembler/AMDGPUDisassembler.h  |  3 +++
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp    | 11 ++++++++++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  5 ++++-
 .../tools/llvm-objdump/ELF/AMDGPU/kd-cov5.s   | 21 +++++++++++++++++++
 .../tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s  | 16 ++++++++++----
 .../tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s  | 16 ++++++++++----
 .../tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s  |  8 +++++--
 .../tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s | 12 ++++++++---
 .../tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s   | 12 ++++++++---
 .../tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s   | 12 ++++++++---
 .../llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s |  1 +
 .../llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s  |  6 ++++--
 llvm/tools/llvm-objdump/llvm-objdump.cpp      |  3 +++
 16 files changed, 121 insertions(+), 25 deletions(-)
 create mode 100644 llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-cov5.s

diff --git a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
index 2553a086cd53b..7dd8b0b7d3778 100644
--- a/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
+++ b/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
@@ -217,6 +217,9 @@ class MCDisassembler {
 
   const MCSubtargetInfo& getSubtargetInfo() const { return STI; }
 
+  /// ELF-specific, set the ABI version from the object header.
+  virtual void setABIVersion(unsigned Version) {}
+
   // Marked mutable because we cache it inside the disassembler, rather than
   // having to pass it around as an argument through all the autogenerated code.
   mutable raw_ostream *CommentStream = nullptr;
diff --git a/llvm/include/llvm/Object/ELFObjectFile.h b/llvm/include/llvm/Object/ELFObjectFile.h
index 7124df50b561d..c9227da65708c 100644
--- a/llvm/include/llvm/Object/ELFObjectFile.h
+++ b/llvm/include/llvm/Object/ELFObjectFile.h
@@ -103,6 +103,8 @@ class ELFObjectFileBase : public ObjectFile {
 
   virtual uint16_t getEMachine() const = 0;
 
+  virtual uint8_t getEIdentABIVersion() const = 0;
+
   std::vector<ELFPltEntry> getPltEntries() const;
 
   /// Returns a vector containing a symbol version for each dynamic symbol.
@@ -251,6 +253,7 @@ ELFObjectFileBase::symbols() const {
 template <class ELFT> class ELFObjectFile : public ELFObjectFileBase {
   uint16_t getEMachine() const override;
   uint16_t getEType() const override;
+  uint8_t getEIdentABIVersion() const override;
   uint64_t getSymbolSize(DataRefImpl Sym) const override;
 
 public:
@@ -645,6 +648,10 @@ template <class ELFT> uint16_t ELFObjectFile<ELFT>::getEType() const {
   return EF.getHeader().e_type;
 }
 
+template <class ELFT> uint8_t ELFObjectFile<ELFT>::getEIdentABIVersion() const {
+  return EF.getHeader().e_ident[ELF::EI_ABIVERSION];
+}
+
 template <class ELFT>
 uint64_t ELFObjectFile<ELFT>::getSymbolSize(DataRefImpl Sym) const {
   Expected<const Elf_Sym *> SymOrErr = getSymbol(Sym);
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 9dedd39908989..fba9eb53c8a8b 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -47,12 +47,17 @@ using DecodeStatus = llvm::MCDisassembler::DecodeStatus;
 AMDGPUDisassembler::AMDGPUDisassembler(const MCSubtargetInfo &STI,
                                        MCContext &Ctx, MCInstrInfo const *MCII)
     : MCDisassembler(STI, Ctx), MCII(MCII), MRI(*Ctx.getRegisterInfo()),
-      MAI(*Ctx.getAsmInfo()), TargetMaxInstBytes(MAI.getMaxInstLength(&STI)) {
+      MAI(*Ctx.getAsmInfo()), TargetMaxInstBytes(MAI.getMaxInstLength(&STI)),
+      CodeObjectVersion(AMDGPU::getDefaultAMDHSACodeObjectVersion()) {
   // ToDo: AMDGPUDisassembler supports only VI ISA.
   if (!STI.hasFeature(AMDGPU::FeatureGCN3Encoding) && !isGFX10Plus())
     report_fatal_error("Disassembly not yet supported for subtarget");
 }
 
+void AMDGPUDisassembler::setABIVersion(unsigned Version) {
+  CodeObjectVersion = AMDGPU::getAMDHSACodeObjectVersion(Version);
+}
+
 inline static MCDisassembler::DecodeStatus
 addOperand(MCInst &Inst, const MCOperand& Opnd) {
   Inst.addOperand(Opnd);
@@ -2202,8 +2207,7 @@ AMDGPUDisassembler::decodeKernelDescriptorDirective(
                       KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32);
     }
 
-    // FIXME: We should be looking at the ELF header ABI version for this.
-    if (AMDGPU::getDefaultAMDHSACodeObjectVersion() >= AMDGPU::AMDHSA_COV5)
+    if (CodeObjectVersion >= AMDGPU::AMDHSA_COV5)
       PRINT_DIRECTIVE(".amdhsa_uses_dynamic_stack",
                       KERNEL_CODE_PROPERTY_USES_DYNAMIC_STACK);
 
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index 233581949d712..5a89b30f6fb36 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -100,12 +100,15 @@ class AMDGPUDisassembler : public MCDisassembler {
   mutable uint64_t Literal64;
   mutable bool HasLiteral;
   mutable std::optional<bool> EnableWavefrontSize32;
+  unsigned CodeObjectVersion;
 
 public:
   AMDGPUDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx,
                      MCInstrInfo const *MCII);
   ~AMDGPUDisassembler() override = default;
 
+  void setABIVersion(unsigned Version) override;
+
   DecodeStatus getInstruction(MCInst &MI, uint64_t &Size,
                               ArrayRef<uint8_t> Bytes, uint64_t Address,
                               raw_ostream &CS) const override;
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 106fdb19f2789..89c066613bd91 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -175,6 +175,17 @@ unsigned getDefaultAMDHSACodeObjectVersion() {
   return DefaultAMDHSACodeObjectVersion;
 }
 
+unsigned getAMDHSACodeObjectVersion(unsigned ABIVersion) {
+  switch (ABIVersion) {
+  case ELF::ELFABIVERSION_AMDGPU_HSA_V4:
+    return 4;
+  case ELF::ELFABIVERSION_AMDGPU_HSA_V5:
+    return 5;
+  default:
+    return getDefaultAMDHSACodeObjectVersion();
+  }
+}
+
 uint8_t getELFABIVersion(const Triple &T, unsigned CodeObjectVersion) {
   if (T.getOS() != Triple::AMDHSA)
     return 0;
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 11b0bc5c81711..c0be034ff0ebd 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -50,12 +50,15 @@ bool isHsaAbi(const MCSubtargetInfo &STI);
 /// \returns Code object version from the IR module flag.
 unsigned getAMDHSACodeObjectVersion(const Module &M);
 
+/// \returns Code object version from ELF's e_ident[EI_ABIVERSION].
+unsigned getAMDHSACodeObjectVersion(unsigned ABIVersion);
+
 /// \returns The default HSA code object version. This should only be used when
 /// we lack a more accurate CodeObjectVersion value (e.g. from the IR module
 /// flag or a .amdhsa_code_object_version directive)
 unsigned getDefaultAMDHSACodeObjectVersion();
 
-/// \returns ABIVersion suitable for use in ELF's e_ident[ABIVERSION]. \param
+/// \returns ABIVersion suitable for use in ELF's e_ident[EI_ABIVERSION]. \param
 /// CodeObjectVersion is a value returned by getAMDHSACodeObjectVersion().
 uint8_t getELFABIVersion(const Triple &OS, unsigned CodeObjectVersion);
 
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-cov5.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-cov5.s
new file mode 100644
index 0000000000000..ece36c6ad2672
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-cov5.s
@@ -0,0 +1,21 @@
+; RUN: sed 's/CODE_OBJECT_VERSION/5/g' %s \
+; RUN:   | llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=-xnack,+wavefrontsize32,-wavefrontsize64 -filetype=obj > %t.o
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd %t.o | FileCheck %s --check-prefixes=COV5,CHECK
+
+; RUN: sed 's/CODE_OBJECT_VERSION/4/g' %s \
+; RUN:   | llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=-xnack,+wavefrontsize32,-wavefrontsize64 -filetype=obj > %t.o
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd %t.o | FileCheck %s --check-prefixes=COV4,CHECK
+
+;; Verify that .amdhsa_uses_dynamic_stack is only printed on COV5+.
+
+; CHECK: .amdhsa_kernel kernel
+; COV5: .amdhsa_uses_dynamic_stack 0
+; COV4-NOT: .amdhsa_uses_dynamic_stack
+; CHECK: .end_amdhsa_kernel
+
+.amdhsa_code_object_version CODE_OBJECT_VERSION
+
+.amdhsa_kernel kernel
+  .amdhsa_next_free_vgpr 32
+  .amdhsa_next_free_sgpr 32
+.end_amdhsa_kernel
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s
index 781729d5c4cc1..81d0d868ab918 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s
@@ -4,7 +4,8 @@
 
 ;--- 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1010 < 1.s > 1.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: echo '.amdhsa_code_object_version 5' > 1-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee -a 1-disasm.s | FileCheck 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1010 < 1-disasm.s > 1-disasm.o
 ; RUN: cmp 1.o 1-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -50,6 +51,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 1
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -58,7 +60,8 @@
 
 ;--- 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1010 < 2.s > 2.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: echo '.amdhsa_code_object_version 5' > 2-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee -a 2-disasm.s | FileCheck 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1010 < 2-disasm.s > 2-disasm.o
 ; RUN: cmp 2.o 2-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -104,6 +107,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -112,7 +116,8 @@
 
 ;--- 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1010 < 3.s > 3.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee 3-disasm.s | FileCheck 3.s
+; RUN: echo '.amdhsa_code_object_version 5' > 3-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee -a 3-disasm.s | FileCheck 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1010 < 3-disasm.s > 3-disasm.o
 ; RUN: cmp 3.o 3-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -158,6 +163,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -166,7 +172,8 @@
 
 ;--- 4.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1010 < 4.s > 4.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 4.o | tail -n +7 | tee 4-disasm.s | FileCheck 4.s
+; RUN: echo '.amdhsa_code_object_version 5' > 4-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 4.o | tail -n +7 | tee -a 4-disasm.s | FileCheck 4.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack,+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1010 < 4-disasm.s > 4-disasm.o
 ; RUN: cmp 4.o 4-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -212,6 +219,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s
index 019c20754f389..750809128189f 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s
@@ -4,7 +4,8 @@
 
 ;--- 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1100 < 1.s > 1.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: echo '.amdhsa_code_object_version 5' > 1-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee -a 1-disasm.s | FileCheck 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1100 < 1-disasm.s > 1-disasm.o
 ; RUN: cmp 1.o 1-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -51,6 +52,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 1
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -59,7 +61,8 @@
 
 ;--- 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1100 < 2.s > 2.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: echo '.amdhsa_code_object_version 5' > 2-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee -a 2-disasm.s | FileCheck 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1100 < 2-disasm.s > 2-disasm.o
 ; RUN: cmp 2.o 2-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -106,6 +109,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -114,7 +118,8 @@
 
 ;--- 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1100 < 3.s > 3.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee 3-disasm.s | FileCheck 3.s
+; RUN: echo '.amdhsa_code_object_version 5' > 3-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee -a 3-disasm.s | FileCheck 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1100 < 3-disasm.s > 3-disasm.o
 ; RUN: cmp 3.o 3-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -161,6 +166,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -169,7 +175,8 @@
 
 ;--- 4.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1100 < 4.s > 4.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 4.o | tail -n +7 | tee 4-disasm.s | FileCheck 4.s
+; RUN: echo '.amdhsa_code_object_version 5' > 4-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 4.o | tail -n +7 | tee -a 4-disasm.s | FileCheck 4.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize64,-wavefrontsize32 -filetype=obj -mcpu=gfx1100 < 4-disasm.s > 4-disasm.o
 ; RUN: cmp 4.o 4-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -216,6 +223,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s
index 86af4810059ec..c644e15efc8d7 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s
@@ -4,7 +4,8 @@
 
 ;--- 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 1.s > 1.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: echo '.amdhsa_code_object_version 5' > 1-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee -a 1-disasm.s | FileCheck 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 1-disasm.s > 1-disasm.o
 ; RUN: cmp 1.o 1-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -48,6 +49,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 1
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
@@ -56,7 +58,8 @@
 
 ;--- 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-wavefrontsize32,+wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 2.s > 2.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: echo '.amdhsa_code_object_version 5' > 2-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee -a 2-disasm.s | FileCheck 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-wavefrontsize32,+wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 2-disasm.s > 2-disasm.o
 ; RUN: cmp 2.o 2-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -100,6 +103,7 @@
 ; CHECK-NEXT: .amdhsa_wavefront_size32 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 32
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s
index 4978f6974fd33..d1062c8946677 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s
@@ -4,7 +4,8 @@
 
 ;--- 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx90a < 1.s > 1.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: echo '.amdhsa_code_object_version 5' > 1-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee -a 1-disasm.s | FileCheck 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx90a < 1-disasm.s > 1-disasm.o
 ; RUN: cmp 1.o 1-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -47,6 +48,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 0
   .amdhsa_next_free_sgpr 0
@@ -55,7 +57,8 @@
 
 ;--- 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx90a < 2.s > 2.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: echo '.amdhsa_code_object_version 5' > 2-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee -a 2-disasm.s | FileCheck 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx90a < 2-disasm.s > 2-disasm.o
 ; RUN: cmp 2.o 2-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -98,6 +101,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 0
@@ -106,7 +110,8 @@
 
 ;--- 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx90a < 3.s > 3.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee 3-disasm.s | FileCheck 3.s
+; RUN: echo '.amdhsa_code_object_version 5' > 3-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee -a 3-disasm.s | FileCheck 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx90a < 3-disasm.s > 3-disasm.o
 ; RUN: cmp 3.o 3-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -151,6 +156,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_kernarg_preload_length  2
 ; CHECK-NEXT: .amdhsa_user_sgpr_kernarg_preload_offset  1
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 0
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
index a40cf1d377693..021d7b415e5ed 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
@@ -5,7 +5,8 @@
 ;--- 1.s
 ;; Only set next_free_sgpr.
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 1.s > 1.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: echo '.amdhsa_code_object_version 5' > 1-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee -a 1-disasm.s | FileCheck 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 1-disasm.s > 1-disasm.o
 ; RUN: cmp 1.o 1-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -46,6 +47,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 0
   .amdhsa_next_free_sgpr 42
@@ -57,7 +59,8 @@
 ;--- 2.s
 ;; Only set other directives.
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 2.s > 2.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: echo '.amdhsa_code_object_version 5' > 2-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee -a 2-disasm.s | FileCheck 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 2-disasm.s > 2-disasm.o
 ; RUN: cmp 2.o 2-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -98,6 +101,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 0
   .amdhsa_next_free_sgpr 0
@@ -109,7 +113,8 @@
 ;--- 3.s
 ;; Set all affecting directives.
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 3.s > 3.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee 3-disasm.s | FileCheck 3.s
+; RUN: echo '.amdhsa_code_object_version 5' > 3-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee -a 3-disasm.s | FileCheck 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 3-disasm.s > 3-disasm.o
 ; RUN: cmp 3.o 3-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -150,6 +155,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 0
   .amdhsa_next_free_sgpr 35
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s
index b6b9c91b14246..3c0bfd7e372b9 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s
@@ -4,7 +4,8 @@
 
 ;--- 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 1.s > 1.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: echo '.amdhsa_code_object_version 5' > 1-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee -a 1-disasm.s | FileCheck 1.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 1-disasm.s > 1-disasm.o
 ; RUN: cmp 1.o 1-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -45,6 +46,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 23
   .amdhsa_next_free_sgpr 0
@@ -52,7 +54,8 @@
 
 ;--- 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 2.s > 2.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: echo '.amdhsa_code_object_version 5' > 2-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee -a 2-disasm.s | FileCheck 2.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 2-disasm.s > 2-disasm.o
 ; RUN: cmp 2.o 2-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -93,6 +96,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 14
   .amdhsa_next_free_sgpr 0
@@ -100,7 +104,8 @@
 
 ;--- 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 3.s > 3.o
-; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee 3-disasm.s | FileCheck 3.s
+; RUN: echo '.amdhsa_code_object_version 5' > 3-disasm.s
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 3.o | tail -n +7 | tee -a 3-disasm.s | FileCheck 3.s
 ; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-xnack -filetype=obj -mcpu=gfx908 < 3-disasm.s > 3-disasm.o
 ; RUN: cmp 3.o 3-disasm.o
 ; CHECK: .amdhsa_kernel kernel
@@ -141,6 +146,7 @@
 ; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
 ; CHECK-NEXT: .amdhsa_uses_dynamic_stack 0
 ; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_code_object_version 5
 .amdhsa_kernel kernel
   .amdhsa_next_free_vgpr 32
   .amdhsa_next_free_sgpr 0
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s
index 39739c957350c..f9d68e81e940b 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s
@@ -65,6 +65,7 @@
 ; OBJDUMP-NEXT:         .amdhsa_uses_dynamic_stack 0
 ; OBJDUMP-NEXT: .end_amdhsa_kernel
 
+.amdhsa_code_object_version 5
 .amdhsa_kernel my_kernel
   .amdhsa_group_segment_fixed_size 0
   .amdhsa_private_segment_fixed_size 0
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s
index 78b405097cf60..5f8e9ad5c1929 100644
--- a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s
@@ -1,8 +1,9 @@
 ;; Entirely zeroed kernel descriptor (for GFX9).
 
 ; RUN: llvm-mc %s --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=-xnack -filetype=obj -o %t1
-; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t1 \
-; RUN: | tail -n +7 | llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=-xnack -filetype=obj -o %t2
+; RUN: echo '.amdhsa_code_object_version 5' > %t2.s
+; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t1 | tail -n +7 >> %t2.s
+; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -mattr=-xnack -filetype=obj -o %t2 %t2.s
 ; RUN: diff %t1 %t2
 
 ; RUN: llvm-objdump -s -j .text %t1 | FileCheck --check-prefix=OBJDUMP %s
@@ -15,6 +16,7 @@
 ;; This file and kd-zeroed-raw.s produce the same output for the kernel
 ;; descriptor - a block of 64 zeroed bytes.
 
+.amdhsa_code_object_version 5
 .amdhsa_kernel my_kernel
   .amdhsa_group_segment_fixed_size 0
   .amdhsa_private_segment_fixed_size 0
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index a80f4c2d90865..7f57713e6f946 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -913,6 +913,9 @@ DisassemblerTarget::DisassemblerTarget(const Target *TheTarget, ObjectFile &Obj,
   if (!DisAsm)
     reportError(Obj.getFileName(), "no disassembler for target " + TripleName);
 
+  if (auto *ELFObj = dyn_cast<ELFObjectFileBase>(&Obj))
+    DisAsm->setABIVersion(ELFObj->getEIdentABIVersion());
+
   InstrAnalysis.reset(TheTarget->createMCInstrAnalysis(InstrInfo.get()));
 
   int AsmPrinterVariant = AsmInfo->getAssemblerDialect();

>From 1dec0bdd350a306ceac7f436fd4c7242975a7062 Mon Sep 17 00:00:00 2001
From: jeanPerier <jperier at nvidia.com>
Date: Thu, 1 Feb 2024 17:43:43 +0100
Subject: [PATCH 30/42] [flang] Fix passing NULL to OPTIONAL procedure pointers
 (#80267)

Procedure pointer lowering used `prepareUserCallActualArgument` because
it was convenient, but this helper was not meant for POINTERs when
originally written and it did not handled passing NULL to an OPTIONAL
procedure pointer correctly.

The resulting argument should be a disassociated pointer, not an absent
pointer (Fortran 15.5.2.12 point 1.).

Move the logic for procedure pointer argument "cooking" in its own
helper to avoid triggering the logic that created an absent argument in
this case.
---
 flang/lib/Lower/ConvertCall.cpp              | 68 ++++++++++++--------
 flang/test/Lower/HLFIR/procedure-pointer.f90 | 20 ++++++
 2 files changed, 61 insertions(+), 27 deletions(-)

diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index 1d5ebeb1b3620..bb8fd2e945f43 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -912,37 +912,16 @@ static PreparedDummyArgument preparePresentUserCallActualArgument(
   // element if this is an array in an elemental call.
   hlfir::Entity actual = preparedActual.getActual(loc, builder);
 
-  // Handle the procedure pointer actual arguments.
-  if (actual.isProcedurePointer()) {
-    // Procedure pointer actual to procedure pointer dummy.
-    if (fir::isBoxProcAddressType(dummyType))
-      return PreparedDummyArgument{actual, /*cleanups=*/{}};
+  // Handle procedure arguments (procedure pointers should go through
+  // prepareProcedurePointerActualArgument).
+  if (hlfir::isFortranProcedureValue(dummyType)) {
     // Procedure pointer actual to procedure dummy.
-    if (hlfir::isFortranProcedureValue(dummyType)) {
+    if (actual.isProcedurePointer()) {
       actual = hlfir::derefPointersAndAllocatables(loc, builder, actual);
       return PreparedDummyArgument{actual, /*cleanups=*/{}};
     }
-  }
-
-  // NULL() actual to procedure pointer dummy
-  if (Fortran::evaluate::IsNullProcedurePointer(expr) &&
-      fir::isBoxProcAddressType(dummyType)) {
-    auto boxTy{Fortran::lower::getUntypedBoxProcType(builder.getContext())};
-    auto tempBoxProc{builder.createTemporary(loc, boxTy)};
-    hlfir::Entity nullBoxProc(
-        fir::factory::createNullBoxProc(builder, loc, boxTy));
-    builder.create<fir::StoreOp>(loc, nullBoxProc, tempBoxProc);
-    return PreparedDummyArgument{tempBoxProc, /*cleanups=*/{}};
-  }
-
-  if (actual.isProcedure()) {
-    // Procedure actual to procedure pointer dummy.
-    if (fir::isBoxProcAddressType(dummyType)) {
-      auto tempBoxProc{builder.createTemporary(loc, actual.getType())};
-      builder.create<fir::StoreOp>(loc, actual, tempBoxProc);
-      return PreparedDummyArgument{tempBoxProc, /*cleanups=*/{}};
-    }
     // Procedure actual to procedure dummy.
+    assert(actual.isProcedure());
     // Do nothing if this is a procedure argument. It is already a
     // fir.boxproc/fir.tuple<fir.boxproc, len> as it should.
     if (actual.getType() != dummyType)
@@ -1219,6 +1198,34 @@ static PreparedDummyArgument prepareUserCallActualArgument(
   return result;
 }
 
+/// Prepare actual argument for a procedure pointer dummy.
+static PreparedDummyArgument prepareProcedurePointerActualArgument(
+    mlir::Location loc, fir::FirOpBuilder &builder,
+    const Fortran::lower::PreparedActualArgument &preparedActual,
+    mlir::Type dummyType,
+    const Fortran::lower::CallerInterface::PassedEntity &arg,
+    const Fortran::lower::SomeExpr &expr, CallContext &callContext) {
+
+  // NULL() actual to procedure pointer dummy
+  if (Fortran::evaluate::UnwrapExpr<Fortran::evaluate::NullPointer>(expr) &&
+      fir::isBoxProcAddressType(dummyType)) {
+    auto boxTy{Fortran::lower::getUntypedBoxProcType(builder.getContext())};
+    auto tempBoxProc{builder.createTemporary(loc, boxTy)};
+    hlfir::Entity nullBoxProc(
+        fir::factory::createNullBoxProc(builder, loc, boxTy));
+    builder.create<fir::StoreOp>(loc, nullBoxProc, tempBoxProc);
+    return PreparedDummyArgument{tempBoxProc, /*cleanups=*/{}};
+  }
+  hlfir::Entity actual = preparedActual.getActual(loc, builder);
+  if (actual.isProcedurePointer())
+    return PreparedDummyArgument{actual, /*cleanups=*/{}};
+  assert(actual.isProcedure());
+  // Procedure actual to procedure pointer dummy.
+  auto tempBoxProc{builder.createTemporary(loc, actual.getType())};
+  builder.create<fir::StoreOp>(loc, actual, tempBoxProc);
+  return PreparedDummyArgument{tempBoxProc, /*cleanups=*/{}};
+}
+
 /// Lower calls to user procedures with actual arguments that have been
 /// pre-lowered but not yet prepared according to the interface.
 /// This can be called for elemental procedures, but only with scalar
@@ -1284,7 +1291,6 @@ genUserCall(Fortran::lower::PreparedActualArguments &loweredActuals,
     case PassBy::CharBoxValueAttribute:
     case PassBy::Box:
     case PassBy::BaseAddress:
-    case PassBy::BoxProcRef:
     case PassBy::BoxChar: {
       PreparedDummyArgument preparedDummy = prepareUserCallActualArgument(
           loc, builder, *preparedActual, argTy, arg, *expr, callContext);
@@ -1292,6 +1298,14 @@ genUserCall(Fortran::lower::PreparedActualArguments &loweredActuals,
                           preparedDummy.cleanups.rend());
       caller.placeInput(arg, preparedDummy.dummy);
     } break;
+    case PassBy::BoxProcRef: {
+      PreparedDummyArgument preparedDummy =
+          prepareProcedurePointerActualArgument(loc, builder, *preparedActual,
+                                                argTy, arg, *expr, callContext);
+      callCleanUps.append(preparedDummy.cleanups.rbegin(),
+                          preparedDummy.cleanups.rend());
+      caller.placeInput(arg, preparedDummy.dummy);
+    } break;
     case PassBy::AddressAndLength:
       // PassBy::AddressAndLength is only used for character results. Results
       // are not handled here.
diff --git a/flang/test/Lower/HLFIR/procedure-pointer.f90 b/flang/test/Lower/HLFIR/procedure-pointer.f90
index ba423db150841..28965b22de971 100644
--- a/flang/test/Lower/HLFIR/procedure-pointer.f90
+++ b/flang/test/Lower/HLFIR/procedure-pointer.f90
@@ -340,6 +340,26 @@ subroutine sub12()
 ! CHECK: fir.call @_QPfoo2(%[[VAL_17]]) fastmath<contract> : (!fir.ref<!fir.boxproc<() -> ()>>) -> ()
 end 
 
+subroutine test_opt_pointer()
+  interface
+    subroutine takes_opt_proc_ptr(p)
+      procedure(), pointer, optional :: p
+    end subroutine
+  end interface
+  call takes_opt_proc_ptr(NULL())
+  call takes_opt_proc_ptr()
+end subroutine
+! CHECK-LABEL:   func.func @_QPtest_opt_pointer() {
+! CHECK:    %[[VAL_0:.*]] = fir.alloca !fir.boxproc<() -> ()>
+! CHECK:    %[[VAL_1:.*]] = fir.zero_bits () -> ()
+! CHECK:    %[[VAL_2:.*]] = fir.emboxproc %[[VAL_1]] : (() -> ()) -> !fir.boxproc<() -> ()>
+! CHECK:    fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.boxproc<() -> ()>>
+! CHECK:    fir.call @_QPtakes_opt_proc_ptr(%[[VAL_0]]) fastmath<contract> : (!fir.ref<!fir.boxproc<() -> ()>>) -> ()
+! CHECK:    %[[VAL_3:.*]] = fir.absent !fir.ref<!fir.boxproc<() -> ()>>
+! CHECK:    fir.call @_QPtakes_opt_proc_ptr(%[[VAL_3]]) fastmath<contract> : (!fir.ref<!fir.boxproc<() -> ()>>) -> ()
+
+
+
 ! CHECK-LABEL: fir.global internal @_QFsub1Ep2 : !fir.boxproc<(!fir.ref<f32>) -> f32> {
 ! CHECK: %[[VAL_0:.*]] = fir.zero_bits (!fir.ref<f32>) -> f32
 ! CHECK: %[[VAL_1:.*]] = fir.emboxproc %[[VAL_0]] : ((!fir.ref<f32>) -> f32) -> !fir.boxproc<(!fir.ref<f32>) -> f32>

>From 3c3afc8c372be7973d26816f3389dae9780ca3b5 Mon Sep 17 00:00:00 2001
From: Kevin Frei <kevinfrei at users.noreply.github.com>
Date: Thu, 1 Feb 2024 08:47:11 -0800
Subject: [PATCH 31/42] Aggregate errors from llvm-dwarfdump --verify (#79648)

The amount and format of output from `llvm-dwarfdump --verify` makes it
quite difficult to know if a change to a tool that produces or modifies
DWARF is causing new problems, or is fixing existing problems. This diff
adds a categorized summary of issues found by the DWARF verifier, on by
default, at the bottom of the error output.

The change includes a new `--error-display` option with 4 settings:

* `--error-display=quiet`: Only display if errors occurred, but no
details or summary are printed.
* `--error-display=summary`: Only display the aggregated summary of
errors with no error detail.
* `--error-display=details`: Only display the detailed error messages
with no summary (previous behavior)
* `--error-display=full`: Display both the detailed error messages and
the aggregated summary of errors (the default)

I changed a handful of tests that were failing due to new output, adding
the flag to use the old behavior for all but a couple. For those two I
added the new aggregated output to the expected output of the test.

The `OutputCategoryAggregator` is a pretty simple little class that
@clayborg suggested to allow code to only be run to dump detail if it's
enabled, while still collating counts of the category. Knowing that the
lambda passed in is only conditionally executed is pretty important
(handling errors has to be done *outside* the lambda). I'm happy to move
this somewhere else (and change/improve it) to be more broadly useful if
folks would like.

---------

Co-authored-by: Kevin Frei <freik at meta.com>
---
 llvm/include/llvm/DebugInfo/DIContext.h       |   1 +
 .../llvm/DebugInfo/DWARF/DWARFVerifier.h      |  18 +
 llvm/lib/DebugInfo/DWARF/DWARFContext.cpp     |   1 +
 llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp    | 770 +++++++++++-------
 .../test/DebugInfo/X86/skeleton-unit-verify.s |   4 +-
 llvm/test/DebugInfo/dwarfdump-accel.test      |   2 +-
 .../X86/verify_attr_file_indexes.yaml         |   2 +-
 .../verify_attr_file_indexes_no_files.yaml    |   2 +-
 .../X86/verify_file_encoding.yaml             |   6 +-
 .../X86/verify_overlapping_cu_ranges.yaml     |   2 +-
 .../X86/verify_parent_zero_length.yaml        |   2 +-
 .../llvm-dwarfdump/X86/verify_split_cu.s      |   7 +-
 llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp  |  26 +-
 13 files changed, 560 insertions(+), 283 deletions(-)

diff --git a/llvm/include/llvm/DebugInfo/DIContext.h b/llvm/include/llvm/DebugInfo/DIContext.h
index 78ac34e5f0d26..288ddf77bdfda 100644
--- a/llvm/include/llvm/DebugInfo/DIContext.h
+++ b/llvm/include/llvm/DebugInfo/DIContext.h
@@ -205,6 +205,7 @@ struct DIDumpOptions {
   bool DisplayRawContents = false;
   bool IsEH = false;
   bool DumpNonSkeleton = false;
+  bool ShowAggregateErrors = false;
   std::function<llvm::StringRef(uint64_t DwarfRegNum, bool IsEH)>
       GetNameForDWARFReg;
 
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
index e56d3781e824f..ea73664b1e46c 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
@@ -30,6 +30,20 @@ class DWARFDebugAbbrev;
 class DataExtractor;
 struct DWARFSection;
 
+class OutputCategoryAggregator {
+private:
+  std::map<std::string, unsigned> Aggregation;
+  bool IncludeDetail;
+
+public:
+  OutputCategoryAggregator(bool includeDetail = false)
+      : IncludeDetail(includeDetail) {}
+  void ShowDetail(bool showDetail) { IncludeDetail = showDetail; }
+  size_t GetNumCategories() const { return Aggregation.size(); }
+  void Report(StringRef s, std::function<void()> detailCallback);
+  void EnumerateResults(std::function<void(StringRef, unsigned)> handleCounts);
+};
+
 /// A class that verifies DWARF debug information given a DWARF Context.
 class DWARFVerifier {
 public:
@@ -81,6 +95,7 @@ class DWARFVerifier {
   DWARFContext &DCtx;
   DIDumpOptions DumpOpts;
   uint32_t NumDebugLineErrors = 0;
+  OutputCategoryAggregator ErrorCategory;
   // Used to relax some checks that do not currently work portably
   bool IsObjectFile;
   bool IsMachOObject;
@@ -348,6 +363,9 @@ class DWARFVerifier {
   bool verifyDebugStrOffsets(
       StringRef SectionName, const DWARFSection &Section, StringRef StrData,
       void (DWARFObject::*)(function_ref<void(const DWARFSection &)>) const);
+
+  /// Emits any aggregate information collected, depending on the dump options
+  void summarize();
 };
 
 static inline bool operator<(const DWARFVerifier::DieRangeInfo &LHS,
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
index 792df53d304aa..b7297c18da7c9 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
@@ -1408,6 +1408,7 @@ bool DWARFContext::verify(raw_ostream &OS, DIDumpOptions DumpOpts) {
   if (DumpOpts.DumpType & DIDT_DebugStrOffsets)
     Success &= verifier.handleDebugStrOffsets();
   Success &= verifier.handleAccelTables();
+  verifier.summarize();
   return Success;
 }
 
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index c4c14f5e2c9d3..2124ff835c572 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -167,20 +167,48 @@ bool DWARFVerifier::verifyUnitHeader(const DWARFDataExtractor DebugInfoData,
   if (!ValidLength || !ValidVersion || !ValidAddrSize || !ValidAbbrevOffset ||
       !ValidType) {
     Success = false;
-    error() << format("Units[%d] - start offset: 0x%08" PRIx64 " \n", UnitIndex,
-                      OffsetStart);
+    bool HeaderShown = false;
+    auto ShowHeaderOnce = [&]() {
+      if (!HeaderShown) {
+        error() << format("Units[%d] - start offset: 0x%08" PRIx64 " \n",
+                          UnitIndex, OffsetStart);
+        HeaderShown = true;
+      }
+    };
     if (!ValidLength)
-      note() << "The length for this unit is too "
-                "large for the .debug_info provided.\n";
+      ErrorCategory.Report(
+          "Unit Header Length: Unit too large for .debug_info provided", [&]() {
+            ShowHeaderOnce();
+            note() << "The length for this unit is too "
+                      "large for the .debug_info provided.\n";
+          });
     if (!ValidVersion)
-      note() << "The 16 bit unit header version is not valid.\n";
+      ErrorCategory.Report(
+          "Unit Header Length: 16 bit unit header version is not valid", [&]() {
+            ShowHeaderOnce();
+            note() << "The 16 bit unit header version is not valid.\n";
+          });
     if (!ValidType)
-      note() << "The unit type encoding is not valid.\n";
+      ErrorCategory.Report(
+          "Unit Header Length: Unit type encoding is not valid", [&]() {
+            ShowHeaderOnce();
+            note() << "The unit type encoding is not valid.\n";
+          });
     if (!ValidAbbrevOffset)
-      note() << "The offset into the .debug_abbrev section is "
-                "not valid.\n";
+      ErrorCategory.Report(
+          "Unit Header Length: Offset into the .debug_abbrev section is not "
+          "valid",
+          [&]() {
+            ShowHeaderOnce();
+            note() << "The offset into the .debug_abbrev section is "
+                      "not valid.\n";
+          });
     if (!ValidAddrSize)
-      note() << "The address size is unsupported.\n";
+      ErrorCategory.Report("Unit Header Length: Address size is unsupported",
+                           [&]() {
+                             ShowHeaderOnce();
+                             note() << "The address size is unsupported.\n";
+                           });
   }
   *Offset = OffsetStart + Length + (isUnitDWARF64 ? 12 : 4);
   return Success;
@@ -198,12 +226,16 @@ bool DWARFVerifier::verifyName(const DWARFDie &Die) {
   if (OriginalFullName.empty() || OriginalFullName == ReconstructedName)
     return false;
 
-  error() << "Simplified template DW_AT_name could not be reconstituted:\n"
-          << formatv("         original: {0}\n"
-                     "    reconstituted: {1}\n",
-                     OriginalFullName, ReconstructedName);
-  dump(Die) << '\n';
-  dump(Die.getDwarfUnit()->getUnitDIE()) << '\n';
+  ErrorCategory.Report(
+      "Simplified template DW_AT_name could not be reconstituted", [&]() {
+        error()
+            << "Simplified template DW_AT_name could not be reconstituted:\n"
+            << formatv("         original: {0}\n"
+                       "    reconstituted: {1}\n",
+                       OriginalFullName, ReconstructedName);
+        dump(Die) << '\n';
+        dump(Die.getDwarfUnit()->getUnitDIE()) << '\n';
+      });
   return true;
 }
 
@@ -240,22 +272,28 @@ unsigned DWARFVerifier::verifyUnitContents(DWARFUnit &Unit,
 
   DWARFDie Die = Unit.getUnitDIE(/* ExtractUnitDIEOnly = */ false);
   if (!Die) {
-    error() << "Compilation unit without DIE.\n";
+    ErrorCategory.Report("Compilation unit missing DIE", [&]() {
+      error() << "Compilation unit without DIE.\n";
+    });
     NumUnitErrors++;
     return NumUnitErrors;
   }
 
   if (!dwarf::isUnitType(Die.getTag())) {
-    error() << "Compilation unit root DIE is not a unit DIE: "
-            << dwarf::TagString(Die.getTag()) << ".\n";
+    ErrorCategory.Report("Compilation unit root DIE is not a unit DIE", [&]() {
+      error() << "Compilation unit root DIE is not a unit DIE: "
+              << dwarf::TagString(Die.getTag()) << ".\n";
+    });
     NumUnitErrors++;
   }
 
   uint8_t UnitType = Unit.getUnitType();
   if (!DWARFUnit::isMatchingUnitTypeAndTag(UnitType, Die.getTag())) {
-    error() << "Compilation unit type (" << dwarf::UnitTypeString(UnitType)
-            << ") and root DIE (" << dwarf::TagString(Die.getTag())
-            << ") do not match.\n";
+    ErrorCategory.Report("Mismatched unit type", [&]() {
+      error() << "Compilation unit type (" << dwarf::UnitTypeString(UnitType)
+              << ") and root DIE (" << dwarf::TagString(Die.getTag())
+              << ") do not match.\n";
+    });
     NumUnitErrors++;
   }
 
@@ -263,7 +301,9 @@ unsigned DWARFVerifier::verifyUnitContents(DWARFUnit &Unit,
   //  3.1.2 Skeleton Compilation Unit Entries:
   //  "A skeleton compilation unit has no children."
   if (Die.getTag() == dwarf::DW_TAG_skeleton_unit && Die.hasChildren()) {
-    error() << "Skeleton compilation unit has children.\n";
+    ErrorCategory.Report("Skeleton CU has children", [&]() {
+      error() << "Skeleton compilation unit has children.\n";
+    });
     NumUnitErrors++;
   }
 
@@ -280,15 +320,21 @@ unsigned DWARFVerifier::verifyDebugInfoCallSite(const DWARFDie &Die) {
   DWARFDie Curr = Die.getParent();
   for (; Curr.isValid() && !Curr.isSubprogramDIE(); Curr = Die.getParent()) {
     if (Curr.getTag() == DW_TAG_inlined_subroutine) {
-      error() << "Call site entry nested within inlined subroutine:";
-      Curr.dump(OS);
+      ErrorCategory.Report(
+          "Call site nested entry within inlined subroutine", [&]() {
+            error() << "Call site entry nested within inlined subroutine:";
+            Curr.dump(OS);
+          });
       return 1;
     }
   }
 
   if (!Curr.isValid()) {
-    error() << "Call site entry not nested within a valid subprogram:";
-    Die.dump(OS);
+    ErrorCategory.Report(
+        "Call site entry not nested within valid subprogram", [&]() {
+          error() << "Call site entry not nested within a valid subprogram:";
+          Die.dump(OS);
+        });
     return 1;
   }
 
@@ -297,9 +343,13 @@ unsigned DWARFVerifier::verifyDebugInfoCallSite(const DWARFDie &Die) {
        DW_AT_call_all_tail_calls, DW_AT_GNU_all_call_sites,
        DW_AT_GNU_all_source_call_sites, DW_AT_GNU_all_tail_call_sites});
   if (!CallAttr) {
-    error() << "Subprogram with call site entry has no DW_AT_call attribute:";
-    Curr.dump(OS);
-    Die.dump(OS, /*indent*/ 1);
+    ErrorCategory.Report(
+        "Subprogram with call site entry has no DW_AT_call attribute", [&]() {
+          error()
+              << "Subprogram with call site entry has no DW_AT_call attribute:";
+          Curr.dump(OS);
+          Die.dump(OS, /*indent*/ 1);
+        });
     return 1;
   }
 
@@ -313,7 +363,9 @@ unsigned DWARFVerifier::verifyAbbrevSection(const DWARFDebugAbbrev *Abbrev) {
   Expected<const DWARFAbbreviationDeclarationSet *> AbbrDeclsOrErr =
       Abbrev->getAbbreviationDeclarationSet(0);
   if (!AbbrDeclsOrErr) {
-    error() << toString(AbbrDeclsOrErr.takeError()) << "\n";
+    std::string ErrMsg = toString(AbbrDeclsOrErr.takeError());
+    ErrorCategory.Report("Abbreviation Declaration error",
+                         [&]() { error() << ErrMsg << "\n"; });
     return 1;
   }
 
@@ -324,9 +376,12 @@ unsigned DWARFVerifier::verifyAbbrevSection(const DWARFDebugAbbrev *Abbrev) {
     for (auto Attribute : AbbrDecl.attributes()) {
       auto Result = AttributeSet.insert(Attribute.Attr);
       if (!Result.second) {
-        error() << "Abbreviation declaration contains multiple "
-                << AttributeString(Attribute.Attr) << " attributes.\n";
-        AbbrDecl.dump(OS);
+        ErrorCategory.Report(
+            "Abbreviation declartion contains multiple attributes", [&]() {
+              error() << "Abbreviation declaration contains multiple "
+                      << AttributeString(Attribute.Attr) << " attributes.\n";
+              AbbrDecl.dump(OS);
+            });
         ++NumErrors;
       }
     }
@@ -440,10 +495,15 @@ unsigned DWARFVerifier::verifyIndex(StringRef Name,
       auto &M = *Sections[Col];
       auto I = M.find(SC.getOffset());
       if (I != M.end() && I.start() < (SC.getOffset() + SC.getLength())) {
-        error() << llvm::formatv(
-            "overlapping index entries for entries {0:x16} "
-            "and {1:x16} for column {2}\n",
-            *I, Sig, toString(Index.getColumnKinds()[Col]));
+        StringRef Category = InfoColumnKind == DWARFSectionKind::DW_SECT_INFO
+                                 ? "Overlapping CU index entries"
+                                 : "Overlapping TU index entries";
+        ErrorCategory.Report(Category, [&]() {
+          error() << llvm::formatv(
+              "overlapping index entries for entries {0:x16} "
+              "and {1:x16} for column {2}\n",
+              *I, Sig, toString(Index.getColumnKinds()[Col]));
+        });
         return 1;
       }
       M.insert(SC.getOffset(), SC.getOffset() + SC.getLength() - 1, Sig);
@@ -532,8 +592,10 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
     for (const auto &Range : Ranges) {
       if (!Range.valid()) {
         ++NumErrors;
-        error() << "Invalid address range " << Range << "\n";
-        DumpDieAfterError = true;
+        ErrorCategory.Report("Invalid address range", [&]() {
+          error() << "Invalid address range " << Range << "\n";
+          DumpDieAfterError = true;
+        });
         continue;
       }
 
@@ -545,9 +607,11 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
       // address: 0 or -1.
       if (auto PrevRange = RI.insert(Range)) {
         ++NumErrors;
-        error() << "DIE has overlapping ranges in DW_AT_ranges attribute: "
-                << *PrevRange << " and " << Range << '\n';
-        DumpDieAfterError = true;
+        ErrorCategory.Report("DIE has overlapping DW_AT_ranges", [&]() {
+          error() << "DIE has overlapping ranges in DW_AT_ranges attribute: "
+                  << *PrevRange << " and " << Range << '\n';
+          DumpDieAfterError = true;
+        });
       }
     }
     if (DumpDieAfterError)
@@ -558,9 +622,11 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
   const auto IntersectingChild = ParentRI.insert(RI);
   if (IntersectingChild != ParentRI.Children.end()) {
     ++NumErrors;
-    error() << "DIEs have overlapping address ranges:";
-    dump(Die);
-    dump(IntersectingChild->Die) << '\n';
+    ErrorCategory.Report("DIEs have overlapping address ranges", [&]() {
+      error() << "DIEs have overlapping address ranges:";
+      dump(Die);
+      dump(IntersectingChild->Die) << '\n';
+    });
   }
 
   // Verify that ranges are contained within their parent.
@@ -569,9 +635,13 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
                              ParentRI.Die.getTag() == DW_TAG_subprogram);
   if (ShouldBeContained && !ParentRI.contains(RI)) {
     ++NumErrors;
-    error() << "DIE address ranges are not contained in its parent's ranges:";
-    dump(ParentRI.Die);
-    dump(Die, 2) << '\n';
+    ErrorCategory.Report(
+        "DIE address ranges are not contained by parent ranges", [&]() {
+          error()
+              << "DIE address ranges are not contained in its parent's ranges:";
+          dump(ParentRI.Die);
+          dump(Die, 2) << '\n';
+        });
   }
 
   // Recursively check children.
@@ -584,10 +654,12 @@ unsigned DWARFVerifier::verifyDieRanges(const DWARFDie &Die,
 unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
                                                  DWARFAttribute &AttrValue) {
   unsigned NumErrors = 0;
-  auto ReportError = [&](const Twine &TitleMsg) {
+  auto ReportError = [&](StringRef category, const Twine &TitleMsg) {
     ++NumErrors;
-    error() << TitleMsg << '\n';
-    dump(Die) << '\n';
+    ErrorCategory.Report(category, [&]() {
+      error() << TitleMsg << '\n';
+      dump(Die) << '\n';
+    });
   };
 
   const DWARFObject &DObj = DCtx.getDWARFObj();
@@ -604,23 +676,27 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
       if (U->isDWOUnit() && RangeSection.Data.empty())
         break;
       if (*SectionOffset >= RangeSection.Data.size())
-        ReportError(
-            "DW_AT_ranges offset is beyond " +
-            StringRef(DwarfVersion < 5 ? ".debug_ranges" : ".debug_rnglists") +
-            " bounds: " + llvm::formatv("{0:x8}", *SectionOffset));
+        ReportError("DW_AT_ranges offset out of bounds",
+                    "DW_AT_ranges offset is beyond " +
+                        StringRef(DwarfVersion < 5 ? ".debug_ranges"
+                                                   : ".debug_rnglists") +
+                        " bounds: " + llvm::formatv("{0:x8}", *SectionOffset));
       break;
     }
-    ReportError("DIE has invalid DW_AT_ranges encoding:");
+    ReportError("Invalid DW_AT_ranges encoding",
+                "DIE has invalid DW_AT_ranges encoding:");
     break;
   case DW_AT_stmt_list:
     // Make sure the offset in the DW_AT_stmt_list attribute is valid.
     if (auto SectionOffset = AttrValue.Value.getAsSectionOffset()) {
       if (*SectionOffset >= U->getLineSection().Data.size())
-        ReportError("DW_AT_stmt_list offset is beyond .debug_line bounds: " +
-                    llvm::formatv("{0:x8}", *SectionOffset));
+        ReportError("DW_AT_stmt_list offset out of bounds",
+                    "DW_AT_stmt_list offset is beyond .debug_line bounds: " +
+                        llvm::formatv("{0:x8}", *SectionOffset));
       break;
     }
-    ReportError("DIE has invalid DW_AT_stmt_list encoding:");
+    ReportError("Invalid DW_AT_stmt_list encoding",
+                "DIE has invalid DW_AT_stmt_list encoding:");
     break;
   case DW_AT_location: {
     // FIXME: It might be nice if there's a way to walk location expressions
@@ -644,14 +720,15 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
               return Op.isError();
             });
         if (Error || !Expression.verify(U))
-          ReportError("DIE contains invalid DWARF expression:");
+          ReportError("Invalid DWARF expressions",
+                      "DIE contains invalid DWARF expression:");
       }
     } else if (Error Err = handleErrors(
                    Loc.takeError(), [&](std::unique_ptr<ResolverError> E) {
                      return U->isDWOUnit() ? Error::success()
                                            : Error(std::move(E));
                    }))
-      ReportError(toString(std::move(Err)));
+      ReportError("Invalid DW_AT_location", toString(std::move(Err)));
     break;
   }
   case DW_AT_specification:
@@ -668,19 +745,21 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
       // This might be reference to a function declaration.
       if (DieTag == DW_TAG_GNU_call_site && RefTag == DW_TAG_subprogram)
         break;
-      ReportError("DIE with tag " + TagString(DieTag) + " has " +
-                  AttributeString(Attr) +
-                  " that points to DIE with "
-                  "incompatible tag " +
-                  TagString(RefTag));
+      ReportError("Incompatible DW_AT_abstract_origin tag reference",
+                  "DIE with tag " + TagString(DieTag) + " has " +
+                      AttributeString(Attr) +
+                      " that points to DIE with "
+                      "incompatible tag " +
+                      TagString(RefTag));
     }
     break;
   }
   case DW_AT_type: {
     DWARFDie TypeDie = Die.getAttributeValueAsReferencedDie(DW_AT_type);
     if (TypeDie && !isType(TypeDie.getTag())) {
-      ReportError("DIE has " + AttributeString(Attr) +
-                  " with incompatible tag " + TagString(TypeDie.getTag()));
+      ReportError("Incompatible DW_AT_type attribute tag",
+                  "DIE has " + AttributeString(Attr) +
+                      " with incompatible tag " + TagString(TypeDie.getTag()));
     }
     break;
   }
@@ -695,35 +774,43 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
           bool IsZeroIndexed = LT->Prologue.getVersion() >= 5;
           if (std::optional<uint64_t> LastFileIdx =
                   LT->getLastValidFileIndex()) {
-            ReportError("DIE has " + AttributeString(Attr) +
-                        " with an invalid file index " +
-                        llvm::formatv("{0}", *FileIdx) +
-                        " (valid values are [" + (IsZeroIndexed ? "0-" : "1-") +
-                        llvm::formatv("{0}", *LastFileIdx) + "])");
+            ReportError("Invalid file index in DW_AT_decl_file",
+                        "DIE has " + AttributeString(Attr) +
+                            " with an invalid file index " +
+                            llvm::formatv("{0}", *FileIdx) +
+                            " (valid values are [" +
+                            (IsZeroIndexed ? "0-" : "1-") +
+                            llvm::formatv("{0}", *LastFileIdx) + "])");
           } else {
-            ReportError("DIE has " + AttributeString(Attr) +
-                        " with an invalid file index " +
-                        llvm::formatv("{0}", *FileIdx) +
-                        " (the file table in the prologue is empty)");
+            ReportError("Invalid file index in DW_AT_decl_file",
+                        "DIE has " + AttributeString(Attr) +
+                            " with an invalid file index " +
+                            llvm::formatv("{0}", *FileIdx) +
+                            " (the file table in the prologue is empty)");
           }
         }
       } else {
-        ReportError("DIE has " + AttributeString(Attr) +
-                    " that references a file with index " +
-                    llvm::formatv("{0}", *FileIdx) +
-                    " and the compile unit has no line table");
+        ReportError(
+            "File index in DW_AT_decl_file reference CU with no line table",
+            "DIE has " + AttributeString(Attr) +
+                " that references a file with index " +
+                llvm::formatv("{0}", *FileIdx) +
+                " and the compile unit has no line table");
       }
     } else {
-      ReportError("DIE has " + AttributeString(Attr) +
-                  " with invalid encoding");
+      ReportError("Invalid encoding in DW_AT_decl_file",
+                  "DIE has " + AttributeString(Attr) +
+                      " with invalid encoding");
     }
     break;
   }
   case DW_AT_call_line:
   case DW_AT_decl_line: {
     if (!AttrValue.Value.getAsUnsignedConstant()) {
-      ReportError("DIE has " + AttributeString(Attr) +
-                  " with invalid encoding");
+      ReportError(
+          Attr == DW_AT_call_line ? "Invalid file index in DW_AT_decl_line"
+                                  : "Invalid file index in DW_AT_call_line",
+          "DIE has " + AttributeString(Attr) + " with invalid encoding");
     }
     break;
   }
@@ -754,12 +841,14 @@ unsigned DWARFVerifier::verifyDebugInfoForm(const DWARFDie &Die,
       auto CUOffset = AttrValue.Value.getRawUValue();
       if (CUOffset >= CUSize) {
         ++NumErrors;
-        error() << FormEncodingString(Form) << " CU offset "
-                << format("0x%08" PRIx64, CUOffset)
-                << " is invalid (must be less than CU size of "
-                << format("0x%08" PRIx64, CUSize) << "):\n";
-        Die.dump(OS, 0, DumpOpts);
-        dump(Die) << '\n';
+        ErrorCategory.Report("Invalid CU offset", [&]() {
+          error() << FormEncodingString(Form) << " CU offset "
+                  << format("0x%08" PRIx64, CUOffset)
+                  << " is invalid (must be less than CU size of "
+                  << format("0x%08" PRIx64, CUSize) << "):\n";
+          Die.dump(OS, 0, DumpOpts);
+          dump(Die) << '\n';
+        });
       } else {
         // Valid reference, but we will verify it points to an actual
         // DIE later.
@@ -776,9 +865,11 @@ unsigned DWARFVerifier::verifyDebugInfoForm(const DWARFDie &Die,
     if (RefVal) {
       if (*RefVal >= DieCU->getInfoSection().Data.size()) {
         ++NumErrors;
-        error() << "DW_FORM_ref_addr offset beyond .debug_info "
-                   "bounds:\n";
-        dump(Die) << '\n';
+        ErrorCategory.Report("DW_FORM_ref_addr offset out of bounds", [&]() {
+          error() << "DW_FORM_ref_addr offset beyond .debug_info "
+                     "bounds:\n";
+          dump(Die) << '\n';
+        });
       } else {
         // Valid reference, but we will verify it points to an actual
         // DIE later.
@@ -796,8 +887,11 @@ unsigned DWARFVerifier::verifyDebugInfoForm(const DWARFDie &Die,
   case DW_FORM_line_strp: {
     if (Error E = AttrValue.Value.getAsCString().takeError()) {
       ++NumErrors;
-      error() << toString(std::move(E)) << ":\n";
-      dump(Die) << '\n';
+      std::string ErrMsg = toString(std::move(E));
+      ErrorCategory.Report("Invalid DW_FORM attribute", [&]() {
+        error() << ErrMsg << ":\n";
+        dump(Die) << '\n';
+      });
     }
     break;
   }
@@ -821,11 +915,13 @@ unsigned DWARFVerifier::verifyDebugInfoReferences(
     if (GetDIEForOffset(Pair.first))
       continue;
     ++NumErrors;
-    error() << "invalid DIE reference " << format("0x%08" PRIx64, Pair.first)
-            << ". Offset is in between DIEs:\n";
-    for (auto Offset : Pair.second)
-      dump(GetDIEForOffset(Offset)) << '\n';
-    OS << "\n";
+    ErrorCategory.Report("Invalid DIE reference", [&]() {
+      error() << "invalid DIE reference " << format("0x%08" PRIx64, Pair.first)
+              << ". Offset is in between DIEs:\n";
+      for (auto Offset : Pair.second)
+        dump(GetDIEForOffset(Offset)) << '\n';
+      OS << "\n";
+    });
   }
   return NumErrors;
 }
@@ -845,9 +941,11 @@ void DWARFVerifier::verifyDebugLineStmtOffsets() {
     if (LineTableOffset < DCtx.getDWARFObj().getLineSection().Data.size()) {
       if (!LineTable) {
         ++NumDebugLineErrors;
-        error() << ".debug_line[" << format("0x%08" PRIx64, LineTableOffset)
-                << "] was not able to be parsed for CU:\n";
-        dump(Die) << '\n';
+        ErrorCategory.Report("Unparsable .debug_line entry", [&]() {
+          error() << ".debug_line[" << format("0x%08" PRIx64, LineTableOffset)
+                  << "] was not able to be parsed for CU:\n";
+          dump(Die) << '\n';
+        });
         continue;
       }
     } else {
@@ -860,12 +958,14 @@ void DWARFVerifier::verifyDebugLineStmtOffsets() {
     auto Iter = StmtListToDie.find(LineTableOffset);
     if (Iter != StmtListToDie.end()) {
       ++NumDebugLineErrors;
-      error() << "two compile unit DIEs, "
-              << format("0x%08" PRIx64, Iter->second.getOffset()) << " and "
-              << format("0x%08" PRIx64, Die.getOffset())
-              << ", have the same DW_AT_stmt_list section offset:\n";
-      dump(Iter->second);
-      dump(Die) << '\n';
+      ErrorCategory.Report("Identical DW_AT_stmt_list section offset", [&]() {
+        error() << "two compile unit DIEs, "
+                << format("0x%08" PRIx64, Iter->second.getOffset()) << " and "
+                << format("0x%08" PRIx64, Die.getOffset())
+                << ", have the same DW_AT_stmt_list section offset:\n";
+        dump(Iter->second);
+        dump(Die) << '\n';
+      });
       // Already verified this line table before, no need to do it again.
       continue;
     }
@@ -892,12 +992,16 @@ void DWARFVerifier::verifyDebugLineRows() {
       // Verify directory index.
       if (FileName.DirIdx > MaxDirIndex) {
         ++NumDebugLineErrors;
-        error() << ".debug_line["
-                << format("0x%08" PRIx64,
-                          *toSectionOffset(Die.find(DW_AT_stmt_list)))
-                << "].prologue.file_names[" << FileIndex
-                << "].dir_idx contains an invalid index: " << FileName.DirIdx
-                << "\n";
+        ErrorCategory.Report(
+            "Invalid index in .debug_line->prologue.file_names->dir_idx",
+            [&]() {
+              error() << ".debug_line["
+                      << format("0x%08" PRIx64,
+                                *toSectionOffset(Die.find(DW_AT_stmt_list)))
+                      << "].prologue.file_names[" << FileIndex
+                      << "].dir_idx contains an invalid index: "
+                      << FileName.DirIdx << "\n";
+            });
       }
 
       // Check file paths for duplicates.
@@ -910,7 +1014,7 @@ void DWARFVerifier::verifyDebugLineRows() {
       auto It = FullPathMap.find(FullPath);
       if (It == FullPathMap.end())
         FullPathMap[FullPath] = FileIndex;
-      else if (It->second != FileIndex) {
+      else if (It->second != FileIndex && DumpOpts.Verbose) {
         warn() << ".debug_line["
                << format("0x%08" PRIx64,
                          *toSectionOffset(Die.find(DW_AT_stmt_list)))
@@ -928,17 +1032,20 @@ void DWARFVerifier::verifyDebugLineRows() {
       // Verify row address.
       if (Row.Address.Address < PrevAddress) {
         ++NumDebugLineErrors;
-        error() << ".debug_line["
-                << format("0x%08" PRIx64,
-                          *toSectionOffset(Die.find(DW_AT_stmt_list)))
-                << "] row[" << RowIndex
-                << "] decreases in address from previous row:\n";
-
-        DWARFDebugLine::Row::dumpTableHeader(OS, 0);
-        if (RowIndex > 0)
-          LineTable->Rows[RowIndex - 1].dump(OS);
-        Row.dump(OS);
-        OS << '\n';
+        ErrorCategory.Report(
+            "decreasing address between debug_line rows", [&]() {
+              error() << ".debug_line["
+                      << format("0x%08" PRIx64,
+                                *toSectionOffset(Die.find(DW_AT_stmt_list)))
+                      << "] row[" << RowIndex
+                      << "] decreases in address from previous row:\n";
+
+              DWARFDebugLine::Row::dumpTableHeader(OS, 0);
+              if (RowIndex > 0)
+                LineTable->Rows[RowIndex - 1].dump(OS);
+              Row.dump(OS);
+              OS << '\n';
+            });
       }
 
       // If the prologue contains no file names and the line table has only one
@@ -949,16 +1056,18 @@ void DWARFVerifier::verifyDebugLineRows() {
            LineTable->Rows.size() != 1) &&
           !LineTable->hasFileAtIndex(Row.File)) {
         ++NumDebugLineErrors;
-        error() << ".debug_line["
-                << format("0x%08" PRIx64,
-                          *toSectionOffset(Die.find(DW_AT_stmt_list)))
-                << "][" << RowIndex << "] has invalid file index " << Row.File
-                << " (valid values are [" << MinFileIndex << ','
-                << LineTable->Prologue.FileNames.size()
-                << (isDWARF5 ? ")" : "]") << "):\n";
-        DWARFDebugLine::Row::dumpTableHeader(OS, 0);
-        Row.dump(OS);
-        OS << '\n';
+        ErrorCategory.Report("Invalid file index in debug_line", [&]() {
+          error() << ".debug_line["
+                  << format("0x%08" PRIx64,
+                            *toSectionOffset(Die.find(DW_AT_stmt_list)))
+                  << "][" << RowIndex << "] has invalid file index " << Row.File
+                  << " (valid values are [" << MinFileIndex << ','
+                  << LineTable->Prologue.FileNames.size()
+                  << (isDWARF5 ? ")" : "]") << "):\n";
+          DWARFDebugLine::Row::dumpTableHeader(OS, 0);
+          Row.dump(OS);
+          OS << '\n';
+        });
       }
       if (Row.EndSequence)
         PrevAddress = 0;
@@ -973,6 +1082,7 @@ DWARFVerifier::DWARFVerifier(raw_ostream &S, DWARFContext &D,
                              DIDumpOptions DumpOpts)
     : OS(S), DCtx(D), DumpOpts(std::move(DumpOpts)), IsObjectFile(false),
       IsMachOObject(false) {
+  ErrorCategory.ShowDetail(DumpOpts.Verbose || !DumpOpts.ShowAggregateErrors);
   if (const auto *F = DCtx.getDWARFObj().getFile()) {
     IsObjectFile = F->isRelocatableObject();
     IsMachOObject = F->isMachO();
@@ -999,13 +1109,17 @@ unsigned DWARFVerifier::verifyAppleAccelTable(const DWARFSection *AccelSection,
 
   // Verify that the fixed part of the header is not too short.
   if (!AccelSectionData.isValidOffset(AccelTable.getSizeHdr())) {
-    error() << "Section is too small to fit a section header.\n";
+    ErrorCategory.Report("Section is too small to fit a section header", [&]() {
+      error() << "Section is too small to fit a section header.\n";
+    });
     return 1;
   }
 
   // Verify that the section is not too short.
   if (Error E = AccelTable.extract()) {
-    error() << toString(std::move(E)) << '\n';
+    std::string Msg = toString(std::move(E));
+    ErrorCategory.Report("Section is too small to fit a section header",
+                         [&]() { error() << Msg << '\n'; });
     return 1;
   }
 
@@ -1020,18 +1134,24 @@ unsigned DWARFVerifier::verifyAppleAccelTable(const DWARFSection *AccelSection,
   for (uint32_t BucketIdx = 0; BucketIdx < NumBuckets; ++BucketIdx) {
     uint32_t HashIdx = AccelSectionData.getU32(&BucketsOffset);
     if (HashIdx >= NumHashes && HashIdx != UINT32_MAX) {
-      error() << format("Bucket[%d] has invalid hash index: %u.\n", BucketIdx,
-                        HashIdx);
+      ErrorCategory.Report("Invalid hash index", [&]() {
+        error() << format("Bucket[%d] has invalid hash index: %u.\n", BucketIdx,
+                          HashIdx);
+      });
       ++NumErrors;
     }
   }
   uint32_t NumAtoms = AccelTable.getAtomsDesc().size();
   if (NumAtoms == 0) {
-    error() << "No atoms: failed to read HashData.\n";
+    ErrorCategory.Report("No atoms", [&]() {
+      error() << "No atoms: failed to read HashData.\n";
+    });
     return 1;
   }
   if (!AccelTable.validateForms()) {
-    error() << "Unsupported form: failed to read HashData.\n";
+    ErrorCategory.Report("Unsupported form", [&]() {
+      error() << "Unsupported form: failed to read HashData.\n";
+    });
     return 1;
   }
 
@@ -1042,9 +1162,11 @@ unsigned DWARFVerifier::verifyAppleAccelTable(const DWARFSection *AccelSection,
     uint64_t HashDataOffset = AccelSectionData.getU32(&DataOffset);
     if (!AccelSectionData.isValidOffsetForDataOfSize(HashDataOffset,
                                                      sizeof(uint64_t))) {
-      error() << format("Hash[%d] has invalid HashData offset: "
-                        "0x%08" PRIx64 ".\n",
-                        HashIdx, HashDataOffset);
+      ErrorCategory.Report("Invalid HashData offset", [&]() {
+        error() << format("Hash[%d] has invalid HashData offset: "
+                          "0x%08" PRIx64 ".\n",
+                          HashIdx, HashDataOffset);
+      });
       ++NumErrors;
     }
 
@@ -1068,21 +1190,25 @@ unsigned DWARFVerifier::verifyAppleAccelTable(const DWARFSection *AccelSection,
           if (!Name)
             Name = "<NULL>";
 
-          error() << format(
-              "%s Bucket[%d] Hash[%d] = 0x%08x "
-              "Str[%u] = 0x%08" PRIx64 " DIE[%d] = 0x%08" PRIx64 " "
-              "is not a valid DIE offset for \"%s\".\n",
-              SectionName, BucketIdx, HashIdx, Hash, StringCount, StrpOffset,
-              HashDataIdx, Offset, Name);
+          ErrorCategory.Report("Invalid DIE offset", [&]() {
+            error() << format(
+                "%s Bucket[%d] Hash[%d] = 0x%08x "
+                "Str[%u] = 0x%08" PRIx64 " DIE[%d] = 0x%08" PRIx64 " "
+                "is not a valid DIE offset for \"%s\".\n",
+                SectionName, BucketIdx, HashIdx, Hash, StringCount, StrpOffset,
+                HashDataIdx, Offset, Name);
+          });
 
           ++NumErrors;
           continue;
         }
         if ((Tag != dwarf::DW_TAG_null) && (Die.getTag() != Tag)) {
-          error() << "Tag " << dwarf::TagString(Tag)
-                  << " in accelerator table does not match Tag "
-                  << dwarf::TagString(Die.getTag()) << " of DIE[" << HashDataIdx
-                  << "].\n";
+          ErrorCategory.Report("Mismatched Tag in accellerator table", [&]() {
+            error() << "Tag " << dwarf::TagString(Tag)
+                    << " in accelerator table does not match Tag "
+                    << dwarf::TagString(Die.getTag()) << " of DIE["
+                    << HashDataIdx << "].\n";
+          });
           ++NumErrors;
         }
       }
@@ -1106,8 +1232,10 @@ DWARFVerifier::verifyDebugNamesCULists(const DWARFDebugNames &AccelTable) {
   unsigned NumErrors = 0;
   for (const DWARFDebugNames::NameIndex &NI : AccelTable) {
     if (NI.getCUCount() == 0) {
-      error() << formatv("Name Index @ {0:x} does not index any CU\n",
-                         NI.getUnitOffset());
+      ErrorCategory.Report("Name Index doesn't index any CU", [&]() {
+        error() << formatv("Name Index @ {0:x} does not index any CU\n",
+                           NI.getUnitOffset());
+      });
       ++NumErrors;
       continue;
     }
@@ -1116,17 +1244,22 @@ DWARFVerifier::verifyDebugNamesCULists(const DWARFDebugNames &AccelTable) {
       auto Iter = CUMap.find(Offset);
 
       if (Iter == CUMap.end()) {
-        error() << formatv(
-            "Name Index @ {0:x} references a non-existing CU @ {1:x}\n",
-            NI.getUnitOffset(), Offset);
+        ErrorCategory.Report("Name Index references non-existing CU", [&]() {
+          error() << formatv(
+              "Name Index @ {0:x} references a non-existing CU @ {1:x}\n",
+              NI.getUnitOffset(), Offset);
+        });
         ++NumErrors;
         continue;
       }
 
       if (Iter->second != NotIndexed) {
-        error() << formatv("Name Index @ {0:x} references a CU @ {1:x}, but "
-                           "this CU is already indexed by Name Index @ {2:x}\n",
-                           NI.getUnitOffset(), Offset, Iter->second);
+        ErrorCategory.Report("Duplicate Name Index", [&]() {
+          error() << formatv(
+              "Name Index @ {0:x} references a CU @ {1:x}, but "
+              "this CU is already indexed by Name Index @ {2:x}\n",
+              NI.getUnitOffset(), Offset, Iter->second);
+        });
         continue;
       }
       Iter->second = NI.getUnitOffset();
@@ -1167,9 +1300,12 @@ DWARFVerifier::verifyNameIndexBuckets(const DWARFDebugNames::NameIndex &NI,
   for (uint32_t Bucket = 0, End = NI.getBucketCount(); Bucket < End; ++Bucket) {
     uint32_t Index = NI.getBucketArrayEntry(Bucket);
     if (Index > NI.getNameCount()) {
-      error() << formatv("Bucket {0} of Name Index @ {1:x} contains invalid "
-                         "value {2}. Valid range is [0, {3}].\n",
-                         Bucket, NI.getUnitOffset(), Index, NI.getNameCount());
+      ErrorCategory.Report("Name Index Bucket contains invalid value", [&]() {
+        error() << formatv("Bucket {0} of Name Index @ {1:x} contains invalid "
+                           "value {2}. Valid range is [0, {3}].\n",
+                           Bucket, NI.getUnitOffset(), Index,
+                           NI.getNameCount());
+      });
       ++NumErrors;
       continue;
     }
@@ -1202,9 +1338,11 @@ DWARFVerifier::verifyNameIndexBuckets(const DWARFDebugNames::NameIndex &NI,
     // will not match because we have already verified that the name's hash
     // puts it into the previous bucket.)
     if (B.Index > NextUncovered) {
-      error() << formatv("Name Index @ {0:x}: Name table entries [{1}, {2}] "
-                         "are not covered by the hash table.\n",
-                         NI.getUnitOffset(), NextUncovered, B.Index - 1);
+      ErrorCategory.Report("Name table entries uncovered by hash table", [&]() {
+        error() << formatv("Name Index @ {0:x}: Name table entries [{1}, {2}] "
+                           "are not covered by the hash table.\n",
+                           NI.getUnitOffset(), NextUncovered, B.Index - 1);
+      });
       ++NumErrors;
     }
     uint32_t Idx = B.Index;
@@ -1220,11 +1358,13 @@ DWARFVerifier::verifyNameIndexBuckets(const DWARFDebugNames::NameIndex &NI,
     // bucket as empty.
     uint32_t FirstHash = NI.getHashArrayEntry(Idx);
     if (FirstHash % NI.getBucketCount() != B.Bucket) {
-      error() << formatv(
-          "Name Index @ {0:x}: Bucket {1} is not empty but points to a "
-          "mismatched hash value {2:x} (belonging to bucket {3}).\n",
-          NI.getUnitOffset(), B.Bucket, FirstHash,
-          FirstHash % NI.getBucketCount());
+      ErrorCategory.Report("Name Index point to mismatched hash value", [&]() {
+        error() << formatv(
+            "Name Index @ {0:x}: Bucket {1} is not empty but points to a "
+            "mismatched hash value {2:x} (belonging to bucket {3}).\n",
+            NI.getUnitOffset(), B.Bucket, FirstHash,
+            FirstHash % NI.getBucketCount());
+      });
       ++NumErrors;
     }
 
@@ -1238,11 +1378,14 @@ DWARFVerifier::verifyNameIndexBuckets(const DWARFDebugNames::NameIndex &NI,
 
       const char *Str = NI.getNameTableEntry(Idx).getString();
       if (caseFoldingDjbHash(Str) != Hash) {
-        error() << formatv("Name Index @ {0:x}: String ({1}) at index {2} "
-                           "hashes to {3:x}, but "
-                           "the Name Index hash is {4:x}\n",
-                           NI.getUnitOffset(), Str, Idx,
-                           caseFoldingDjbHash(Str), Hash);
+        ErrorCategory.Report(
+            "String hash doesn't match Name Index hash", [&]() {
+              error() << formatv(
+                  "Name Index @ {0:x}: String ({1}) at index {2} "
+                  "hashes to {3:x}, but "
+                  "the Name Index hash is {4:x}\n",
+                  NI.getUnitOffset(), Str, Idx, caseFoldingDjbHash(Str), Hash);
+            });
         ++NumErrors;
       }
 
@@ -1258,19 +1401,23 @@ unsigned DWARFVerifier::verifyNameIndexAttribute(
     DWARFDebugNames::AttributeEncoding AttrEnc) {
   StringRef FormName = dwarf::FormEncodingString(AttrEnc.Form);
   if (FormName.empty()) {
-    error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x}: {2} uses an "
-                       "unknown form: {3}.\n",
-                       NI.getUnitOffset(), Abbr.Code, AttrEnc.Index,
-                       AttrEnc.Form);
+    ErrorCategory.Report("Unknown NameIndex Abbreviation", [&]() {
+      error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x}: {2} uses an "
+                         "unknown form: {3}.\n",
+                         NI.getUnitOffset(), Abbr.Code, AttrEnc.Index,
+                         AttrEnc.Form);
+    });
     return 1;
   }
 
   if (AttrEnc.Index == DW_IDX_type_hash) {
     if (AttrEnc.Form != dwarf::DW_FORM_data8) {
-      error() << formatv(
-          "NameIndex @ {0:x}: Abbreviation {1:x}: DW_IDX_type_hash "
-          "uses an unexpected form {2} (should be {3}).\n",
-          NI.getUnitOffset(), Abbr.Code, AttrEnc.Form, dwarf::DW_FORM_data8);
+      ErrorCategory.Report("Unexpected NameIndex Abbreviation", [&]() {
+        error() << formatv(
+            "NameIndex @ {0:x}: Abbreviation {1:x}: DW_IDX_type_hash "
+            "uses an unexpected form {2} (should be {3}).\n",
+            NI.getUnitOffset(), Abbr.Code, AttrEnc.Form, dwarf::DW_FORM_data8);
+      });
       return 1;
     }
     return 0;
@@ -1280,10 +1427,13 @@ unsigned DWARFVerifier::verifyNameIndexAttribute(
     constexpr static auto AllowedForms = {dwarf::Form::DW_FORM_flag_present,
                                           dwarf::Form::DW_FORM_ref4};
     if (!is_contained(AllowedForms, AttrEnc.Form)) {
-      error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x}: DW_IDX_parent "
-                         "uses an unexpected form {2} (should be "
-                         "DW_FORM_ref4 or DW_FORM_flag_present).\n",
-                         NI.getUnitOffset(), Abbr.Code, AttrEnc.Form);
+      ErrorCategory.Report("Unexpected NameIndex Abbreviation", [&]() {
+        error() << formatv(
+            "NameIndex @ {0:x}: Abbreviation {1:x}: DW_IDX_parent "
+            "uses an unexpected form {2} (should be "
+            "DW_FORM_ref4 or DW_FORM_flag_present).\n",
+            NI.getUnitOffset(), Abbr.Code, AttrEnc.Form);
+      });
       return 1;
     }
     return 0;
@@ -1315,10 +1465,12 @@ unsigned DWARFVerifier::verifyNameIndexAttribute(
   }
 
   if (!DWARFFormValue(AttrEnc.Form).isFormClass(Iter->Class)) {
-    error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x}: {2} uses an "
-                       "unexpected form {3} (expected form class {4}).\n",
-                       NI.getUnitOffset(), Abbr.Code, AttrEnc.Index,
-                       AttrEnc.Form, Iter->ClassName);
+    ErrorCategory.Report("Unexpected NameIndex Abbreviation", [&]() {
+      error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x}: {2} uses an "
+                         "unexpected form {3} (expected form class {4}).\n",
+                         NI.getUnitOffset(), Abbr.Code, AttrEnc.Index,
+                         AttrEnc.Form, Iter->ClassName);
+    });
     return 1;
   }
   return 0;
@@ -1344,9 +1496,13 @@ DWARFVerifier::verifyNameIndexAbbrevs(const DWARFDebugNames::NameIndex &NI) {
     SmallSet<unsigned, 5> Attributes;
     for (const auto &AttrEnc : Abbrev.Attributes) {
       if (!Attributes.insert(AttrEnc.Index).second) {
-        error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x} contains "
-                           "multiple {2} attributes.\n",
-                           NI.getUnitOffset(), Abbrev.Code, AttrEnc.Index);
+        ErrorCategory.Report(
+            "NameIndex Abbreviateion contains multiple attributes", [&]() {
+              error() << formatv(
+                  "NameIndex @ {0:x}: Abbreviation {1:x} contains "
+                  "multiple {2} attributes.\n",
+                  NI.getUnitOffset(), Abbrev.Code, AttrEnc.Index);
+            });
         ++NumErrors;
         continue;
       }
@@ -1354,16 +1510,20 @@ DWARFVerifier::verifyNameIndexAbbrevs(const DWARFDebugNames::NameIndex &NI) {
     }
 
     if (NI.getCUCount() > 1 && !Attributes.count(dwarf::DW_IDX_compile_unit)) {
-      error() << formatv("NameIndex @ {0:x}: Indexing multiple compile units "
-                         "and abbreviation {1:x} has no {2} attribute.\n",
-                         NI.getUnitOffset(), Abbrev.Code,
-                         dwarf::DW_IDX_compile_unit);
+      ErrorCategory.Report("Abbreviation contains no attribute", [&]() {
+        error() << formatv("NameIndex @ {0:x}: Indexing multiple compile units "
+                           "and abbreviation {1:x} has no {2} attribute.\n",
+                           NI.getUnitOffset(), Abbrev.Code,
+                           dwarf::DW_IDX_compile_unit);
+      });
       ++NumErrors;
     }
     if (!Attributes.count(dwarf::DW_IDX_die_offset)) {
-      error() << formatv(
-          "NameIndex @ {0:x}: Abbreviation {1:x} has no {2} attribute.\n",
-          NI.getUnitOffset(), Abbrev.Code, dwarf::DW_IDX_die_offset);
+      ErrorCategory.Report("Abbreviate in NameIndex missing attribute", [&]() {
+        error() << formatv(
+            "NameIndex @ {0:x}: Abbreviation {1:x} has no {2} attribute.\n",
+            NI.getUnitOffset(), Abbrev.Code, dwarf::DW_IDX_die_offset);
+      });
       ++NumErrors;
     }
   }
@@ -1417,9 +1577,11 @@ unsigned DWARFVerifier::verifyNameIndexEntries(
 
   const char *CStr = NTE.getString();
   if (!CStr) {
-    error() << formatv(
-        "Name Index @ {0:x}: Unable to get string associated with name {1}.\n",
-        NI.getUnitOffset(), NTE.getIndex());
+    ErrorCategory.Report("Unable to get string associated with name", [&]() {
+      error() << formatv("Name Index @ {0:x}: Unable to get string associated "
+                         "with name {1}.\n",
+                         NI.getUnitOffset(), NTE.getIndex());
+    });
     return 1;
   }
   StringRef Str(CStr);
@@ -1433,9 +1595,11 @@ unsigned DWARFVerifier::verifyNameIndexEntries(
                                 EntryOr = NI.getEntry(&NextEntryID)) {
     uint32_t CUIndex = *EntryOr->getCUIndex();
     if (CUIndex > NI.getCUCount()) {
-      error() << formatv("Name Index @ {0:x}: Entry @ {1:x} contains an "
-                         "invalid CU index ({2}).\n",
-                         NI.getUnitOffset(), EntryID, CUIndex);
+      ErrorCategory.Report("Name Index entry contains invalid CU index", [&]() {
+        error() << formatv("Name Index @ {0:x}: Entry @ {1:x} contains an "
+                           "invalid CU index ({2}).\n",
+                           NI.getUnitOffset(), EntryID, CUIndex);
+      });
       ++NumErrors;
       continue;
     }
@@ -1443,24 +1607,32 @@ unsigned DWARFVerifier::verifyNameIndexEntries(
     uint64_t DIEOffset = CUOffset + *EntryOr->getDIEUnitOffset();
     DWARFDie DIE = DCtx.getDIEForOffset(DIEOffset);
     if (!DIE) {
-      error() << formatv("Name Index @ {0:x}: Entry @ {1:x} references a "
-                         "non-existing DIE @ {2:x}.\n",
-                         NI.getUnitOffset(), EntryID, DIEOffset);
+      ErrorCategory.Report("NameIndex references nonexistent DIE", [&]() {
+        error() << formatv("Name Index @ {0:x}: Entry @ {1:x} references a "
+                           "non-existing DIE @ {2:x}.\n",
+                           NI.getUnitOffset(), EntryID, DIEOffset);
+      });
       ++NumErrors;
       continue;
     }
     if (DIE.getDwarfUnit()->getOffset() != CUOffset) {
-      error() << formatv("Name Index @ {0:x}: Entry @ {1:x}: mismatched CU of "
-                         "DIE @ {2:x}: index - {3:x}; debug_info - {4:x}.\n",
-                         NI.getUnitOffset(), EntryID, DIEOffset, CUOffset,
-                         DIE.getDwarfUnit()->getOffset());
+      ErrorCategory.Report("Name index contains mismatched CU of DIE", [&]() {
+        error() << formatv(
+            "Name Index @ {0:x}: Entry @ {1:x}: mismatched CU of "
+            "DIE @ {2:x}: index - {3:x}; debug_info - {4:x}.\n",
+            NI.getUnitOffset(), EntryID, DIEOffset, CUOffset,
+            DIE.getDwarfUnit()->getOffset());
+      });
       ++NumErrors;
     }
     if (DIE.getTag() != EntryOr->tag()) {
-      error() << formatv("Name Index @ {0:x}: Entry @ {1:x}: mismatched Tag of "
-                         "DIE @ {2:x}: index - {3}; debug_info - {4}.\n",
-                         NI.getUnitOffset(), EntryID, DIEOffset, EntryOr->tag(),
-                         DIE.getTag());
+      ErrorCategory.Report("Name Index contains mismatched Tag of DIE", [&]() {
+        error() << formatv(
+            "Name Index @ {0:x}: Entry @ {1:x}: mismatched Tag of "
+            "DIE @ {2:x}: index - {3}; debug_info - {4}.\n",
+            NI.getUnitOffset(), EntryID, DIEOffset, EntryOr->tag(),
+            DIE.getTag());
+      });
       ++NumErrors;
     }
 
@@ -1471,29 +1643,36 @@ unsigned DWARFVerifier::verifyNameIndexEntries(
         DIE.getTag() == DW_TAG_inlined_subroutine;
     auto EntryNames = getNames(DIE, IncludeStrippedTemplateNames);
     if (!is_contained(EntryNames, Str)) {
-      error() << formatv("Name Index @ {0:x}: Entry @ {1:x}: mismatched Name "
-                         "of DIE @ {2:x}: index - {3}; debug_info - {4}.\n",
-                         NI.getUnitOffset(), EntryID, DIEOffset, Str,
-                         make_range(EntryNames.begin(), EntryNames.end()));
+      ErrorCategory.Report("Name Index contains mismatched name of DIE", [&]() {
+        error() << formatv("Name Index @ {0:x}: Entry @ {1:x}: mismatched Name "
+                           "of DIE @ {2:x}: index - {3}; debug_info - {4}.\n",
+                           NI.getUnitOffset(), EntryID, DIEOffset, Str,
+                           make_range(EntryNames.begin(), EntryNames.end()));
+      });
       ++NumErrors;
     }
   }
-  handleAllErrors(EntryOr.takeError(),
-                  [&](const DWARFDebugNames::SentinelError &) {
-                    if (NumEntries > 0)
-                      return;
-                    error() << formatv("Name Index @ {0:x}: Name {1} ({2}) is "
-                                       "not associated with any entries.\n",
-                                       NI.getUnitOffset(), NTE.getIndex(), Str);
-                    ++NumErrors;
-                  },
-                  [&](const ErrorInfoBase &Info) {
-                    error()
-                        << formatv("Name Index @ {0:x}: Name {1} ({2}): {3}\n",
-                                   NI.getUnitOffset(), NTE.getIndex(), Str,
-                                   Info.message());
-                    ++NumErrors;
-                  });
+  handleAllErrors(
+      EntryOr.takeError(),
+      [&](const DWARFDebugNames::SentinelError &) {
+        if (NumEntries > 0)
+          return;
+        ErrorCategory.Report(
+            "NameIndex Name is not associated with any entries", [&]() {
+              error() << formatv("Name Index @ {0:x}: Name {1} ({2}) is "
+                                 "not associated with any entries.\n",
+                                 NI.getUnitOffset(), NTE.getIndex(), Str);
+            });
+        ++NumErrors;
+      },
+      [&](const ErrorInfoBase &Info) {
+        ErrorCategory.Report("Uncategorized NameIndex error", [&]() {
+          error() << formatv("Name Index @ {0:x}: Name {1} ({2}): {3}\n",
+                             NI.getUnitOffset(), NTE.getIndex(), Str,
+                             Info.message());
+        });
+        ++NumErrors;
+      });
   return NumErrors;
 }
 
@@ -1619,10 +1798,12 @@ unsigned DWARFVerifier::verifyNameIndexCompleteness(
     if (none_of(NI.equal_range(Name), [&](const DWARFDebugNames::Entry &E) {
           return E.getDIEUnitOffset() == DieUnitOffset;
         })) {
-      error() << formatv("Name Index @ {0:x}: Entry for DIE @ {1:x} ({2}) with "
-                         "name {3} missing.\n",
-                         NI.getUnitOffset(), Die.getOffset(), Die.getTag(),
-                         Name);
+      ErrorCategory.Report("Name Index DIE entry missing name", [&]() {
+        error() << formatv(
+            "Name Index @ {0:x}: Entry for DIE @ {1:x} ({2}) with "
+            "name {3} missing.\n",
+            NI.getUnitOffset(), Die.getOffset(), Die.getTag(), Name);
+      });
       ++NumErrors;
     }
   }
@@ -1641,7 +1822,9 @@ unsigned DWARFVerifier::verifyDebugNames(const DWARFSection &AccelSection,
   // This verifies that we can read individual name indices and their
   // abbreviation tables.
   if (Error E = AccelTable.extract()) {
-    error() << toString(std::move(E)) << '\n';
+    std::string Msg = toString(std::move(E));
+    ErrorCategory.Report("Accelerator Table Error",
+                         [&]() { error() << Msg << '\n'; });
     return 1;
   }
 
@@ -1741,13 +1924,17 @@ bool DWARFVerifier::verifyDebugStrOffsets(
       if (!C)
         break;
       if (C.tell() + Length > DA.getData().size()) {
-        error() << formatv(
-            "{0}: contribution {1:X}: length exceeds available space "
-            "(contribution "
-            "offset ({1:X}) + length field space ({2:X}) + length ({3:X}) == "
-            "{4:X} > section size {5:X})\n",
-            SectionName, StartOffset, C.tell() - StartOffset, Length,
-            C.tell() + Length, DA.getData().size());
+        ErrorCategory.Report(
+            "Section contribution length exceeds available space", [&]() {
+              error() << formatv(
+                  "{0}: contribution {1:X}: length exceeds available space "
+                  "(contribution "
+                  "offset ({1:X}) + length field space ({2:X}) + length "
+                  "({3:X}) == "
+                  "{4:X} > section size {5:X})\n",
+                  SectionName, StartOffset, C.tell() - StartOffset, Length,
+                  C.tell() + Length, DA.getData().size());
+            });
         Success = false;
         // Nothing more to do - no other contributions to try.
         break;
@@ -1755,8 +1942,10 @@ bool DWARFVerifier::verifyDebugStrOffsets(
       NextUnit = C.tell() + Length;
       uint8_t Version = DA.getU16(C);
       if (C && Version != 5) {
-        error() << formatv("{0}: contribution {1:X}: invalid version {2}\n",
-                           SectionName, StartOffset, Version);
+        ErrorCategory.Report("Invalid Section version", [&]() {
+          error() << formatv("{0}: contribution {1:X}: invalid version {2}\n",
+                             SectionName, StartOffset, Version);
+        });
         Success = false;
         // Can't parse the rest of this contribution, since we don't know the
         // version, but we can pick up with the next contribution.
@@ -1768,10 +1957,12 @@ bool DWARFVerifier::verifyDebugStrOffsets(
     DA.setAddressSize(OffsetByteSize);
     uint64_t Remainder = (Length - 4) % OffsetByteSize;
     if (Remainder != 0) {
-      error() << formatv(
-          "{0}: contribution {1:X}: invalid length ((length ({2:X}) "
-          "- header (0x4)) % offset size {3:X} == {4:X} != 0)\n",
-          SectionName, StartOffset, Length, OffsetByteSize, Remainder);
+      ErrorCategory.Report("Invalid section contribution length", [&]() {
+        error() << formatv(
+            "{0}: contribution {1:X}: invalid length ((length ({2:X}) "
+            "- header (0x4)) % offset size {3:X} == {4:X} != 0)\n",
+            SectionName, StartOffset, Length, OffsetByteSize, Remainder);
+      });
       Success = false;
     }
     for (uint64_t Index = 0; C && C.tell() + OffsetByteSize <= NextUnit; ++Index) {
@@ -1781,29 +1972,64 @@ bool DWARFVerifier::verifyDebugStrOffsets(
       if (StrOff == 0)
         continue;
       if (StrData.size() <= StrOff) {
-        error() << formatv(
-            "{0}: contribution {1:X}: index {2:X}: invalid string "
-            "offset *{3:X} == {4:X}, is beyond the bounds of the string section of length {5:X}\n",
-            SectionName, StartOffset, Index, OffOff, StrOff, StrData.size());
+        ErrorCategory.Report(
+            "String offset out of bounds of string section", [&]() {
+              error() << formatv(
+                  "{0}: contribution {1:X}: index {2:X}: invalid string "
+                  "offset *{3:X} == {4:X}, is beyond the bounds of the string "
+                  "section of length {5:X}\n",
+                  SectionName, StartOffset, Index, OffOff, StrOff,
+                  StrData.size());
+            });
         continue;
       }
       if (StrData[StrOff - 1] == '\0')
         continue;
-      error() << formatv("{0}: contribution {1:X}: index {2:X}: invalid string "
-                         "offset *{3:X} == {4:X}, is neither zero nor "
-                         "immediately following a null character\n",
-                         SectionName, StartOffset, Index, OffOff, StrOff);
+      ErrorCategory.Report(
+          "Section contribution contains invalid string offset", [&]() {
+            error() << formatv(
+                "{0}: contribution {1:X}: index {2:X}: invalid string "
+                "offset *{3:X} == {4:X}, is neither zero nor "
+                "immediately following a null character\n",
+                SectionName, StartOffset, Index, OffOff, StrOff);
+          });
       Success = false;
     }
   }
 
   if (Error E = C.takeError()) {
-    error() << SectionName << ": " << toString(std::move(E)) << '\n';
-    return false;
+    std::string Msg = toString(std::move(E));
+    ErrorCategory.Report("String offset error", [&]() {
+      error() << SectionName << ": " << Msg << '\n';
+      return false;
+    });
   }
   return Success;
 }
 
+void OutputCategoryAggregator::Report(
+    StringRef s, std::function<void(void)> detailCallback) {
+  Aggregation[std::string(s)]++;
+  if (IncludeDetail)
+    detailCallback();
+}
+
+void OutputCategoryAggregator::EnumerateResults(
+    std::function<void(StringRef, unsigned)> handleCounts) {
+  for (auto &&[name, count] : Aggregation) {
+    handleCounts(name, count);
+  }
+}
+
+void DWARFVerifier::summarize() {
+  if (ErrorCategory.GetNumCategories() && DumpOpts.ShowAggregateErrors) {
+    error() << "Aggregated error counts:\n";
+    ErrorCategory.EnumerateResults([&](StringRef s, unsigned count) {
+      error() << s << " occurred " << count << " time(s).\n";
+    });
+  }
+}
+
 raw_ostream &DWARFVerifier::error() const { return WithColor::error(OS); }
 
 raw_ostream &DWARFVerifier::warn() const { return WithColor::warning(OS); }
diff --git a/llvm/test/DebugInfo/X86/skeleton-unit-verify.s b/llvm/test/DebugInfo/X86/skeleton-unit-verify.s
index d9c7436d1c750..92a3df486da39 100644
--- a/llvm/test/DebugInfo/X86/skeleton-unit-verify.s
+++ b/llvm/test/DebugInfo/X86/skeleton-unit-verify.s
@@ -1,5 +1,5 @@
 # RUN: llvm-mc -triple x86_64-unknown-linux %s -filetype=obj -o %t.o
-# RUN: not llvm-dwarfdump --verify %t.o | FileCheck %s
+# RUN: not llvm-dwarfdump --error-display=details --verify %t.o | FileCheck %s
 
 # CHECK: Verifying .debug_abbrev...
 # CHECK-NEXT: Verifying .debug_info Unit Header Chain...
@@ -51,5 +51,3 @@
         .byte   2                       # Abbrev [2]
         .byte   0
 .Lcu_end1:
-
-
diff --git a/llvm/test/DebugInfo/dwarfdump-accel.test b/llvm/test/DebugInfo/dwarfdump-accel.test
index 27720b6c9b42a..d564a8576c3c3 100644
--- a/llvm/test/DebugInfo/dwarfdump-accel.test
+++ b/llvm/test/DebugInfo/dwarfdump-accel.test
@@ -1,5 +1,5 @@
 RUN: llvm-dwarfdump -v %p/Inputs/dwarfdump-objc.x86_64.o | FileCheck %s
-RUN: not llvm-dwarfdump -verify %p/Inputs/dwarfdump-objc.x86_64.o | FileCheck %s --check-prefix=VERIFY
+RUN: not llvm-dwarfdump -error-display=details -verify %p/Inputs/dwarfdump-objc.x86_64.o | FileCheck %s --check-prefix=VERIFY
 
 Gather some DIE indexes to verify the accelerator table contents.
 CHECK: .debug_info contents
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes.yaml
index b86623dd011d4..a05b6f0cef8d0 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes.yaml
@@ -1,4 +1,4 @@
-# RUN: yaml2obj %s | not llvm-dwarfdump --verify - | FileCheck %s --implicit-check-not=error:
+# RUN: yaml2obj %s | not llvm-dwarfdump --error-display=details --verify - | FileCheck %s --implicit-check-not=error:
 
 # CHECK:      error: DIE has DW_AT_decl_file with an invalid file index 2 (valid values are [1-1]){{[[:space:]]}}
 # CHECK-NEXT: 0x0000001e: DW_TAG_subprogram
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes_no_files.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes_no_files.yaml
index 3b56dca1bb090..2ba71f521d058 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes_no_files.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_attr_file_indexes_no_files.yaml
@@ -1,4 +1,4 @@
-# RUN: yaml2obj %s | not llvm-dwarfdump --verify - | FileCheck %s --implicit-check-not=error:
+# RUN: yaml2obj %s | not llvm-dwarfdump --error-display=details --verify - | FileCheck %s --implicit-check-not=error:
 
 # CHECK:      error: DIE has DW_AT_decl_file with an invalid file index 2 (the file table in the prologue is empty){{[[:space:]]}}
 # CHECK-NEXT: 0x0000001e: DW_TAG_subprogram
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml
index af55a3a7d1034..fe31436e9f6e3 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml
@@ -50,7 +50,11 @@
 # CHECK-NEXT:               DW_AT_decl_line   [DW_FORM_sdata] (3)
 # CHECK-NEXT:               DW_AT_call_file   [DW_FORM_sdata] (4)
 # CHECK-NEXT:               DW_AT_call_line   [DW_FORM_sdata] (5){{[[:space:]]}}
-
+# CHECK-NEXT: Verifying dwo Units...
+# CHECK-NEXT: error: Aggregated error counts:
+# CHECK-NEXT: error: Invalid encoding in DW_AT_decl_file occurred 4 time(s).
+# CHECK-NEXT: error: Invalid file index in DW_AT_call_line occurred 1 time(s).
+# CHECK-NEXT: error: Invalid file index in DW_AT_decl_line occurred 1 time(s).
 --- !ELF
 FileHeader:
   Class:   ELFCLASS64
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml
index a40959f4d0ded..32b1c399985f1 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml
@@ -39,7 +39,7 @@
 #
 # 0x00000066:   NULL
 
-# RUN: yaml2obj %s | not llvm-dwarfdump --verify - | FileCheck %s --implicit-check-not=error:
+# RUN: yaml2obj %s | not llvm-dwarfdump --error-display=details --verify - | FileCheck %s --implicit-check-not=error:
 
 # CHECK: error: DIE has overlapping ranges in DW_AT_ranges attribute: [0x0000000000000000, 0x0000000000000020) and [0x0000000000000000, 0x0000000000000030)
 
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml
index 5188ac5a6d407..655819515f0ff 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml
@@ -30,7 +30,7 @@
 # 0x00000056:   NULL
 
 
-# RUN: yaml2obj %s | not llvm-dwarfdump --verify - | FileCheck %s --implicit-check-not=error:
+# RUN: yaml2obj %s | not llvm-dwarfdump --error-display=details --verify - | FileCheck %s --implicit-check-not=error:
 
 # CHECK: Verifying -:	file format Mach-O 64-bit x86-64
 # CHECK: Verifying .debug_abbrev...
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_split_cu.s b/llvm/test/tools/llvm-dwarfdump/X86/verify_split_cu.s
index 3941d9f1a7a57..ebc1b923f1a3b 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_split_cu.s
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_split_cu.s
@@ -8,7 +8,12 @@
 # CHECK: error: Unsupported DW_AT_location encoding: DW_FORM_data1
 # FIXME: This should read "type unit" or just "unit" to be correct for this case/in general
 # CHECK: error: DIE has DW_AT_decl_file that references a file with index 1 and the compile unit has no line table
-# CHECK: Errors detected
+# CHECK: error: Aggregated error counts:
+# CHECK: error: Compilation unit root DIE is not a unit DIE occurred 1 time(s).
+# CHECK: error: File index in DW_AT_decl_file reference CU with no line table occurred 1 time(s).
+# CHECK: error: Invalid DW_AT_location occurred 1 time(s).
+# CHECK: error: Mismatched unit type occurred 1 time(s).
+# CHECK: Errors detected.
 	.section	.debug_info.dwo,"e", at progbits
 	.long	.Ldebug_info_dwo_end1-.Ldebug_info_dwo_start1 # Length of Unit
 .Ldebug_info_dwo_start1:
diff --git a/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp b/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
index 941df4eb18445..559e7a6048983 100644
--- a/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
+++ b/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
@@ -124,6 +124,14 @@ template <> class parser<BoolOption> final : public basic_parser<BoolOption> {
 namespace {
 using namespace cl;
 
+enum ErrorDetailLevel {
+  OnlyDetailsNoSummary,
+  NoDetailsOnlySummary,
+  NoDetailsOrSummary,
+  BothDetailsAndSummary,
+  Unspecified
+};
+
 OptionCategory DwarfDumpCategory("Specific Options");
 static list<std::string>
     InputFilenames(Positional, desc("<input object files or .dSYM bundles>"),
@@ -276,6 +284,17 @@ static cl::opt<bool>
                 cat(DwarfDumpCategory));
 static opt<bool> Verify("verify", desc("Verify the DWARF debug info."),
                         cat(DwarfDumpCategory));
+static opt<ErrorDetailLevel> ErrorDetails(
+    "error-display", init(Unspecified),
+    values(clEnumValN(NoDetailsOrSummary, "quiet",
+                      "Only display whether errors occurred."),
+           clEnumValN(NoDetailsOnlySummary, "summary",
+                      "Display only a summary of the errors found."),
+           clEnumValN(OnlyDetailsNoSummary, "details",
+                      "Display each error in detail but no summary."),
+           clEnumValN(BothDetailsAndSummary, "full",
+                      "Display each error as well as a summary. [default]")),
+    cat(DwarfDumpCategory));
 static opt<bool> Quiet("quiet", desc("Use with -verify to not emit to STDOUT."),
                        cat(DwarfDumpCategory));
 static opt<bool> DumpUUID("uuid", desc("Show the UUID for each architecture."),
@@ -326,7 +345,10 @@ static DIDumpOptions getDumpOpts(DWARFContext &C) {
   DumpOpts.RecoverableErrorHandler = C.getRecoverableErrorHandler();
   // In -verify mode, print DIEs without children in error messages.
   if (Verify) {
-    DumpOpts.Verbose = true;
+    DumpOpts.Verbose = ErrorDetails != NoDetailsOnlySummary &&
+                       ErrorDetails != NoDetailsOrSummary;
+    DumpOpts.ShowAggregateErrors = ErrorDetails != OnlyDetailsNoSummary &&
+                                   ErrorDetails != NoDetailsOnlySummary;
     return DumpOpts.noImplicitRecursion();
   }
   return DumpOpts;
@@ -812,6 +834,8 @@ int main(int argc, char **argv) {
                           "-verbose is currently not supported";
     return 1;
   }
+  if (!Verify && ErrorDetails != Unspecified)
+    WithColor::warning() << "-error-detail has no affect without -verify";
 
   std::error_code EC;
   ToolOutputFile OutputFile(OutputFilename, EC, sys::fs::OF_TextWithCRLF);

>From 9341b5c84ae36f79cd57f2dd0d5adec8b9bac622 Mon Sep 17 00:00:00 2001
From: Krystian Stasiowski <sdkrystian at gmail.com>
Date: Thu, 1 Feb 2024 11:50:50 -0500
Subject: [PATCH 32/42] [Clang][NFC] Remove TemplateArgumentList::OnStack
 (#79760)

This patch removes on-stack `TemplateArgumentList`'s. They were primary used
to pass an `ArrayRef<TemplateArgument>` to
`Sema::getTemplateInstantiationArgs`, which had a `const
TemplateArgumentList*` parameter for the innermost template argument
list. Changing this parameter to an
`std::optional<ArrayRef<TemplateArgument>>` eliminates the need for
on-stack `TemplateArgumentList`'s, which in turn eliminates the need for
`TemplateArgumentList` to store a pointer to its template argument
storage (which is redundant in almost all cases, as it is an AST
allocated type).
---
 clang/docs/ReleaseNotes.rst                   |  4 ++
 clang/include/clang/AST/DeclTemplate.h        | 26 +------
 clang/include/clang/Sema/Sema.h               |  8 +--
 clang/lib/AST/DeclTemplate.cpp                |  3 +-
 clang/lib/Sema/SemaConcept.cpp                | 20 +++---
 clang/lib/Sema/SemaExprCXX.cpp                |  4 +-
 clang/lib/Sema/SemaTemplate.cpp               | 20 +++---
 clang/lib/Sema/SemaTemplateDeduction.cpp      | 70 ++++++++-----------
 clang/lib/Sema/SemaTemplateInstantiate.cpp    | 11 +--
 .../lib/Sema/SemaTemplateInstantiateDecl.cpp  | 27 +++----
 10 files changed, 81 insertions(+), 112 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index efd925e990f43..53040aa0f9074 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -51,6 +51,10 @@ AST Dumping Potentially Breaking Changes
 
 Clang Frontend Potentially Breaking Changes
 -------------------------------------------
+- Removed support for constructing on-stack ``TemplateArgumentList``s; interfaces should instead
+  use ``ArrayRef<TemplateArgument>`` to pass template arguments. Transitioning internal uses to
+  ``ArrayRef<TemplateArgument>`` reduces AST memory usage by 0.4% when compiling clang, and is
+  expected to show similar improvements on other workloads.
 
 Target OS macros extension
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/AST/DeclTemplate.h b/clang/include/clang/AST/DeclTemplate.h
index 832ad2de6b08a..baf71145d99dc 100644
--- a/clang/include/clang/AST/DeclTemplate.h
+++ b/clang/include/clang/AST/DeclTemplate.h
@@ -241,9 +241,6 @@ class FixedSizeTemplateParameterListStorage
 /// A template argument list.
 class TemplateArgumentList final
     : private llvm::TrailingObjects<TemplateArgumentList, TemplateArgument> {
-  /// The template argument list.
-  const TemplateArgument *Arguments;
-
   /// The number of template arguments in this template
   /// argument list.
   unsigned NumArguments;
@@ -258,30 +255,11 @@ class TemplateArgumentList final
   TemplateArgumentList(const TemplateArgumentList &) = delete;
   TemplateArgumentList &operator=(const TemplateArgumentList &) = delete;
 
-  /// Type used to indicate that the template argument list itself is a
-  /// stack object. It does not own its template arguments.
-  enum OnStackType { OnStack };
-
   /// Create a new template argument list that copies the given set of
   /// template arguments.
   static TemplateArgumentList *CreateCopy(ASTContext &Context,
                                           ArrayRef<TemplateArgument> Args);
 
-  /// Construct a new, temporary template argument list on the stack.
-  ///
-  /// The template argument list does not own the template arguments
-  /// provided.
-  explicit TemplateArgumentList(OnStackType, ArrayRef<TemplateArgument> Args)
-      : Arguments(Args.data()), NumArguments(Args.size()) {}
-
-  /// Produces a shallow copy of the given template argument list.
-  ///
-  /// This operation assumes that the input argument list outlives it.
-  /// This takes the list as a pointer to avoid looking like a copy
-  /// constructor, since this really isn't safe to use that way.
-  explicit TemplateArgumentList(const TemplateArgumentList *Other)
-      : Arguments(Other->data()), NumArguments(Other->size()) {}
-
   /// Retrieve the template argument at a given index.
   const TemplateArgument &get(unsigned Idx) const {
     assert(Idx < NumArguments && "Invalid template argument index");
@@ -301,7 +279,9 @@ class TemplateArgumentList final
   unsigned size() const { return NumArguments; }
 
   /// Retrieve a pointer to the template argument list.
-  const TemplateArgument *data() const { return Arguments; }
+  const TemplateArgument *data() const {
+    return getTrailingObjects<TemplateArgument>();
+  }
 };
 
 void *allocateDefaultArgStorageChain(const ASTContext &C);
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index b780cee86c3c3..780a2f2d8ce27 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -9329,12 +9329,12 @@ class Sema final {
 
   TemplateDeductionResult
   DeduceTemplateArguments(ClassTemplatePartialSpecializationDecl *Partial,
-                          const TemplateArgumentList &TemplateArgs,
+                          ArrayRef<TemplateArgument> TemplateArgs,
                           sema::TemplateDeductionInfo &Info);
 
   TemplateDeductionResult
   DeduceTemplateArguments(VarTemplatePartialSpecializationDecl *Partial,
-                          const TemplateArgumentList &TemplateArgs,
+                          ArrayRef<TemplateArgument> TemplateArgs,
                           sema::TemplateDeductionInfo &Info);
 
   TemplateDeductionResult SubstituteExplicitTemplateArguments(
@@ -9507,7 +9507,7 @@ class Sema final {
 
   MultiLevelTemplateArgumentList getTemplateInstantiationArgs(
       const NamedDecl *D, const DeclContext *DC = nullptr, bool Final = false,
-      const TemplateArgumentList *Innermost = nullptr,
+      std::optional<ArrayRef<TemplateArgument>> Innermost = std::nullopt,
       bool RelativeToPrimary = false, const FunctionDecl *Pattern = nullptr,
       bool ForConstraintInstantiation = false,
       bool SkipForSpecialization = false);
@@ -10537,7 +10537,7 @@ class Sema final {
                                      bool AtEndOfTU = false);
   VarTemplateSpecializationDecl *BuildVarTemplateInstantiation(
       VarTemplateDecl *VarTemplate, VarDecl *FromVar,
-      const TemplateArgumentList &TemplateArgList,
+      const TemplateArgumentList *PartialSpecArgs,
       const TemplateArgumentListInfo &TemplateArgsInfo,
       SmallVectorImpl<TemplateArgument> &Converted,
       SourceLocation PointOfInstantiation,
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index 946a34ea8830e..3c217d6a6a5ae 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -871,8 +871,7 @@ void TemplateTemplateParmDecl::setDefaultArgument(
 // TemplateArgumentList Implementation
 //===----------------------------------------------------------------------===//
 TemplateArgumentList::TemplateArgumentList(ArrayRef<TemplateArgument> Args)
-    : Arguments(getTrailingObjects<TemplateArgument>()),
-      NumArguments(Args.size()) {
+    : NumArguments(Args.size()) {
   std::uninitialized_copy(Args.begin(), Args.end(),
                           getTrailingObjects<TemplateArgument>());
 }
diff --git a/clang/lib/Sema/SemaConcept.cpp b/clang/lib/Sema/SemaConcept.cpp
index 5028879eda22f..2878e4d31ee8f 100644
--- a/clang/lib/Sema/SemaConcept.cpp
+++ b/clang/lib/Sema/SemaConcept.cpp
@@ -661,11 +661,12 @@ Sema::SetupConstraintCheckingTemplateArgumentsAndScope(
   // Collect the list of template arguments relative to the 'primary' template.
   // We need the entire list, since the constraint is completely uninstantiated
   // at this point.
-  MLTAL = getTemplateInstantiationArgs(FD, FD->getLexicalDeclContext(),
-                                       /*Final=*/false, /*Innermost=*/nullptr,
-                                       /*RelativeToPrimary=*/true,
-                                       /*Pattern=*/nullptr,
-                                       /*ForConstraintInstantiation=*/true);
+  MLTAL =
+      getTemplateInstantiationArgs(FD, FD->getLexicalDeclContext(),
+                                   /*Final=*/false, /*Innermost=*/std::nullopt,
+                                   /*RelativeToPrimary=*/true,
+                                   /*Pattern=*/nullptr,
+                                   /*ForConstraintInstantiation=*/true);
   if (SetupConstraintScope(FD, TemplateArgs, MLTAL, Scope))
     return std::nullopt;
 
@@ -740,7 +741,8 @@ static unsigned
 CalculateTemplateDepthForConstraints(Sema &S, const NamedDecl *ND,
                                      bool SkipForSpecialization = false) {
   MultiLevelTemplateArgumentList MLTAL = S.getTemplateInstantiationArgs(
-      ND, ND->getLexicalDeclContext(), /*Final=*/false, /*Innermost=*/nullptr,
+      ND, ND->getLexicalDeclContext(), /*Final=*/false,
+      /*Innermost=*/std::nullopt,
       /*RelativeToPrimary=*/true,
       /*Pattern=*/nullptr,
       /*ForConstraintInstantiation=*/true, SkipForSpecialization);
@@ -780,7 +782,7 @@ static const Expr *SubstituteConstraintExpressionWithoutSatisfaction(
     const Expr *ConstrExpr) {
   MultiLevelTemplateArgumentList MLTAL = S.getTemplateInstantiationArgs(
       DeclInfo.getDecl(), DeclInfo.getLexicalDeclContext(), /*Final=*/false,
-      /*Innermost=*/nullptr,
+      /*Innermost=*/std::nullopt,
       /*RelativeToPrimary=*/true,
       /*Pattern=*/nullptr, /*ForConstraintInstantiation=*/true,
       /*SkipForSpecialization*/ false);
@@ -1279,11 +1281,9 @@ substituteParameterMappings(Sema &S, NormalizedConstraint &N,
 
 static bool substituteParameterMappings(Sema &S, NormalizedConstraint &N,
                                         const ConceptSpecializationExpr *CSE) {
-  TemplateArgumentList TAL{TemplateArgumentList::OnStack,
-                           CSE->getTemplateArguments()};
   MultiLevelTemplateArgumentList MLTAL = S.getTemplateInstantiationArgs(
       CSE->getNamedConcept(), CSE->getNamedConcept()->getLexicalDeclContext(),
-      /*Final=*/false, &TAL,
+      /*Final=*/false, CSE->getTemplateArguments(),
       /*RelativeToPrimary=*/true,
       /*Pattern=*/nullptr,
       /*ForConstraintInstantiation=*/true);
diff --git a/clang/lib/Sema/SemaExprCXX.cpp b/clang/lib/Sema/SemaExprCXX.cpp
index 2b695fd43eac0..3a32754e5376e 100644
--- a/clang/lib/Sema/SemaExprCXX.cpp
+++ b/clang/lib/Sema/SemaExprCXX.cpp
@@ -9113,9 +9113,7 @@ Sema::BuildExprRequirement(
 
     auto *Param = cast<TemplateTypeParmDecl>(TPL->getParam(0));
 
-    TemplateArgumentList TAL(TemplateArgumentList::OnStack, Args);
-    MultiLevelTemplateArgumentList MLTAL(Param, TAL.asArray(),
-                                         /*Final=*/false);
+    MultiLevelTemplateArgumentList MLTAL(Param, Args, /*Final=*/false);
     MLTAL.addOuterRetainedLevels(TPL->getDepth());
     const TypeConstraint *TC = Param->getTypeConstraint();
     assert(TC && "Type Constraint cannot be null here");
diff --git a/clang/lib/Sema/SemaTemplate.cpp b/clang/lib/Sema/SemaTemplate.cpp
index 9e8d058041f04..5616682e909aa 100644
--- a/clang/lib/Sema/SemaTemplate.cpp
+++ b/clang/lib/Sema/SemaTemplate.cpp
@@ -4843,9 +4843,7 @@ Sema::CheckVarTemplateId(VarTemplateDecl *Template, SourceLocation TemplateLoc,
   // the set of specializations, based on the closest partial specialization
   // that it represents. That is,
   VarDecl *InstantiationPattern = Template->getTemplatedDecl();
-  TemplateArgumentList TemplateArgList(TemplateArgumentList::OnStack,
-                                       CanonicalConverted);
-  TemplateArgumentList *InstantiationArgs = &TemplateArgList;
+  const TemplateArgumentList *PartialSpecArgs = nullptr;
   bool AmbiguousPartialSpec = false;
   typedef PartialSpecMatchResult MatchResult;
   SmallVector<MatchResult, 4> Matched;
@@ -4866,7 +4864,7 @@ Sema::CheckVarTemplateId(VarTemplateDecl *Template, SourceLocation TemplateLoc,
     TemplateDeductionInfo Info(FailedCandidates.getLocation());
 
     if (TemplateDeductionResult Result =
-            DeduceTemplateArguments(Partial, TemplateArgList, Info)) {
+            DeduceTemplateArguments(Partial, CanonicalConverted, Info)) {
       // Store the failed-deduction information for use in diagnostics, later.
       // TODO: Actually use the failed-deduction info?
       FailedCandidates.addCandidate().set(
@@ -4919,7 +4917,7 @@ Sema::CheckVarTemplateId(VarTemplateDecl *Template, SourceLocation TemplateLoc,
 
     // Instantiate using the best variable template partial specialization.
     InstantiationPattern = Best->Partial;
-    InstantiationArgs = Best->Args;
+    PartialSpecArgs = Best->Args;
   } else {
     //   -- If no match is found, the instantiation is generated
     //      from the primary template.
@@ -4931,7 +4929,7 @@ Sema::CheckVarTemplateId(VarTemplateDecl *Template, SourceLocation TemplateLoc,
   // in DoMarkVarDeclReferenced().
   // FIXME: LateAttrs et al.?
   VarTemplateSpecializationDecl *Decl = BuildVarTemplateInstantiation(
-      Template, InstantiationPattern, *InstantiationArgs, TemplateArgs,
+      Template, InstantiationPattern, PartialSpecArgs, TemplateArgs,
       CanonicalConverted, TemplateNameLoc /*, LateAttrs, StartingScope*/);
   if (!Decl)
     return true;
@@ -4952,7 +4950,7 @@ Sema::CheckVarTemplateId(VarTemplateDecl *Template, SourceLocation TemplateLoc,
 
   if (VarTemplatePartialSpecializationDecl *D =
           dyn_cast<VarTemplatePartialSpecializationDecl>(InstantiationPattern))
-    Decl->setInstantiationOf(D, InstantiationArgs);
+    Decl->setInstantiationOf(D, PartialSpecArgs);
 
   checkSpecializationReachability(TemplateNameLoc, Decl);
 
@@ -6257,8 +6255,6 @@ bool Sema::CheckTemplateArgumentList(
     TemplateArgs = std::move(NewArgs);
 
   if (!PartialTemplateArgs) {
-    TemplateArgumentList StackTemplateArgs(TemplateArgumentList::OnStack,
-                                           CanonicalConverted);
     // Setup the context/ThisScope for the case where we are needing to
     // re-instantiate constraints outside of normal instantiation.
     DeclContext *NewContext = Template->getDeclContext();
@@ -6278,7 +6274,7 @@ bool Sema::CheckTemplateArgumentList(
     CXXThisScopeRAII(*this, RD, ThisQuals, RD != nullptr);
 
     MultiLevelTemplateArgumentList MLTAL = getTemplateInstantiationArgs(
-        Template, NewContext, /*Final=*/false, &StackTemplateArgs,
+        Template, NewContext, /*Final=*/false, CanonicalConverted,
         /*RelativeToPrimary=*/true,
         /*Pattern=*/nullptr,
         /*ForConceptInstantiation=*/true);
@@ -9782,8 +9778,8 @@ bool Sema::CheckFunctionTemplateSpecialization(
   // specialization, with the template arguments from the previous
   // specialization.
   // Take copies of (semantic and syntactic) template argument lists.
-  const TemplateArgumentList* TemplArgs = new (Context)
-    TemplateArgumentList(Specialization->getTemplateSpecializationArgs());
+  const TemplateArgumentList *TemplArgs = TemplateArgumentList::CreateCopy(
+      Context, Specialization->getTemplateSpecializationArgs()->asArray());
   FD->setFunctionTemplateSpecialization(
       Specialization->getPrimaryTemplate(), TemplArgs, /*InsertPos=*/nullptr,
       SpecInfo->getTemplateSpecializationKind(),
diff --git a/clang/lib/Sema/SemaTemplateDeduction.cpp b/clang/lib/Sema/SemaTemplateDeduction.cpp
index f08577febcd3e..a54ad27975890 100644
--- a/clang/lib/Sema/SemaTemplateDeduction.cpp
+++ b/clang/lib/Sema/SemaTemplateDeduction.cpp
@@ -2514,17 +2514,6 @@ DeduceTemplateArguments(Sema &S, TemplateParameterList *TemplateParams,
   return Sema::TDK_Success;
 }
 
-static Sema::TemplateDeductionResult
-DeduceTemplateArguments(Sema &S, TemplateParameterList *TemplateParams,
-                        const TemplateArgumentList &ParamList,
-                        const TemplateArgumentList &ArgList,
-                        TemplateDeductionInfo &Info,
-                        SmallVectorImpl<DeducedTemplateArgument> &Deduced) {
-  return DeduceTemplateArguments(S, TemplateParams, ParamList.asArray(),
-                                 ArgList.asArray(), Info, Deduced,
-                                 /*NumberOfArgumentsMustMatch=*/false);
-}
-
 /// Determine whether two template arguments are the same.
 static bool isSameTemplateArg(ASTContext &Context,
                               TemplateArgument X,
@@ -2945,13 +2934,14 @@ CheckDeducedArgumentConstraints(Sema &S, TemplateDeclT *Template,
   llvm::SmallVector<const Expr *, 3> AssociatedConstraints;
   Template->getAssociatedConstraints(AssociatedConstraints);
 
-  bool NeedsReplacement = DeducedArgsNeedReplacement(Template);
-  TemplateArgumentList DeducedTAL{TemplateArgumentList::OnStack,
-                                  CanonicalDeducedArgs};
+  std::optional<ArrayRef<TemplateArgument>> Innermost;
+  // If we don't need to replace the deduced template arguments,
+  // we can add them immediately as the inner-most argument list.
+  if (!DeducedArgsNeedReplacement(Template))
+    Innermost = CanonicalDeducedArgs;
 
   MultiLevelTemplateArgumentList MLTAL = S.getTemplateInstantiationArgs(
-      Template, Template->getDeclContext(), /*Final=*/false,
-      /*InnerMost=*/NeedsReplacement ? nullptr : &DeducedTAL,
+      Template, Template->getDeclContext(), /*Final=*/false, Innermost,
       /*RelativeToPrimary=*/true, /*Pattern=*/
       nullptr, /*ForConstraintInstantiation=*/true);
 
@@ -2959,7 +2949,7 @@ CheckDeducedArgumentConstraints(Sema &S, TemplateDeclT *Template,
   // template args when this is a variable template partial specialization and
   // not class-scope explicit specialization, so replace with Deduced Args
   // instead of adding to inner-most.
-  if (NeedsReplacement)
+  if (!Innermost)
     MLTAL.replaceInnermostTemplateArguments(Template, CanonicalDeducedArgs);
 
   if (S.CheckConstraintSatisfaction(Template, AssociatedConstraints, MLTAL,
@@ -2980,7 +2970,7 @@ static std::enable_if_t<IsPartialSpecialization<T>::value,
                         Sema::TemplateDeductionResult>
 FinishTemplateArgumentDeduction(
     Sema &S, T *Partial, bool IsPartialOrdering,
-    const TemplateArgumentList &TemplateArgs,
+    ArrayRef<TemplateArgument> TemplateArgs,
     SmallVectorImpl<DeducedTemplateArgument> &Deduced,
     TemplateDeductionInfo &Info) {
   // Unevaluated SFINAE context.
@@ -3073,7 +3063,7 @@ FinishTemplateArgumentDeduction(
 // FIXME: Factor out duplication with partial specialization version above.
 static Sema::TemplateDeductionResult FinishTemplateArgumentDeduction(
     Sema &S, TemplateDecl *Template, bool PartialOrdering,
-    const TemplateArgumentList &TemplateArgs,
+    ArrayRef<TemplateArgument> TemplateArgs,
     SmallVectorImpl<DeducedTemplateArgument> &Deduced,
     TemplateDeductionInfo &Info) {
   // Unevaluated SFINAE context.
@@ -3122,7 +3112,7 @@ static Sema::TemplateDeductionResult FinishTemplateArgumentDeduction(
 /// partial specialization per C++ [temp.class.spec.match].
 Sema::TemplateDeductionResult
 Sema::DeduceTemplateArguments(ClassTemplatePartialSpecializationDecl *Partial,
-                              const TemplateArgumentList &TemplateArgs,
+                              ArrayRef<TemplateArgument> TemplateArgs,
                               TemplateDeductionInfo &Info) {
   if (Partial->isInvalidDecl())
     return TDK_Invalid;
@@ -3144,11 +3134,10 @@ Sema::DeduceTemplateArguments(ClassTemplatePartialSpecializationDecl *Partial,
 
   SmallVector<DeducedTemplateArgument, 4> Deduced;
   Deduced.resize(Partial->getTemplateParameters()->size());
-  if (TemplateDeductionResult Result
-        = ::DeduceTemplateArguments(*this,
-                                    Partial->getTemplateParameters(),
-                                    Partial->getTemplateArgs(),
-                                    TemplateArgs, Info, Deduced))
+  if (TemplateDeductionResult Result = ::DeduceTemplateArguments(
+          *this, Partial->getTemplateParameters(),
+          Partial->getTemplateArgs().asArray(), TemplateArgs, Info, Deduced,
+          /*NumberOfArgumentsMustMatch=*/false))
     return Result;
 
   SmallVector<TemplateArgument, 4> DeducedArgs(Deduced.begin(), Deduced.end());
@@ -3174,7 +3163,7 @@ Sema::DeduceTemplateArguments(ClassTemplatePartialSpecializationDecl *Partial,
 /// partial specialization per C++ [temp.class.spec.match].
 Sema::TemplateDeductionResult
 Sema::DeduceTemplateArguments(VarTemplatePartialSpecializationDecl *Partial,
-                              const TemplateArgumentList &TemplateArgs,
+                              ArrayRef<TemplateArgument> TemplateArgs,
                               TemplateDeductionInfo &Info) {
   if (Partial->isInvalidDecl())
     return TDK_Invalid;
@@ -3197,8 +3186,9 @@ Sema::DeduceTemplateArguments(VarTemplatePartialSpecializationDecl *Partial,
   SmallVector<DeducedTemplateArgument, 4> Deduced;
   Deduced.resize(Partial->getTemplateParameters()->size());
   if (TemplateDeductionResult Result = ::DeduceTemplateArguments(
-          *this, Partial->getTemplateParameters(), Partial->getTemplateArgs(),
-          TemplateArgs, Info, Deduced))
+          *this, Partial->getTemplateParameters(),
+          Partial->getTemplateArgs().asArray(), TemplateArgs, Info, Deduced,
+          /*NumberOfArgumentsMustMatch=*/false))
     return Result;
 
   SmallVector<TemplateArgument, 4> DeducedArgs(Deduced.begin(), Deduced.end());
@@ -3427,15 +3417,15 @@ Sema::TemplateDeductionResult Sema::SubstituteExplicitTemplateArguments(
     // specification.
     SmallVector<QualType, 4> ExceptionStorage;
     if (getLangOpts().CPlusPlus17 &&
-        SubstExceptionSpec(Function->getLocation(), EPI.ExceptionSpec,
-                           ExceptionStorage,
-                           getTemplateInstantiationArgs(
-                               FunctionTemplate, nullptr, /*Final=*/true,
-                               /*Innermost=*/SugaredExplicitArgumentList,
-                               /*RelativeToPrimary=*/false,
-                               /*Pattern=*/nullptr,
-                               /*ForConstraintInstantiation=*/false,
-                               /*SkipForSpecialization=*/true)))
+        SubstExceptionSpec(
+            Function->getLocation(), EPI.ExceptionSpec, ExceptionStorage,
+            getTemplateInstantiationArgs(
+                FunctionTemplate, nullptr, /*Final=*/true,
+                /*Innermost=*/SugaredExplicitArgumentList->asArray(),
+                /*RelativeToPrimary=*/false,
+                /*Pattern=*/nullptr,
+                /*ForConstraintInstantiation=*/false,
+                /*SkipForSpecialization=*/true)))
       return TDK_SubstitutionFailure;
 
     *FunctionType = BuildFunctionType(ResultType, ParamTypes,
@@ -5802,10 +5792,8 @@ static bool isAtLeastAsSpecializedAs(Sema &S, QualType T1, QualType T2,
   bool AtLeastAsSpecialized;
   S.runWithSufficientStackSpace(Info.getLocation(), [&] {
     AtLeastAsSpecialized = !FinishTemplateArgumentDeduction(
-        S, P2, /*IsPartialOrdering=*/true,
-        TemplateArgumentList(TemplateArgumentList::OnStack,
-                             TST1->template_arguments()),
-        Deduced, Info);
+        S, P2, /*IsPartialOrdering=*/true, TST1->template_arguments(), Deduced,
+        Info);
   });
   return AtLeastAsSpecialized;
 }
diff --git a/clang/lib/Sema/SemaTemplateInstantiate.cpp b/clang/lib/Sema/SemaTemplateInstantiate.cpp
index e12186d7d82f8..01b78e4424fb5 100644
--- a/clang/lib/Sema/SemaTemplateInstantiate.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiate.cpp
@@ -338,7 +338,7 @@ Response HandleGenericDeclContext(const Decl *CurDecl) {
 
 MultiLevelTemplateArgumentList Sema::getTemplateInstantiationArgs(
     const NamedDecl *ND, const DeclContext *DC, bool Final,
-    const TemplateArgumentList *Innermost, bool RelativeToPrimary,
+    std::optional<ArrayRef<TemplateArgument>> Innermost, bool RelativeToPrimary,
     const FunctionDecl *Pattern, bool ForConstraintInstantiation,
     bool SkipForSpecialization) {
   assert((ND || DC) && "Can't find arguments for a decl if one isn't provided");
@@ -352,8 +352,8 @@ MultiLevelTemplateArgumentList Sema::getTemplateInstantiationArgs(
     CurDecl = Decl::castFromDeclContext(DC);
 
   if (Innermost) {
-    Result.addOuterTemplateArguments(const_cast<NamedDecl *>(ND),
-                                     Innermost->asArray(), Final);
+    Result.addOuterTemplateArguments(const_cast<NamedDecl *>(ND), *Innermost,
+                                     Final);
     // Populate placeholder template arguments for TemplateTemplateParmDecls.
     // This is essential for the case e.g.
     //
@@ -3656,7 +3656,8 @@ bool Sema::usesPartialOrExplicitSpecialization(
   for (unsigned I = 0, N = PartialSpecs.size(); I != N; ++I) {
     TemplateDeductionInfo Info(Loc);
     if (!DeduceTemplateArguments(PartialSpecs[I],
-                                 ClassTemplateSpec->getTemplateArgs(), Info))
+                                 ClassTemplateSpec->getTemplateArgs().asArray(),
+                                 Info))
       return true;
   }
 
@@ -3701,7 +3702,7 @@ getPatternForClassTemplateSpecialization(
       ClassTemplatePartialSpecializationDecl *Partial = PartialSpecs[I];
       TemplateDeductionInfo Info(FailedCandidates.getLocation());
       if (Sema::TemplateDeductionResult Result = S.DeduceTemplateArguments(
-              Partial, ClassTemplateSpec->getTemplateArgs(), Info)) {
+              Partial, ClassTemplateSpec->getTemplateArgs().asArray(), Info)) {
         // Store the failed-deduction information for use in diagnostics, later.
         // TODO: Actually use the failed-deduction info?
         FailedCandidates.addCandidate().set(
diff --git a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
index fcb27a880290b..d67b21b4449e0 100644
--- a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
@@ -4656,9 +4656,10 @@ bool Sema::InstantiateDefaultArgument(SourceLocation CallLoc, FunctionDecl *FD,
   //
   // template<typename T>
   // A<T> Foo(int a = A<T>::FooImpl());
-  MultiLevelTemplateArgumentList TemplateArgs = getTemplateInstantiationArgs(
-      FD, FD->getLexicalDeclContext(), /*Final=*/false, nullptr,
-      /*RelativeToPrimary=*/true);
+  MultiLevelTemplateArgumentList TemplateArgs =
+      getTemplateInstantiationArgs(FD, FD->getLexicalDeclContext(),
+                                   /*Final=*/false, /*Innermost=*/std::nullopt,
+                                   /*RelativeToPrimary=*/true);
 
   if (SubstDefaultArgument(CallLoc, Param, TemplateArgs, /*ForCallExpr*/ true))
     return true;
@@ -4696,9 +4697,10 @@ void Sema::InstantiateExceptionSpec(SourceLocation PointOfInstantiation,
   Sema::ContextRAII savedContext(*this, Decl);
   LocalInstantiationScope Scope(*this);
 
-  MultiLevelTemplateArgumentList TemplateArgs = getTemplateInstantiationArgs(
-      Decl, Decl->getLexicalDeclContext(), /*Final=*/false, nullptr,
-      /*RelativeToPrimary*/ true);
+  MultiLevelTemplateArgumentList TemplateArgs =
+      getTemplateInstantiationArgs(Decl, Decl->getLexicalDeclContext(),
+                                   /*Final=*/false, /*Innermost=*/std::nullopt,
+                                   /*RelativeToPrimary*/ true);
 
   // FIXME: We can't use getTemplateInstantiationPattern(false) in general
   // here, because for a non-defining friend declaration in a class template,
@@ -5140,8 +5142,8 @@ void Sema::InstantiateFunctionDefinition(SourceLocation PointOfInstantiation,
     SetDeclDefaulted(Function, PatternDecl->getLocation());
   } else {
     MultiLevelTemplateArgumentList TemplateArgs = getTemplateInstantiationArgs(
-        Function, Function->getLexicalDeclContext(), /*Final=*/false, nullptr,
-        false, PatternDecl);
+        Function, Function->getLexicalDeclContext(), /*Final=*/false,
+        /*Innermost=*/std::nullopt, false, PatternDecl);
 
     // Substitute into the qualifier; we can get a substitution failure here
     // through evil use of alias templates.
@@ -5211,7 +5213,7 @@ void Sema::InstantiateFunctionDefinition(SourceLocation PointOfInstantiation,
 
 VarTemplateSpecializationDecl *Sema::BuildVarTemplateInstantiation(
     VarTemplateDecl *VarTemplate, VarDecl *FromVar,
-    const TemplateArgumentList &TemplateArgList,
+    const TemplateArgumentList *PartialSpecArgs,
     const TemplateArgumentListInfo &TemplateArgsInfo,
     SmallVectorImpl<TemplateArgument> &Converted,
     SourceLocation PointOfInstantiation, LateInstantiatedAttrVec *LateAttrs,
@@ -5236,14 +5238,15 @@ VarTemplateSpecializationDecl *Sema::BuildVarTemplateInstantiation(
   MultiLevelTemplateArgumentList MultiLevelList;
   if (auto *PartialSpec =
           dyn_cast<VarTemplatePartialSpecializationDecl>(FromVar)) {
+    assert(PartialSpecArgs);
     IsMemberSpec = PartialSpec->isMemberSpecialization();
     MultiLevelList.addOuterTemplateArguments(
-        PartialSpec, TemplateArgList.asArray(), /*Final=*/false);
+        PartialSpec, PartialSpecArgs->asArray(), /*Final=*/false);
   } else {
     assert(VarTemplate == FromVar->getDescribedVarTemplate());
     IsMemberSpec = VarTemplate->isMemberSpecialization();
-    MultiLevelList.addOuterTemplateArguments(
-        VarTemplate, TemplateArgList.asArray(), /*Final=*/false);
+    MultiLevelList.addOuterTemplateArguments(VarTemplate, Converted,
+                                             /*Final=*/false);
   }
   if (!IsMemberSpec)
     FromVar = FromVar->getFirstDecl();

>From c7550af4e1ac480483745d42e6a6d29d9da56c6c Mon Sep 17 00:00:00 2001
From: lntue <35648136+lntue at users.noreply.github.com>
Date: Thu, 1 Feb 2024 11:57:52 -0500
Subject: [PATCH 33/42] [libc][NFC] Refactor FLAGS expansion using
 cmake_language(CALL ...). (#80156)

---
 libc/cmake/modules/LLVMLibCFlagRules.cmake    | 129 +++++++++
 libc/cmake/modules/LLVMLibCLibraryRules.cmake |  93 +-----
 libc/cmake/modules/LLVMLibCObjectRules.cmake  | 271 ++++--------------
 libc/cmake/modules/LLVMLibCTestRules.cmake    |  93 +-----
 4 files changed, 184 insertions(+), 402 deletions(-)

diff --git a/libc/cmake/modules/LLVMLibCFlagRules.cmake b/libc/cmake/modules/LLVMLibCFlagRules.cmake
index 9a48d38b02893..9bec716516f28 100644
--- a/libc/cmake/modules/LLVMLibCFlagRules.cmake
+++ b/libc/cmake/modules/LLVMLibCFlagRules.cmake
@@ -129,6 +129,135 @@ function(get_fq_dep_list_without_flag output_list flag)
   set(${output_list} ${fq_dep_no_flag_list} PARENT_SCOPE)
 endfunction(get_fq_dep_list_without_flag)
 
+# Check if a `flag` is set
+function(check_flag result flag_name)
+  list(FIND ARGN ${flag_name}_FLAG has_flag)
+  if(${has_flag} LESS 0)
+    list(FIND ARGN "${flag_name}_FLAG__ONLY" has_flag)
+  endif()
+  if(${has_flag} GREATER -1)
+    set(${result} TRUE PARENT_SCOPE)
+  else()
+    set(${result} FALSE PARENT_SCOPE)
+  endif()
+endfunction(check_flag)
+
+# Generate all flags' combinations and call the corresponding function provided
+# by `CREATE_TARGET` to create a target for each combination.
+function(expand_flags_for_target target_name flags)
+  cmake_parse_arguments(
+    "EXPAND_FLAGS"
+    "" # Optional arguments
+    "CREATE_TARGET" # Single-value arguments
+    "DEPENDS;FLAGS" # Multi-value arguments
+    ${ARGN}
+  )
+
+  list(LENGTH flags nflags)
+  if(NOT ${nflags})
+    cmake_language(CALL ${EXPAND_FLAGS_CREATE_TARGET}
+      ${target_name}
+      ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
+      DEPENDS ${EXPAND_FLAGS_DEPENDS}
+      FLAGS ${EXPAND_FLAGS_FLAGS}
+    )
+    return()
+  endif()
+
+  list(GET flags 0 flag)
+  list(REMOVE_AT flags 0)
+  extract_flag_modifier(${flag} real_flag modifier)
+
+  if(NOT "${modifier}" STREQUAL "NO")
+    expand_flags_for_target(
+      ${target_name}
+      "${flags}"
+      DEPENDS ${EXPAND_FLAGS_DEPENDS}
+      FLAGS ${EXPAND_FLAGS_FLAGS}
+      CREATE_TARGET ${EXPAND_FLAGS_CREATE_TARGET}
+      ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
+    )
+  endif()
+
+  if("${real_flag}" STREQUAL "" OR "${modifier}" STREQUAL "ONLY")
+    return()
+  endif()
+
+  set(NEW_FLAGS ${EXPAND_FLAGS_FLAGS})
+  list(REMOVE_ITEM NEW_FLAGS ${flag})
+  get_fq_dep_list_without_flag(NEW_DEPS ${real_flag} ${EXPAND_FLAGS_DEPENDS})
+
+  # Only target with `flag` has `.__NO_flag` target, `flag__NO` and
+  # `flag__ONLY` do not.
+  if("${modifier}" STREQUAL "")
+    set(TARGET_NAME "${target_name}.__NO_${flag}")
+  else()
+    set(TARGET_NAME "${target_name}")
+  endif()
+
+  expand_flags_for_target(
+    ${TARGET_NAME}
+    "${flags}"
+    DEPENDS ${NEW_DEPS}
+    FLAGS ${NEW_FLAGS}
+    CREATE_TARGET ${EXPAND_FLAGS_CREATE_TARGET}
+    ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
+  )
+endfunction(expand_flags_for_target)
+
+# Collect all flags from a target's dependency, and then forward to
+# `expand_flags_for_target to generate all flags' combinations and call
+# the corresponding function provided by `CREATE_TARGET` to create a target for
+# each combination.
+function(add_target_with_flags target_name)
+  cmake_parse_arguments(
+    "ADD_TO_EXPAND"
+    "" # Optional arguments
+    "CREATE_TARGET;" # Single value arguments
+    "DEPENDS;FLAGS;ADD_FLAGS" # Multi-value arguments
+    ${ARGN}
+  )
+
+  if(NOT target_name)
+    message(FATAL_ERROR "Bad target name")
+  endif()
+
+  if(NOT ADD_TO_EXPAND_CREATE_TARGET)
+    message(FATAL_ERROR "Missing function to create targets.  Please specify "
+                        "`CREATE_TARGET <function>`")
+  endif()
+
+  get_fq_target_name(${target_name} fq_target_name)
+
+  if(ADD_TO_EXPAND_DEPENDS AND ("${SHOW_INTERMEDIATE_OBJECTS}" STREQUAL "DEPS"))
+    message(STATUS "Gathering FLAGS from dependencies for ${fq_target_name}")
+  endif()
+
+  get_fq_deps_list(fq_deps_list ${ADD_TO_EXPAND_DEPENDS})
+  get_flags_from_dep_list(deps_flag_list ${fq_deps_list})
+
+  # Appending ADD_FLAGS before flags from dependency.
+  if(ADD_TO_EXPAND_ADD_FLAGS)
+    list(APPEND ADD_TO_EXPAND_FLAGS ${ADD_TO_EXPAND_ADD_FLAGS})
+  endif()
+  list(APPEND ADD_TO_EXPAND_FLAGS ${deps_flag_list})
+  remove_duplicated_flags("${ADD_TO_EXPAND_FLAGS}" flags)
+  list(SORT flags)
+
+  if(SHOW_INTERMEDIATE_OBJECTS AND flags)
+    message(STATUS "Target ${fq_target_name} has FLAGS: ${flags}")
+  endif()
+
+  expand_flags_for_target(
+    ${fq_target_name}
+    "${flags}"
+    DEPENDS "${fq_deps_list}"
+    FLAGS "${flags}"
+    CREATE_TARGET ${ADD_TO_EXPAND_CREATE_TARGET}
+    ${ADD_TO_EXPAND_UNPARSED_ARGUMENTS}
+  )
+endfunction(add_target_with_flags)
+
 # Special flags
 set(FMA_OPT_FLAG "FMA_OPT")
 set(ROUND_OPT_FLAG "ROUND_OPT")
diff --git a/libc/cmake/modules/LLVMLibCLibraryRules.cmake b/libc/cmake/modules/LLVMLibCLibraryRules.cmake
index adb3bdeea2cb3..81c207ec23176 100644
--- a/libc/cmake/modules/LLVMLibCLibraryRules.cmake
+++ b/libc/cmake/modules/LLVMLibCLibraryRules.cmake
@@ -207,97 +207,10 @@ endfunction(create_header_library)
 #      FLAGS <list of flags>
 #    )
 
-# Internal function, used by `add_header_library`.
-function(expand_flags_for_header_library target_name flags)
-  cmake_parse_arguments(
-    "EXPAND_FLAGS"
-    "IGNORE_MARKER" # Optional arguments
-    "" # Single-value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
-    ${ARGN}
-  )
-
-  list(LENGTH flags nflags)
-  if(NOT ${nflags})
-    create_header_library(
-      ${target_name}
-      DEPENDS ${EXPAND_FLAGS_DEPENDS}
-      FLAGS ${EXPAND_FLAGS_FLAGS}
-      ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
-    )
-    return()
-  endif()
-
-  list(GET flags 0 flag)
-  list(REMOVE_AT flags 0)
-  extract_flag_modifier(${flag} real_flag modifier)
-
-  if(NOT "${modifier}" STREQUAL "NO")
-    expand_flags_for_header_library(
-      ${target_name}
-      "${flags}"
-      DEPENDS ${EXPAND_FLAGS_DEPENDS} IGNORE_MARKER
-      FLAGS ${EXPAND_FLAGS_FLAGS} IGNORE_MARKER
-      ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
-    )
-  endif()
-
-  if("${real_flag}" STREQUAL "" OR "${modifier}" STREQUAL "ONLY")
-    return()
-  endif()
-
-  set(NEW_FLAGS ${EXPAND_FLAGS_FLAGS})
-  list(REMOVE_ITEM NEW_FLAGS ${flag})
-  get_fq_dep_list_without_flag(NEW_DEPS ${real_flag} ${EXPAND_FLAGS_DEPENDS})
-
-  # Only target with `flag` has `.__NO_flag` target, `flag__NO` and
-  # `flag__ONLY` do not.
-  if("${modifier}" STREQUAL "")
-    set(TARGET_NAME "${target_name}.__NO_${flag}")
-  else()
-    set(TARGET_NAME "${target_name}")
-  endif()
-
-  expand_flags_for_header_library(
-    ${TARGET_NAME}
-    "${flags}"
-    DEPENDS ${NEW_DEPS} IGNORE_MARKER
-    FLAGS ${NEW_FLAGS} IGNORE_MARKER
-    ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
-  )
-endfunction(expand_flags_for_header_library)
-
 function(add_header_library target_name)
-  cmake_parse_arguments(
-    "ADD_TO_EXPAND"
-    "" # Optional arguments
-    "" # Single value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
+  add_target_with_flags(
+    ${target_name}
+    CREATE_TARGET create_header_library
     ${ARGN}
   )
-
-  get_fq_target_name(${target_name} fq_target_name)
-
-  if(ADD_TO_EXPAND_DEPENDS AND ("${SHOW_INTERMEDIATE_OBJECTS}" STREQUAL "DEPS"))
-    message(STATUS "Gathering FLAGS from dependencies for ${fq_target_name}")
-  endif()
-
-  get_fq_deps_list(fq_deps_list ${ADD_TO_EXPAND_DEPENDS})
-  get_flags_from_dep_list(deps_flag_list ${fq_deps_list})
-  
-  list(APPEND ADD_TO_EXPAND_FLAGS ${deps_flag_list})
-  remove_duplicated_flags("${ADD_TO_EXPAND_FLAGS}" flags)
-  list(SORT flags)
-
-  if(SHOW_INTERMEDIATE_OBJECTS AND flags)
-    message(STATUS "Header library ${fq_target_name} has FLAGS: ${flags}")
-  endif()
-
-  expand_flags_for_header_library(
-    ${fq_target_name}
-    "${flags}"
-    DEPENDS ${fq_deps_list} IGNORE_MARKER
-    FLAGS ${flags} IGNORE_MARKER
-    ${ADD_TO_EXPAND_UNPARSED_ARGUMENTS}
-  )
 endfunction(add_header_library)
diff --git a/libc/cmake/modules/LLVMLibCObjectRules.cmake b/libc/cmake/modules/LLVMLibCObjectRules.cmake
index c5c9f7c09a144..4213fe15731f1 100644
--- a/libc/cmake/modules/LLVMLibCObjectRules.cmake
+++ b/libc/cmake/modules/LLVMLibCObjectRules.cmake
@@ -1,32 +1,46 @@
 set(OBJECT_LIBRARY_TARGET_TYPE "OBJECT_LIBRARY")
 
-function(_get_common_compile_options output_var flags)
-  list(FIND flags ${FMA_OPT_FLAG} fma)
-  if(${fma} LESS 0)
-    list(FIND flags "${FMA_OPT_FLAG}__ONLY" fma)
-  endif()
-  if((${fma} GREATER -1) AND (LIBC_TARGET_ARCHITECTURE_IS_RISCV64 OR
-                              (LIBC_CPU_FEATURES MATCHES "FMA")))
-    set(ADD_FMA_FLAG TRUE)
-  endif()
+function(_get_compile_options_from_flags output_var)
+  set(compile_options "")
 
-  list(FIND flags ${ROUND_OPT_FLAG} round)
-  if(${round} LESS 0)
-    list(FIND flags "${ROUND_OPT_FLAG}__ONLY" round)
+  if(LIBC_TARGET_ARCHITECTURE_IS_RISCV64 OR(LIBC_CPU_FEATURES MATCHES "FMA"))
+    check_flag(ADD_FMA_FLAG ${FMA_OPT_FLAG} ${flags})
   endif()
-  if((${round} GREATER -1) AND (LIBC_CPU_FEATURES MATCHES "SSE4_2"))
-    set(ADD_SSE4_2_FLAG TRUE)
+  check_flag(ADD_SSE4_2_FLAG ${ROUND_OPT_FLAG} ${flags})
+  check_flag(ADD_EXPLICIT_SIMD_OPT_FLAG ${EXPLICIT_SIMD_OPT_FLAG} ${flags})
+  
+  if(LLVM_COMPILER_IS_GCC_COMPATIBLE)
+    if(ADD_FMA_FLAG)
+      if(LIBC_TARGET_ARCHITECTURE_IS_X86)
+        list(APPEND compile_options "-mavx2")
+        list(APPEND compile_options "-mfma")
+      elseif(LIBC_TARGET_ARCHITECTURE_IS_RISCV64)
+        list(APPEND compile_options "-D__LIBC_RISCV_USE_FMA")
+      endif()
+    endif()
+    if(ADD_SSE4_2_FLAG)
+      list(APPEND compile_options "-msse4.2")
+    endif()
+    if(ADD_EXPLICIT_SIMD_OPT_FLAG)
+      list(APPEND compile_options "-D__LIBC_EXPLICIT_SIMD_OPT")
+    endif()
+  elseif(MSVC)
+    if(ADD_FMA_FLAG)
+      list(APPEND compile_options "/arch:AVX2")
+    endif()
+    if(ADD_EXPLICIT_SIMD_OPT_FLAG)
+      list(APPEND compile_options "/D__LIBC_EXPLICIT_SIMD_OPT")
+    endif()
   endif()
 
-  list(FIND flags ${EXPLICIT_SIMD_OPT_FLAG} explicit_simd)
-  if(${explicit_simd} LESS 0)
-    list(FIND flags "${EXPLICIT_SIMD_OPT_FLAG}__ONLY" explicit_simd)
-  endif()
-  if(${explicit_simd} GREATER -1)
-    set(ADD_EXPLICIT_SIMD_OPT_FLAG TRUE)
-  endif()
+  set(${output_var} ${compile_options} PARENT_SCOPE)
+endfunction(_get_compile_options_from_flags)
+
+function(_get_common_compile_options output_var flags)
+  _get_compile_options_from_flags(compile_flags ${flags})
+
+  set(compile_options ${LIBC_COMPILE_OPTIONS_DEFAULT} ${compile_flags})
 
-  set(compile_options ${LIBC_COMPILE_OPTIONS_DEFAULT})
   if(LLVM_COMPILER_IS_GCC_COMPATIBLE)
     list(APPEND compile_options "-fpie")
 
@@ -62,29 +76,9 @@ function(_get_common_compile_options output_var flags)
       list(APPEND compile_options "-Wthread-safety")
       list(APPEND compile_options "-Wglobal-constructors")
     endif()
-    if(ADD_FMA_FLAG)
-      if(LIBC_TARGET_ARCHITECTURE_IS_X86)
-        list(APPEND compile_options "-mavx2")
-        list(APPEND compile_options "-mfma")
-      elseif(LIBC_TARGET_ARCHITECTURE_IS_RISCV64)
-        list(APPEND compile_options "-D__LIBC_RISCV_USE_FMA")
-      endif()
-    endif()
-    if(ADD_SSE4_2_FLAG)
-      list(APPEND compile_options "-msse4.2")
-    endif()
-    if(ADD_EXPLICIT_SIMD_OPT_FLAG)
-      list(APPEND compile_options "-D__LIBC_EXPLICIT_SIMD_OPT")
-    endif()
   elseif(MSVC)
     list(APPEND compile_options "/EHs-c-")
     list(APPEND compile_options "/GR-")
-    if(ADD_FMA_FLAG)
-      list(APPEND compile_options "/arch:AVX2")
-    endif()
-    if(ADD_EXPLICIT_SIMD_OPT_FLAG)
-      list(APPEND compile_options "/D__LIBC_EXPLICIT_SIMD_OPT")
-    endif()
   endif()
   if (LIBC_TARGET_ARCHITECTURE_IS_GPU)
     list(APPEND compile_options "-nogpulib")
@@ -428,99 +422,11 @@ function(create_object_library fq_target_name)
   endif()
 endfunction(create_object_library)
 
-# Internal function, used by `add_object_library`.
-function(expand_flags_for_object_library target_name flags)
-  cmake_parse_arguments(
-    "EXPAND_FLAGS"
-    "IGNORE_MARKER" # Optional arguments
-    "" # Single-value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
-    ${ARGN}
-  )
-
-  list(LENGTH flags nflags)
-  if(NOT ${nflags})
-    create_object_library(
-      ${target_name}
-      DEPENDS ${EXPAND_FLAGS_DEPENDS}
-      FLAGS ${EXPAND_FLAGS_FLAGS}
-      ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
-    )
-    return()
-  endif()
-
-  list(GET flags 0 flag)
-  list(REMOVE_AT flags 0)
-  extract_flag_modifier(${flag} real_flag modifier)
-
-  if(NOT "${modifier}" STREQUAL "NO")
-    expand_flags_for_object_library(
-      ${target_name}
-      "${flags}"
-      DEPENDS "${EXPAND_FLAGS_DEPENDS}" IGNORE_MARKER
-      FLAGS "${EXPAND_FLAGS_FLAGS}" IGNORE_MARKER
-      "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-    )
-  endif()
-
-  if("${real_flag}" STREQUAL "" OR "${modifier}" STREQUAL "ONLY")
-    return()
-  endif()
-
-  set(NEW_FLAGS ${EXPAND_FLAGS_FLAGS})
-  list(REMOVE_ITEM NEW_FLAGS ${flag})
-  get_fq_dep_list_without_flag(NEW_DEPS ${real_flag} ${EXPAND_FLAGS_DEPENDS})
-
-  # Only target with `flag` has `.__NO_flag` target, `flag__NO` and
-  # `flag__ONLY` do not.
-  if("${modifier}" STREQUAL "")
-    set(TARGET_NAME "${target_name}.__NO_${flag}")
-  else()
-    set(TARGET_NAME "${target_name}")
-  endif()
-
-  expand_flags_for_object_library(
-    ${TARGET_NAME}
-    "${flags}"
-    DEPENDS "${NEW_DEPS}" IGNORE_MARKER
-    FLAGS "${NEW_FLAGS}" IGNORE_MARKER
-    "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-  )
-endfunction(expand_flags_for_object_library)
-
 function(add_object_library target_name)
-  cmake_parse_arguments(
-    "ADD_TO_EXPAND"
-    "" # Optional arguments
-    "" # Single value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
-    ${ARGN}
-  )
-
-  get_fq_target_name(${target_name} fq_target_name)
-
-  if(ADD_TO_EXPAND_DEPENDS AND ("${SHOW_INTERMEDIATE_OBJECTS}" STREQUAL "DEPS"))
-    message(STATUS "Gathering FLAGS from dependencies for ${fq_target_name}")
-  endif()
-
-  get_fq_deps_list(fq_deps_list ${ADD_TO_EXPAND_DEPENDS})
-  get_flags_from_dep_list(deps_flag_list ${fq_deps_list})
-
-  list(APPEND ADD_TO_EXPAND_FLAGS ${deps_flag_list})
-  remove_duplicated_flags("${ADD_TO_EXPAND_FLAGS}" flags)
-  list(SORT flags)
-
-  if(SHOW_INTERMEDIATE_OBJECTS AND flags)
-    message(STATUS "Object library ${fq_target_name} has FLAGS: ${flags}")
-  endif()
-
-  expand_flags_for_object_library(
-    ${fq_target_name}
-    "${flags}"
-    DEPENDS "${fq_deps_list}" IGNORE_MARKER
-    FLAGS "${flags}" IGNORE_MARKER
-    ${ADD_TO_EXPAND_UNPARSED_ARGUMENTS}
-  )
+  add_target_with_flags(
+    ${target_name}
+    CREATE_TARGET create_object_library
+    ${ARGN})
 endfunction(add_object_library)
 
 set(ENTRYPOINT_OBJ_TARGET_TYPE "ENTRYPOINT_OBJ")
@@ -790,103 +696,24 @@ function(create_entrypoint_object fq_target_name)
 
 endfunction(create_entrypoint_object)
 
-# Internal function, used by `add_entrypoint_object`.
-function(expand_flags_for_entrypoint_object target_name flags)
-  cmake_parse_arguments(
-    "EXPAND_FLAGS"
-    "IGNORE_MARKER" # Optional arguments
-    "" # Single-value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
-    ${ARGN}
-  )
-
-  list(LENGTH flags nflags)
-  if(NOT ${nflags})
-    create_entrypoint_object(
-      ${target_name}
-      DEPENDS ${EXPAND_FLAGS_DEPENDS}
-      FLAGS ${EXPAND_FLAGS_FLAGS}
-      ${EXPAND_FLAGS_UNPARSED_ARGUMENTS}
-    )
-    return()
-  endif()
-
-  list(GET flags 0 flag)
-  list(REMOVE_AT flags 0)
-  extract_flag_modifier(${flag} real_flag modifier)
-
-  if(NOT "${modifier}" STREQUAL "NO")
-    expand_flags_for_entrypoint_object(
-      ${target_name}
-      "${flags}"
-      DEPENDS "${EXPAND_FLAGS_DEPENDS}" IGNORE_MARKER
-      FLAGS "${EXPAND_FLAGS_FLAGS}" IGNORE_MARKER
-      "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-    )
-  endif()
-
-  if("${real_flag}" STREQUAL "" OR "${modifier}" STREQUAL "ONLY")
-    return()
-  endif()
-
-  set(NEW_FLAGS ${EXPAND_FLAGS_FLAGS})
-  list(REMOVE_ITEM NEW_FLAGS ${flag})
-  get_fq_dep_list_without_flag(NEW_DEPS ${real_flag} ${EXPAND_FLAGS_DEPENDS})
-
-  # Only target with `flag` has `.__NO_flag` target, `flag__NO` and
-  # `flag__ONLY` do not.
-  if("${modifier}" STREQUAL "")
-    set(TARGET_NAME "${target_name}.__NO_${flag}")
-  else()
-    set(TARGET_NAME "${target_name}")
-  endif()
-
-  expand_flags_for_entrypoint_object(
-    ${TARGET_NAME}
-    "${flags}"
-    DEPENDS "${NEW_DEPS}" IGNORE_MARKER
-    FLAGS "${NEW_FLAGS}" IGNORE_MARKER
-    "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-  )
-endfunction(expand_flags_for_entrypoint_object)
-
 function(add_entrypoint_object target_name)
   cmake_parse_arguments(
-    "ADD_TO_EXPAND"
+    "ADD_ENTRYPOINT_OBJ"
     "" # Optional arguments
     "NAME" # Single value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
+    "" # Multi-value arguments
     ${ARGN}
   )
 
-  get_fq_target_name(${target_name} fq_target_name)
-
-  if(ADD_TO_EXPAND_DEPENDS AND ("${SHOW_INTERMEDIATE_OBJECTS}" STREQUAL "DEPS"))
-    message(STATUS "Gathering FLAGS from dependencies for ${fq_target_name}")
-  endif()
-
-  get_fq_deps_list(fq_deps_list ${ADD_TO_EXPAND_DEPENDS})
-  get_flags_from_dep_list(deps_flag_list ${fq_deps_list})
-
-  list(APPEND ADD_TO_EXPAND_FLAGS ${deps_flag_list})
-  remove_duplicated_flags("${ADD_TO_EXPAND_FLAGS}" flags)
-  list(SORT flags)
-
-  if(SHOW_INTERMEDIATE_OBJECTS AND flags)
-    message(STATUS "Entrypoint object ${fq_target_name} has FLAGS: ${flags}")
+  if(NOT ADD_ENTRYPOINT_OBJ_NAME)
+    set(ADD_ENTRYPOINT_OBJ_NAME ${target_name})
   endif()
 
-  if(NOT ADD_TO_EXPAND_NAME)
-    set(ADD_TO_EXPAND_NAME ${target_name})
-  endif()
-
-  expand_flags_for_entrypoint_object(
-    ${fq_target_name}
-    "${flags}"
-    NAME ${ADD_TO_EXPAND_NAME} IGNORE_MARKER
-    DEPENDS "${fq_deps_list}" IGNORE_MARKER
-    FLAGS "${flags}" IGNORE_MARKER
-    ${ADD_TO_EXPAND_UNPARSED_ARGUMENTS}
+  add_target_with_flags(
+    ${target_name}
+    NAME ${ADD_ENTRYPOINT_OBJ_NAME}
+    CREATE_TARGET create_entrypoint_object
+    ${ADD_ENTRYPOINT_OBJ_UNPARSED_ARGUMENTS}
   )
 endfunction(add_entrypoint_object)
 
diff --git a/libc/cmake/modules/LLVMLibCTestRules.cmake b/libc/cmake/modules/LLVMLibCTestRules.cmake
index 0d0a47b33aaeb..5b96c5e9f8c80 100644
--- a/libc/cmake/modules/LLVMLibCTestRules.cmake
+++ b/libc/cmake/modules/LLVMLibCTestRules.cmake
@@ -246,99 +246,12 @@ function(create_libc_unittest fq_target_name)
   add_dependencies(libc-unit-tests ${fq_target_name})
 endfunction(create_libc_unittest)
 
-# Internal function, used by `add_libc_unittest`.
-function(expand_flags_for_libc_unittest target_name flags)
-  cmake_parse_arguments(
-    "EXPAND_FLAGS"
-    "IGNORE_MARKER" # No Optional arguments
-    "" # No Single-value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
-    ${ARGN}
-  )
-
-  list(LENGTH flags nflags)
-  if(NOT ${nflags})
-    create_libc_unittest(
-      ${target_name}
-      DEPENDS "${EXPAND_FLAGS_DEPENDS}"
-      FLAGS "${EXPAND_FLAGS_FLAGS}"
-      "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-    )
-    return()
-  endif()
-
-  list(GET flags 0 flag)
-  list(REMOVE_AT flags 0)
-  extract_flag_modifier(${flag} real_flag modifier)
-
-  if(NOT "${modifier}" STREQUAL "NO")
-    expand_flags_for_libc_unittest(
-      ${target_name}
-      "${flags}"
-      DEPENDS "${EXPAND_FLAGS_DEPENDS}" IGNORE_MARKER
-      FLAGS "${EXPAND_FLAGS_FLAGS}" IGNORE_MARKER
-      "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-    )
-  endif()
-
-  if("${real_flag}" STREQUAL "" OR "${modifier}" STREQUAL "ONLY")
-    return()
-  endif()
-
-  set(NEW_FLAGS ${EXPAND_FLAGS_FLAGS})
-  list(REMOVE_ITEM NEW_FLAGS ${flag})
-  get_fq_dep_list_without_flag(NEW_DEPS ${real_flag} ${EXPAND_FLAGS_DEPENDS})
-
-  # Only target with `flag` has `.__NO_flag` target, `flag__NO` and
-  # `flag__ONLY` do not.
-  if("${modifier}" STREQUAL "")
-    set(TARGET_NAME "${target_name}.__NO_${flag}")
-  else()
-    set(TARGET_NAME "${target_name}")
-  endif()
-
-  expand_flags_for_libc_unittest(
-    ${TARGET_NAME}
-    "${flags}"
-    DEPENDS "${NEW_DEPS}" IGNORE_MARKER
-    FLAGS "${NEW_FLAGS}" IGNORE_MARKER
-    "${EXPAND_FLAGS_UNPARSED_ARGUMENTS}"
-  )
-endfunction(expand_flags_for_libc_unittest)
-
 function(add_libc_unittest target_name)
-  cmake_parse_arguments(
-    "ADD_TO_EXPAND"
-    "" # Optional arguments
-    "" # Single value arguments
-    "DEPENDS;FLAGS" # Multi-value arguments
+  add_target_with_flags(
+    ${target_name}
+    CREATE_TARGET create_libc_unittest
     ${ARGN}
   )
-
-  get_fq_target_name(${target_name} fq_target_name)
-
-  if(ADD_TO_EXPAND_DEPENDS AND ("${SHOW_INTERMEDIATE_OBJECTS}" STREQUAL "DEPS"))
-    message(STATUS "Gathering FLAGS from dependencies for ${fq_target_name}")
-  endif()
-
-  get_fq_deps_list(fq_deps_list ${ADD_TO_EXPAND_DEPENDS})
-  get_flags_from_dep_list(deps_flag_list ${fq_deps_list})
-  
-  list(APPEND ADD_TO_EXPAND_FLAGS ${deps_flag_list})
-  remove_duplicated_flags("${ADD_TO_EXPAND_FLAGS}" flags)
-  list(SORT flags)
-
-  if(SHOW_INTERMEDIATE_OBJECTS AND flags)
-    message(STATUS "Unit test ${fq_target_name} has FLAGS: ${flags}")
-  endif()
-
-  expand_flags_for_libc_unittest(
-    ${fq_target_name}
-    "${flags}"
-    DEPENDS ${fq_deps_list} IGNORE_MARKER
-    FLAGS ${flags} IGNORE_MARKER
-    ${ADD_TO_EXPAND_UNPARSED_ARGUMENTS}
-  )
 endfunction(add_libc_unittest)
 
 function(add_libc_exhaustive_testsuite suite_name)

>From f326d9282148d8133a1529dd5ea7ada29f12f970 Mon Sep 17 00:00:00 2001
From: Paschalis Mpeis <paschalis.mpeis at arm.com>
Date: Thu, 1 Feb 2024 17:08:30 +0000
Subject: [PATCH 34/42] [NFC] Reorder test lines in arith-fp-frem.ll (#79991)

Run lines appear in a more natural order:
- no veclib (neon, sve)
- neon + veclib
- sve + veclib
- sve + tailfold + veclib
---
 .../CostModel/AArch64/arith-fp-frem.ll        | 96 +++++++++----------
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
index eb2db1596bef7..20e0ef7ea3428 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
@@ -2,20 +2,20 @@
 
 ; RUN: opt -mattr=+neon -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=NEON-NO-VECLIB
 
-; RUN: opt -mattr=+neon -vector-library=sleefgnuabi -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=NEON-SLEEF
-
-; RUN: opt -mattr=+neon -vector-library=ArmPL -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=NEON-ARMPL
-
 ; RUN: opt -mattr=+sve -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-NO-VECLIB
 
-; RUN: opt -mattr=+sve -vector-library=sleefgnuabi -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-SLEEF
+; RUN: opt -mattr=+neon -vector-library=ArmPL -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=NEON-ARMPL
 
-; RUN: opt -mattr=+sve -vector-library=sleefgnuabi -passes=loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-SLEEF-TAILFOLD
+; RUN: opt -mattr=+neon -vector-library=sleefgnuabi -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=NEON-SLEEF
 
 ; RUN: opt -mattr=+sve -vector-library=ArmPL -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-ARMPL
 
+; RUN: opt -mattr=+sve -vector-library=sleefgnuabi -passes=loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-SLEEF
+
 ; RUN: opt -mattr=+sve -vector-library=ArmPL -passes=loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-ARMPL-TAILFOLD
 
+; RUN: opt -mattr=+sve -vector-library=sleefgnuabi -passes=loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 | FileCheck %s --check-prefix=SVE-SLEEF-TAILFOLD
+
 ; REQUIRES: asserts
 
 target triple = "aarch64-unknown-linux-gnu"
@@ -25,31 +25,19 @@ define void @frem_f64(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; NEON-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
 ; NEON-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
 ;
-; NEON-SLEEF-LABEL: 'frem_f64'
-; NEON-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
-;
-; NEON-ARMPL-LABEL: 'frem_f64'
-; NEON-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
-;
 ; SVE-NO-VECLIB-LABEL: 'frem_f64'
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
-; SVE-SLEEF-LABEL: 'frem_f64'
-; SVE-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
+; NEON-ARMPL-LABEL: 'frem_f64'
+; NEON-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
 ;
-; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f64'
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
+; NEON-SLEEF-LABEL: 'frem_f64'
+; NEON-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
 ;
 ; SVE-ARMPL-LABEL: 'frem_f64'
 ; SVE-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
@@ -57,11 +45,23 @@ define void @frem_f64(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; SVE-ARMPL:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
+; SVE-SLEEF-LABEL: 'frem_f64'
+; SVE-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
+;
 ; SVE-ARMPL-TAILFOLD-LABEL: 'frem_f64'
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
+;
+; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f64'
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
   entry:
   br label %for.body
@@ -87,16 +87,6 @@ define void @frem_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; NEON-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
 ; NEON-NO-VECLIB:  LV: Found an estimated cost of 20 for VF 4 For instruction: %res = frem float %in, %in
 ;
-; NEON-SLEEF-LABEL: 'frem_f32'
-; NEON-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; NEON-SLEEF:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
-;
-; NEON-ARMPL-LABEL: 'frem_f32'
-; NEON-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; NEON-ARMPL:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
-;
 ; SVE-NO-VECLIB-LABEL: 'frem_f32'
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
@@ -105,21 +95,15 @@ define void @frem_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
-; SVE-SLEEF-LABEL: 'frem_f32'
-; SVE-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
+; NEON-ARMPL-LABEL: 'frem_f32'
+; NEON-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ;
-; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f32'
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
+; NEON-SLEEF-LABEL: 'frem_f32'
+; NEON-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ;
 ; SVE-ARMPL-LABEL: 'frem_f32'
 ; SVE-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
@@ -129,6 +113,14 @@ define void @frem_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; SVE-ARMPL:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
+; SVE-SLEEF-LABEL: 'frem_f32'
+; SVE-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
+;
 ; SVE-ARMPL-TAILFOLD-LABEL: 'frem_f32'
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
@@ -136,6 +128,14 @@ define void @frem_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
+;
+; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f32'
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
   entry:
   br label %for.body

>From 54065b9cd6b6a097fb770f50f218a6a93090f78c Mon Sep 17 00:00:00 2001
From: Alexey Bataev <a.bataev at outlook.com>
Date: Thu, 1 Feb 2024 08:56:36 -0800
Subject: [PATCH 35/42] [SLP][NFC]Add tests with strided loads, NFC.

---
 .../SLPVectorizer/RISCV/complex-loads.ll      | 512 ++++++++++++++++++
 .../RISCV/strided-loads-vectorized.ll         | 482 +++++++++++++++++
 .../strided-loads-with-external-indices.ll    |  39 ++
 .../strided-loads-with-external-use-ptr.ll    |  37 ++
 .../RISCV/strided-unsupported-type.ll         |  46 ++
 5 files changed, 1116 insertions(+)
 create mode 100644 llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
 create mode 100644 llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-vectorized.ll
 create mode 100644 llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-indices.ll
 create mode 100644 llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-use-ptr.ll
 create mode 100644 llvm/test/Transforms/SLPVectorizer/RISCV/strided-unsupported-type.ll

diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
new file mode 100644
index 0000000000000..ccc31193c7215
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
@@ -0,0 +1,512 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -S -mtriple riscv64-unknown-linux-gnu < %s --passes=slp-vectorizer -mattr=+v -slp-threshold=-80 | FileCheck %s
+
+define i32 @test(ptr %pix1, ptr %pix2, i64 %idx.ext, i64 %idx.ext63, ptr %add.ptr, ptr %add.ptr64) {
+; CHECK-LABEL: define i32 @test(
+; CHECK-SAME: ptr [[PIX1:%.*]], ptr [[PIX2:%.*]], i64 [[IDX_EXT:%.*]], i64 [[IDX_EXT63:%.*]], ptr [[ADD_PTR:%.*]], ptr [[ADD_PTR64:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i8>, ptr [[PIX1]], align 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i8>, ptr [[PIX2]], align 1
+; CHECK-NEXT:    [[ADD_PTR3:%.*]] = getelementptr i8, ptr [[PIX1]], i64 [[IDX_EXT]]
+; CHECK-NEXT:    [[ADD_PTR644:%.*]] = getelementptr i8, ptr [[PIX2]], i64 [[IDX_EXT63]]
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i8>, ptr [[ADD_PTR3]], align 1
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i8>, ptr [[ADD_PTR644]], align 1
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <8 x ptr> poison, ptr [[PIX1]], i32 1
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <8 x ptr> [[TMP4]], ptr [[ADD_PTR3]], i32 0
+; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <8 x ptr> [[TMP5]], <8 x ptr> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
+; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr i8, <8 x ptr> [[TMP6]], <8 x i64> <i64 4, i64 5, i64 6, i64 7, i64 4, i64 5, i64 6, i64 7>
+; CHECK-NEXT:    [[TMP8:%.*]] = call <8 x i8> @llvm.masked.gather.v8i8.v8p0(<8 x ptr> [[TMP7]], i32 1, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i8> poison)
+; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <8 x ptr> poison, ptr [[PIX2]], i32 1
+; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <8 x ptr> [[TMP9]], ptr [[ADD_PTR644]], i32 0
+; CHECK-NEXT:    [[TMP11:%.*]] = shufflevector <8 x ptr> [[TMP10]], <8 x ptr> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr i8, <8 x ptr> [[TMP11]], <8 x i64> <i64 4, i64 5, i64 6, i64 7, i64 4, i64 5, i64 6, i64 7>
+; CHECK-NEXT:    [[TMP13:%.*]] = call <8 x i8> @llvm.masked.gather.v8i8.v8p0(<8 x ptr> [[TMP12]], i32 1, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i8> poison)
+; CHECK-NEXT:    [[ADD_PTR_1:%.*]] = getelementptr i8, ptr [[ADD_PTR]], i64 [[IDX_EXT]]
+; CHECK-NEXT:    [[ADD_PTR64_1:%.*]] = getelementptr i8, ptr [[ADD_PTR64]], i64 [[IDX_EXT63]]
+; CHECK-NEXT:    [[ARRAYIDX3_2:%.*]] = getelementptr i8, ptr [[ADD_PTR_1]], i64 4
+; CHECK-NEXT:    [[ARRAYIDX5_2:%.*]] = getelementptr i8, ptr [[ADD_PTR64_1]], i64 4
+; CHECK-NEXT:    [[TMP14:%.*]] = load <4 x i8>, ptr [[ADD_PTR_1]], align 1
+; CHECK-NEXT:    [[TMP15:%.*]] = load <4 x i8>, ptr [[ADD_PTR64_1]], align 1
+; CHECK-NEXT:    [[TMP16:%.*]] = load <4 x i8>, ptr [[ARRAYIDX3_2]], align 1
+; CHECK-NEXT:    [[TMP17:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_2]], align 1
+; CHECK-NEXT:    [[ARRAYIDX3_3:%.*]] = getelementptr i8, ptr null, i64 4
+; CHECK-NEXT:    [[ARRAYIDX5_3:%.*]] = getelementptr i8, ptr null, i64 4
+; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <2 x ptr> <ptr null, ptr poison>, ptr [[ARRAYIDX3_3]], i32 1
+; CHECK-NEXT:    [[TMP19:%.*]] = call <2 x i8> @llvm.masked.gather.v2i8.v2p0(<2 x ptr> [[TMP18]], i32 1, <2 x i1> <i1 true, i1 true>, <2 x i8> poison)
+; CHECK-NEXT:    [[TMP20:%.*]] = load i8, ptr null, align 1
+; CHECK-NEXT:    [[TMP21:%.*]] = load <4 x i8>, ptr null, align 1
+; CHECK-NEXT:    [[TMP22:%.*]] = load <4 x i8>, ptr null, align 1
+; CHECK-NEXT:    [[TMP23:%.*]] = load i8, ptr null, align 1
+; CHECK-NEXT:    [[TMP24:%.*]] = load <4 x i8>, ptr [[ARRAYIDX5_3]], align 1
+; CHECK-NEXT:    [[TMP25:%.*]] = shufflevector <4 x i8> [[TMP21]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP26:%.*]] = shufflevector <4 x i8> [[TMP14]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP27:%.*]] = shufflevector <16 x i8> [[TMP25]], <16 x i8> [[TMP26]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP28:%.*]] = shufflevector <4 x i8> [[TMP2]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP29:%.*]] = shufflevector <16 x i8> [[TMP27]], <16 x i8> [[TMP28]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP30:%.*]] = shufflevector <4 x i8> [[TMP0]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP31:%.*]] = shufflevector <16 x i8> [[TMP29]], <16 x i8> [[TMP30]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
+; CHECK-NEXT:    [[TMP32:%.*]] = zext <16 x i8> [[TMP31]] to <16 x i32>
+; CHECK-NEXT:    [[TMP33:%.*]] = shufflevector <4 x i8> [[TMP22]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP34:%.*]] = shufflevector <4 x i8> [[TMP15]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP35:%.*]] = shufflevector <16 x i8> [[TMP33]], <16 x i8> [[TMP34]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP36:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP37:%.*]] = shufflevector <16 x i8> [[TMP35]], <16 x i8> [[TMP36]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP38:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP39:%.*]] = shufflevector <16 x i8> [[TMP37]], <16 x i8> [[TMP38]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
+; CHECK-NEXT:    [[TMP40:%.*]] = zext <16 x i8> [[TMP39]] to <16 x i32>
+; CHECK-NEXT:    [[TMP41:%.*]] = sub <16 x i32> [[TMP32]], [[TMP40]]
+; CHECK-NEXT:    [[TMP42:%.*]] = insertelement <16 x i8> poison, i8 [[TMP23]], i32 0
+; CHECK-NEXT:    [[TMP43:%.*]] = insertelement <16 x i8> [[TMP42]], i8 [[TMP20]], i32 1
+; CHECK-NEXT:    [[TMP44:%.*]] = shufflevector <2 x i8> [[TMP19]], <2 x i8> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP45:%.*]] = shufflevector <16 x i8> [[TMP43]], <16 x i8> [[TMP44]], <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP46:%.*]] = shufflevector <4 x i8> [[TMP16]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP47:%.*]] = shufflevector <16 x i8> [[TMP45]], <16 x i8> [[TMP46]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP48:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP49:%.*]] = shufflevector <16 x i8> [[TMP47]], <16 x i8> [[TMP48]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+; CHECK-NEXT:    [[TMP50:%.*]] = zext <16 x i8> [[TMP49]] to <16 x i32>
+; CHECK-NEXT:    [[TMP51:%.*]] = shufflevector <16 x i32> [[TMP50]], <16 x i32> poison, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP52:%.*]] = shufflevector <4 x i8> [[TMP24]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP53:%.*]] = shufflevector <4 x i8> [[TMP17]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP54:%.*]] = shufflevector <16 x i8> [[TMP52]], <16 x i8> [[TMP53]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP55:%.*]] = shufflevector <8 x i8> [[TMP13]], <8 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP56:%.*]] = shufflevector <16 x i8> [[TMP54]], <16 x i8> [[TMP55]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+; CHECK-NEXT:    [[TMP57:%.*]] = zext <16 x i8> [[TMP56]] to <16 x i32>
+; CHECK-NEXT:    [[TMP58:%.*]] = sub <16 x i32> [[TMP51]], [[TMP57]]
+; CHECK-NEXT:    [[TMP59:%.*]] = shl <16 x i32> [[TMP58]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
+; CHECK-NEXT:    [[TMP60:%.*]] = add <16 x i32> [[TMP59]], [[TMP41]]
+; CHECK-NEXT:    [[TMP61:%.*]] = shufflevector <16 x i32> [[TMP60]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
+; CHECK-NEXT:    [[TMP62:%.*]] = add <16 x i32> [[TMP60]], [[TMP61]]
+; CHECK-NEXT:    [[TMP63:%.*]] = sub <16 x i32> [[TMP60]], [[TMP61]]
+; CHECK-NEXT:    [[TMP64:%.*]] = shufflevector <16 x i32> [[TMP62]], <16 x i32> [[TMP63]], <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 22, i32 18, i32 26, i32 30, i32 5, i32 1, i32 9, i32 13, i32 20, i32 16, i32 24, i32 28>
+; CHECK-NEXT:    [[TMP65:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> poison, <16 x i32> <i32 9, i32 8, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 1, i32 0, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP66:%.*]] = add <16 x i32> [[TMP64]], [[TMP65]]
+; CHECK-NEXT:    [[TMP67:%.*]] = sub <16 x i32> [[TMP64]], [[TMP65]]
+; CHECK-NEXT:    [[TMP68:%.*]] = shufflevector <16 x i32> [[TMP66]], <16 x i32> [[TMP67]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
+; CHECK-NEXT:    [[TMP69:%.*]] = shufflevector <16 x i32> [[TMP68]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
+; CHECK-NEXT:    [[TMP70:%.*]] = add <16 x i32> [[TMP68]], [[TMP69]]
+; CHECK-NEXT:    [[TMP71:%.*]] = sub <16 x i32> [[TMP68]], [[TMP69]]
+; CHECK-NEXT:    [[TMP72:%.*]] = shufflevector <16 x i32> [[TMP70]], <16 x i32> [[TMP71]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 20, i32 5, i32 6, i32 23, i32 24, i32 9, i32 10, i32 27, i32 28, i32 13, i32 14, i32 31>
+; CHECK-NEXT:    [[TMP73:%.*]] = shufflevector <16 x i32> [[TMP72]], <16 x i32> poison, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>
+; CHECK-NEXT:    [[TMP74:%.*]] = add <16 x i32> [[TMP72]], [[TMP73]]
+; CHECK-NEXT:    [[TMP75:%.*]] = sub <16 x i32> [[TMP72]], [[TMP73]]
+; CHECK-NEXT:    [[TMP76:%.*]] = shufflevector <16 x i32> [[TMP74]], <16 x i32> [[TMP75]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
+; CHECK-NEXT:    [[TMP77:%.*]] = shufflevector <16 x i32> [[TMP32]], <16 x i32> [[TMP64]], <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 10, i32 27, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP78:%.*]] = lshr <16 x i32> [[TMP77]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
+; CHECK-NEXT:    [[TMP79:%.*]] = and <16 x i32> [[TMP78]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
+; CHECK-NEXT:    [[TMP80:%.*]] = mul <16 x i32> [[TMP79]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
+; CHECK-NEXT:    [[TMP81:%.*]] = add <16 x i32> [[TMP80]], [[TMP76]]
+; CHECK-NEXT:    [[TMP82:%.*]] = xor <16 x i32> [[TMP81]], [[TMP77]]
+; CHECK-NEXT:    [[TMP83:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP82]])
+; CHECK-NEXT:    ret i32 [[TMP83]]
+;
+entry:
+  %0 = load i8, ptr %pix1, align 1
+  %conv = zext i8 %0 to i32
+  %1 = load i8, ptr %pix2, align 1
+  %conv2 = zext i8 %1 to i32
+  %sub = sub i32 %conv, %conv2
+  %arrayidx3 = getelementptr i8, ptr %pix1, i64 4
+  %2 = load i8, ptr %arrayidx3, align 1
+  %conv4 = zext i8 %2 to i32
+  %arrayidx5 = getelementptr i8, ptr %pix2, i64 4
+  %3 = load i8, ptr %arrayidx5, align 1
+  %conv6 = zext i8 %3 to i32
+  %sub7 = sub i32 %conv4, %conv6
+  %shl = shl i32 %sub7, 16
+  %add = add i32 %shl, %sub
+  %arrayidx8 = getelementptr i8, ptr %pix1, i64 1
+  %4 = load i8, ptr %arrayidx8, align 1
+  %conv9 = zext i8 %4 to i32
+  %arrayidx10 = getelementptr i8, ptr %pix2, i64 1
+  %5 = load i8, ptr %arrayidx10, align 1
+  %conv11 = zext i8 %5 to i32
+  %sub12 = sub i32 %conv9, %conv11
+  %arrayidx13 = getelementptr i8, ptr %pix1, i64 5
+  %6 = load i8, ptr %arrayidx13, align 1
+  %conv14 = zext i8 %6 to i32
+  %arrayidx15 = getelementptr i8, ptr %pix2, i64 5
+  %7 = load i8, ptr %arrayidx15, align 1
+  %conv16 = zext i8 %7 to i32
+  %sub17 = sub i32 %conv14, %conv16
+  %shl18 = shl i32 %sub17, 16
+  %add19 = add i32 %shl18, %sub12
+  %arrayidx20 = getelementptr i8, ptr %pix1, i64 2
+  %8 = load i8, ptr %arrayidx20, align 1
+  %conv21 = zext i8 %8 to i32
+  %arrayidx22 = getelementptr i8, ptr %pix2, i64 2
+  %9 = load i8, ptr %arrayidx22, align 1
+  %conv23 = zext i8 %9 to i32
+  %sub24 = sub i32 %conv21, %conv23
+  %arrayidx25 = getelementptr i8, ptr %pix1, i64 6
+  %10 = load i8, ptr %arrayidx25, align 1
+  %conv26 = zext i8 %10 to i32
+  %arrayidx27 = getelementptr i8, ptr %pix2, i64 6
+  %11 = load i8, ptr %arrayidx27, align 1
+  %conv28 = zext i8 %11 to i32
+  %sub29 = sub i32 %conv26, %conv28
+  %shl30 = shl i32 %sub29, 16
+  %add31 = add i32 %shl30, %sub24
+  %arrayidx32 = getelementptr i8, ptr %pix1, i64 3
+  %12 = load i8, ptr %arrayidx32, align 1
+  %conv33 = zext i8 %12 to i32
+  %arrayidx34 = getelementptr i8, ptr %pix2, i64 3
+  %13 = load i8, ptr %arrayidx34, align 1
+  %conv35 = zext i8 %13 to i32
+  %sub36 = sub i32 %conv33, %conv35
+  %arrayidx37 = getelementptr i8, ptr %pix1, i64 7
+  %14 = load i8, ptr %arrayidx37, align 1
+  %conv38 = zext i8 %14 to i32
+  %arrayidx39 = getelementptr i8, ptr %pix2, i64 7
+  %15 = load i8, ptr %arrayidx39, align 1
+  %conv40 = zext i8 %15 to i32
+  %sub41 = sub i32 %conv38, %conv40
+  %shl42 = shl i32 %sub41, 16
+  %add43 = add i32 %shl42, %sub36
+  %add44 = add i32 %add19, %add
+  %sub45 = sub i32 %add, %add19
+  %add46 = add i32 %add43, %add31
+  %sub47 = sub i32 %add31, %add43
+  %add48 = add i32 %add46, %add44
+  %sub51 = sub i32 %add44, %add46
+  %add55 = add i32 %sub47, %sub45
+  %sub59 = sub i32 %sub45, %sub47
+  %add.ptr3 = getelementptr i8, ptr %pix1, i64 %idx.ext
+  %add.ptr644 = getelementptr i8, ptr %pix2, i64 %idx.ext63
+  %16 = load i8, ptr %add.ptr3, align 1
+  %conv.1 = zext i8 %16 to i32
+  %17 = load i8, ptr %add.ptr644, align 1
+  %conv2.1 = zext i8 %17 to i32
+  %sub.1 = sub i32 %conv.1, %conv2.1
+  %arrayidx3.1 = getelementptr i8, ptr %add.ptr3, i64 4
+  %18 = load i8, ptr %arrayidx3.1, align 1
+  %conv4.1 = zext i8 %18 to i32
+  %arrayidx5.1 = getelementptr i8, ptr %add.ptr644, i64 4
+  %19 = load i8, ptr %arrayidx5.1, align 1
+  %conv6.1 = zext i8 %19 to i32
+  %sub7.1 = sub i32 %conv4.1, %conv6.1
+  %shl.1 = shl i32 %sub7.1, 16
+  %add.1 = add i32 %shl.1, %sub.1
+  %arrayidx8.1 = getelementptr i8, ptr %add.ptr3, i64 1
+  %20 = load i8, ptr %arrayidx8.1, align 1
+  %conv9.1 = zext i8 %20 to i32
+  %arrayidx10.1 = getelementptr i8, ptr %add.ptr644, i64 1
+  %21 = load i8, ptr %arrayidx10.1, align 1
+  %conv11.1 = zext i8 %21 to i32
+  %sub12.1 = sub i32 %conv9.1, %conv11.1
+  %arrayidx13.1 = getelementptr i8, ptr %add.ptr3, i64 5
+  %22 = load i8, ptr %arrayidx13.1, align 1
+  %conv14.1 = zext i8 %22 to i32
+  %arrayidx15.1 = getelementptr i8, ptr %add.ptr644, i64 5
+  %23 = load i8, ptr %arrayidx15.1, align 1
+  %conv16.1 = zext i8 %23 to i32
+  %sub17.1 = sub i32 %conv14.1, %conv16.1
+  %shl18.1 = shl i32 %sub17.1, 16
+  %add19.1 = add i32 %shl18.1, %sub12.1
+  %arrayidx20.1 = getelementptr i8, ptr %add.ptr3, i64 2
+  %24 = load i8, ptr %arrayidx20.1, align 1
+  %conv21.1 = zext i8 %24 to i32
+  %arrayidx22.1 = getelementptr i8, ptr %add.ptr644, i64 2
+  %25 = load i8, ptr %arrayidx22.1, align 1
+  %conv23.1 = zext i8 %25 to i32
+  %sub24.1 = sub i32 %conv21.1, %conv23.1
+  %arrayidx25.1 = getelementptr i8, ptr %add.ptr3, i64 6
+  %26 = load i8, ptr %arrayidx25.1, align 1
+  %conv26.1 = zext i8 %26 to i32
+  %arrayidx27.1 = getelementptr i8, ptr %add.ptr644, i64 6
+  %27 = load i8, ptr %arrayidx27.1, align 1
+  %conv28.1 = zext i8 %27 to i32
+  %sub29.1 = sub i32 %conv26.1, %conv28.1
+  %shl30.1 = shl i32 %sub29.1, 16
+  %add31.1 = add i32 %shl30.1, %sub24.1
+  %arrayidx32.1 = getelementptr i8, ptr %add.ptr3, i64 3
+  %28 = load i8, ptr %arrayidx32.1, align 1
+  %conv33.1 = zext i8 %28 to i32
+  %arrayidx34.1 = getelementptr i8, ptr %add.ptr644, i64 3
+  %29 = load i8, ptr %arrayidx34.1, align 1
+  %conv35.1 = zext i8 %29 to i32
+  %sub36.1 = sub i32 %conv33.1, %conv35.1
+  %arrayidx37.1 = getelementptr i8, ptr %add.ptr3, i64 7
+  %30 = load i8, ptr %arrayidx37.1, align 1
+  %conv38.1 = zext i8 %30 to i32
+  %arrayidx39.1 = getelementptr i8, ptr %add.ptr644, i64 7
+  %31 = load i8, ptr %arrayidx39.1, align 1
+  %conv40.1 = zext i8 %31 to i32
+  %sub41.1 = sub i32 %conv38.1, %conv40.1
+  %shl42.1 = shl i32 %sub41.1, 16
+  %add43.1 = add i32 %shl42.1, %sub36.1
+  %add44.1 = add i32 %add19.1, %add.1
+  %sub45.1 = sub i32 %add.1, %add19.1
+  %add46.1 = add i32 %add43.1, %add31.1
+  %sub47.1 = sub i32 %add31.1, %add43.1
+  %add48.1 = add i32 %add46.1, %add44.1
+  %sub51.1 = sub i32 %add44.1, %add46.1
+  %add55.1 = add i32 %sub47.1, %sub45.1
+  %sub59.1 = sub i32 %sub45.1, %sub47.1
+  %add.ptr.1 = getelementptr i8, ptr %add.ptr, i64 %idx.ext
+  %add.ptr64.1 = getelementptr i8, ptr %add.ptr64, i64 %idx.ext63
+  %32 = load i8, ptr %add.ptr.1, align 1
+  %conv.2 = zext i8 %32 to i32
+  %33 = load i8, ptr %add.ptr64.1, align 1
+  %conv2.2 = zext i8 %33 to i32
+  %sub.2 = sub i32 %conv.2, %conv2.2
+  %arrayidx3.2 = getelementptr i8, ptr %add.ptr.1, i64 4
+  %34 = load i8, ptr %arrayidx3.2, align 1
+  %conv4.2 = zext i8 %34 to i32
+  %arrayidx5.2 = getelementptr i8, ptr %add.ptr64.1, i64 4
+  %35 = load i8, ptr %arrayidx5.2, align 1
+  %conv6.2 = zext i8 %35 to i32
+  %sub7.2 = sub i32 %conv4.2, %conv6.2
+  %shl.2 = shl i32 %sub7.2, 16
+  %add.2 = add i32 %shl.2, %sub.2
+  %arrayidx8.2 = getelementptr i8, ptr %add.ptr.1, i64 1
+  %36 = load i8, ptr %arrayidx8.2, align 1
+  %conv9.2 = zext i8 %36 to i32
+  %arrayidx10.2 = getelementptr i8, ptr %add.ptr64.1, i64 1
+  %37 = load i8, ptr %arrayidx10.2, align 1
+  %conv11.2 = zext i8 %37 to i32
+  %sub12.2 = sub i32 %conv9.2, %conv11.2
+  %arrayidx13.2 = getelementptr i8, ptr %add.ptr.1, i64 5
+  %38 = load i8, ptr %arrayidx13.2, align 1
+  %conv14.2 = zext i8 %38 to i32
+  %arrayidx15.2 = getelementptr i8, ptr %add.ptr64.1, i64 5
+  %39 = load i8, ptr %arrayidx15.2, align 1
+  %conv16.2 = zext i8 %39 to i32
+  %sub17.2 = sub i32 %conv14.2, %conv16.2
+  %shl18.2 = shl i32 %sub17.2, 16
+  %add19.2 = add i32 %shl18.2, %sub12.2
+  %arrayidx20.2 = getelementptr i8, ptr %add.ptr.1, i64 2
+  %40 = load i8, ptr %arrayidx20.2, align 1
+  %conv21.2 = zext i8 %40 to i32
+  %arrayidx22.2 = getelementptr i8, ptr %add.ptr64.1, i64 2
+  %41 = load i8, ptr %arrayidx22.2, align 1
+  %conv23.2 = zext i8 %41 to i32
+  %sub24.2 = sub i32 %conv21.2, %conv23.2
+  %arrayidx25.2 = getelementptr i8, ptr %add.ptr.1, i64 6
+  %42 = load i8, ptr %arrayidx25.2, align 1
+  %conv26.2 = zext i8 %42 to i32
+  %arrayidx27.2 = getelementptr i8, ptr %add.ptr64.1, i64 6
+  %43 = load i8, ptr %arrayidx27.2, align 1
+  %conv28.2 = zext i8 %43 to i32
+  %sub29.2 = sub i32 %conv26.2, %conv28.2
+  %shl30.2 = shl i32 %sub29.2, 16
+  %add31.2 = add i32 %shl30.2, %sub24.2
+  %arrayidx32.2 = getelementptr i8, ptr %add.ptr.1, i64 3
+  %44 = load i8, ptr %arrayidx32.2, align 1
+  %conv33.2 = zext i8 %44 to i32
+  %arrayidx34.2 = getelementptr i8, ptr %add.ptr64.1, i64 3
+  %45 = load i8, ptr %arrayidx34.2, align 1
+  %conv35.2 = zext i8 %45 to i32
+  %sub36.2 = sub i32 %conv33.2, %conv35.2
+  %arrayidx37.2 = getelementptr i8, ptr %add.ptr.1, i64 7
+  %46 = load i8, ptr %arrayidx37.2, align 1
+  %conv38.2 = zext i8 %46 to i32
+  %arrayidx39.2 = getelementptr i8, ptr %add.ptr64.1, i64 7
+  %47 = load i8, ptr %arrayidx39.2, align 1
+  %conv40.2 = zext i8 %47 to i32
+  %sub41.2 = sub i32 %conv38.2, %conv40.2
+  %shl42.2 = shl i32 %sub41.2, 16
+  %add43.2 = add i32 %shl42.2, %sub36.2
+  %add44.2 = add i32 %add19.2, %add.2
+  %sub45.2 = sub i32 %add.2, %add19.2
+  %add46.2 = add i32 %add43.2, %add31.2
+  %sub47.2 = sub i32 %add31.2, %add43.2
+  %add48.2 = add i32 %add46.2, %add44.2
+  %sub51.2 = sub i32 %add44.2, %add46.2
+  %add55.2 = add i32 %sub47.2, %sub45.2
+  %sub59.2 = sub i32 %sub45.2, %sub47.2
+  %48 = load i8, ptr null, align 1
+  %conv.3 = zext i8 %48 to i32
+  %49 = load i8, ptr null, align 1
+  %conv2.3 = zext i8 %49 to i32
+  %sub.3 = sub i32 %conv.3, %conv2.3
+  %arrayidx3.3 = getelementptr i8, ptr null, i64 4
+  %50 = load i8, ptr %arrayidx3.3, align 1
+  %conv4.3 = zext i8 %50 to i32
+  %arrayidx5.3 = getelementptr i8, ptr null, i64 4
+  %51 = load i8, ptr %arrayidx5.3, align 1
+  %conv6.3 = zext i8 %51 to i32
+  %sub7.3 = sub i32 %conv4.3, %conv6.3
+  %shl.3 = shl i32 %sub7.3, 16
+  %add.3 = add i32 %shl.3, %sub.3
+  %arrayidx8.3 = getelementptr i8, ptr null, i64 1
+  %52 = load i8, ptr %arrayidx8.3, align 1
+  %conv9.3 = zext i8 %52 to i32
+  %arrayidx10.3 = getelementptr i8, ptr null, i64 1
+  %53 = load i8, ptr %arrayidx10.3, align 1
+  %conv11.3 = zext i8 %53 to i32
+  %sub12.3 = sub i32 %conv9.3, %conv11.3
+  %54 = load i8, ptr null, align 1
+  %conv14.3 = zext i8 %54 to i32
+  %arrayidx15.3 = getelementptr i8, ptr null, i64 5
+  %55 = load i8, ptr %arrayidx15.3, align 1
+  %conv16.3 = zext i8 %55 to i32
+  %sub17.3 = sub i32 %conv14.3, %conv16.3
+  %shl18.3 = shl i32 %sub17.3, 16
+  %add19.3 = add i32 %shl18.3, %sub12.3
+  %arrayidx20.3 = getelementptr i8, ptr null, i64 2
+  %56 = load i8, ptr %arrayidx20.3, align 1
+  %conv21.3 = zext i8 %56 to i32
+  %arrayidx22.3 = getelementptr i8, ptr null, i64 2
+  %57 = load i8, ptr %arrayidx22.3, align 1
+  %conv23.3 = zext i8 %57 to i32
+  %sub24.3 = sub i32 %conv21.3, %conv23.3
+  %58 = load i8, ptr null, align 1
+  %conv26.3 = zext i8 %58 to i32
+  %arrayidx27.3 = getelementptr i8, ptr null, i64 6
+  %59 = load i8, ptr %arrayidx27.3, align 1
+  %conv28.3 = zext i8 %59 to i32
+  %sub29.3 = sub i32 %conv26.3, %conv28.3
+  %shl30.3 = shl i32 %sub29.3, 16
+  %add31.3 = add i32 %shl30.3, %sub24.3
+  %arrayidx32.3 = getelementptr i8, ptr null, i64 3
+  %60 = load i8, ptr %arrayidx32.3, align 1
+  %conv33.3 = zext i8 %60 to i32
+  %arrayidx34.3 = getelementptr i8, ptr null, i64 3
+  %61 = load i8, ptr %arrayidx34.3, align 1
+  %conv35.3 = zext i8 %61 to i32
+  %sub36.3 = sub i32 %conv33.3, %conv35.3
+  %62 = load i8, ptr null, align 1
+  %conv38.3 = zext i8 %62 to i32
+  %arrayidx39.3 = getelementptr i8, ptr null, i64 7
+  %63 = load i8, ptr %arrayidx39.3, align 1
+  %conv40.3 = zext i8 %63 to i32
+  %sub41.3 = sub i32 %conv38.3, %conv40.3
+  %shl42.3 = shl i32 %sub41.3, 16
+  %add43.3 = add i32 %shl42.3, %sub36.3
+  %add44.3 = add i32 %add19.3, %add.3
+  %sub45.3 = sub i32 %add.3, %add19.3
+  %add46.3 = add i32 %add43.3, %add31.3
+  %sub47.3 = sub i32 %add31.3, %add43.3
+  %add48.3 = add i32 %add46.3, %add44.3
+  %sub51.3 = sub i32 %add44.3, %add46.3
+  %add55.3 = add i32 %sub47.3, %sub45.3
+  %sub59.3 = sub i32 %sub45.3, %sub47.3
+  %add78 = add i32 %add48.1, %add48
+  %sub86 = sub i32 %add48, %add48.1
+  %add94 = add i32 %add48.3, %add48.2
+  %sub102 = sub i32 %add48.2, %add48.3
+  %add103 = add i32 %add94, %add78
+  %sub104 = sub i32 %add78, %add94
+  %add105 = add i32 %sub102, %sub86
+  %sub106 = sub i32 %sub86, %sub102
+  %shr.i = lshr i32 %conv.3, 15
+  %and.i = and i32 %shr.i, 65537
+  %mul.i = mul i32 %and.i, 65535
+  %add.i = add i32 %mul.i, %add103
+  %xor.i = xor i32 %add.i, %conv.3
+  %shr.i49 = lshr i32 %add46.2, 15
+  %and.i50 = and i32 %shr.i49, 65537
+  %mul.i51 = mul i32 %and.i50, 65535
+  %add.i52 = add i32 %mul.i51, %add105
+  %xor.i53 = xor i32 %add.i52, %add46.2
+  %shr.i54 = lshr i32 %add46.1, 15
+  %and.i55 = and i32 %shr.i54, 65537
+  %mul.i56 = mul i32 %and.i55, 65535
+  %add.i57 = add i32 %mul.i56, %sub104
+  %xor.i58 = xor i32 %add.i57, %add46.1
+  %shr.i59 = lshr i32 %add46, 15
+  %and.i60 = and i32 %shr.i59, 65537
+  %mul.i61 = mul i32 %and.i60, 65535
+  %add.i62 = add i32 %mul.i61, %sub106
+  %xor.i63 = xor i32 %add.i62, %add46
+  %add110 = add i32 %xor.i53, %xor.i
+  %add112 = add i32 %add110, %xor.i58
+  %add113 = add i32 %add112, %xor.i63
+  %add78.1 = add i32 %add55.1, %add55
+  %sub86.1 = sub i32 %add55, %add55.1
+  %add94.1 = add i32 %add55.3, %add55.2
+  %sub102.1 = sub i32 %add55.2, %add55.3
+  %add103.1 = add i32 %add94.1, %add78.1
+  %sub104.1 = sub i32 %add78.1, %add94.1
+  %add105.1 = add i32 %sub102.1, %sub86.1
+  %sub106.1 = sub i32 %sub86.1, %sub102.1
+  %shr.i.1 = lshr i32 %conv9.2, 15
+  %and.i.1 = and i32 %shr.i.1, 65537
+  %mul.i.1 = mul i32 %and.i.1, 65535
+  %add.i.1 = add i32 %mul.i.1, %add103.1
+  %xor.i.1 = xor i32 %add.i.1, %conv9.2
+  %shr.i49.1 = lshr i32 %conv.2, 15
+  %and.i50.1 = and i32 %shr.i49.1, 65537
+  %mul.i51.1 = mul i32 %and.i50.1, 65535
+  %add.i52.1 = add i32 %mul.i51.1, %add105.1
+  %xor.i53.1 = xor i32 %add.i52.1, %conv.2
+  %shr.i54.1 = lshr i32 %sub47.1, 15
+  %and.i55.1 = and i32 %shr.i54.1, 65537
+  %mul.i56.1 = mul i32 %and.i55.1, 65535
+  %add.i57.1 = add i32 %mul.i56.1, %sub104.1
+  %xor.i58.1 = xor i32 %add.i57.1, %sub47.1
+  %shr.i59.1 = lshr i32 %sub47, 15
+  %and.i60.1 = and i32 %shr.i59.1, 65537
+  %mul.i61.1 = mul i32 %and.i60.1, 65535
+  %add.i62.1 = add i32 %mul.i61.1, %sub106.1
+  %xor.i63.1 = xor i32 %add.i62.1, %sub47
+  %add108.1 = add i32 %xor.i53.1, %add113
+  %add110.1 = add i32 %add108.1, %xor.i.1
+  %add112.1 = add i32 %add110.1, %xor.i58.1
+  %add113.1 = add i32 %add112.1, %xor.i63.1
+  %add78.2 = add i32 %sub51.1, %sub51
+  %sub86.2 = sub i32 %sub51, %sub51.1
+  %add94.2 = add i32 %sub51.3, %sub51.2
+  %sub102.2 = sub i32 %sub51.2, %sub51.3
+  %add103.2 = add i32 %add94.2, %add78.2
+  %sub104.2 = sub i32 %add78.2, %add94.2
+  %add105.2 = add i32 %sub102.2, %sub86.2
+  %sub106.2 = sub i32 %sub86.2, %sub102.2
+  %shr.i.2 = lshr i32 %conv9.1, 15
+  %and.i.2 = and i32 %shr.i.2, 65537
+  %mul.i.2 = mul i32 %and.i.2, 65535
+  %add.i.2 = add i32 %mul.i.2, %add103.2
+  %xor.i.2 = xor i32 %add.i.2, %conv9.1
+  %shr.i49.2 = lshr i32 %conv.1, 15
+  %and.i50.2 = and i32 %shr.i49.2, 65537
+  %mul.i51.2 = mul i32 %and.i50.2, 65535
+  %add.i52.2 = add i32 %mul.i51.2, %add105.2
+  %xor.i53.2 = xor i32 %add.i52.2, %conv.1
+  %shr.i54.2 = lshr i32 %conv21.1, 15
+  %and.i55.2 = and i32 %shr.i54.2, 65537
+  %mul.i56.2 = mul i32 %and.i55.2, 65535
+  %add.i57.2 = add i32 %mul.i56.2, %sub104.2
+  %xor.i58.2 = xor i32 %add.i57.2, %conv21.1
+  %shr.i59.2 = lshr i32 %add44, 15
+  %and.i60.2 = and i32 %shr.i59.2, 65537
+  %mul.i61.2 = mul i32 %and.i60.2, 65535
+  %add.i62.2 = add i32 %mul.i61.2, %sub106.2
+  %xor.i63.2 = xor i32 %add.i62.2, %add44
+  %add108.2 = add i32 %xor.i53.2, %add113.1
+  %add110.2 = add i32 %add108.2, %xor.i.2
+  %add112.2 = add i32 %add110.2, %xor.i58.2
+  %add113.2 = add i32 %add112.2, %xor.i63.2
+  %add78.3 = add i32 %sub59.1, %sub59
+  %sub86.3 = sub i32 %sub59, %sub59.1
+  %add94.3 = add i32 %sub59.3, %sub59.2
+  %sub102.3 = sub i32 %sub59.2, %sub59.3
+  %add103.3 = add i32 %add94.3, %add78.3
+  %sub104.3 = sub i32 %add78.3, %add94.3
+  %add105.3 = add i32 %sub102.3, %sub86.3
+  %sub106.3 = sub i32 %sub86.3, %sub102.3
+  %shr.i.3 = lshr i32 %conv9, 15
+  %and.i.3 = and i32 %shr.i.3, 65537
+  %mul.i.3 = mul i32 %and.i.3, 65535
+  %add.i.3 = add i32 %mul.i.3, %add103.3
+  %xor.i.3 = xor i32 %add.i.3, %conv9
+  %shr.i49.3 = lshr i32 %conv, 15
+  %and.i50.3 = and i32 %shr.i49.3, 65537
+  %mul.i51.3 = mul i32 %and.i50.3, 65535
+  %add.i52.3 = add i32 %mul.i51.3, %add105.3
+  %xor.i53.3 = xor i32 %add.i52.3, %conv
+  %shr.i54.3 = lshr i32 %conv21, 15
+  %and.i55.3 = and i32 %shr.i54.3, 65537
+  %mul.i56.3 = mul i32 %and.i55.3, 65535
+  %add.i57.3 = add i32 %mul.i56.3, %sub104.3
+  %xor.i58.3 = xor i32 %add.i57.3, %conv21
+  %shr.i59.3 = lshr i32 %conv33, 15
+  %and.i60.3 = and i32 %shr.i59.3, 65537
+  %mul.i61.3 = mul i32 %and.i60.3, 65535
+  %add.i62.3 = add i32 %mul.i61.3, %sub106.3
+  %xor.i63.3 = xor i32 %add.i62.3, %conv33
+  %add108.3 = add i32 %xor.i53.3, %add113.2
+  %add110.3 = add i32 %add108.3, %xor.i.3
+  %add112.3 = add i32 %add110.3, %xor.i58.3
+  %add113.3 = add i32 %add112.3, %xor.i63.3
+  ret i32 %add113.3
+}
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-vectorized.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-vectorized.ll
new file mode 100644
index 0000000000000..27e8f084e553d
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-vectorized.ll
@@ -0,0 +1,482 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=slp-vectorizer -S -mtriple=riscv64-unknown-linux-gnu -mattr=+v < %s | FileCheck %s
+
+define void @test([48 x float]* %p, float* noalias %s) {
+; CHECK-LABEL: @test(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr [[P:%.*]], i64 0, i64 0
+; CHECK-NEXT:    [[I:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 30
+; CHECK-NEXT:    [[I1:%.*]] = load float, ptr [[ARRAYIDX1]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fsub fast float [[I1]], [[I]]
+; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[S:%.*]], i64 0
+; CHECK-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 4
+; CHECK-NEXT:    [[I2:%.*]] = load float, ptr [[ARRAYIDX4]], align 4
+; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 26
+; CHECK-NEXT:    [[I3:%.*]] = load float, ptr [[ARRAYIDX6]], align 4
+; CHECK-NEXT:    [[ADD7:%.*]] = fsub fast float [[I3]], [[I2]]
+; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds float, ptr [[S]], i64 1
+; CHECK-NEXT:    store float [[ADD7]], ptr [[ARRAYIDX9]], align 4
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 8
+; CHECK-NEXT:    [[I4:%.*]] = load float, ptr [[ARRAYIDX11]], align 4
+; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 22
+; CHECK-NEXT:    [[I5:%.*]] = load float, ptr [[ARRAYIDX13]], align 4
+; CHECK-NEXT:    [[ADD14:%.*]] = fsub fast float [[I5]], [[I4]]
+; CHECK-NEXT:    [[ARRAYIDX16:%.*]] = getelementptr inbounds float, ptr [[S]], i64 2
+; CHECK-NEXT:    store float [[ADD14]], ptr [[ARRAYIDX16]], align 4
+; CHECK-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 12
+; CHECK-NEXT:    [[I6:%.*]] = load float, ptr [[ARRAYIDX18]], align 4
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 18
+; CHECK-NEXT:    [[I7:%.*]] = load float, ptr [[ARRAYIDX20]], align 4
+; CHECK-NEXT:    [[ADD21:%.*]] = fsub fast float [[I7]], [[I6]]
+; CHECK-NEXT:    [[ARRAYIDX23:%.*]] = getelementptr inbounds float, ptr [[S]], i64 3
+; CHECK-NEXT:    store float [[ADD21]], ptr [[ARRAYIDX23]], align 4
+; CHECK-NEXT:    [[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 16
+; CHECK-NEXT:    [[I8:%.*]] = load float, ptr [[ARRAYIDX25]], align 4
+; CHECK-NEXT:    [[ARRAYIDX27:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 14
+; CHECK-NEXT:    [[I9:%.*]] = load float, ptr [[ARRAYIDX27]], align 4
+; CHECK-NEXT:    [[ADD28:%.*]] = fsub fast float [[I9]], [[I8]]
+; CHECK-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr inbounds float, ptr [[S]], i64 4
+; CHECK-NEXT:    store float [[ADD28]], ptr [[ARRAYIDX30]], align 4
+; CHECK-NEXT:    [[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 20
+; CHECK-NEXT:    [[I10:%.*]] = load float, ptr [[ARRAYIDX32]], align 4
+; CHECK-NEXT:    [[ARRAYIDX34:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 10
+; CHECK-NEXT:    [[I11:%.*]] = load float, ptr [[ARRAYIDX34]], align 4
+; CHECK-NEXT:    [[ADD35:%.*]] = fsub fast float [[I11]], [[I10]]
+; CHECK-NEXT:    [[ARRAYIDX37:%.*]] = getelementptr inbounds float, ptr [[S]], i64 5
+; CHECK-NEXT:    store float [[ADD35]], ptr [[ARRAYIDX37]], align 4
+; CHECK-NEXT:    [[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 24
+; CHECK-NEXT:    [[I12:%.*]] = load float, ptr [[ARRAYIDX39]], align 4
+; CHECK-NEXT:    [[ARRAYIDX41:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 6
+; CHECK-NEXT:    [[I13:%.*]] = load float, ptr [[ARRAYIDX41]], align 4
+; CHECK-NEXT:    [[ADD42:%.*]] = fsub fast float [[I13]], [[I12]]
+; CHECK-NEXT:    [[ARRAYIDX44:%.*]] = getelementptr inbounds float, ptr [[S]], i64 6
+; CHECK-NEXT:    store float [[ADD42]], ptr [[ARRAYIDX44]], align 4
+; CHECK-NEXT:    [[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 28
+; CHECK-NEXT:    [[I14:%.*]] = load float, ptr [[ARRAYIDX46]], align 4
+; CHECK-NEXT:    [[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 2
+; CHECK-NEXT:    [[I15:%.*]] = load float, ptr [[ARRAYIDX48]], align 4
+; CHECK-NEXT:    [[ADD49:%.*]] = fsub fast float [[I15]], [[I14]]
+; CHECK-NEXT:    [[ARRAYIDX51:%.*]] = getelementptr inbounds float, ptr [[S]], i64 7
+; CHECK-NEXT:    store float [[ADD49]], ptr [[ARRAYIDX51]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %arrayidx = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 0
+  %i = load float, float* %arrayidx, align 4
+  %arrayidx1 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 30
+  %i1 = load float, float* %arrayidx1, align 4
+  %add = fsub fast float %i1, %i
+  %arrayidx2 = getelementptr inbounds float, float* %s, i64 0
+  store float %add, float* %arrayidx2, align 4
+  %arrayidx4 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 4
+  %i2 = load float, float* %arrayidx4, align 4
+  %arrayidx6 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 26
+  %i3 = load float, float* %arrayidx6, align 4
+  %add7 = fsub fast float %i3, %i2
+  %arrayidx9 = getelementptr inbounds float, float* %s, i64 1
+  store float %add7, float* %arrayidx9, align 4
+  %arrayidx11 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 8
+  %i4 = load float, float* %arrayidx11, align 4
+  %arrayidx13 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 22
+  %i5 = load float, float* %arrayidx13, align 4
+  %add14 = fsub fast float %i5, %i4
+  %arrayidx16 = getelementptr inbounds float, float* %s, i64 2
+  store float %add14, float* %arrayidx16, align 4
+  %arrayidx18 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 12
+  %i6 = load float, float* %arrayidx18, align 4
+  %arrayidx20 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 18
+  %i7 = load float, float* %arrayidx20, align 4
+  %add21 = fsub fast float %i7, %i6
+  %arrayidx23 = getelementptr inbounds float, float* %s, i64 3
+  store float %add21, float* %arrayidx23, align 4
+  %arrayidx25 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 16
+  %i8 = load float, float* %arrayidx25, align 4
+  %arrayidx27 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 14
+  %i9 = load float, float* %arrayidx27, align 4
+  %add28 = fsub fast float %i9, %i8
+  %arrayidx30 = getelementptr inbounds float, float* %s, i64 4
+  store float %add28, float* %arrayidx30, align 4
+  %arrayidx32 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 20
+  %i10 = load float, float* %arrayidx32, align 4
+  %arrayidx34 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 10
+  %i11 = load float, float* %arrayidx34, align 4
+  %add35 = fsub fast float %i11, %i10
+  %arrayidx37 = getelementptr inbounds float, float* %s, i64 5
+  store float %add35, float* %arrayidx37, align 4
+  %arrayidx39 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 24
+  %i12 = load float, float* %arrayidx39, align 4
+  %arrayidx41 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 6
+  %i13 = load float, float* %arrayidx41, align 4
+  %add42 = fsub fast float %i13, %i12
+  %arrayidx44 = getelementptr inbounds float, float* %s, i64 6
+  store float %add42, float* %arrayidx44, align 4
+  %arrayidx46 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 28
+  %i14 = load float, float* %arrayidx46, align 4
+  %arrayidx48 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 2
+  %i15 = load float, float* %arrayidx48, align 4
+  %add49 = fsub fast float %i15, %i14
+  %arrayidx51 = getelementptr inbounds float, float* %s, i64 7
+  store float %add49, float* %arrayidx51, align 4
+  ret void
+}
+
+define void @test1([48 x float]* %p, float* noalias %s, i32 %stride) {
+; CHECK-LABEL: @test1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[STR:%.*]] = zext i32 [[STRIDE:%.*]] to i64
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr [[P:%.*]], i64 0, i64 0
+; CHECK-NEXT:    [[I:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 30
+; CHECK-NEXT:    [[I1:%.*]] = load float, ptr [[ARRAYIDX1]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fsub fast float [[I1]], [[I]]
+; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[S:%.*]], i64 0
+; CHECK-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[STR]]
+; CHECK-NEXT:    [[I2:%.*]] = load float, ptr [[ARRAYIDX4]], align 4
+; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 26
+; CHECK-NEXT:    [[I3:%.*]] = load float, ptr [[ARRAYIDX6]], align 4
+; CHECK-NEXT:    [[ADD7:%.*]] = fsub fast float [[I3]], [[I2]]
+; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds float, ptr [[S]], i64 1
+; CHECK-NEXT:    store float [[ADD7]], ptr [[ARRAYIDX9]], align 4
+; CHECK-NEXT:    [[ST1:%.*]] = mul i64 [[STR]], 2
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST1]]
+; CHECK-NEXT:    [[I4:%.*]] = load float, ptr [[ARRAYIDX11]], align 4
+; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 22
+; CHECK-NEXT:    [[I5:%.*]] = load float, ptr [[ARRAYIDX13]], align 4
+; CHECK-NEXT:    [[ADD14:%.*]] = fsub fast float [[I5]], [[I4]]
+; CHECK-NEXT:    [[ARRAYIDX16:%.*]] = getelementptr inbounds float, ptr [[S]], i64 2
+; CHECK-NEXT:    store float [[ADD14]], ptr [[ARRAYIDX16]], align 4
+; CHECK-NEXT:    [[ST2:%.*]] = mul i64 [[STR]], 3
+; CHECK-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST2]]
+; CHECK-NEXT:    [[I6:%.*]] = load float, ptr [[ARRAYIDX18]], align 4
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 18
+; CHECK-NEXT:    [[I7:%.*]] = load float, ptr [[ARRAYIDX20]], align 4
+; CHECK-NEXT:    [[ADD21:%.*]] = fsub fast float [[I7]], [[I6]]
+; CHECK-NEXT:    [[ARRAYIDX23:%.*]] = getelementptr inbounds float, ptr [[S]], i64 3
+; CHECK-NEXT:    store float [[ADD21]], ptr [[ARRAYIDX23]], align 4
+; CHECK-NEXT:    [[ST3:%.*]] = mul i64 [[STR]], 4
+; CHECK-NEXT:    [[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST3]]
+; CHECK-NEXT:    [[I8:%.*]] = load float, ptr [[ARRAYIDX25]], align 4
+; CHECK-NEXT:    [[ARRAYIDX27:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 14
+; CHECK-NEXT:    [[I9:%.*]] = load float, ptr [[ARRAYIDX27]], align 4
+; CHECK-NEXT:    [[ADD28:%.*]] = fsub fast float [[I9]], [[I8]]
+; CHECK-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr inbounds float, ptr [[S]], i64 4
+; CHECK-NEXT:    store float [[ADD28]], ptr [[ARRAYIDX30]], align 4
+; CHECK-NEXT:    [[ST4:%.*]] = mul i64 [[STR]], 5
+; CHECK-NEXT:    [[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST4]]
+; CHECK-NEXT:    [[I10:%.*]] = load float, ptr [[ARRAYIDX32]], align 4
+; CHECK-NEXT:    [[ARRAYIDX34:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 10
+; CHECK-NEXT:    [[I11:%.*]] = load float, ptr [[ARRAYIDX34]], align 4
+; CHECK-NEXT:    [[ADD35:%.*]] = fsub fast float [[I11]], [[I10]]
+; CHECK-NEXT:    [[ARRAYIDX37:%.*]] = getelementptr inbounds float, ptr [[S]], i64 5
+; CHECK-NEXT:    store float [[ADD35]], ptr [[ARRAYIDX37]], align 4
+; CHECK-NEXT:    [[ST5:%.*]] = mul i64 [[STR]], 6
+; CHECK-NEXT:    [[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST5]]
+; CHECK-NEXT:    [[I12:%.*]] = load float, ptr [[ARRAYIDX39]], align 4
+; CHECK-NEXT:    [[ARRAYIDX41:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 6
+; CHECK-NEXT:    [[I13:%.*]] = load float, ptr [[ARRAYIDX41]], align 4
+; CHECK-NEXT:    [[ADD42:%.*]] = fsub fast float [[I13]], [[I12]]
+; CHECK-NEXT:    [[ARRAYIDX44:%.*]] = getelementptr inbounds float, ptr [[S]], i64 6
+; CHECK-NEXT:    store float [[ADD42]], ptr [[ARRAYIDX44]], align 4
+; CHECK-NEXT:    [[ST6:%.*]] = mul i64 [[STR]], 7
+; CHECK-NEXT:    [[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST6]]
+; CHECK-NEXT:    [[I14:%.*]] = load float, ptr [[ARRAYIDX46]], align 4
+; CHECK-NEXT:    [[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 2
+; CHECK-NEXT:    [[I15:%.*]] = load float, ptr [[ARRAYIDX48]], align 4
+; CHECK-NEXT:    [[ADD49:%.*]] = fsub fast float [[I15]], [[I14]]
+; CHECK-NEXT:    [[ARRAYIDX51:%.*]] = getelementptr inbounds float, ptr [[S]], i64 7
+; CHECK-NEXT:    store float [[ADD49]], ptr [[ARRAYIDX51]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %str = zext i32 %stride to i64
+  %arrayidx = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 0
+  %i = load float, float* %arrayidx, align 4
+  %arrayidx1 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 30
+  %i1 = load float, float* %arrayidx1, align 4
+  %add = fsub fast float %i1, %i
+  %arrayidx2 = getelementptr inbounds float, float* %s, i64 0
+  store float %add, float* %arrayidx2, align 4
+  %arrayidx4 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %str
+  %i2 = load float, float* %arrayidx4, align 4
+  %arrayidx6 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 26
+  %i3 = load float, float* %arrayidx6, align 4
+  %add7 = fsub fast float %i3, %i2
+  %arrayidx9 = getelementptr inbounds float, float* %s, i64 1
+  store float %add7, float* %arrayidx9, align 4
+  %st1 = mul i64 %str, 2
+  %arrayidx11 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st1
+  %i4 = load float, float* %arrayidx11, align 4
+  %arrayidx13 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 22
+  %i5 = load float, float* %arrayidx13, align 4
+  %add14 = fsub fast float %i5, %i4
+  %arrayidx16 = getelementptr inbounds float, float* %s, i64 2
+  store float %add14, float* %arrayidx16, align 4
+  %st2 = mul i64 %str, 3
+  %arrayidx18 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st2
+  %i6 = load float, float* %arrayidx18, align 4
+  %arrayidx20 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 18
+  %i7 = load float, float* %arrayidx20, align 4
+  %add21 = fsub fast float %i7, %i6
+  %arrayidx23 = getelementptr inbounds float, float* %s, i64 3
+  store float %add21, float* %arrayidx23, align 4
+  %st3 = mul i64 %str, 4
+  %arrayidx25 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st3
+  %i8 = load float, float* %arrayidx25, align 4
+  %arrayidx27 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 14
+  %i9 = load float, float* %arrayidx27, align 4
+  %add28 = fsub fast float %i9, %i8
+  %arrayidx30 = getelementptr inbounds float, float* %s, i64 4
+  store float %add28, float* %arrayidx30, align 4
+  %st4 = mul i64 %str, 5
+  %arrayidx32 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st4
+  %i10 = load float, float* %arrayidx32, align 4
+  %arrayidx34 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 10
+  %i11 = load float, float* %arrayidx34, align 4
+  %add35 = fsub fast float %i11, %i10
+  %arrayidx37 = getelementptr inbounds float, float* %s, i64 5
+  store float %add35, float* %arrayidx37, align 4
+  %st5 = mul i64 %str, 6
+  %arrayidx39 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st5
+  %i12 = load float, float* %arrayidx39, align 4
+  %arrayidx41 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 6
+  %i13 = load float, float* %arrayidx41, align 4
+  %add42 = fsub fast float %i13, %i12
+  %arrayidx44 = getelementptr inbounds float, float* %s, i64 6
+  store float %add42, float* %arrayidx44, align 4
+  %st6 = mul i64 %str, 7
+  %arrayidx46 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st6
+  %i14 = load float, float* %arrayidx46, align 4
+  %arrayidx48 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 2
+  %i15 = load float, float* %arrayidx48, align 4
+  %add49 = fsub fast float %i15, %i14
+  %arrayidx51 = getelementptr inbounds float, float* %s, i64 7
+  store float %add49, float* %arrayidx51, align 4
+  ret void
+}
+
+define void @test2([48 x float]* %p, float* noalias %s, i32 %stride) {
+; CHECK-LABEL: @test2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[STR:%.*]] = zext i32 [[STRIDE:%.*]] to i64
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr [[P:%.*]], i64 0, i64 2
+; CHECK-NEXT:    [[I:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ST6:%.*]] = mul i64 [[STR]], 7
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST6]]
+; CHECK-NEXT:    [[I1:%.*]] = load float, ptr [[ARRAYIDX1]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fsub fast float [[I1]], [[I]]
+; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[S:%.*]], i64 0
+; CHECK-NEXT:    store float [[ADD]], ptr [[ARRAYIDX2]], align 4
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 6
+; CHECK-NEXT:    [[I2:%.*]] = load float, ptr [[ARRAYIDX4]], align 4
+; CHECK-NEXT:    [[ST5:%.*]] = mul i64 [[STR]], 6
+; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST5]]
+; CHECK-NEXT:    [[I3:%.*]] = load float, ptr [[ARRAYIDX6]], align 4
+; CHECK-NEXT:    [[ADD7:%.*]] = fsub fast float [[I3]], [[I2]]
+; CHECK-NEXT:    [[ARRAYIDX9:%.*]] = getelementptr inbounds float, ptr [[S]], i64 1
+; CHECK-NEXT:    store float [[ADD7]], ptr [[ARRAYIDX9]], align 4
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 10
+; CHECK-NEXT:    [[I4:%.*]] = load float, ptr [[ARRAYIDX11]], align 4
+; CHECK-NEXT:    [[ST4:%.*]] = mul i64 [[STR]], 5
+; CHECK-NEXT:    [[ARRAYIDX13:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST4]]
+; CHECK-NEXT:    [[I5:%.*]] = load float, ptr [[ARRAYIDX13]], align 4
+; CHECK-NEXT:    [[ADD14:%.*]] = fsub fast float [[I5]], [[I4]]
+; CHECK-NEXT:    [[ARRAYIDX16:%.*]] = getelementptr inbounds float, ptr [[S]], i64 2
+; CHECK-NEXT:    store float [[ADD14]], ptr [[ARRAYIDX16]], align 4
+; CHECK-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 14
+; CHECK-NEXT:    [[I6:%.*]] = load float, ptr [[ARRAYIDX18]], align 4
+; CHECK-NEXT:    [[ST3:%.*]] = mul i64 [[STR]], 4
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST3]]
+; CHECK-NEXT:    [[I7:%.*]] = load float, ptr [[ARRAYIDX20]], align 4
+; CHECK-NEXT:    [[ADD21:%.*]] = fsub fast float [[I7]], [[I6]]
+; CHECK-NEXT:    [[ARRAYIDX23:%.*]] = getelementptr inbounds float, ptr [[S]], i64 3
+; CHECK-NEXT:    store float [[ADD21]], ptr [[ARRAYIDX23]], align 4
+; CHECK-NEXT:    [[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 18
+; CHECK-NEXT:    [[ST2:%.*]] = mul i64 [[STR]], 3
+; CHECK-NEXT:    [[I8:%.*]] = load float, ptr [[ARRAYIDX25]], align 4
+; CHECK-NEXT:    [[ARRAYIDX27:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST2]]
+; CHECK-NEXT:    [[I9:%.*]] = load float, ptr [[ARRAYIDX27]], align 4
+; CHECK-NEXT:    [[ADD28:%.*]] = fsub fast float [[I9]], [[I8]]
+; CHECK-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr inbounds float, ptr [[S]], i64 4
+; CHECK-NEXT:    store float [[ADD28]], ptr [[ARRAYIDX30]], align 4
+; CHECK-NEXT:    [[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 22
+; CHECK-NEXT:    [[I10:%.*]] = load float, ptr [[ARRAYIDX32]], align 4
+; CHECK-NEXT:    [[ST1:%.*]] = mul i64 [[STR]], 2
+; CHECK-NEXT:    [[ARRAYIDX34:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[ST1]]
+; CHECK-NEXT:    [[I11:%.*]] = load float, ptr [[ARRAYIDX34]], align 4
+; CHECK-NEXT:    [[ADD35:%.*]] = fsub fast float [[I11]], [[I10]]
+; CHECK-NEXT:    [[ARRAYIDX37:%.*]] = getelementptr inbounds float, ptr [[S]], i64 5
+; CHECK-NEXT:    store float [[ADD35]], ptr [[ARRAYIDX37]], align 4
+; CHECK-NEXT:    [[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 26
+; CHECK-NEXT:    [[I12:%.*]] = load float, ptr [[ARRAYIDX39]], align 4
+; CHECK-NEXT:    [[ARRAYIDX41:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 [[STR]]
+; CHECK-NEXT:    [[I13:%.*]] = load float, ptr [[ARRAYIDX41]], align 4
+; CHECK-NEXT:    [[ADD42:%.*]] = fsub fast float [[I13]], [[I12]]
+; CHECK-NEXT:    [[ARRAYIDX44:%.*]] = getelementptr inbounds float, ptr [[S]], i64 6
+; CHECK-NEXT:    store float [[ADD42]], ptr [[ARRAYIDX44]], align 4
+; CHECK-NEXT:    [[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 30
+; CHECK-NEXT:    [[I14:%.*]] = load float, ptr [[ARRAYIDX46]], align 4
+; CHECK-NEXT:    [[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 0
+; CHECK-NEXT:    [[I15:%.*]] = load float, ptr [[ARRAYIDX48]], align 4
+; CHECK-NEXT:    [[ADD49:%.*]] = fsub fast float [[I15]], [[I14]]
+; CHECK-NEXT:    [[ARRAYIDX51:%.*]] = getelementptr inbounds float, ptr [[S]], i64 7
+; CHECK-NEXT:    store float [[ADD49]], ptr [[ARRAYIDX51]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %str = zext i32 %stride to i64
+  %arrayidx = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 2
+  %i = load float, float* %arrayidx, align 4
+  %st6 = mul i64 %str, 7
+  %arrayidx1 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st6
+  %i1 = load float, float* %arrayidx1, align 4
+  %add = fsub fast float %i1, %i
+  %arrayidx2 = getelementptr inbounds float, float* %s, i64 0
+  store float %add, float* %arrayidx2, align 4
+  %arrayidx4 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 6
+  %i2 = load float, float* %arrayidx4, align 4
+  %st5 = mul i64 %str, 6
+  %arrayidx6 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st5
+  %i3 = load float, float* %arrayidx6, align 4
+  %add7 = fsub fast float %i3, %i2
+  %arrayidx9 = getelementptr inbounds float, float* %s, i64 1
+  store float %add7, float* %arrayidx9, align 4
+  %arrayidx11 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 10
+  %i4 = load float, float* %arrayidx11, align 4
+  %st4 = mul i64 %str, 5
+  %arrayidx13 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st4
+  %i5 = load float, float* %arrayidx13, align 4
+  %add14 = fsub fast float %i5, %i4
+  %arrayidx16 = getelementptr inbounds float, float* %s, i64 2
+  store float %add14, float* %arrayidx16, align 4
+  %arrayidx18 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 14
+  %i6 = load float, float* %arrayidx18, align 4
+  %st3 = mul i64 %str, 4
+  %arrayidx20 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st3
+  %i7 = load float, float* %arrayidx20, align 4
+  %add21 = fsub fast float %i7, %i6
+  %arrayidx23 = getelementptr inbounds float, float* %s, i64 3
+  store float %add21, float* %arrayidx23, align 4
+  %arrayidx25 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 18
+  %st2 = mul i64 %str, 3
+  %i8 = load float, float* %arrayidx25, align 4
+  %arrayidx27 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st2
+  %i9 = load float, float* %arrayidx27, align 4
+  %add28 = fsub fast float %i9, %i8
+  %arrayidx30 = getelementptr inbounds float, float* %s, i64 4
+  store float %add28, float* %arrayidx30, align 4
+  %arrayidx32 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 22
+  %i10 = load float, float* %arrayidx32, align 4
+  %st1 = mul i64 %str, 2
+  %arrayidx34 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %st1
+  %i11 = load float, float* %arrayidx34, align 4
+  %add35 = fsub fast float %i11, %i10
+  %arrayidx37 = getelementptr inbounds float, float* %s, i64 5
+  store float %add35, float* %arrayidx37, align 4
+  %arrayidx39 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 26
+  %i12 = load float, float* %arrayidx39, align 4
+  %arrayidx41 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 %str
+  %i13 = load float, float* %arrayidx41, align 4
+  %add42 = fsub fast float %i13, %i12
+  %arrayidx44 = getelementptr inbounds float, float* %s, i64 6
+  store float %add42, float* %arrayidx44, align 4
+  %arrayidx46 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 30
+  %i14 = load float, float* %arrayidx46, align 4
+  %arrayidx48 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 0
+  %i15 = load float, float* %arrayidx48, align 4
+  %add49 = fsub fast float %i15, %i14
+  %arrayidx51 = getelementptr inbounds float, float* %s, i64 7
+  store float %add49, float* %arrayidx51, align 4
+  ret void
+}
+
+define void @test3([48 x float]* %p, float* noalias %s) {
+; CHECK-LABEL: @test3(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr [[P:%.*]], i64 0, i64 0
+; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[S:%.*]], i64 0
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 4
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 8
+; CHECK-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 12
+; CHECK-NEXT:    [[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 16
+; CHECK-NEXT:    [[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 20
+; CHECK-NEXT:    [[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 24
+; CHECK-NEXT:    [[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 28
+; CHECK-NEXT:    [[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr [[P]], i64 0, i64 23
+; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr [[ARRAYIDX]], i32 0
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <8 x ptr> [[TMP0]], ptr [[ARRAYIDX4]], i32 1
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <8 x ptr> [[TMP1]], ptr [[ARRAYIDX11]], i32 2
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <8 x ptr> [[TMP2]], ptr [[ARRAYIDX18]], i32 3
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <8 x ptr> [[TMP3]], ptr [[ARRAYIDX25]], i32 4
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <8 x ptr> [[TMP4]], ptr [[ARRAYIDX32]], i32 5
+; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <8 x ptr> [[TMP5]], ptr [[ARRAYIDX39]], i32 6
+; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <8 x ptr> [[TMP6]], ptr [[ARRAYIDX46]], i32 7
+; CHECK-NEXT:    [[TMP8:%.*]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0(<8 x ptr> [[TMP7]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> poison)
+; CHECK-NEXT:    [[TMP9:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4
+; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <8 x float> [[TMP9]], <8 x float> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    [[TMP11:%.*]] = fsub fast <8 x float> [[TMP10]], [[TMP8]]
+; CHECK-NEXT:    store <8 x float> [[TMP11]], ptr [[ARRAYIDX2]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %arrayidx = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 0
+  %i = load float, float* %arrayidx, align 4
+  %arrayidx1 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 30
+  %i1 = load float, float* %arrayidx1, align 4
+  %add = fsub fast float %i1, %i
+  %arrayidx2 = getelementptr inbounds float, float* %s, i64 0
+  store float %add, float* %arrayidx2, align 4
+  %arrayidx4 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 4
+  %i2 = load float, float* %arrayidx4, align 4
+  %arrayidx6 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 29
+  %i3 = load float, float* %arrayidx6, align 4
+  %add7 = fsub fast float %i3, %i2
+  %arrayidx9 = getelementptr inbounds float, float* %s, i64 1
+  store float %add7, float* %arrayidx9, align 4
+  %arrayidx11 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 8
+  %i4 = load float, float* %arrayidx11, align 4
+  %arrayidx13 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 28
+  %i5 = load float, float* %arrayidx13, align 4
+  %add14 = fsub fast float %i5, %i4
+  %arrayidx16 = getelementptr inbounds float, float* %s, i64 2
+  store float %add14, float* %arrayidx16, align 4
+  %arrayidx18 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 12
+  %i6 = load float, float* %arrayidx18, align 4
+  %arrayidx20 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 27
+  %i7 = load float, float* %arrayidx20, align 4
+  %add21 = fsub fast float %i7, %i6
+  %arrayidx23 = getelementptr inbounds float, float* %s, i64 3
+  store float %add21, float* %arrayidx23, align 4
+  %arrayidx25 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 16
+  %i8 = load float, float* %arrayidx25, align 4
+  %arrayidx27 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 26
+  %i9 = load float, float* %arrayidx27, align 4
+  %add28 = fsub fast float %i9, %i8
+  %arrayidx30 = getelementptr inbounds float, float* %s, i64 4
+  store float %add28, float* %arrayidx30, align 4
+  %arrayidx32 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 20
+  %i10 = load float, float* %arrayidx32, align 4
+  %arrayidx34 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 25
+  %i11 = load float, float* %arrayidx34, align 4
+  %add35 = fsub fast float %i11, %i10
+  %arrayidx37 = getelementptr inbounds float, float* %s, i64 5
+  store float %add35, float* %arrayidx37, align 4
+  %arrayidx39 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 24
+  %i12 = load float, float* %arrayidx39, align 4
+  %arrayidx41 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 24
+  %i13 = load float, float* %arrayidx41, align 4
+  %add42 = fsub fast float %i13, %i12
+  %arrayidx44 = getelementptr inbounds float, float* %s, i64 6
+  store float %add42, float* %arrayidx44, align 4
+  %arrayidx46 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 28
+  %i14 = load float, float* %arrayidx46, align 4
+  %arrayidx48 = getelementptr inbounds [48 x float], [48 x float]* %p, i64 0, i64 23
+  %i15 = load float, float* %arrayidx48, align 4
+  %add49 = fsub fast float %i15, %i14
+  %arrayidx51 = getelementptr inbounds float, float* %s, i64 7
+  store float %add49, float* %arrayidx51, align 4
+  ret void
+}
+
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-indices.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-indices.ll
new file mode 100644
index 0000000000000..c72d6cc75d827
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-indices.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S --passes=slp-vectorizer -slp-threshold=-50 -mtriple=riscv64-unknown-linux-gnu -mattr=+v < %s| FileCheck %s
+
+%class.A = type { i32, i32 }
+
+define void @test() {
+; CHECK-LABEL: define void @test
+; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[BODY:%.*]]
+; CHECK:       body:
+; CHECK-NEXT:    [[ADD_I_I62_US:%.*]] = shl i64 0, 0
+; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <2 x i64> <i64 poison, i64 1>, i64 [[ADD_I_I62_US]], i32 0
+; CHECK-NEXT:    [[TMP1:%.*]] = or <2 x i64> zeroinitializer, [[TMP0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr [[CLASS_A:%.*]], <2 x ptr> zeroinitializer, <2 x i64> [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = call <2 x i32> @llvm.masked.gather.v2i32.v2p0(<2 x ptr> [[TMP2]], i32 4, <2 x i1> <i1 true, i1 true>, <2 x i32> poison)
+; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
+; CHECK-NEXT:    [[CMP_I_I_I_I67_US:%.*]] = icmp slt i32 [[TMP4]], [[TMP5]]
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
+; CHECK-NEXT:    [[SPEC_SELECT_I_I68_US:%.*]] = select i1 false, i64 [[TMP6]], i64 0
+; CHECK-NEXT:    br label [[BODY]]
+;
+entry:
+  br label %body
+
+body:
+  %add.i.i62.us = shl i64 0, 0
+  %mul.i.i63.us = or i64 %add.i.i62.us, 0
+  %add.ptr.i.i.i64.us = getelementptr %class.A, ptr null, i64 %mul.i.i63.us
+  %sub4.i.i65.us = or i64 0, 1
+  %add.ptr.i63.i.i66.us = getelementptr %class.A, ptr null, i64 %sub4.i.i65.us
+  %0 = load i32, ptr %add.ptr.i.i.i64.us, align 4
+  %1 = load i32, ptr %add.ptr.i63.i.i66.us, align 4
+  %cmp.i.i.i.i67.us = icmp slt i32 %0, %1
+  %spec.select.i.i68.us = select i1 false, i64 %sub4.i.i65.us, i64 0
+  br label %body
+}
+
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-use-ptr.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-use-ptr.ll
new file mode 100644
index 0000000000000..5aba9ea115a4b
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-loads-with-external-use-ptr.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S --passes=slp-vectorizer -mtriple=riscv64-unknown-linux-gnu -mattr=+v -slp-threshold=-20 < %s | FileCheck %s
+
+%S = type { i16, i16 }
+
+define i16 @test() {
+; CHECK-LABEL: define i16 @test
+; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[PPREV_058_I:%.*]] = getelementptr [[S:%.*]], ptr null, i64 -1
+; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <2 x ptr> <ptr poison, ptr null>, ptr [[PPREV_058_I]], i32 0
+; CHECK-NEXT:    br label [[WHILE_BODY_I:%.*]]
+; CHECK:       while.body.i:
+; CHECK-NEXT:    [[TMP1:%.*]] = phi i16 [ 0, [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi <2 x ptr> [ [[TMP3:%.*]], [[WHILE_BODY_I]] ], [ [[TMP0]], [[ENTRY]] ]
+; CHECK-NEXT:    [[TMP3]] = getelementptr [[S]], <2 x ptr> [[TMP2]], <2 x i64> <i64 -1, i64 -1>
+; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i16> @llvm.masked.gather.v2i16.v2p0(<2 x ptr> [[TMP3]], i32 2, <2 x i1> <i1 true, i1 true>, <2 x i16> poison)
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x i16> [[TMP4]], i32 0
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i16> [[TMP4]], i32 1
+; CHECK-NEXT:    [[CMP_I178:%.*]] = icmp ult i16 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:    br label [[WHILE_BODY_I]]
+;
+entry:
+  %pPrev.058.i = getelementptr %S, ptr null, i64 -1
+  br label %while.body.i
+
+while.body.i:
+  %0 = phi i16 [ 0, %while.body.i ], [ 0, %entry ]
+  %pPrev.062.i = phi ptr [ %pPrev.0.i, %while.body.i ], [ %pPrev.058.i, %entry ]
+  %pEdge.061.i = phi ptr [ %incdec.ptr.i, %while.body.i ], [ null, %entry ]
+  %incdec.ptr.i = getelementptr %S, ptr %pEdge.061.i, i64 -1
+  %pPrev.0.i = getelementptr %S, ptr %pPrev.062.i, i64 -1
+  %1 = load i16, ptr %incdec.ptr.i, align 2
+  %2 = load i16, ptr %pPrev.0.i, align 2
+  %cmp.i178 = icmp ult i16 %1, %2
+  br label %while.body.i
+}
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/strided-unsupported-type.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-unsupported-type.ll
new file mode 100644
index 0000000000000..4fd22639d6371
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/strided-unsupported-type.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -S < %s --passes=slp-vectorizer -slp-threshold=-50 -mtriple=riscv64-unknown-linux-gnu -mattr=+v | FileCheck %s
+
+define void @loads() {
+; CHECK-LABEL: define void @loads(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x fp128>, ptr null, align 16
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp une <2 x fp128> [[TMP0]], zeroinitializer
+; CHECK-NEXT:    call void null(i32 0, ptr null, i32 0)
+; CHECK-NEXT:    [[TMP2:%.*]] = fcmp une <2 x fp128> [[TMP0]], zeroinitializer
+; CHECK-NEXT:    ret void
+;
+entry:
+  %_M_value.imagp.i266 = getelementptr { fp128, fp128 }, ptr null, i64 0, i32 1
+  %0 = load fp128, ptr null, align 16
+  %cmp.i382 = fcmp une fp128 %0, 0xL00000000000000000000000000000000
+  %1 = load fp128, ptr %_M_value.imagp.i266, align 16
+  %cmp4.i385 = fcmp une fp128 %1, 0xL00000000000000000000000000000000
+  call void null(i32 0, ptr null, i32 0)
+  %cmp.i386 = fcmp une fp128 %0, 0xL00000000000000000000000000000000
+  %cmp2.i388 = fcmp une fp128 %1, 0xL00000000000000000000000000000000
+  ret void
+}
+
+define void @stores(ptr noalias %p) {
+; CHECK-LABEL: define void @stores(
+; CHECK-SAME: ptr noalias [[P:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[_M_VALUE_IMAGP_I266:%.*]] = getelementptr { fp128, fp128 }, ptr null, i64 0, i32 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load fp128, ptr null, align 16
+; CHECK-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[_M_VALUE_IMAGP_I266]], align 16
+; CHECK-NEXT:    [[P1:%.*]] = getelementptr fp128, ptr [[P]], i64 1
+; CHECK-NEXT:    store fp128 [[TMP0]], ptr [[P1]], align 16
+; CHECK-NEXT:    store fp128 [[TMP1]], ptr [[P]], align 16
+; CHECK-NEXT:    ret void
+;
+entry:
+  %_M_value.imagp.i266 = getelementptr { fp128, fp128 }, ptr null, i64 0, i32 1
+  %0 = load fp128, ptr null, align 16
+  %1 = load fp128, ptr %_M_value.imagp.i266, align 16
+  %p1 = getelementptr fp128, ptr %p, i64 1
+  store fp128 %0, ptr %p1, align 16
+  store fp128 %1, ptr %p, align 16
+  ret void
+}

>From 776d668ad7b72c6f1cb342a4eff2bc98568c78f4 Mon Sep 17 00:00:00 2001
From: XDeme <tagawafernando at gmail.com>
Date: Thu, 1 Feb 2024 14:11:14 -0300
Subject: [PATCH 36/42] [clang-format] Handles Elaborated type specifier for
 enum in trailing return (#80085)

Fixes llvm/llvm-project#80062
---
 clang/lib/Format/UnwrappedLineParser.cpp      | 4 ++--
 clang/unittests/Format/TokenAnnotatorTest.cpp | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp
index 0a7f8808f29af..d4f9b3f9df524 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -1760,8 +1760,8 @@ void UnwrappedLineParser::parseStructuralElement(
       break;
     }
     case tok::kw_enum:
-      // Ignore if this is part of "template <enum ...".
-      if (Previous && Previous->is(tok::less)) {
+      // Ignore if this is part of "template <enum ..." or "... -> enum".
+      if (Previous && Previous->isOneOf(tok::less, tok::arrow)) {
         nextToken();
         break;
       }
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 6e5832858c1ec..52a00c8a1a35d 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2644,6 +2644,12 @@ TEST_F(TokenAnnotatorTest, StreamOperator) {
   EXPECT_TRUE(Tokens[5]->MustBreakBefore);
 }
 
+TEST_F(TokenAnnotatorTest, UnderstandsElaboratedTypeSpecifier) {
+  auto Tokens = annotate("auto foo() -> enum En {}");
+  ASSERT_EQ(Tokens.size(), 10u) << Tokens;
+  EXPECT_TOKEN(Tokens[7], tok::l_brace, TT_FunctionLBrace);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

>From d312931aa53d7870a980281ebc4a94051f8ea0be Mon Sep 17 00:00:00 2001
From: jeffreytan81 <jeffreytan at meta.com>
Date: Thu, 1 Feb 2024 09:11:25 -0800
Subject: [PATCH 37/42] Fix debug info size statistics for split dwarf (#80218)

`statistics dump` command relies on `SymbolFile::GetDebugInfoSize()` to
get total debug info size.
The current implementation is missing debug info for split dwarf
scenarios which requires getting debug info from separate dwo/dwp files.
This patch fixes this issue for split dwarf by parsing debug info from
dwp/dwo.

New yaml tests are added.

---------

Co-authored-by: jeffreytan81 <jeffreytan at fb.com>
---
 .../SymbolFile/DWARF/SymbolFileDWARF.cpp      |   23 +
 .../SymbolFile/DWARF/SymbolFileDWARF.h        |    2 +
 .../SymbolFile/DWARF/SymbolFileDWARFDwo.cpp   |   11 +
 .../SymbolFile/DWARF/SymbolFileDWARFDwo.h     |    2 +
 .../target/debuginfo/TestDebugInfoSize.py     |  135 ++
 .../target/debuginfo/a.out-foo.dwo.yaml       |   37 +
 .../target/debuginfo/a.out-main.dwo.yaml      |   37 +
 .../API/commands/target/debuginfo/a.out.yaml  | 1273 +++++++++++++++++
 8 files changed, 1520 insertions(+)
 create mode 100644 lldb/test/API/commands/target/debuginfo/TestDebugInfoSize.py
 create mode 100644 lldb/test/API/commands/target/debuginfo/a.out-foo.dwo.yaml
 create mode 100644 lldb/test/API/commands/target/debuginfo/a.out-main.dwo.yaml
 create mode 100644 lldb/test/API/commands/target/debuginfo/a.out.yaml

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index fed97858c83f8..7ab75a9ce2c6b 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -2667,6 +2667,29 @@ static bool UpdateCompilerContextForSimpleTemplateNames(TypeQuery &match) {
   }
   return any_context_updated;
 }
+
+uint64_t SymbolFileDWARF::GetDebugInfoSize() {
+  DWARFDebugInfo &info = DebugInfo();
+  uint32_t num_comp_units = info.GetNumUnits();
+
+  uint64_t debug_info_size = SymbolFileCommon::GetDebugInfoSize();
+  // In dwp scenario, debug info == skeleton debug info + dwp debug info.
+  if (std::shared_ptr<SymbolFileDWARFDwo> dwp_sp = GetDwpSymbolFile())
+    return debug_info_size + dwp_sp->GetDebugInfoSize();
+
+  // In dwo scenario, debug info == skeleton debug info + all dwo debug info.
+  for (uint32_t i = 0; i < num_comp_units; i++) {
+    DWARFUnit *cu = info.GetUnitAtIndex(i);
+    if (cu == nullptr)
+      continue;
+
+    SymbolFileDWARFDwo *dwo = cu->GetDwoSymbolFile();
+    if (dwo)
+      debug_info_size += dwo->GetDebugInfoSize();
+  }
+  return debug_info_size;
+}
+
 void SymbolFileDWARF::FindTypes(const TypeQuery &query, TypeResults &results) {
 
   // Make sure we haven't already searched this SymbolFile before.
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 26a9502f90aa0..6d87530acf833 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
@@ -186,6 +186,8 @@ class SymbolFileDWARF : public SymbolFileCommon {
   GetMangledNamesForFunction(const std::string &scope_qualified_name,
                              std::vector<ConstString> &mangled_names) override;
 
+  uint64_t GetDebugInfoSize() override;
+
   void FindTypes(const lldb_private::TypeQuery &match,
                  lldb_private::TypeResults &results) override;
 
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp
index ca698a84a9146..b52cb514fb190 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp
@@ -85,6 +85,17 @@ lldb::offset_t SymbolFileDWARFDwo::GetVendorDWARFOpcodeSize(
   return GetBaseSymbolFile().GetVendorDWARFOpcodeSize(data, data_offset, op);
 }
 
+uint64_t SymbolFileDWARFDwo::GetDebugInfoSize() {
+  // Directly get debug info from current dwo object file's section list
+  // instead of asking SymbolFileCommon::GetDebugInfo() which parses from
+  // owning module which is wrong.
+  SectionList *section_list =
+      m_objfile_sp->GetSectionList(/*update_module_section_list=*/false);
+  if (section_list)
+    return section_list->GetDebugInfoSize();
+  return 0;
+}
+
 bool SymbolFileDWARFDwo::ParseVendorDWARFOpcode(
     uint8_t op, const lldb_private::DataExtractor &opcodes,
     lldb::offset_t &offset, std::vector<lldb_private::Value> &stack) const {
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h
index 9f5950e51b0c1..5c4b36328cbac 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h
@@ -47,6 +47,8 @@ class SymbolFileDWARFDwo : public SymbolFileDWARF {
                                           const lldb::offset_t data_offset,
                                           const uint8_t op) const override;
 
+  uint64_t GetDebugInfoSize() override;
+
   bool ParseVendorDWARFOpcode(uint8_t op, const DataExtractor &opcodes,
                               lldb::offset_t &offset,
                               std::vector<Value> &stack) const override;
diff --git a/lldb/test/API/commands/target/debuginfo/TestDebugInfoSize.py b/lldb/test/API/commands/target/debuginfo/TestDebugInfoSize.py
new file mode 100644
index 0000000000000..a70212fb42626
--- /dev/null
+++ b/lldb/test/API/commands/target/debuginfo/TestDebugInfoSize.py
@@ -0,0 +1,135 @@
+"""
+Test SBTarget.GetStatistics() reporting for dwo files.
+"""
+
+import json
+import os
+
+from lldbsuite.test import lldbtest, lldbutil
+from lldbsuite.test.decorators import *
+from lldbsuite.test_event.build_exception import BuildError
+
+
+SKELETON_DEBUGINFO_SIZE = 602
+MAIN_DWO_DEBUGINFO_SIZE = 385
+FOO_DWO_DEBUGINFO_SIZE = 380
+
+
+class TestDebugInfoSize(lldbtest.TestBase):
+    # Concurrency is the primary test factor here, not debug info variants.
+    NO_DEBUG_INFO_TESTCASE = True
+
+    def get_output_from_yaml(self):
+        exe = self.getBuildArtifact("a.out")
+        main_dwo = self.getBuildArtifact("a.out-main.dwo")
+        foo_dwo = self.getBuildArtifact("a.out-foo.dwo")
+
+        src_dir = self.getSourceDir()
+        exe_yaml_path = os.path.join(src_dir, "a.out.yaml")
+        self.yaml2obj(exe_yaml_path, exe)
+
+        main_dwo_yaml_path = os.path.join(src_dir, "a.out-main.dwo.yaml")
+        self.yaml2obj(main_dwo_yaml_path, main_dwo)
+
+        foo_dwo_yaml_path = os.path.join(src_dir, "a.out-foo.dwo.yaml")
+        self.yaml2obj(foo_dwo_yaml_path, foo_dwo)
+        return (exe, main_dwo, foo_dwo)
+
+    @add_test_categories(["dwo"])
+    def test_dwo(self):
+        (exe, main_dwo, foo_dwo) = self.get_output_from_yaml()
+
+        # Make sure dwo files exist
+        self.assertTrue(os.path.exists(main_dwo), f'Make sure "{main_dwo}" file exists')
+        self.assertTrue(os.path.exists(foo_dwo), f'Make sure "{foo_dwo}" file exists')
+
+        target = self.dbg.CreateTarget(exe)
+        self.assertTrue(target, lldbtest.VALID_TARGET)
+
+        stats = target.GetStatistics()
+        stream = lldb.SBStream()
+        res = stats.GetAsJSON(stream)
+        debug_stats = json.loads(stream.GetData())
+        self.assertIn(
+            "totalDebugInfoByteSize",
+            debug_stats,
+            'Make sure the "totalDebugInfoByteSize" key is in target.GetStatistics()',
+        )
+        self.assertEqual(
+            debug_stats["totalDebugInfoByteSize"],
+            SKELETON_DEBUGINFO_SIZE + MAIN_DWO_DEBUGINFO_SIZE + FOO_DWO_DEBUGINFO_SIZE,
+        )
+
+    @add_test_categories(["dwo"])
+    def test_only_load_skeleton_debuginfo(self):
+        (exe, main_dwo, foo_dwo) = self.get_output_from_yaml()
+
+        # REMOVE one of the dwo files
+        os.unlink(main_dwo)
+        os.unlink(foo_dwo)
+
+        target = self.dbg.CreateTarget(exe)
+        self.assertTrue(target, lldbtest.VALID_TARGET)
+
+        stats = target.GetStatistics()
+        stream = lldb.SBStream()
+        res = stats.GetAsJSON(stream)
+        debug_stats = json.loads(stream.GetData())
+        self.assertIn(
+            "totalDebugInfoByteSize",
+            debug_stats,
+            'Make sure the "totalDebugInfoByteSize" key is in target.GetStatistics()',
+        )
+        self.assertEqual(debug_stats["totalDebugInfoByteSize"], SKELETON_DEBUGINFO_SIZE)
+
+    @add_test_categories(["dwo"])
+    def test_load_partial_dwos(self):
+        (exe, main_dwo, foo_dwo) = self.get_output_from_yaml()
+
+        # REMOVE one of the dwo files
+        os.unlink(main_dwo)
+
+        target = self.dbg.CreateTarget(exe)
+        self.assertTrue(target, lldbtest.VALID_TARGET)
+
+        stats = target.GetStatistics()
+        stream = lldb.SBStream()
+        res = stats.GetAsJSON(stream)
+        debug_stats = json.loads(stream.GetData())
+        self.assertIn(
+            "totalDebugInfoByteSize",
+            debug_stats,
+            'Make sure the "totalDebugInfoByteSize" key is in target.GetStatistics()',
+        )
+        self.assertEqual(
+            debug_stats["totalDebugInfoByteSize"],
+            SKELETON_DEBUGINFO_SIZE + FOO_DWO_DEBUGINFO_SIZE,
+        )
+
+    @add_test_categories(["dwo"])
+    def test_dwos_loaded_symbols_on_demand(self):
+        (exe, main_dwo, foo_dwo) = self.get_output_from_yaml()
+
+        # Make sure dwo files exist
+        self.assertTrue(os.path.exists(main_dwo), f'Make sure "{main_dwo}" file exists')
+        self.assertTrue(os.path.exists(foo_dwo), f'Make sure "{foo_dwo}" file exists')
+
+        # Load symbols on-demand
+        self.runCmd("settings set symbols.load-on-demand true")
+
+        target = self.dbg.CreateTarget(exe)
+        self.assertTrue(target, lldbtest.VALID_TARGET)
+
+        stats = target.GetStatistics()
+        stream = lldb.SBStream()
+        res = stats.GetAsJSON(stream)
+        debug_stats = json.loads(stream.GetData())
+        self.assertIn(
+            "totalDebugInfoByteSize",
+            debug_stats,
+            'Make sure the "totalDebugInfoByteSize" key is in target.GetStatistics()',
+        )
+        self.assertEqual(
+            debug_stats["totalDebugInfoByteSize"],
+            SKELETON_DEBUGINFO_SIZE + MAIN_DWO_DEBUGINFO_SIZE + FOO_DWO_DEBUGINFO_SIZE,
+        )
diff --git a/lldb/test/API/commands/target/debuginfo/a.out-foo.dwo.yaml b/lldb/test/API/commands/target/debuginfo/a.out-foo.dwo.yaml
new file mode 100644
index 0000000000000..7a59fd67bc08b
--- /dev/null
+++ b/lldb/test/API/commands/target/debuginfo/a.out-foo.dwo.yaml
@@ -0,0 +1,37 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_X86_64
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .debug_str_offsets.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE ]
+    AddressAlign:    0x1
+    Content:         180000000500000000000000040000000800000097000000F6000000
+  - Name:            .debug_str.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE, SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         666F6F00696E740046616365626F6F6B20636C616E672076657273696F6E2031352E302E3020287373683A2F2F6769742E7669702E66616365626F6F6B2E636F6D2F646174612F6769747265706F732F6F736D6574612F65787465726E616C2F6C6C766D2D70726F6A656374203435616538646332373465366362636264343064353734353136643533343337393662653135323729002F686F6D652F6A65666672657974616E2F6C6C766D2D73616E642F65787465726E616C2F6C6C766D2D70726F6A6563742F6C6C64622F746573742F4150492F636F6D6D616E64732F7461726765742F6465627567696E666F2F666F6F2E6300612E6F75742D666F6F2E64776F00
+  - Name:            .debug_info.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE ]
+    AddressAlign:    0x1
+    Content:         2A0000000500050800000000495EA96AE5C99FC401021D00030402000B0000000156000003290000000301050400
+  - Name:            .debug_abbrev.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE ]
+    AddressAlign:    0x1
+    Content:         01110125251305032576250000022E00111B1206401803253A0B3B0B49133F19000003240003253E0B0B0B000000
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .debug_str_offsets.dwo
+      - Name:            .debug_str.dwo
+      - Name:            .debug_info.dwo
+      - Name:            .debug_abbrev.dwo
+...
diff --git a/lldb/test/API/commands/target/debuginfo/a.out-main.dwo.yaml b/lldb/test/API/commands/target/debuginfo/a.out-main.dwo.yaml
new file mode 100644
index 0000000000000..997158b8f918b
--- /dev/null
+++ b/lldb/test/API/commands/target/debuginfo/a.out-main.dwo.yaml
@@ -0,0 +1,37 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_X86_64
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .debug_str_offsets.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE ]
+    AddressAlign:    0x1
+    Content:         180000000500000000000000050000000900000098000000F8000000
+  - Name:            .debug_str.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE, SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         6D61696E00696E740046616365626F6F6B20636C616E672076657273696F6E2031352E302E3020287373683A2F2F6769742E7669702E66616365626F6F6B2E636F6D2F646174612F6769747265706F732F6F736D6574612F65787465726E616C2F6C6C766D2D70726F6A656374203435616538646332373465366362636264343064353734353136643533343337393662653135323729002F686F6D652F6A65666672657974616E2F6C6C766D2D73616E642F65787465726E616C2F6C6C766D2D70726F6A6563742F6C6C64622F746573742F4150492F636F6D6D616E64732F7461726765742F6465627567696E666F2F6D61696E2E6300612E6F75742D6D61696E2E64776F00
+  - Name:            .debug_info.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE ]
+    AddressAlign:    0x1
+    Content:         2A000000050005080000000037AA38DE48449DD701021D00030402001C0000000156000103290000000301050400
+  - Name:            .debug_abbrev.dwo
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_EXCLUDE ]
+    AddressAlign:    0x1
+    Content:         01110125251305032576250000022E00111B1206401803253A0B3B0B271949133F19000003240003253E0B0B0B000000
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .debug_str_offsets.dwo
+      - Name:            .debug_str.dwo
+      - Name:            .debug_info.dwo
+      - Name:            .debug_abbrev.dwo
+...
diff --git a/lldb/test/API/commands/target/debuginfo/a.out.yaml b/lldb/test/API/commands/target/debuginfo/a.out.yaml
new file mode 100644
index 0000000000000..95578c358f497
--- /dev/null
+++ b/lldb/test/API/commands/target/debuginfo/a.out.yaml
@@ -0,0 +1,1273 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_DYN
+  Machine:         EM_X86_64
+  Entry:           0x4F0
+ProgramHeaders:
+  - Type:            PT_PHDR
+    Flags:           [ PF_R ]
+    VAddr:           0x40
+    Align:           0x8
+    Offset:          0x40
+  - Type:            PT_INTERP
+    Flags:           [ PF_R ]
+    FirstSec:        .interp
+    LastSec:         .interp
+    VAddr:           0x238
+    Offset:          0x238
+  - Type:            PT_LOAD
+    Flags:           [ PF_X, PF_R ]
+    FirstSec:        .interp
+    LastSec:         .eh_frame
+    Align:           0x200000
+    Offset:          0x0
+  - Type:            PT_LOAD
+    Flags:           [ PF_W, PF_R ]
+    FirstSec:        .init_array
+    LastSec:         .bss
+    VAddr:           0x200DE0
+    Align:           0x200000
+    Offset:          0xDE0
+  - Type:            PT_DYNAMIC
+    Flags:           [ PF_W, PF_R ]
+    FirstSec:        .dynamic
+    LastSec:         .dynamic
+    VAddr:           0x200DF8
+    Align:           0x8
+    Offset:          0xDF8
+  - Type:            PT_NOTE
+    Flags:           [ PF_R ]
+    FirstSec:        .note.ABI-tag
+    LastSec:         .note.ABI-tag
+    VAddr:           0x254
+    Align:           0x4
+    Offset:          0x254
+  - Type:            PT_GNU_EH_FRAME
+    Flags:           [ PF_R ]
+    FirstSec:        .eh_frame_hdr
+    LastSec:         .eh_frame_hdr
+    VAddr:           0x69C
+    Align:           0x4
+    Offset:          0x69C
+  - Type:            PT_GNU_STACK
+    Flags:           [ PF_W, PF_R ]
+    Align:           0x10
+    Offset:          0x0
+  - Type:            PT_GNU_RELRO
+    Flags:           [ PF_R ]
+    FirstSec:        .init_array
+    LastSec:         .got
+    VAddr:           0x200DE0
+    Offset:          0xDE0
+Sections:
+  - Name:            .interp
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x238
+    AddressAlign:    0x1
+    Content:         2F6C696236342F6C642D6C696E75782D7838362D36342E736F2E3200
+  - Name:            .note.ABI-tag
+    Type:            SHT_NOTE
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x254
+    AddressAlign:    0x4
+    Notes:
+      - Name:            GNU
+        Desc:            '00000000030000000200000000000000'
+        Type:            NT_VERSION
+  - Name:            .gnu.hash
+    Type:            SHT_GNU_HASH
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x278
+    Link:            .dynsym
+    AddressAlign:    0x8
+    Header:
+      SymNdx:          0x1
+      Shift2:          0x0
+    BloomFilter:     [ 0x0 ]
+    HashBuckets:     [ 0x0 ]
+    HashValues:      [  ]
+  - Name:            .dynsym
+    Type:            SHT_DYNSYM
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x298
+    Link:            .dynstr
+    AddressAlign:    0x8
+  - Name:            .dynstr
+    Type:            SHT_STRTAB
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x328
+    AddressAlign:    0x1
+  - Name:            .gnu.version
+    Type:            SHT_GNU_versym
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x3A6
+    Link:            .dynsym
+    AddressAlign:    0x2
+    Entries:         [ 0, 1, 2, 1, 1, 2 ]
+  - Name:            .gnu.version_r
+    Type:            SHT_GNU_verneed
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x3B8
+    Link:            .dynstr
+    AddressAlign:    0x8
+    Dependencies:
+      - Version:         1
+        File:            libc.so.6
+        Entries:
+          - Name:            GLIBC_2.2.5
+            Hash:            157882997
+            Flags:           0
+            Other:           2
+  - Name:            .rela.dyn
+    Type:            SHT_RELA
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x3D8
+    Link:            .dynsym
+    AddressAlign:    0x8
+    Relocations:
+      - Offset:          0x200DE0
+        Type:            R_X86_64_RELATIVE
+        Addend:          1488
+      - Offset:          0x200DE8
+        Type:            R_X86_64_RELATIVE
+        Addend:          1424
+      - Offset:          0x200DF0
+        Type:            R_X86_64_RELATIVE
+        Addend:          2100720
+      - Offset:          0x200FD8
+        Symbol:          _ITM_deregisterTMCloneTable
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x200FE0
+        Symbol:          __libc_start_main
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x200FE8
+        Symbol:          __gmon_start__
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x200FF0
+        Symbol:          _ITM_registerTMCloneTable
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x200FF8
+        Symbol:          __cxa_finalize
+        Type:            R_X86_64_GLOB_DAT
+  - Name:            .rela.plt
+    Type:            SHT_RELA
+    Flags:           [ SHF_ALLOC, SHF_INFO_LINK ]
+    Address:         0x498
+    Link:            .dynsym
+    AddressAlign:    0x8
+    Info:            .got.plt
+    Relocations:
+      - Offset:          0x201018
+        Symbol:          __cxa_finalize
+        Type:            R_X86_64_JUMP_SLOT
+  - Name:            .init
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x4B0
+    AddressAlign:    0x4
+    Content:         F30F1EFA4883EC08488B05290B20004885C07402FFD04883C408C3
+  - Name:            .plt
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x4D0
+    AddressAlign:    0x10
+    EntSize:         0x10
+    Content:         FF35320B2000FF25340B20000F1F4000FF25320B20006800000000E9E0FFFFFF
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x4F0
+    AddressAlign:    0x10
+    Content:         F30F1EFA31ED4989D15E4889E24883E4F050544C8D0576010000488D0DFF000000488D3DC8000000FF15C20A2000F490488D3D010B2000488D05FA0A20004839F87415488B059E0A20004885C07409FFE00F1F8000000000C30F1F8000000000488D3DD10A2000488D35CA0A20004829FE48C1FE034889F048C1E83F4801C648D1FE7414488B05750A20004885C07408FFE0660F1F440000C30F1F8000000000F30F1EFA803D890A200000752B5548833D520A2000004889E5740C488D3D3E082000E829FFFFFFE864FFFFFFC605610A2000015DC30F1F00C30F1F8000000000F30F1EFAE977FFFFFF0F1F8000000000554889E54883EC10C745FC00000000B000E80A0000004883C4105DC30F1F4000554889E5B8010000005DC30F1F440000F30F1EFA41574989D741564989F641554189FD41544C8D25B407200055488D2DB4072000534C29E54883EC08E86FFEFFFF48C1FD03741F31DB0F1F80000000004C89FA4C89F64489EF41FF14DC4883C3014839DD75EA4883C4085B5D415C415D415E415FC366662E0F1F840000000000F30F1EFAC3
+  - Name:            .fini
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x688
+    AddressAlign:    0x4
+    Content:         F30F1EFA4883EC084883C408C3
+  - Name:            .rodata
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_MERGE ]
+    Address:         0x698
+    AddressAlign:    0x4
+    EntSize:         0x4
+    Content:         '01000200'
+  - Name:            .eh_frame_hdr
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x69C
+    AddressAlign:    0x4
+    Content:         011B033B380000000600000034FEFFFF6C00000054FEFFFF5400000044FFFFFF9400000064FFFFFFB400000074FFFFFFD4000000E4FFFFFF1C010000
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x6D8
+    AddressAlign:    0x8
+    Content:         1400000000000000017A5200017810011B0C070890010000140000001C000000F8FDFFFF2F00000000440710000000002400000034000000C0FDFFFF20000000000E10460E184A0F0B770880003F1A3B2A332422000000001C0000005C000000A8FEFFFF1C00000000410E108602430D06570C07080000001C0000007C000000A8FEFFFF0B00000000410E108602430D06460C0708000000440000009C00000098FEFFFF6500000000460E108F02450E188E03450E208D04450E288C05480E308606480E388307470E406E0E38410E30410E28420E20420E18420E10420E080010000000E4000000C0FEFFFF050000000000000000000000
+  - Name:            .init_array
+    Type:            SHT_INIT_ARRAY
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x200DE0
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Offset:          0xDE0
+    Content:         D005000000000000
+  - Name:            .fini_array
+    Type:            SHT_FINI_ARRAY
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x200DE8
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Content:         '9005000000000000'
+  - Name:            .data.rel.ro
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x200DF0
+    AddressAlign:    0x8
+    Content:         F00D200000000000
+  - Name:            .dynamic
+    Type:            SHT_DYNAMIC
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x200DF8
+    Link:            .dynstr
+    AddressAlign:    0x8
+    Entries:
+      - Tag:             DT_NEEDED
+        Value:           0x1
+      - Tag:             DT_INIT
+        Value:           0x4B0
+      - Tag:             DT_FINI
+        Value:           0x688
+      - Tag:             DT_INIT_ARRAY
+        Value:           0x200DE0
+      - Tag:             DT_INIT_ARRAYSZ
+        Value:           0x8
+      - Tag:             DT_FINI_ARRAY
+        Value:           0x200DE8
+      - Tag:             DT_FINI_ARRAYSZ
+        Value:           0x8
+      - Tag:             DT_GNU_HASH
+        Value:           0x278
+      - Tag:             DT_STRTAB
+        Value:           0x328
+      - Tag:             DT_SYMTAB
+        Value:           0x298
+      - Tag:             DT_STRSZ
+        Value:           0x7D
+      - Tag:             DT_SYMENT
+        Value:           0x18
+      - Tag:             DT_DEBUG
+        Value:           0x0
+      - Tag:             DT_PLTGOT
+        Value:           0x201000
+      - Tag:             DT_PLTRELSZ
+        Value:           0x18
+      - Tag:             DT_PLTREL
+        Value:           0x7
+      - Tag:             DT_JMPREL
+        Value:           0x498
+      - Tag:             DT_RELA
+        Value:           0x3D8
+      - Tag:             DT_RELASZ
+        Value:           0xC0
+      - Tag:             DT_RELAENT
+        Value:           0x18
+      - Tag:             DT_FLAGS_1
+        Value:           0x8000000
+      - Tag:             DT_VERNEED
+        Value:           0x3B8
+      - Tag:             DT_VERNEEDNUM
+        Value:           0x1
+      - Tag:             DT_VERSYM
+        Value:           0x3A6
+      - Tag:             DT_RELACOUNT
+        Value:           0x3
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+  - Name:            .got
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x200FD8
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Content:         '00000000000000000000000000000000000000000000000000000000000000000000000000000000'
+  - Name:            .got.plt
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x201000
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Content:         F80D20000000000000000000000000000000000000000000E604000000000000
+  - Name:            .data
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x201020
+    AddressAlign:    0x1
+    Content:         '00000000'
+  - Name:            .bss
+    Type:            SHT_NOBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x201024
+    AddressAlign:    0x1
+    Size:            0x4
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         4743433A2028474E552920382E352E3020323032313035313420285265642048617420382E352E302D3231290046616365626F6F6B20636C616E672076657273696F6E2031352E302E3020287373683A2F2F6769742E7669702E66616365626F6F6B2E636F6D2F646174612F6769747265706F732F6F736D6574612F65787465726E616C2F6C6C766D2D70726F6A65637420343561653864633237346536636263626434306435373435313664353334333739366265313532372900
+  - Name:            .gnu.build.attributes
+    Type:            SHT_NOTE
+    Address:         0x601028
+    AddressAlign:    0x4
+    Notes:
+      - Name:            "GA$\x013p1113"
+        Desc:            1F050000000000001F05000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\0"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0�"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\0"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0�"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\0"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0�"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\0"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0�"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\0"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0�"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            F0040000000000001F05000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            1F050000000000001F05000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            1F050000000000001F05000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            B004000000000000C604000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '88060000000000009006000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            2005000000000000D905000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            '10060000000000008506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            '10060000000000007506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_FUNC
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            '10060000000000007506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_FUNC
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            '75060000000000008506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_FUNC
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            '75060000000000008506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_FUNC
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013p1113"
+        Desc:            F004000000000000F004000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05running gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05annobin gcc 8.5.0 20210514"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x05plugin name: gcc-annobin"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*GOW\0*\x05\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x02\x03"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+stack_clash'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*cf_protection\0\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*FORTIFY\0\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+GLIBCXX_ASSERTIONS'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\a\x02"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA!\b"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA+omit_frame_pointer'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA*\x06\x12"
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            'GA!stack_realign'
+        Desc:            ''
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '85060000000000008506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '85060000000000008506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            C604000000000000CB04000000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '90060000000000009506000000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+  - Name:            .debug_info
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+    Content:         24000000050004080000000037AA38DE48449DD70100000000080000000001001C00000008000000240000000500040817000000495EA96AE5C99FC4015F000000180000000001000B00000018000000
+  - Name:            .debug_abbrev
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+    Content:         014A00101772171B25B442197625111B12067317000000014A00101772171B25B442197625111B12067317000000
+  - Name:            .debug_line
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+    Content:         5B0000000500080037000000010101FB0E0D00010101010000000100000101011F010000000003011F020F051E010200000000CEBCB192A47C15B4C4C1CCEC9400444F0400000902E0050000000000001405190AE40512060B740206000101590000000500080037000000010101FB0E0D00010101010000000100000101011F010000000003011F020F051E016200000000E50309AC872C80EA355C846ACBBCF1660400000902000600000000000014050D0A4A060B580202000101
+  - Name:            .debug_str_offsets
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+    Content:         0C0000000500000000000000020000000C000000050000000000000011000000
+  - Name:            .debug_gnu_pubnames
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+    Content:         18000000020000000000280000001A000000306D61696E000000000017000000020028000000280000001A00000030666F6F0000000000
+  - Name:            .debug_gnu_pubtypes
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+    Content:         17000000020000000000280000002900000090696E74000000000017000000020028000000280000002900000090696E740000000000
+  - Name:            .debug_line_str
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         2E002F686F6D652F6A65666672657974616E2F6C6C766D2D73616E642F65787465726E616C2F6C6C766D2D70726F6A6563742F6C6C64622F746573742F4150492F636F6D6D616E64732F7461726765742F6465627567696E666F2F6D61696E2E63002F686F6D652F6A65666672657974616E2F6C6C766D2D73616E642F65787465726E616C2F6C6C766D2D70726F6A6563742F6C6C64622F746573742F4150492F636F6D6D616E64732F7461726765742F6465627567696E666F2F666F6F2E6300
+Symbols:
+  - Name:            .interp
+    Type:            STT_SECTION
+    Section:         .interp
+    Value:           0x238
+  - Name:            .note.ABI-tag
+    Type:            STT_SECTION
+    Section:         .note.ABI-tag
+    Value:           0x254
+  - Name:            .gnu.hash
+    Type:            STT_SECTION
+    Section:         .gnu.hash
+    Value:           0x278
+  - Name:            .dynsym
+    Type:            STT_SECTION
+    Section:         .dynsym
+    Value:           0x298
+  - Name:            .dynstr
+    Type:            STT_SECTION
+    Section:         .dynstr
+    Value:           0x328
+  - Name:            .gnu.version
+    Type:            STT_SECTION
+    Section:         .gnu.version
+    Value:           0x3A6
+  - Name:            .gnu.version_r
+    Type:            STT_SECTION
+    Section:         .gnu.version_r
+    Value:           0x3B8
+  - Name:            .rela.dyn
+    Type:            STT_SECTION
+    Section:         .rela.dyn
+    Value:           0x3D8
+  - Name:            .rela.plt
+    Type:            STT_SECTION
+    Section:         .rela.plt
+    Value:           0x498
+  - Name:            .init
+    Type:            STT_SECTION
+    Section:         .init
+    Value:           0x4B0
+  - Name:            .plt
+    Type:            STT_SECTION
+    Section:         .plt
+    Value:           0x4D0
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+    Value:           0x4F0
+  - Name:            .fini
+    Type:            STT_SECTION
+    Section:         .fini
+    Value:           0x688
+  - Name:            .rodata
+    Type:            STT_SECTION
+    Section:         .rodata
+    Value:           0x698
+  - Name:            .eh_frame_hdr
+    Type:            STT_SECTION
+    Section:         .eh_frame_hdr
+    Value:           0x69C
+  - Name:            .eh_frame
+    Type:            STT_SECTION
+    Section:         .eh_frame
+    Value:           0x6D8
+  - Name:            .init_array
+    Type:            STT_SECTION
+    Section:         .init_array
+    Value:           0x200DE0
+  - Name:            .fini_array
+    Type:            STT_SECTION
+    Section:         .fini_array
+    Value:           0x200DE8
+  - Name:            .data.rel.ro
+    Type:            STT_SECTION
+    Section:         .data.rel.ro
+    Value:           0x200DF0
+  - Name:            .dynamic
+    Type:            STT_SECTION
+    Section:         .dynamic
+    Value:           0x200DF8
+  - Name:            .got
+    Type:            STT_SECTION
+    Section:         .got
+    Value:           0x200FD8
+  - Name:            .got.plt
+    Type:            STT_SECTION
+    Section:         .got.plt
+    Value:           0x201000
+  - Name:            .data
+    Type:            STT_SECTION
+    Section:         .data
+    Value:           0x201020
+  - Name:            .bss
+    Type:            STT_SECTION
+    Section:         .bss
+    Value:           0x201024
+  - Name:            .comment
+    Type:            STT_SECTION
+    Section:         .comment
+  - Name:            .gnu.build.attributes
+    Type:            STT_SECTION
+    Section:         .gnu.build.attributes
+    Value:           0x601028
+  - Name:            .debug_info
+    Type:            STT_SECTION
+    Section:         .debug_info
+  - Name:            .debug_abbrev
+    Type:            STT_SECTION
+    Section:         .debug_abbrev
+  - Name:            .debug_line
+    Type:            STT_SECTION
+    Section:         .debug_line
+  - Name:            .debug_str
+    Type:            STT_SECTION
+    Section:         .debug_str
+  - Name:            .debug_addr
+    Type:            STT_SECTION
+    Section:         .debug_addr
+  - Name:            .debug_str_offsets
+    Type:            STT_SECTION
+    Section:         .debug_str_offsets
+  - Name:            .debug_gnu_pubnames
+    Type:            STT_SECTION
+    Section:         .debug_gnu_pubnames
+  - Name:            .debug_gnu_pubtypes
+    Type:            STT_SECTION
+    Section:         .debug_gnu_pubtypes
+  - Name:            .debug_line_str
+    Type:            STT_SECTION
+    Section:         .debug_line_str
+  - Name:            '/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/Scrt1.o'
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .annobin_init.c
+    Section:         .text
+    Value:           0x51F
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c_end
+    Section:         .text
+    Value:           0x51F
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c.hot
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c_end.hot
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c.unlikely
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c_end.unlikely
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c.startup
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c_end.startup
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c.exit
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_init.c_end.exit
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            elf-init.oS
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .annobin_elf_init.c
+    Section:         .text
+    Value:           0x610
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c_end
+    Section:         .text
+    Value:           0x685
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c.hot
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c_end.hot
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c.unlikely
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c_end.unlikely
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c.startup
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c_end.startup
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c.exit
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin_elf_init.c_end.exit
+    Section:         .text
+    Value:           0x4F0
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin___libc_csu_init.start
+    Section:         .text
+    Value:           0x610
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin___libc_csu_init.end
+    Section:         .text
+    Value:           0x675
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin___libc_csu_fini.start
+    Section:         .text
+    Value:           0x675
+    Other:           [ STV_HIDDEN ]
+  - Name:            .annobin___libc_csu_fini.end
+    Section:         .text
+    Value:           0x685
+    Other:           [ STV_HIDDEN ]
+  - Name:            crtstuff.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            deregister_tm_clones
+    Type:            STT_FUNC
+    Section:         .text
+    Value:           0x520
+  - Name:            register_tm_clones
+    Type:            STT_FUNC
+    Section:         .text
+    Value:           0x550
+  - Name:            __do_global_dtors_aux
+    Type:            STT_FUNC
+    Section:         .text
+    Value:           0x590
+  - Name:            completed.7303
+    Type:            STT_OBJECT
+    Section:         .bss
+    Value:           0x201024
+    Size:            0x1
+  - Name:            __do_global_dtors_aux_fini_array_entry
+    Type:            STT_OBJECT
+    Section:         .fini_array
+    Value:           0x200DE8
+  - Name:            frame_dummy
+    Type:            STT_FUNC
+    Section:         .text
+    Value:           0x5D0
+  - Name:            __frame_dummy_init_array_entry
+    Type:            STT_OBJECT
+    Section:         .init_array
+    Value:           0x200DE0
+  - Name:            main.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            foo.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            'crtstuff.c (1)'
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            __FRAME_END__
+    Type:            STT_OBJECT
+    Section:         .eh_frame
+    Value:           0x7CC
+  - Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            __init_array_end
+    Section:         .init_array
+    Value:           0x200DE8
+  - Name:            _DYNAMIC
+    Type:            STT_OBJECT
+    Section:         .dynamic
+    Value:           0x200DF8
+  - Name:            __init_array_start
+    Section:         .init_array
+    Value:           0x200DE0
+  - Name:            __GNU_EH_FRAME_HDR
+    Section:         .eh_frame_hdr
+    Value:           0x69C
+  - Name:            _GLOBAL_OFFSET_TABLE_
+    Type:            STT_OBJECT
+    Section:         .got.plt
+    Value:           0x201000
+  - Name:            _init
+    Type:            STT_FUNC
+    Section:         .init
+    Value:           0x4B0
+  - Name:            __libc_csu_fini
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Value:           0x680
+    Size:            0x5
+  - Name:            _ITM_deregisterTMCloneTable
+    Binding:         STB_WEAK
+  - Name:            data_start
+    Section:         .data
+    Binding:         STB_WEAK
+    Value:           0x201020
+  - Name:            _edata
+    Section:         .data
+    Binding:         STB_GLOBAL
+    Value:           0x201024
+  - Name:            _fini
+    Type:            STT_FUNC
+    Section:         .fini
+    Binding:         STB_GLOBAL
+    Value:           0x688
+    Other:           [ STV_HIDDEN ]
+  - Name:            '__libc_start_main@@GLIBC_2.2.5'
+    Type:            STT_FUNC
+    Binding:         STB_GLOBAL
+  - Name:            __data_start
+    Section:         .data
+    Binding:         STB_GLOBAL
+    Value:           0x201020
+  - Name:            __gmon_start__
+    Binding:         STB_WEAK
+  - Name:            __dso_handle
+    Type:            STT_OBJECT
+    Section:         .data.rel.ro
+    Binding:         STB_GLOBAL
+    Value:           0x200DF0
+    Other:           [ STV_HIDDEN ]
+  - Name:            _IO_stdin_used
+    Type:            STT_OBJECT
+    Section:         .rodata
+    Binding:         STB_GLOBAL
+    Value:           0x698
+    Size:            0x4
+  - Name:            __libc_csu_init
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Value:           0x610
+    Size:            0x65
+  - Name:            foo
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Value:           0x600
+    Size:            0xB
+  - Name:            _end
+    Section:         .bss
+    Binding:         STB_GLOBAL
+    Value:           0x201028
+  - Name:            _start
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Value:           0x4F0
+    Size:            0x2F
+  - Name:            __bss_start
+    Section:         .bss
+    Binding:         STB_GLOBAL
+    Value:           0x201024
+  - Name:            main
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Value:           0x5E0
+    Size:            0x1C
+  - Name:            __TMC_END__
+    Type:            STT_OBJECT
+    Section:         .data
+    Binding:         STB_GLOBAL
+    Value:           0x201028
+    Other:           [ STV_HIDDEN ]
+  - Name:            _ITM_registerTMCloneTable
+    Binding:         STB_WEAK
+  - Name:            '__cxa_finalize@@GLIBC_2.2.5'
+    Type:            STT_FUNC
+    Binding:         STB_WEAK
+DynamicSymbols:
+  - Name:            _ITM_deregisterTMCloneTable
+    Binding:         STB_WEAK
+  - Name:            __libc_start_main
+    Type:            STT_FUNC
+    Binding:         STB_GLOBAL
+  - Name:            __gmon_start__
+    Binding:         STB_WEAK
+  - Name:            _ITM_registerTMCloneTable
+    Binding:         STB_WEAK
+  - Name:            __cxa_finalize
+    Type:            STT_FUNC
+    Binding:         STB_WEAK
+DWARF:
+  debug_str:
+    - .
+    - a.out-main.dwo
+    - a.out-foo.dwo
+  debug_addr:
+    - Length:          0xC
+      Version:         0x5
+      AddressSize:     0x8
+      Entries:
+        - Address:         0x5E0
+    - Length:          0xC
+      Version:         0x5
+      AddressSize:     0x8
+      Entries:
+        - Address:         0x600
+...

>From 110bb35f7d533a1e457292fab50620ce391ce4a0 Mon Sep 17 00:00:00 2001
From: lntue <35648136+lntue at users.noreply.github.com>
Date: Thu, 1 Feb 2024 12:16:28 -0500
Subject: [PATCH 38/42] [libc] Update libc_errno to work correctly in both
 overlay and full build modes. (#80177)

---
 libc/include/errno.h.def                      |  8 +++
 libc/src/errno/CMakeLists.txt                 | 13 ++++
 libc/src/errno/libc_errno.cpp                 | 60 +++++++++++--------
 libc/src/errno/libc_errno.h                   | 60 ++++++++-----------
 .../integration/startup/linux/tls_test.cpp    |  4 +-
 libc/test/src/errno/errno_test.cpp            |  2 +-
 libc/test/src/stdlib/StrtolTest.h             | 12 ++--
 libc/test/src/sys/mman/linux/madvise_test.cpp |  2 +-
 libc/test/src/sys/mman/linux/mlock_test.cpp   |  2 +-
 libc/test/src/sys/mman/linux/mmap_test.cpp    |  2 +-
 .../test/src/sys/mman/linux/mprotect_test.cpp |  2 +-
 .../src/sys/mman/linux/posix_madvise_test.cpp |  2 +-
 libc/test/src/time/asctime_r_test.cpp         |  6 +-
 libc/test/src/time/asctime_test.cpp           | 12 ++--
 14 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/libc/include/errno.h.def b/libc/include/errno.h.def
index d8f79dd47a0d1..90bd8bfecf2f1 100644
--- a/libc/include/errno.h.def
+++ b/libc/include/errno.h.def
@@ -44,7 +44,15 @@
 #endif
 
 #if !defined(__AMDGPU__) && !defined(__NVPTX__)
+
+#ifdef __cplusplus
+extern "C" {
+extern thread_local int __llvmlibc_errno;
+}
+#else
 extern _Thread_local int __llvmlibc_errno;
+#endif // __cplusplus
+
 #define errno __llvmlibc_errno
 #endif
 
diff --git a/libc/src/errno/CMakeLists.txt b/libc/src/errno/CMakeLists.txt
index e8868dc48c5e9..d9b8d9957c170 100644
--- a/libc/src/errno/CMakeLists.txt
+++ b/libc/src/errno/CMakeLists.txt
@@ -1,9 +1,22 @@
+# If we are in full build mode, we will provide the errno definition ourselves,
+# and if we are in overlay mode, we will just re-use the system's errno.
+# We are passing LIBC_FULL_BUILD flag in full build mode so that the
+# implementation of libc_errno will know if we are in full build mode or not.
+
+# TODO: Move LIBC_FULL_BUILD flag to _get_common_compile_options.
+set(full_build_flag "")
+if(LLVM_LIBC_FULL_BUILD)
+  set(full_build_flag "-DLIBC_FULL_BUILD")
+endif()
+
 add_entrypoint_object(
   errno
   SRCS
     libc_errno.cpp
   HDRS
     libc_errno.h     # Include this
+  COMPILE_OPTIONS
+    ${full_build_flag}
   DEPENDS
     libc.include.errno
     libc.src.__support.common
diff --git a/libc/src/errno/libc_errno.cpp b/libc/src/errno/libc_errno.cpp
index c8f0bffd0962e..e54bdd156d6bc 100644
--- a/libc/src/errno/libc_errno.cpp
+++ b/libc/src/errno/libc_errno.cpp
@@ -1,4 +1,4 @@
-//===-- Implementation of errno -------------------------------------------===//
+//===-- Implementation of libc_errno --------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -6,35 +6,43 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "src/__support/macros/attributes.h"
-#include "src/__support/macros/properties/architectures.h"
-
-namespace LIBC_NAMESPACE {
+#include "libc_errno.h"
 
 #ifdef LIBC_TARGET_ARCH_IS_GPU
-struct ErrnoConsumer {
-  void operator=(int) {}
-};
-#endif
+// LIBC_THREAD_LOCAL on GPU currently does nothing.  So essentially this is just
+// a global errno for gpu to use for now.
+extern "C" {
+LIBC_THREAD_LOCAL int __llvmlibc_gpu_errno;
+}
 
+void LIBC_NAMESPACE::Errno::operator=(int a) { __llvmlibc_gpu_errno = a; }
+LIBC_NAMESPACE::Errno::operator int() { return __llvmlibc_gpu_errno; }
+
+#elif !defined(LIBC_COPT_PUBLIC_PACKAGING)
+// This mode is for unit testing.  We just use our internal errno.
+LIBC_THREAD_LOCAL int __llvmlibc_internal_errno;
+
+void LIBC_NAMESPACE::Errno::operator=(int a) { __llvmlibc_internal_errno = a; }
+LIBC_NAMESPACE::Errno::operator int() { return __llvmlibc_internal_errno; }
+
+#elif defined(LIBC_FULL_BUILD)
+// This mode is for public libc archive, hermetic, and integration tests.
+// In full build mode, we provide the errno storage ourselves.
 extern "C" {
-#ifdef LIBC_COPT_PUBLIC_PACKAGING
-// TODO: Declare __llvmlibc_errno only under LIBC_COPT_PUBLIC_PACKAGING and
-// __llvmlibc_internal_errno otherwise.
-// In overlay mode, this will be an unused thread local variable as libc_errno
-// will resolve to errno from the system libc's errno.h. In full build mode
-// however, libc_errno will resolve to this thread local variable via the errno
-// macro defined in LLVM libc's public errno.h header file.
-// TODO: Use a macro to distinguish full build and overlay build which can be
-//       used to exclude __llvmlibc_errno under overlay build.
-#ifdef LIBC_TARGET_ARCH_IS_GPU
-ErrnoConsumer __llvmlibc_errno;
-#else
 LIBC_THREAD_LOCAL int __llvmlibc_errno;
-#endif // LIBC_TARGET_ARCH_IS_GPU
+}
+
+void LIBC_NAMESPACE::Errno::operator=(int a) { __llvmlibc_errno = a; }
+LIBC_NAMESPACE::Errno::operator int() { return __llvmlibc_errno; }
+
 #else
-LIBC_THREAD_LOCAL int __llvmlibc_internal_errno;
-#endif
-} // extern "C"
+// In overlay mode, we simply use the system errno.
+#include <errno.h>
+
+void LIBC_NAMESPACE::Errno::operator=(int a) { errno = a; }
+LIBC_NAMESPACE::Errno::operator int() { return errno; }
+
+#endif // LIBC_FULL_BUILD
 
-} // namespace LIBC_NAMESPACE
+// Define the global `libc_errno` instance.
+LIBC_NAMESPACE::Errno libc_errno;
diff --git a/libc/src/errno/libc_errno.h b/libc/src/errno/libc_errno.h
index fbcd1c3395cd1..632faeaaeff66 100644
--- a/libc/src/errno/libc_errno.h
+++ b/libc/src/errno/libc_errno.h
@@ -1,4 +1,4 @@
-//===-- Implementation header for errno -------------------------*- C++ -*-===//
+//===-- Implementation header for libc_errno --------------------*- C++ -*-===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -12,45 +12,35 @@
 #include "src/__support/macros/attributes.h"
 #include "src/__support/macros/properties/architectures.h"
 
+// TODO: https://github.com/llvm/llvm-project/issues/80172
+// Separate just the definition of errno numbers in
+// include/llvm-libc-macros/* and only include that instead of the system
+// <errno.h>.
 #include <errno.h>
 
-// If we are targeting the GPU we currently don't support 'errno'. We simply
-// consume it.
-#ifdef LIBC_TARGET_ARCH_IS_GPU
-namespace LIBC_NAMESPACE {
-struct ErrnoConsumer {
-  void operator=(int) {}
-};
-} // namespace LIBC_NAMESPACE
-#endif
-
-// All of the libc runtime and test code should use the "libc_errno" macro. They
-// should not refer to the "errno" macro directly.
-#ifdef LIBC_COPT_PUBLIC_PACKAGING
-#ifdef LIBC_TARGET_ARCH_IS_GPU
-extern "C" LIBC_NAMESPACE::ErrnoConsumer __llvmlibc_errno;
-#define libc_errno __llvmlibc_errno
-#else
-// This macro will resolve to errno from the errno.h file included above. Under
-// full build, this will be LLVM libc's errno. In overlay build, it will be
-// system libc's errno.
-#define libc_errno errno
-#endif
-#else
-namespace LIBC_NAMESPACE {
+// This header is to be consumed by internal implementations, in which all of
+// them should refer to `libc_errno` instead of using `errno` directly from
+// <errno.h> header.
 
-// TODO: On the GPU build this will be mapped to a single global value. We need
-// to ensure that tests are not run with multiple threads that depend on errno
-// until we have true 'thread_local' support on the GPU.
-extern "C" LIBC_THREAD_LOCAL int __llvmlibc_internal_errno;
+// Unit and hermetic tests should:
+// - #include "src/errno/libc_errno.h"
+// - NOT #include <errno.h>
+// - Only use `libc_errno` in the code
+// - Depend on libc.src.errno.errno
 
-// TODO: After all of libc/src and libc/test are switched over to use
-// libc_errno, this header file will be "shipped" via an add_entrypoint_object
-// target. At which point libc_errno, should point to __llvmlibc_internal_errno
-// if LIBC_COPT_PUBLIC_PACKAGING is not defined.
-#define libc_errno LIBC_NAMESPACE::__llvmlibc_internal_errno
+// Integration tests should:
+// - NOT #include "src/errno/libc_errno.h"
+// - #include <errno.h>
+// - Use regular `errno` in the code
+// - Still depend on libc.src.errno.errno
 
+namespace LIBC_NAMESPACE {
+struct Errno {
+  void operator=(int);
+  operator int();
+};
 } // namespace LIBC_NAMESPACE
-#endif
+
+extern LIBC_NAMESPACE::Errno libc_errno;
 
 #endif // LLVM_LIBC_SRC_ERRNO_LIBC_ERRNO_H
diff --git a/libc/test/integration/startup/linux/tls_test.cpp b/libc/test/integration/startup/linux/tls_test.cpp
index 5f235a96006d6..cf2ff94931ba6 100644
--- a/libc/test/integration/startup/linux/tls_test.cpp
+++ b/libc/test/integration/startup/linux/tls_test.cpp
@@ -28,11 +28,11 @@ TEST_MAIN(int argc, char **argv, char **envp) {
   // set in errno. Since errno is implemented using a thread
   // local var, this helps us test setting of errno and
   // reading it back.
-  ASSERT_TRUE(libc_errno == 0);
+  ASSERT_ERRNO_SUCCESS();
   void *addr = LIBC_NAMESPACE::mmap(nullptr, 0, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
   ASSERT_TRUE(addr == MAP_FAILED);
-  ASSERT_TRUE(libc_errno == EINVAL);
+  ASSERT_ERRNO_SUCCESS();
 
   return 0;
 }
diff --git a/libc/test/src/errno/errno_test.cpp b/libc/test/src/errno/errno_test.cpp
index 33185c2bcf6f5..876ebfc0ac269 100644
--- a/libc/test/src/errno/errno_test.cpp
+++ b/libc/test/src/errno/errno_test.cpp
@@ -12,5 +12,5 @@
 TEST(LlvmLibcErrnoTest, Basic) {
   int test_val = 123;
   libc_errno = test_val;
-  ASSERT_EQ(test_val, libc_errno);
+  ASSERT_ERRNO_EQ(test_val);
 }
diff --git a/libc/test/src/stdlib/StrtolTest.h b/libc/test/src/stdlib/StrtolTest.h
index 8f1723b038612..50ed4cca3950e 100644
--- a/libc/test/src/stdlib/StrtolTest.h
+++ b/libc/test/src/stdlib/StrtolTest.h
@@ -331,8 +331,7 @@ struct StrtoTest : public LIBC_NAMESPACE::testing::Test {
               ((is_signed_v<ReturnT> && sizeof(ReturnT) == 4)
                    ? T_MAX
                    : ReturnT(0xFFFFFFFF)));
-    ASSERT_EQ(libc_errno,
-              is_signed_v<ReturnT> && sizeof(ReturnT) == 4 ? ERANGE : 0);
+    ASSERT_ERRNO_EQ(is_signed_v<ReturnT> && sizeof(ReturnT) == 4 ? ERANGE : 0);
     EXPECT_EQ(str_end - max_32_bit_value, ptrdiff_t(10));
 
     const char *negative_max_32_bit_value = "-0xFFFFFFFF";
@@ -341,8 +340,7 @@ struct StrtoTest : public LIBC_NAMESPACE::testing::Test {
               ((is_signed_v<ReturnT> && sizeof(ReturnT) == 4)
                    ? T_MIN
                    : -ReturnT(0xFFFFFFFF)));
-    ASSERT_EQ(libc_errno,
-              is_signed_v<ReturnT> && sizeof(ReturnT) == 4 ? ERANGE : 0);
+    ASSERT_ERRNO_EQ(is_signed_v<ReturnT> && sizeof(ReturnT) == 4 ? ERANGE : 0);
     EXPECT_EQ(str_end - negative_max_32_bit_value, ptrdiff_t(11));
 
     // Max size for signed 32 bit numbers
@@ -368,8 +366,7 @@ struct StrtoTest : public LIBC_NAMESPACE::testing::Test {
               (is_signed_v<ReturnT> || sizeof(ReturnT) < 8
                    ? T_MAX
                    : ReturnT(0xFFFFFFFFFFFFFFFF)));
-    ASSERT_EQ(libc_errno,
-              (is_signed_v<ReturnT> || sizeof(ReturnT) < 8 ? ERANGE : 0));
+    ASSERT_ERRNO_EQ((is_signed_v<ReturnT> || sizeof(ReturnT) < 8 ? ERANGE : 0));
     EXPECT_EQ(str_end - max_64_bit_value, ptrdiff_t(18));
 
     // See the end of CleanBase10Decode for an explanation of how this large
@@ -381,8 +378,7 @@ struct StrtoTest : public LIBC_NAMESPACE::testing::Test {
         (is_signed_v<ReturnT>
              ? T_MIN
              : (sizeof(ReturnT) < 8 ? T_MAX : -ReturnT(0xFFFFFFFFFFFFFFFF))));
-    ASSERT_EQ(libc_errno,
-              (is_signed_v<ReturnT> || sizeof(ReturnT) < 8 ? ERANGE : 0));
+    ASSERT_ERRNO_EQ((is_signed_v<ReturnT> || sizeof(ReturnT) < 8 ? ERANGE : 0));
     EXPECT_EQ(str_end - negative_max_64_bit_value, ptrdiff_t(19));
 
     // Max size for signed 64 bit numbers
diff --git a/libc/test/src/sys/mman/linux/madvise_test.cpp b/libc/test/src/sys/mman/linux/madvise_test.cpp
index 83c73f5454de1..e45cf19f8913e 100644
--- a/libc/test/src/sys/mman/linux/madvise_test.cpp
+++ b/libc/test/src/sys/mman/linux/madvise_test.cpp
@@ -23,7 +23,7 @@ TEST(LlvmLibcMadviseTest, NoError) {
   libc_errno = 0;
   void *addr = LIBC_NAMESPACE::mmap(nullptr, alloc_size, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
-  EXPECT_EQ(0, libc_errno);
+  ASSERT_ERRNO_SUCCESS();
   EXPECT_NE(addr, MAP_FAILED);
 
   EXPECT_THAT(LIBC_NAMESPACE::madvise(addr, alloc_size, MADV_RANDOM),
diff --git a/libc/test/src/sys/mman/linux/mlock_test.cpp b/libc/test/src/sys/mman/linux/mlock_test.cpp
index a4e1682ff32bc..f1d1af1e76920 100644
--- a/libc/test/src/sys/mman/linux/mlock_test.cpp
+++ b/libc/test/src/sys/mman/linux/mlock_test.cpp
@@ -123,7 +123,7 @@ TEST(LlvmLibcMlockTest, InvalidFlag) {
   libc_errno = 0;
   void *addr = LIBC_NAMESPACE::mmap(nullptr, alloc_size, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
-  EXPECT_EQ(0, libc_errno);
+  ASSERT_ERRNO_SUCCESS();
   EXPECT_NE(addr, MAP_FAILED);
 
   // Invalid mlock2 flags.
diff --git a/libc/test/src/sys/mman/linux/mmap_test.cpp b/libc/test/src/sys/mman/linux/mmap_test.cpp
index 9b13b8bd8057f..b996f26db8605 100644
--- a/libc/test/src/sys/mman/linux/mmap_test.cpp
+++ b/libc/test/src/sys/mman/linux/mmap_test.cpp
@@ -22,7 +22,7 @@ TEST(LlvmLibcMMapTest, NoError) {
   libc_errno = 0;
   void *addr = LIBC_NAMESPACE::mmap(nullptr, alloc_size, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
-  EXPECT_EQ(0, libc_errno);
+  ASSERT_ERRNO_SUCCESS();
   EXPECT_NE(addr, MAP_FAILED);
 
   int *array = reinterpret_cast<int *>(addr);
diff --git a/libc/test/src/sys/mman/linux/mprotect_test.cpp b/libc/test/src/sys/mman/linux/mprotect_test.cpp
index 7127f77714d64..96f625984101d 100644
--- a/libc/test/src/sys/mman/linux/mprotect_test.cpp
+++ b/libc/test/src/sys/mman/linux/mprotect_test.cpp
@@ -24,7 +24,7 @@ TEST(LlvmLibcMProtectTest, NoError) {
   libc_errno = 0;
   void *addr = LIBC_NAMESPACE::mmap(nullptr, alloc_size, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
-  EXPECT_EQ(0, libc_errno);
+  ASSERT_ERRNO_SUCCESS();
   EXPECT_NE(addr, MAP_FAILED);
 
   int *array = reinterpret_cast<int *>(addr);
diff --git a/libc/test/src/sys/mman/linux/posix_madvise_test.cpp b/libc/test/src/sys/mman/linux/posix_madvise_test.cpp
index 59cf01ac74695..d20db69042b7a 100644
--- a/libc/test/src/sys/mman/linux/posix_madvise_test.cpp
+++ b/libc/test/src/sys/mman/linux/posix_madvise_test.cpp
@@ -23,7 +23,7 @@ TEST(LlvmLibcPosixMadviseTest, NoError) {
   libc_errno = 0;
   void *addr = LIBC_NAMESPACE::mmap(nullptr, alloc_size, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
-  EXPECT_EQ(0, libc_errno);
+  ASSERT_ERRNO_SUCCESS();
   EXPECT_NE(addr, MAP_FAILED);
 
   EXPECT_EQ(LIBC_NAMESPACE::posix_madvise(addr, alloc_size, POSIX_MADV_RANDOM),
diff --git a/libc/test/src/time/asctime_r_test.cpp b/libc/test/src/time/asctime_r_test.cpp
index 1abaa135350c1..f3aadbb39de4d 100644
--- a/libc/test/src/time/asctime_r_test.cpp
+++ b/libc/test/src/time/asctime_r_test.cpp
@@ -27,17 +27,17 @@ static inline char *call_asctime_r(struct tm *tm_data, int year, int month,
 TEST(LlvmLibcAsctimeR, Nullptr) {
   char *result;
   result = LIBC_NAMESPACE::asctime_r(nullptr, nullptr);
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
   ASSERT_STREQ(nullptr, result);
 
   char buffer[TimeConstants::ASCTIME_BUFFER_SIZE];
   result = LIBC_NAMESPACE::asctime_r(nullptr, buffer);
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
   ASSERT_STREQ(nullptr, result);
 
   struct tm tm_data;
   result = LIBC_NAMESPACE::asctime_r(&tm_data, nullptr);
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
   ASSERT_STREQ(nullptr, result);
 }
 
diff --git a/libc/test/src/time/asctime_test.cpp b/libc/test/src/time/asctime_test.cpp
index 4b5ceb596aa46..169a7463a3037 100644
--- a/libc/test/src/time/asctime_test.cpp
+++ b/libc/test/src/time/asctime_test.cpp
@@ -22,7 +22,7 @@ static inline char *call_asctime(struct tm *tm_data, int year, int month,
 TEST(LlvmLibcAsctime, Nullptr) {
   char *result;
   result = LIBC_NAMESPACE::asctime(nullptr);
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
   ASSERT_STREQ(nullptr, result);
 }
 
@@ -40,7 +40,7 @@ TEST(LlvmLibcAsctime, InvalidWday) {
                0,    // sec
                -1,   // wday
                0);   // yday
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
 
   // Test with wday = 7.
   call_asctime(&tm_data,
@@ -52,7 +52,7 @@ TEST(LlvmLibcAsctime, InvalidWday) {
                0,    // sec
                7,    // wday
                0);   // yday
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
 }
 
 // Months are from January to December. Test passing invalid value in month.
@@ -69,7 +69,7 @@ TEST(LlvmLibcAsctime, InvalidMonth) {
                0,    // sec
                4,    // wday
                0);   // yday
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
 
   // Test with month = 13.
   call_asctime(&tm_data,
@@ -81,7 +81,7 @@ TEST(LlvmLibcAsctime, InvalidMonth) {
                0,    // sec
                4,    // wday
                0);   // yday
-  ASSERT_EQ(EINVAL, libc_errno);
+  ASSERT_ERRNO_EQ(EINVAL);
 }
 
 TEST(LlvmLibcAsctime, ValidWeekdays) {
@@ -209,6 +209,6 @@ TEST(LlvmLibcAsctime, Max64BitYear) {
                         50,         // sec
                         2,          // wday
                         50);        // yday
-  ASSERT_EQ(EOVERFLOW, libc_errno);
+  ASSERT_ERRNO_EQ(EOVERFLOW);
   ASSERT_STREQ(nullptr, result);
 }

>From e07d6b1f05f02cbf8630953426f4f31b33c84099 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <yaxun.liu at amd.com>
Date: Thu, 1 Feb 2024 12:21:19 -0500
Subject: [PATCH 39/42] [AMDGPU] Mark PC_ADD_REL_OFFSET rematerializable
 (#79674)

Currently machine LICM hoist PC_ADD_REL_OFFSET out of loops, causes
register pressure when function calls are deep in loops. This is a main
cause of sgpr spill for programs containing large number of function
calls in loops.

This patch marks PC_ADD_REL_OFFSET as rematerializable, which eliminates
sgpr spills due to function calls in loops.
---
 llvm/lib/Target/AMDGPU/SIInstructions.td      |   1 +
 ...ne-sink-temporal-divergence-swdev407790.ll | 228 ++++++++++--------
 .../AMDGPU/tuple-allocation-failure.ll        |  48 ++--
 .../AMDGPU/unstructured-cfg-def-use-issue.ll  |  34 ++-
 4 files changed, 168 insertions(+), 143 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 788e3162fb37e..b593b7dbfe082 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -1036,6 +1036,7 @@ let isConvergent = 1 in {
   defm SI_SPILL_WWM_AV32 : SI_SPILL_VGPR <AV_32, 1>;
 }
 
+let isReMaterializable = 1, isAsCheapAsAMove = 1 in
 def SI_PC_ADD_REL_OFFSET : SPseudoInstSI <
   (outs SReg_64:$dst),
   (ins si_ga:$ptr_lo, si_ga:$ptr_hi),
diff --git a/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll b/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
index d9f6ce0b4c851..138a6a86cee98 100644
--- a/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
+++ b/llvm/test/CodeGen/AMDGPU/machine-sink-temporal-divergence-swdev407790.ll
@@ -87,10 +87,10 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    ds_write_b32 v45, v45 offset:15360
-; CHECK-NEXT:    s_getpc_b64 s[52:53]
-; CHECK-NEXT:    s_add_u32 s52, s52, _Z7barrierj at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s53, s53, _Z7barrierj at rel32@hi+12
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[52:53]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z7barrierj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z7barrierj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshrrev_b32_e32 v0, 1, v43
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v1, 2, v43
 ; CHECK-NEXT:    v_mov_b32_e32 v31, v40
@@ -111,7 +111,7 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_mov_b32_e32 v1, 12
 ; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_mov_b32_e32 v42, v0
-; CHECK-NEXT:    s_mov_b32 s48, exec_lo
+; CHECK-NEXT:    s_mov_b32 s42, exec_lo
 ; CHECK-NEXT:    v_cmpx_ne_u32_e32 0, v42
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_25
 ; CHECK-NEXT:  ; %bb.1: ; %.preheader5
@@ -130,7 +130,7 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:  ; %bb.3:
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s4
 ; CHECK-NEXT:    v_add_nc_u32_e32 v45, -1, v42
-; CHECK-NEXT:    s_mov_b32 s49, 0
+; CHECK-NEXT:    s_mov_b32 s43, 0
 ; CHECK-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v45
 ; CHECK-NEXT:    s_and_b32 exec_lo, exec_lo, vcc_lo
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_25
@@ -138,47 +138,44 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v43, 10, v43
 ; CHECK-NEXT:    v_add_nc_u32_e32 v46, 0x3c05, v0
 ; CHECK-NEXT:    v_mov_b32_e32 v47, 0
-; CHECK-NEXT:    s_getpc_b64 s[42:43]
-; CHECK-NEXT:    s_add_u32 s42, s42, _Z10atomic_incPU3AS3Vj at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s43, s43, _Z10atomic_incPU3AS3Vj at rel32@hi+12
-; CHECK-NEXT:    s_mov_b32 s55, 0
+; CHECK-NEXT:    s_mov_b32 s49, 0
 ; CHECK-NEXT:  .LBB0_5: ; =>This Loop Header: Depth=1
 ; CHECK-NEXT:    ; Child Loop BB0_8 Depth 2
 ; CHECK-NEXT:    ; Child Loop BB0_20 Depth 2
-; CHECK-NEXT:    v_add_nc_u32_e32 v0, s55, v44
-; CHECK-NEXT:    s_lshl_b32 s4, s55, 5
-; CHECK-NEXT:    s_add_i32 s54, s55, 1
-; CHECK-NEXT:    s_add_i32 s5, s55, 5
-; CHECK-NEXT:    v_or3_b32 v57, s4, v43, s54
+; CHECK-NEXT:    v_add_nc_u32_e32 v0, s49, v44
+; CHECK-NEXT:    s_lshl_b32 s4, s49, 5
+; CHECK-NEXT:    s_add_i32 s48, s49, 1
+; CHECK-NEXT:    s_add_i32 s5, s49, 5
+; CHECK-NEXT:    v_or3_b32 v57, s4, v43, s48
 ; CHECK-NEXT:    ds_read_u8 v0, v0
-; CHECK-NEXT:    v_mov_b32_e32 v58, s54
-; CHECK-NEXT:    s_mov_b32 s56, exec_lo
+; CHECK-NEXT:    v_mov_b32_e32 v58, s48
+; CHECK-NEXT:    s_mov_b32 s52, exec_lo
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_and_b32_e32 v56, 0xff, v0
 ; CHECK-NEXT:    v_cmpx_lt_u32_e64 s5, v42
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_17
 ; CHECK-NEXT:  ; %bb.6: ; %.preheader2
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
-; CHECK-NEXT:    s_mov_b32 s57, 0
-; CHECK-NEXT:    s_mov_b32 s58, 0
+; CHECK-NEXT:    s_mov_b32 s53, 0
+; CHECK-NEXT:    s_mov_b32 s54, 0
 ; CHECK-NEXT:    s_branch .LBB0_8
 ; CHECK-NEXT:  .LBB0_7: ; in Loop: Header=BB0_8 Depth=2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s59
-; CHECK-NEXT:    s_add_i32 s58, s58, 4
-; CHECK-NEXT:    s_add_i32 s4, s55, s58
-; CHECK-NEXT:    v_add_nc_u32_e32 v0, s58, v57
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
+; CHECK-NEXT:    s_add_i32 s54, s54, 4
+; CHECK-NEXT:    s_add_i32 s4, s49, s54
+; CHECK-NEXT:    v_add_nc_u32_e32 v0, s54, v57
 ; CHECK-NEXT:    s_add_i32 s5, s4, 5
 ; CHECK-NEXT:    s_add_i32 s4, s4, 1
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, s5, v42
 ; CHECK-NEXT:    v_mov_b32_e32 v58, s4
-; CHECK-NEXT:    s_or_b32 s57, vcc_lo, s57
-; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s57
+; CHECK-NEXT:    s_or_b32 s53, vcc_lo, s53
+; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s53
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_16
 ; CHECK-NEXT:  .LBB0_8: ; Parent Loop BB0_5 Depth=1
 ; CHECK-NEXT:    ; => This Inner Loop Header: Depth=2
-; CHECK-NEXT:    v_add_nc_u32_e32 v59, s58, v46
-; CHECK-NEXT:    v_add_nc_u32_e32 v58, s58, v57
-; CHECK-NEXT:    s_mov_b32 s59, exec_lo
+; CHECK-NEXT:    v_add_nc_u32_e32 v59, s54, v46
+; CHECK-NEXT:    v_add_nc_u32_e32 v58, s54, v57
+; CHECK-NEXT:    s_mov_b32 s55, exec_lo
 ; CHECK-NEXT:    ds_read_u8 v0, v59
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_cmpx_eq_u16_e64 v56, v0
@@ -194,13 +191,16 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    v_add_nc_u32_e32 v47, 1, v47
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_incPU3AS3Vj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_incPU3AS3Vj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v58
 ; CHECK-NEXT:  .LBB0_10: ; in Loop: Header=BB0_8 Depth=2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s59
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
 ; CHECK-NEXT:    ds_read_u8 v0, v59 offset:1
-; CHECK-NEXT:    s_mov_b32 s59, exec_lo
+; CHECK-NEXT:    s_mov_b32 s55, exec_lo
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_cmpx_eq_u16_e64 v56, v0
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_12
@@ -216,13 +216,16 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    v_add_nc_u32_e32 v60, 1, v58
 ; CHECK-NEXT:    v_add_nc_u32_e32 v47, 1, v47
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_incPU3AS3Vj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_incPU3AS3Vj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v60
 ; CHECK-NEXT:  .LBB0_12: ; in Loop: Header=BB0_8 Depth=2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s59
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
 ; CHECK-NEXT:    ds_read_u8 v0, v59 offset:2
-; CHECK-NEXT:    s_mov_b32 s59, exec_lo
+; CHECK-NEXT:    s_mov_b32 s55, exec_lo
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_cmpx_eq_u16_e64 v56, v0
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_14
@@ -238,13 +241,16 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    v_add_nc_u32_e32 v60, 2, v58
 ; CHECK-NEXT:    v_add_nc_u32_e32 v47, 1, v47
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_incPU3AS3Vj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_incPU3AS3Vj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v60
 ; CHECK-NEXT:  .LBB0_14: ; in Loop: Header=BB0_8 Depth=2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s59
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
 ; CHECK-NEXT:    ds_read_u8 v0, v59 offset:3
-; CHECK-NEXT:    s_mov_b32 s59, exec_lo
+; CHECK-NEXT:    s_mov_b32 s55, exec_lo
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_cmpx_eq_u16_e64 v56, v0
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_7
@@ -260,38 +266,41 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    v_add_nc_u32_e32 v58, 3, v58
 ; CHECK-NEXT:    v_add_nc_u32_e32 v47, 1, v47
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_incPU3AS3Vj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_incPU3AS3Vj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v58
 ; CHECK-NEXT:    s_branch .LBB0_7
 ; CHECK-NEXT:  .LBB0_16: ; %Flow43
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s57
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s53
 ; CHECK-NEXT:    v_mov_b32_e32 v57, v0
 ; CHECK-NEXT:  .LBB0_17: ; %Flow44
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s56
-; CHECK-NEXT:    s_mov_b32 s55, exec_lo
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s52
+; CHECK-NEXT:    s_mov_b32 s49, exec_lo
 ; CHECK-NEXT:    v_cmpx_lt_u32_e64 v58, v42
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_23
 ; CHECK-NEXT:  ; %bb.18: ; %.preheader
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
-; CHECK-NEXT:    s_mov_b32 s56, 0
+; CHECK-NEXT:    s_mov_b32 s52, 0
 ; CHECK-NEXT:    s_inst_prefetch 0x1
 ; CHECK-NEXT:    s_branch .LBB0_20
 ; CHECK-NEXT:    .p2align 6
 ; CHECK-NEXT:  .LBB0_19: ; in Loop: Header=BB0_20 Depth=2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s57
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s53
 ; CHECK-NEXT:    v_add_nc_u32_e32 v58, 1, v58
 ; CHECK-NEXT:    v_add_nc_u32_e32 v57, 1, v57
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, v58, v42
-; CHECK-NEXT:    s_or_b32 s56, vcc_lo, s56
-; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s56
+; CHECK-NEXT:    s_or_b32 s52, vcc_lo, s52
+; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s52
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_22
 ; CHECK-NEXT:  .LBB0_20: ; Parent Loop BB0_5 Depth=1
 ; CHECK-NEXT:    ; => This Inner Loop Header: Depth=2
 ; CHECK-NEXT:    v_add_nc_u32_e32 v0, v44, v58
-; CHECK-NEXT:    s_mov_b32 s57, exec_lo
+; CHECK-NEXT:    s_mov_b32 s53, exec_lo
 ; CHECK-NEXT:    ds_read_u8 v0, v0
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_cmpx_eq_u16_e64 v56, v0
@@ -307,29 +316,32 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    v_add_nc_u32_e32 v47, 1, v47
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_incPU3AS3Vj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_incPU3AS3Vj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v57
 ; CHECK-NEXT:    s_branch .LBB0_19
 ; CHECK-NEXT:  .LBB0_22: ; %Flow41
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
 ; CHECK-NEXT:    s_inst_prefetch 0x2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s56
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s52
 ; CHECK-NEXT:  .LBB0_23: ; %Flow42
 ; CHECK-NEXT:    ; in Loop: Header=BB0_5 Depth=1
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s49
 ; CHECK-NEXT:  ; %bb.24: ; in Loop: Header=BB0_5 Depth=1
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, s54, v45
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, s48, v45
 ; CHECK-NEXT:    v_cmp_lt_u32_e64 s4, 59, v47
 ; CHECK-NEXT:    v_add_nc_u32_e32 v46, 1, v46
-; CHECK-NEXT:    s_mov_b32 s55, s54
+; CHECK-NEXT:    s_mov_b32 s49, s48
 ; CHECK-NEXT:    s_or_b32 s4, vcc_lo, s4
 ; CHECK-NEXT:    s_and_b32 s4, exec_lo, s4
-; CHECK-NEXT:    s_or_b32 s49, s4, s49
-; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s49
+; CHECK-NEXT:    s_or_b32 s43, s4, s43
+; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s43
 ; CHECK-NEXT:    s_cbranch_execnz .LBB0_5
 ; CHECK-NEXT:  .LBB0_25: ; %Flow49
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s48
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s42
 ; CHECK-NEXT:    v_mov_b32_e32 v31, v40
 ; CHECK-NEXT:    v_mov_b32_e32 v0, 1
 ; CHECK-NEXT:    s_add_u32 s8, s34, 40
@@ -339,7 +351,10 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[52:53]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z7barrierj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z7barrierj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_mov_b32_e32 v0, 0
 ; CHECK-NEXT:    s_mov_b32 s4, exec_lo
 ; CHECK-NEXT:    ds_read_b32 v47, v0 offset:15360
@@ -347,21 +362,12 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_cmpx_gt_u32_e64 v47, v41
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_33
 ; CHECK-NEXT:  ; %bb.26:
-; CHECK-NEXT:    s_add_u32 s52, s44, 8
-; CHECK-NEXT:    s_addc_u32 s53, s45, 0
-; CHECK-NEXT:    s_getpc_b64 s[42:43]
-; CHECK-NEXT:    s_add_u32 s42, s42, _Z10atomic_addPU3AS1Vjj at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s43, s43, _Z10atomic_addPU3AS1Vjj at rel32@hi+12
-; CHECK-NEXT:    s_mov_b32 s54, 0
-; CHECK-NEXT:    s_getpc_b64 s[44:45]
-; CHECK-NEXT:    s_add_u32 s44, s44, _Z10atomic_subPU3AS1Vjj at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s45, s45, _Z10atomic_subPU3AS1Vjj at rel32@hi+12
-; CHECK-NEXT:    s_getpc_b64 s[48:49]
-; CHECK-NEXT:    s_add_u32 s48, s48, _Z14get_local_sizej at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s49, s49, _Z14get_local_sizej at rel32@hi+12
+; CHECK-NEXT:    s_add_u32 s42, s44, 8
+; CHECK-NEXT:    s_addc_u32 s43, s45, 0
+; CHECK-NEXT:    s_mov_b32 s44, 0
 ; CHECK-NEXT:    s_branch .LBB0_28
 ; CHECK-NEXT:  .LBB0_27: ; in Loop: Header=BB0_28 Depth=1
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s55
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s45
 ; CHECK-NEXT:    v_mov_b32_e32 v31, v40
 ; CHECK-NEXT:    v_mov_b32_e32 v0, 0
 ; CHECK-NEXT:    s_add_u32 s8, s34, 40
@@ -371,15 +377,18 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[48:49]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z14get_local_sizej at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z14get_local_sizej at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_add_co_u32 v41, vcc_lo, v0, v41
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc_lo, v47, v41
-; CHECK-NEXT:    s_or_b32 s54, vcc_lo, s54
-; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s54
+; CHECK-NEXT:    s_or_b32 s44, vcc_lo, s44
+; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s44
 ; CHECK-NEXT:    s_cbranch_execz .LBB0_33
 ; CHECK-NEXT:  .LBB0_28: ; =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v41
-; CHECK-NEXT:    s_mov_b32 s55, exec_lo
+; CHECK-NEXT:    s_mov_b32 s45, exec_lo
 ; CHECK-NEXT:    ds_read_b32 v0, v0
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_lshrrev_b32_e32 v63, 10, v0
@@ -388,8 +397,8 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_mul_u32_u24_e32 v1, 0x180, v63
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 5, v62
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v4, 5, v72
-; CHECK-NEXT:    v_add_co_u32 v2, s4, s52, v1
-; CHECK-NEXT:    v_add_co_ci_u32_e64 v3, null, s53, 0, s4
+; CHECK-NEXT:    v_add_co_u32 v2, s4, s42, v1
+; CHECK-NEXT:    v_add_co_ci_u32_e64 v3, null, s43, 0, s4
 ; CHECK-NEXT:    v_add_co_u32 v0, vcc_lo, v2, v0
 ; CHECK-NEXT:    v_add_co_ci_u32_e32 v1, vcc_lo, 0, v3, vcc_lo
 ; CHECK-NEXT:    v_add_co_u32 v2, vcc_lo, v2, v4
@@ -425,6 +434,9 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_addPU3AS1Vjj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_addPU3AS1Vjj at rel32@hi+12
 ; CHECK-NEXT:    v_or3_b32 v73, v2, v0, v1
 ; CHECK-NEXT:    v_lshrrev_b32_e32 v0, 1, v73
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v1, 2, v73
@@ -437,7 +449,7 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    v_mov_b32_e32 v0, v42
 ; CHECK-NEXT:    s_mov_b64 s[4:5], s[38:39]
 ; CHECK-NEXT:    v_mov_b32_e32 v1, v43
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_bfe_u32 v0, v0, v74, 4
 ; CHECK-NEXT:    s_mov_b32 s4, exec_lo
 ; CHECK-NEXT:    v_cmpx_gt_u32_e32 12, v0
@@ -482,7 +494,10 @@ define protected amdgpu_kernel void @kernel_round1(ptr addrspace(1) nocapture no
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[44:45]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_subPU3AS1Vjj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_subPU3AS1Vjj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    s_branch .LBB0_27
 ; CHECK-NEXT:  .LBB0_33:
 ; CHECK-NEXT:    s_endpgm
@@ -765,7 +780,7 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    s_addc_u32 s11, s11, 0
 ; CHECK-NEXT:    s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s10
 ; CHECK-NEXT:    s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s11
-; CHECK-NEXT:    s_load_dwordx2 s[46:47], s[6:7], 0x10
+; CHECK-NEXT:    s_load_dwordx2 s[44:45], s[6:7], 0x10
 ; CHECK-NEXT:    s_add_u32 s0, s0, s15
 ; CHECK-NEXT:    s_mov_b64 s[36:37], s[6:7]
 ; CHECK-NEXT:    s_addc_u32 s1, s1, 0
@@ -809,11 +824,11 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    ds_write_b32 v43, v43 offset:15360
-; CHECK-NEXT:    s_getpc_b64 s[44:45]
-; CHECK-NEXT:    s_add_u32 s44, s44, _Z7barrierj at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s45, s45, _Z7barrierj at rel32@hi+12
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z7barrierj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z7barrierj at rel32@hi+12
 ; CHECK-NEXT:    v_add_nc_u32_e32 v44, 0x3c04, v46
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[44:45]
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshrrev_b32_e32 v0, 1, v42
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v1, 2, v42
 ; CHECK-NEXT:    v_mov_b32_e32 v31, v40
@@ -824,7 +839,7 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    s_mov_b64 s[10:11], s[34:35]
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
-; CHECK-NEXT:    global_load_dword v0, v0, s[46:47]
+; CHECK-NEXT:    global_load_dword v0, v0, s[44:45]
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    s_getpc_b64 s[6:7]
 ; CHECK-NEXT:    s_add_u32 s6, s6, _Z3minjj at rel32@lo+4
@@ -835,25 +850,22 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_mov_b32_e32 v41, v0
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v42, 10, v42
-; CHECK-NEXT:    s_getpc_b64 s[42:43]
-; CHECK-NEXT:    s_add_u32 s42, s42, _Z10atomic_incPU3AS3Vj at rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s43, s43, _Z10atomic_incPU3AS3Vj at rel32@hi+12
-; CHECK-NEXT:    s_mov_b32 s46, 0
+; CHECK-NEXT:    s_mov_b32 s42, 0
 ; CHECK-NEXT:    s_mov_b32 s4, 0
-; CHECK-NEXT:    v_add_nc_u32_e32 v45, -1, v41
 ; CHECK-NEXT:    ds_write_b8 v46, v43 offset:15364
+; CHECK-NEXT:    v_add_nc_u32_e32 v45, -1, v41
 ; CHECK-NEXT:  .LBB1_1: ; %.37
 ; CHECK-NEXT:    ; =>This Loop Header: Depth=1
 ; CHECK-NEXT:    ; Child Loop BB1_3 Depth 2
 ; CHECK-NEXT:    ; Child Loop BB1_8 Depth 2
 ; CHECK-NEXT:    v_add_nc_u32_e32 v0, s4, v44
 ; CHECK-NEXT:    s_lshl_b32 s5, s4, 5
-; CHECK-NEXT:    s_add_i32 s47, s4, 1
+; CHECK-NEXT:    s_add_i32 s43, s4, 1
 ; CHECK-NEXT:    s_add_i32 s6, s4, 5
-; CHECK-NEXT:    v_or3_b32 v47, s5, v42, s47
+; CHECK-NEXT:    v_or3_b32 v47, s5, v42, s43
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    ds_read_u8 v46, v0
-; CHECK-NEXT:    v_mov_b32_e32 v56, s47
+; CHECK-NEXT:    v_mov_b32_e32 v56, s43
 ; CHECK-NEXT:    s_mov_b32 s5, exec_lo
 ; CHECK-NEXT:    v_cmpx_lt_u32_e64 s6, v41
 ; CHECK-NEXT:    s_cbranch_execz .LBB1_5
@@ -882,23 +894,23 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:  .LBB1_5: ; %Flow4
 ; CHECK-NEXT:    ; in Loop: Header=BB1_1 Depth=1
 ; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s5
-; CHECK-NEXT:    s_mov_b32 s48, exec_lo
+; CHECK-NEXT:    s_mov_b32 s44, exec_lo
 ; CHECK-NEXT:    v_cmpx_lt_u32_e64 v56, v41
 ; CHECK-NEXT:    s_cbranch_execz .LBB1_11
 ; CHECK-NEXT:  ; %bb.6: ; %.103.preheader
 ; CHECK-NEXT:    ; in Loop: Header=BB1_1 Depth=1
-; CHECK-NEXT:    s_mov_b32 s49, 0
+; CHECK-NEXT:    s_mov_b32 s45, 0
 ; CHECK-NEXT:    s_inst_prefetch 0x1
 ; CHECK-NEXT:    s_branch .LBB1_8
 ; CHECK-NEXT:    .p2align 6
 ; CHECK-NEXT:  .LBB1_7: ; %.114
 ; CHECK-NEXT:    ; in Loop: Header=BB1_8 Depth=2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s50
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s46
 ; CHECK-NEXT:    v_add_nc_u32_e32 v56, 1, v56
 ; CHECK-NEXT:    v_add_nc_u32_e32 v47, 1, v47
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, v56, v41
-; CHECK-NEXT:    s_or_b32 s49, vcc_lo, s49
-; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s49
+; CHECK-NEXT:    s_or_b32 s45, vcc_lo, s45
+; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s45
 ; CHECK-NEXT:    s_cbranch_execz .LBB1_10
 ; CHECK-NEXT:  .LBB1_8: ; %.103
 ; CHECK-NEXT:    ; Parent Loop BB1_1 Depth=1
@@ -907,7 +919,7 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    ds_read_u8 v0, v0
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    v_cmp_eq_u16_sdwa s4, v46, v0 src0_sel:BYTE_0 src1_sel:DWORD
-; CHECK-NEXT:    s_and_saveexec_b32 s50, s4
+; CHECK-NEXT:    s_and_saveexec_b32 s46, s4
 ; CHECK-NEXT:    s_cbranch_execz .LBB1_7
 ; CHECK-NEXT:  ; %bb.9: ; %.110
 ; CHECK-NEXT:    ; in Loop: Header=BB1_8 Depth=2
@@ -921,29 +933,32 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
 ; CHECK-NEXT:    v_add_nc_u32_e32 v43, 1, v43
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[42:43]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z10atomic_incPU3AS3Vj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z10atomic_incPU3AS3Vj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; CHECK-NEXT:    ds_write_b32 v0, v47
 ; CHECK-NEXT:    s_branch .LBB1_7
 ; CHECK-NEXT:  .LBB1_10: ; %Flow
 ; CHECK-NEXT:    ; in Loop: Header=BB1_1 Depth=1
 ; CHECK-NEXT:    s_inst_prefetch 0x2
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s49
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s45
 ; CHECK-NEXT:  .LBB1_11: ; %Flow2
 ; CHECK-NEXT:    ; in Loop: Header=BB1_1 Depth=1
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s48
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s44
 ; CHECK-NEXT:  ; %bb.12: ; %.32
 ; CHECK-NEXT:    ; in Loop: Header=BB1_1 Depth=1
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, s47, v45
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc_lo, s43, v45
 ; CHECK-NEXT:    v_cmp_lt_u32_e64 s4, 59, v43
 ; CHECK-NEXT:    s_or_b32 s4, vcc_lo, s4
 ; CHECK-NEXT:    s_and_b32 s4, exec_lo, s4
-; CHECK-NEXT:    s_or_b32 s46, s4, s46
-; CHECK-NEXT:    s_mov_b32 s4, s47
-; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s46
+; CHECK-NEXT:    s_or_b32 s42, s4, s42
+; CHECK-NEXT:    s_mov_b32 s4, s43
+; CHECK-NEXT:    s_andn2_b32 exec_lo, exec_lo, s42
 ; CHECK-NEXT:    s_cbranch_execnz .LBB1_1
 ; CHECK-NEXT:  ; %bb.13: ; %.119
-; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s46
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s42
 ; CHECK-NEXT:    v_mov_b32_e32 v31, v40
 ; CHECK-NEXT:    v_mov_b32_e32 v0, 1
 ; CHECK-NEXT:    s_add_u32 s8, s36, 40
@@ -953,7 +968,10 @@ define protected amdgpu_kernel void @kernel_round1_short(ptr addrspace(1) nocapt
 ; CHECK-NEXT:    s_mov_b32 s12, s41
 ; CHECK-NEXT:    s_mov_b32 s13, s40
 ; CHECK-NEXT:    s_mov_b32 s14, s33
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[44:45]
+; CHECK-NEXT:    s_getpc_b64 s[6:7]
+; CHECK-NEXT:    s_add_u32 s6, s6, _Z7barrierj at rel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s7, s7, _Z7barrierj at rel32@hi+12
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; CHECK-NEXT:    s_endpgm
 .5:
   %.6 = tail call i64 @_Z13get_global_idj(i32 noundef 0) #4
diff --git a/llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll b/llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
index dd8ff64a4eec2..1118cc3b16463 100644
--- a/llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
+++ b/llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s | FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s | FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s
 
@@ -68,13 +68,9 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    v_cmp_ne_u32_e64 s[46:47], 1, v0
 ; GLOBALNESS1-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
 ; GLOBALNESS1-NEXT:    s_cselect_b64 s[4:5], -1, 0
-; GLOBALNESS1-NEXT:    s_getpc_b64 s[6:7]
-; GLOBALNESS1-NEXT:    s_add_u32 s6, s6, wobble at gotpcrel32@lo+4
-; GLOBALNESS1-NEXT:    s_addc_u32 s7, s7, wobble at gotpcrel32@hi+12
 ; GLOBALNESS1-NEXT:    s_xor_b64 s[4:5], s[4:5], -1
 ; GLOBALNESS1-NEXT:    v_cmp_ne_u32_e64 s[48:49], 1, v0
 ; GLOBALNESS1-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
-; GLOBALNESS1-NEXT:    s_load_dwordx2 s[72:73], s[6:7], 0x0
 ; GLOBALNESS1-NEXT:    v_cmp_ne_u32_e64 s[50:51], 1, v0
 ; GLOBALNESS1-NEXT:    v_cmp_ne_u32_e64 s[42:43], 1, v1
 ; GLOBALNESS1-NEXT:    v_cmp_ne_u32_e64 s[44:45], 1, v3
@@ -122,6 +118,10 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    buffer_store_dword v42, off, s[0:3], 0
 ; GLOBALNESS1-NEXT:    flat_load_dword v46, v[0:1]
 ; GLOBALNESS1-NEXT:    s_addc_u32 s9, s39, 0
+; GLOBALNESS1-NEXT:    s_getpc_b64 s[4:5]
+; GLOBALNESS1-NEXT:    s_add_u32 s4, s4, wobble at gotpcrel32@lo+4
+; GLOBALNESS1-NEXT:    s_addc_u32 s5, s5, wobble at gotpcrel32@hi+12
+; GLOBALNESS1-NEXT:    s_load_dwordx2 s[6:7], s[4:5], 0x0
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[4:5], s[36:37]
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[10:11], s[34:35]
 ; GLOBALNESS1-NEXT:    s_mov_b32 s12, s70
@@ -129,7 +129,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    s_mov_b32 s14, s68
 ; GLOBALNESS1-NEXT:    v_mov_b32_e32 v31, v41
 ; GLOBALNESS1-NEXT:    s_waitcnt lgkmcnt(0)
-; GLOBALNESS1-NEXT:    s_swappc_b64 s[30:31], s[72:73]
+; GLOBALNESS1-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; GLOBALNESS1-NEXT:    s_and_b64 vcc, exec, s[44:45]
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[6:7], -1
 ; GLOBALNESS1-NEXT:    ; implicit-def: $sgpr4_sgpr5
@@ -165,7 +165,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    v_cmp_gt_i32_e64 s[60:61], 0, v0
 ; GLOBALNESS1-NEXT:    v_mov_b32_e32 v0, 0
 ; GLOBALNESS1-NEXT:    v_mov_b32_e32 v1, 0x3ff00000
-; GLOBALNESS1-NEXT:    s_and_saveexec_b64 s[76:77], s[60:61]
+; GLOBALNESS1-NEXT:    s_and_saveexec_b64 s[72:73], s[60:61]
 ; GLOBALNESS1-NEXT:    s_cbranch_execz .LBB1_26
 ; GLOBALNESS1-NEXT:  ; %bb.11: ; %bb33.i
 ; GLOBALNESS1-NEXT:    ; in Loop: Header=BB1_4 Depth=1
@@ -222,6 +222,10 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    ; in Loop: Header=BB1_16 Depth=2
 ; GLOBALNESS1-NEXT:    s_add_u32 s66, s38, 40
 ; GLOBALNESS1-NEXT:    s_addc_u32 s67, s39, 0
+; GLOBALNESS1-NEXT:    s_getpc_b64 s[4:5]
+; GLOBALNESS1-NEXT:    s_add_u32 s4, s4, wobble at gotpcrel32@lo+4
+; GLOBALNESS1-NEXT:    s_addc_u32 s5, s5, wobble at gotpcrel32@hi+12
+; GLOBALNESS1-NEXT:    s_load_dwordx2 s[76:77], s[4:5], 0x0
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[4:5], s[36:37]
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[8:9], s[66:67]
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[10:11], s[34:35]
@@ -229,7 +233,8 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    s_mov_b32 s13, s69
 ; GLOBALNESS1-NEXT:    s_mov_b32 s14, s68
 ; GLOBALNESS1-NEXT:    v_mov_b32_e32 v31, v41
-; GLOBALNESS1-NEXT:    s_swappc_b64 s[30:31], s[72:73]
+; GLOBALNESS1-NEXT:    s_waitcnt lgkmcnt(0)
+; GLOBALNESS1-NEXT:    s_swappc_b64 s[30:31], s[76:77]
 ; GLOBALNESS1-NEXT:    v_pk_mov_b32 v[46:47], 0, 0
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[4:5], s[36:37]
 ; GLOBALNESS1-NEXT:    s_mov_b64 s[8:9], s[66:67]
@@ -239,7 +244,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    s_mov_b32 s14, s68
 ; GLOBALNESS1-NEXT:    v_mov_b32_e32 v31, v41
 ; GLOBALNESS1-NEXT:    global_store_dwordx2 v[46:47], v[44:45], off
-; GLOBALNESS1-NEXT:    s_swappc_b64 s[30:31], s[72:73]
+; GLOBALNESS1-NEXT:    s_swappc_b64 s[30:31], s[76:77]
 ; GLOBALNESS1-NEXT:    s_and_saveexec_b64 s[4:5], s[62:63]
 ; GLOBALNESS1-NEXT:    s_cbranch_execz .LBB1_14
 ; GLOBALNESS1-NEXT:  ; %bb.23: ; %bb62.i
@@ -256,7 +261,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS1-NEXT:    v_pk_mov_b32 v[0:1], 0, 0
 ; GLOBALNESS1-NEXT:  .LBB1_26: ; %Flow24
 ; GLOBALNESS1-NEXT:    ; in Loop: Header=BB1_4 Depth=1
-; GLOBALNESS1-NEXT:    s_or_b64 exec, exec, s[76:77]
+; GLOBALNESS1-NEXT:    s_or_b64 exec, exec, s[72:73]
 ; GLOBALNESS1-NEXT:    s_and_saveexec_b64 s[4:5], s[60:61]
 ; GLOBALNESS1-NEXT:    s_cbranch_execz .LBB1_2
 ; GLOBALNESS1-NEXT:  ; %bb.27: ; %bb67.i
@@ -350,13 +355,9 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    v_cmp_ne_u32_e64 s[46:47], 1, v0
 ; GLOBALNESS0-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
 ; GLOBALNESS0-NEXT:    s_cselect_b64 s[4:5], -1, 0
-; GLOBALNESS0-NEXT:    s_getpc_b64 s[6:7]
-; GLOBALNESS0-NEXT:    s_add_u32 s6, s6, wobble at gotpcrel32@lo+4
-; GLOBALNESS0-NEXT:    s_addc_u32 s7, s7, wobble at gotpcrel32@hi+12
 ; GLOBALNESS0-NEXT:    s_xor_b64 s[4:5], s[4:5], -1
 ; GLOBALNESS0-NEXT:    v_cmp_ne_u32_e64 s[48:49], 1, v0
 ; GLOBALNESS0-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
-; GLOBALNESS0-NEXT:    s_load_dwordx2 s[72:73], s[6:7], 0x0
 ; GLOBALNESS0-NEXT:    v_cmp_ne_u32_e64 s[50:51], 1, v0
 ; GLOBALNESS0-NEXT:    v_cmp_ne_u32_e64 s[42:43], 1, v1
 ; GLOBALNESS0-NEXT:    v_cmp_ne_u32_e64 s[44:45], 1, v3
@@ -404,6 +405,10 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    buffer_store_dword v42, off, s[0:3], 0
 ; GLOBALNESS0-NEXT:    flat_load_dword v46, v[0:1]
 ; GLOBALNESS0-NEXT:    s_addc_u32 s9, s39, 0
+; GLOBALNESS0-NEXT:    s_getpc_b64 s[4:5]
+; GLOBALNESS0-NEXT:    s_add_u32 s4, s4, wobble at gotpcrel32@lo+4
+; GLOBALNESS0-NEXT:    s_addc_u32 s5, s5, wobble at gotpcrel32@hi+12
+; GLOBALNESS0-NEXT:    s_load_dwordx2 s[6:7], s[4:5], 0x0
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[4:5], s[36:37]
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[10:11], s[34:35]
 ; GLOBALNESS0-NEXT:    s_mov_b32 s12, s68
@@ -411,7 +416,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    s_mov_b32 s14, s66
 ; GLOBALNESS0-NEXT:    v_mov_b32_e32 v31, v41
 ; GLOBALNESS0-NEXT:    s_waitcnt lgkmcnt(0)
-; GLOBALNESS0-NEXT:    s_swappc_b64 s[30:31], s[72:73]
+; GLOBALNESS0-NEXT:    s_swappc_b64 s[30:31], s[6:7]
 ; GLOBALNESS0-NEXT:    s_and_b64 vcc, exec, s[44:45]
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[6:7], -1
 ; GLOBALNESS0-NEXT:    ; implicit-def: $sgpr4_sgpr5
@@ -447,7 +452,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    v_cmp_gt_i32_e64 s[60:61], 0, v0
 ; GLOBALNESS0-NEXT:    v_mov_b32_e32 v0, 0
 ; GLOBALNESS0-NEXT:    v_mov_b32_e32 v1, 0x3ff00000
-; GLOBALNESS0-NEXT:    s_and_saveexec_b64 s[76:77], s[60:61]
+; GLOBALNESS0-NEXT:    s_and_saveexec_b64 s[72:73], s[60:61]
 ; GLOBALNESS0-NEXT:    s_cbranch_execz .LBB1_26
 ; GLOBALNESS0-NEXT:  ; %bb.11: ; %bb33.i
 ; GLOBALNESS0-NEXT:    ; in Loop: Header=BB1_4 Depth=1
@@ -504,6 +509,10 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    ; in Loop: Header=BB1_16 Depth=2
 ; GLOBALNESS0-NEXT:    s_add_u32 s70, s38, 40
 ; GLOBALNESS0-NEXT:    s_addc_u32 s71, s39, 0
+; GLOBALNESS0-NEXT:    s_getpc_b64 s[4:5]
+; GLOBALNESS0-NEXT:    s_add_u32 s4, s4, wobble at gotpcrel32@lo+4
+; GLOBALNESS0-NEXT:    s_addc_u32 s5, s5, wobble at gotpcrel32@hi+12
+; GLOBALNESS0-NEXT:    s_load_dwordx2 s[76:77], s[4:5], 0x0
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[4:5], s[36:37]
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[8:9], s[70:71]
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[10:11], s[34:35]
@@ -511,7 +520,8 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    s_mov_b32 s13, s67
 ; GLOBALNESS0-NEXT:    s_mov_b32 s14, s66
 ; GLOBALNESS0-NEXT:    v_mov_b32_e32 v31, v41
-; GLOBALNESS0-NEXT:    s_swappc_b64 s[30:31], s[72:73]
+; GLOBALNESS0-NEXT:    s_waitcnt lgkmcnt(0)
+; GLOBALNESS0-NEXT:    s_swappc_b64 s[30:31], s[76:77]
 ; GLOBALNESS0-NEXT:    v_pk_mov_b32 v[46:47], 0, 0
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[4:5], s[36:37]
 ; GLOBALNESS0-NEXT:    s_mov_b64 s[8:9], s[70:71]
@@ -521,7 +531,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    s_mov_b32 s14, s66
 ; GLOBALNESS0-NEXT:    v_mov_b32_e32 v31, v41
 ; GLOBALNESS0-NEXT:    global_store_dwordx2 v[46:47], v[44:45], off
-; GLOBALNESS0-NEXT:    s_swappc_b64 s[30:31], s[72:73]
+; GLOBALNESS0-NEXT:    s_swappc_b64 s[30:31], s[76:77]
 ; GLOBALNESS0-NEXT:    s_and_saveexec_b64 s[4:5], s[62:63]
 ; GLOBALNESS0-NEXT:    s_cbranch_execz .LBB1_14
 ; GLOBALNESS0-NEXT:  ; %bb.23: ; %bb62.i
@@ -538,7 +548,7 @@ define amdgpu_kernel void @kernel(ptr addrspace(1) %arg1.global, i1 %tmp3.i.i, i
 ; GLOBALNESS0-NEXT:    v_pk_mov_b32 v[0:1], 0, 0
 ; GLOBALNESS0-NEXT:  .LBB1_26: ; %Flow24
 ; GLOBALNESS0-NEXT:    ; in Loop: Header=BB1_4 Depth=1
-; GLOBALNESS0-NEXT:    s_or_b64 exec, exec, s[76:77]
+; GLOBALNESS0-NEXT:    s_or_b64 exec, exec, s[72:73]
 ; GLOBALNESS0-NEXT:    s_and_saveexec_b64 s[4:5], s[60:61]
 ; GLOBALNESS0-NEXT:    s_cbranch_execz .LBB1_2
 ; GLOBALNESS0-NEXT:  ; %bb.27: ; %bb67.i
diff --git a/llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll b/llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
index ebbce68221a94..ed7182914cd92 100644
--- a/llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
+++ b/llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
 ; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s | FileCheck -check-prefix=GCN %s
 ; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s | FileCheck -check-prefix=SI-OPT %s
 
@@ -259,7 +259,7 @@ define hidden void @blam() {
 ; GCN-NEXT:    s_or_saveexec_b64 s[18:19], -1
 ; GCN-NEXT:    buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
 ; GCN-NEXT:    s_mov_b64 exec, s[18:19]
-; GCN-NEXT:    v_writelane_b32 v45, s16, 28
+; GCN-NEXT:    v_writelane_b32 v45, s16, 26
 ; GCN-NEXT:    s_addk_i32 s32, 0x800
 ; GCN-NEXT:    buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
 ; GCN-NEXT:    buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
@@ -292,8 +292,6 @@ define hidden void @blam() {
 ; GCN-NEXT:    v_writelane_b32 v45, s55, 23
 ; GCN-NEXT:    v_writelane_b32 v45, s56, 24
 ; GCN-NEXT:    v_writelane_b32 v45, s57, 25
-; GCN-NEXT:    v_writelane_b32 v45, s58, 26
-; GCN-NEXT:    v_writelane_b32 v45, s59, 27
 ; GCN-NEXT:    s_mov_b64 s[34:35], s[6:7]
 ; GCN-NEXT:    v_mov_b32_e32 v40, v31
 ; GCN-NEXT:    s_mov_b32 s46, s15
@@ -306,15 +304,12 @@ define hidden void @blam() {
 ; GCN-NEXT:    v_mov_b32_e32 v0, 0
 ; GCN-NEXT:    v_mov_b32_e32 v1, 0
 ; GCN-NEXT:    v_and_b32_e32 v2, 0x3ff, v40
-; GCN-NEXT:    v_mov_b32_e32 v42, 0
 ; GCN-NEXT:    flat_load_dword v43, v[0:1]
+; GCN-NEXT:    v_mov_b32_e32 v42, 0
 ; GCN-NEXT:    s_mov_b64 s[50:51], 0
-; GCN-NEXT:    s_getpc_b64 s[52:53]
-; GCN-NEXT:    s_add_u32 s52, s52, spam at rel32@lo+4
-; GCN-NEXT:    s_addc_u32 s53, s53, spam at rel32@hi+12
 ; GCN-NEXT:    v_lshlrev_b32_e32 v41, 2, v2
 ; GCN-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_cmp_eq_f32_e64 s[54:55], 0, v43
+; GCN-NEXT:    v_cmp_eq_f32_e64 s[52:53], 0, v43
 ; GCN-NEXT:    v_cmp_neq_f32_e64 s[42:43], 0, v43
 ; GCN-NEXT:    v_mov_b32_e32 v44, 0x7fc00000
 ; GCN-NEXT:    s_branch .LBB1_2
@@ -334,15 +329,18 @@ define hidden void @blam() {
 ; GCN-NEXT:    v_cmp_lt_i32_e32 vcc, 2, v0
 ; GCN-NEXT:    s_mov_b64 s[4:5], -1
 ; GCN-NEXT:    s_and_saveexec_b64 s[8:9], vcc
-; GCN-NEXT:    s_xor_b64 s[56:57], exec, s[8:9]
+; GCN-NEXT:    s_xor_b64 s[54:55], exec, s[8:9]
 ; GCN-NEXT:    s_cbranch_execz .LBB1_12
 ; GCN-NEXT:  ; %bb.3: ; %bb6
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
 ; GCN-NEXT:    v_cmp_eq_u32_e64 s[44:45], 3, v0
-; GCN-NEXT:    s_and_saveexec_b64 s[58:59], s[44:45]
+; GCN-NEXT:    s_and_saveexec_b64 s[56:57], s[44:45]
 ; GCN-NEXT:    s_cbranch_execz .LBB1_11
 ; GCN-NEXT:  ; %bb.4: ; %bb11
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
+; GCN-NEXT:    s_getpc_b64 s[16:17]
+; GCN-NEXT:    s_add_u32 s16, s16, spam at rel32@lo+4
+; GCN-NEXT:    s_addc_u32 s17, s17, spam at rel32@hi+12
 ; GCN-NEXT:    s_mov_b64 s[4:5], s[40:41]
 ; GCN-NEXT:    s_mov_b64 s[6:7], s[34:35]
 ; GCN-NEXT:    s_mov_b64 s[8:9], s[38:39]
@@ -352,20 +350,20 @@ define hidden void @blam() {
 ; GCN-NEXT:    s_mov_b32 s14, s47
 ; GCN-NEXT:    s_mov_b32 s15, s46
 ; GCN-NEXT:    v_mov_b32_e32 v31, v40
-; GCN-NEXT:    s_swappc_b64 s[30:31], s[52:53]
+; GCN-NEXT:    s_swappc_b64 s[30:31], s[16:17]
 ; GCN-NEXT:    v_cmp_neq_f32_e32 vcc, 0, v0
 ; GCN-NEXT:    s_mov_b64 s[6:7], 0
 ; GCN-NEXT:    s_and_saveexec_b64 s[4:5], vcc
 ; GCN-NEXT:    s_cbranch_execz .LBB1_10
 ; GCN-NEXT:  ; %bb.5: ; %bb14
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
-; GCN-NEXT:    s_mov_b64 s[8:9], s[54:55]
+; GCN-NEXT:    s_mov_b64 s[8:9], s[52:53]
 ; GCN-NEXT:    s_and_saveexec_b64 s[6:7], s[42:43]
 ; GCN-NEXT:    s_cbranch_execz .LBB1_7
 ; GCN-NEXT:  ; %bb.6: ; %bb16
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
 ; GCN-NEXT:    buffer_store_dword v44, off, s[0:3], 0
-; GCN-NEXT:    s_or_b64 s[8:9], s[54:55], exec
+; GCN-NEXT:    s_or_b64 s[8:9], s[52:53], exec
 ; GCN-NEXT:  .LBB1_7: ; %Flow3
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
 ; GCN-NEXT:    s_or_b64 exec, exec, s[6:7]
@@ -390,13 +388,13 @@ define hidden void @blam() {
 ; GCN-NEXT:    s_and_b64 s[6:7], s[6:7], exec
 ; GCN-NEXT:  .LBB1_11: ; %Flow1
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
-; GCN-NEXT:    s_or_b64 exec, exec, s[58:59]
+; GCN-NEXT:    s_or_b64 exec, exec, s[56:57]
 ; GCN-NEXT:    s_orn2_b64 s[4:5], s[44:45], exec
 ; GCN-NEXT:    s_and_b64 s[6:7], s[6:7], exec
 ; GCN-NEXT:    ; implicit-def: $vgpr0
 ; GCN-NEXT:  .LBB1_12: ; %Flow
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
-; GCN-NEXT:    s_andn2_saveexec_b64 s[8:9], s[56:57]
+; GCN-NEXT:    s_andn2_saveexec_b64 s[8:9], s[54:55]
 ; GCN-NEXT:    s_cbranch_execz .LBB1_16
 ; GCN-NEXT:  ; %bb.13: ; %bb8
 ; GCN-NEXT:    ; in Loop: Header=BB1_2 Depth=1
@@ -429,8 +427,6 @@ define hidden void @blam() {
 ; GCN-NEXT:    s_branch .LBB1_1
 ; GCN-NEXT:  .LBB1_18: ; %DummyReturnBlock
 ; GCN-NEXT:    s_or_b64 exec, exec, s[50:51]
-; GCN-NEXT:    v_readlane_b32 s59, v45, 27
-; GCN-NEXT:    v_readlane_b32 s58, v45, 26
 ; GCN-NEXT:    v_readlane_b32 s57, v45, 25
 ; GCN-NEXT:    v_readlane_b32 s56, v45, 24
 ; GCN-NEXT:    v_readlane_b32 s55, v45, 23
@@ -462,7 +458,7 @@ define hidden void @blam() {
 ; GCN-NEXT:    buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
 ; GCN-NEXT:    buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
 ; GCN-NEXT:    buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
-; GCN-NEXT:    v_readlane_b32 s4, v45, 28
+; GCN-NEXT:    v_readlane_b32 s4, v45, 26
 ; GCN-NEXT:    s_or_saveexec_b64 s[6:7], -1
 ; GCN-NEXT:    buffer_load_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
 ; GCN-NEXT:    s_mov_b64 exec, s[6:7]

>From 6b2a4b398468d940163b640d7f1582f7f0171b19 Mon Sep 17 00:00:00 2001
From: lntue <35648136+lntue at users.noreply.github.com>
Date: Thu, 1 Feb 2024 12:31:35 -0500
Subject: [PATCH 40/42] [libc] Fix wrong errno number in tls_test. (#80312)

---
 libc/test/integration/startup/linux/tls_test.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libc/test/integration/startup/linux/tls_test.cpp b/libc/test/integration/startup/linux/tls_test.cpp
index cf2ff94931ba6..2a6385e195a49 100644
--- a/libc/test/integration/startup/linux/tls_test.cpp
+++ b/libc/test/integration/startup/linux/tls_test.cpp
@@ -32,7 +32,7 @@ TEST_MAIN(int argc, char **argv, char **envp) {
   void *addr = LIBC_NAMESPACE::mmap(nullptr, 0, PROT_READ,
                                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
   ASSERT_TRUE(addr == MAP_FAILED);
-  ASSERT_ERRNO_SUCCESS();
+  ASSERT_ERRNO_EQ(EINVAL);
 
   return 0;
 }

>From 03a60c73398401c5be9b0b0f483a643f49bd9113 Mon Sep 17 00:00:00 2001
From: Matin Raayai <30674652+matinraayai at users.noreply.github.com>
Date: Thu, 1 Feb 2024 12:50:44 -0500
Subject: [PATCH 41/42] Fix Passing TargetOptions by Value in TargetMachines
 for AMDGPU (#79866)

`TargetOptions` is currently passed by value in AMDGPU targets, which
makes unnecessary copies. This PR fixes this issue.
---
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 4 ++--
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h   | 4 ++--
 llvm/lib/Target/AMDGPU/R600TargetMachine.cpp   | 2 +-
 llvm/lib/Target/AMDGPU/R600TargetMachine.h     | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index b8a7a5e208021..e26b4cf820a52 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -571,7 +571,7 @@ static Reloc::Model getEffectiveRelocModel(std::optional<Reloc::Model> RM) {
 
 AMDGPUTargetMachine::AMDGPUTargetMachine(const Target &T, const Triple &TT,
                                          StringRef CPU, StringRef FS,
-                                         TargetOptions Options,
+                                         const TargetOptions &Options,
                                          std::optional<Reloc::Model> RM,
                                          std::optional<CodeModel::Model> CM,
                                          CodeGenOptLevel OptLevel)
@@ -863,7 +863,7 @@ AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind(unsigned Kind) const {
 
 GCNTargetMachine::GCNTargetMachine(const Target &T, const Triple &TT,
                                    StringRef CPU, StringRef FS,
-                                   TargetOptions Options,
+                                   const TargetOptions &Options,
                                    std::optional<Reloc::Model> RM,
                                    std::optional<CodeModel::Model> CM,
                                    CodeGenOptLevel OL, bool JIT)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
index 99c9db3e654a6..ce2dd2947daf6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
@@ -39,7 +39,7 @@ class AMDGPUTargetMachine : public LLVMTargetMachine {
   static bool EnableLowerModuleLDS;
 
   AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
-                      StringRef FS, TargetOptions Options,
+                      StringRef FS, const TargetOptions &Options,
                       std::optional<Reloc::Model> RM,
                       std::optional<CodeModel::Model> CM, CodeGenOptLevel OL);
   ~AMDGPUTargetMachine() override;
@@ -78,7 +78,7 @@ class GCNTargetMachine final : public AMDGPUTargetMachine {
 
 public:
   GCNTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
-                   StringRef FS, TargetOptions Options,
+                   StringRef FS, const TargetOptions &Options,
                    std::optional<Reloc::Model> RM,
                    std::optional<CodeModel::Model> CM, CodeGenOptLevel OL,
                    bool JIT);
diff --git a/llvm/lib/Target/AMDGPU/R600TargetMachine.cpp b/llvm/lib/Target/AMDGPU/R600TargetMachine.cpp
index 6cd4fd42444dd..2461263866a96 100644
--- a/llvm/lib/Target/AMDGPU/R600TargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/R600TargetMachine.cpp
@@ -50,7 +50,7 @@ static MachineSchedRegistry R600SchedRegistry("r600",
 
 R600TargetMachine::R600TargetMachine(const Target &T, const Triple &TT,
                                      StringRef CPU, StringRef FS,
-                                     TargetOptions Options,
+                                     const TargetOptions &Options,
                                      std::optional<Reloc::Model> RM,
                                      std::optional<CodeModel::Model> CM,
                                      CodeGenOptLevel OL, bool JIT)
diff --git a/llvm/lib/Target/AMDGPU/R600TargetMachine.h b/llvm/lib/Target/AMDGPU/R600TargetMachine.h
index 3fe54c778fe15..af8dcb8488679 100644
--- a/llvm/lib/Target/AMDGPU/R600TargetMachine.h
+++ b/llvm/lib/Target/AMDGPU/R600TargetMachine.h
@@ -31,7 +31,7 @@ class R600TargetMachine final : public AMDGPUTargetMachine {
 
 public:
   R600TargetMachine(const Target &T, const Triple &TT, StringRef CPU,
-                    StringRef FS, TargetOptions Options,
+                    StringRef FS, const TargetOptions &Options,
                     std::optional<Reloc::Model> RM,
                     std::optional<CodeModel::Model> CM, CodeGenOptLevel OL,
                     bool JIT);

>From 7d2c1eb97abf162522d030f9dfe9e281758f24b3 Mon Sep 17 00:00:00 2001
From: Peter Smith <peter.smith at arm.com>
Date: Thu, 1 Feb 2024 18:10:37 +0000
Subject: [PATCH 42/42] 2023 transparency report fix typo

---
 llvm/docs/SecurityTransparencyReports.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/SecurityTransparencyReports.rst b/llvm/docs/SecurityTransparencyReports.rst
index c8e3cd45c98ef..b43a85a012f41 100644
--- a/llvm/docs/SecurityTransparencyReports.rst
+++ b/llvm/docs/SecurityTransparencyReports.rst
@@ -100,7 +100,7 @@ on an old version of xml2js with a CVE filed against it.
 https://bugs.chromium.org/p/llvm/issues/detail?id=45 reports a number of
 dependencies that have had vulnerabilities reported against them.
 
-https://bugs.chromium.org/p/llvm/issues/detail?id=46 is related to issue 43
+https://bugs.chromium.org/p/llvm/issues/detail?id=46 is related to issue 43.
 
 https://bugs.chromium.org/p/llvm/issues/detail?id=48 reports a buffer overflow
 in std::format from -fexperimental-library.