[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

Ivan R. Ivanov via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Wed Aug 21 20:48:50 PDT 2024


=?utf-8?q?Björn?= Pettersson <bjorn.a.pettersson at ericsson.com>,Abid
 Qadeer <haqadeer at amd.com>,Sergio Afonso <safonsof at amd.com>,Tom Eccles
 <tom.eccles at arm.com>,Alex Rice <alexrice999 at hotmail.co.uk>,magic-akari
 <akari.ccino at gmail.com>,Siu Chi Chan <siuchi.chan at amd.com>,Kyungwoo Lee
 <kyulee at meta.com>,Chenguang Wang <w3cing at gmail.com>,Johannes Doerfert
 <johannes at jdoerfert.de>,Joseph Huber <huberjn at outlook.com>,David Green
 <david.green at arm.com>,Sumanth Gundapaneni <sumanth.gundapaneni at amd.com>,Jacob
 Lalonde <jalalonde at fb.com>,Louis Dionne <ldionne.2 at gmail.com>,Louis Dionne
 <ldionne.2 at gmail.com>,Philip Reames <preames at rivosinc.com>,Harini0924
 <79345568+Harini0924 at users.noreply.github.com>,Michael Jones
 <michaelrj at google.com>,Mircea Trofin <mtrofin at google.com>,Jay Foad
 <jay.foad at amd.com>,LLVM GN Syncbot <llvmgnsyncbot at gmail.com>,Michael Kruse
 <llvm-project at meinersbur.de>,Adrian Vogelsgesang
 <avogelsgesang at salesforce.com>,Rahul Joshi <rjoshi at nvidia.com>,Krzysztof
 Parzyszek <Krzysztof.Parzyszek at amd.com>,Kyungwoo Lee <kyulee at meta.com>,Alexey
 Bataev <a.bataev at outlook.com>,Alexey Bataev <a.bataev at outlook.com>,Sander de
 Smalen <sander.desmalen at arm.com>,Jorge Gorbe Moya <jgorbe at google.com>,Rahul
 Joshi <rjoshi at nvidia.com>,Louis Dionne <ldionne.2 at gmail.com>,Slava Zakharin
 <szakharin at nvidia.com>,John Harrison <harjohn at google.com>,Louis Dionne
 <ldionne.2 at gmail.com>,Jonas Rickert <Jonas.Rickert at amd.com>,Dmitri Gribenko
 <gribozavr at gmail.com>,Dmitri Gribenko <gribozavr at gmail.com>,Dmitri Gribenko
 <gribozavr at gmail.com>,Volodymyr Vasylkun <vvmposeydon at gmail.com>,Jorge Gorbe
 Moya <jgorbe at google.com>,Rahul Joshi <rjoshi at nvidia.com>,Jorge Gorbe Moya
 <jgorbe at google.com>,Joseph Huber <huberjn at outlook.com>,Peter Klausler
 <35819229+klausler at users.noreply.github.com>,Kazu Hirata <kazu at google.com>,Kazu
 Hirata <kazu at google.com>,Adrian Prantl <aprantl at apple.com>,Mircea Trofin
 <mtrofin at google.com>,Connie Zhu
 <60797237+connieyzhu at users.noreply.github.com>,Jorge Gorbe Moya
 <jgorbe at google.com>,eddyz87 <eddyz87 at gmail.com>,Vitaly Buka
 <vitalybuka at google.com>,Shubham Sandeep Rastogi <srastogi22 at apple.com>,Alexander
 Shaposhnikov <ashaposhnikov at google.com>,Joseph Huber <huberjn at outlook.com>,John
 Harrison <harjohn at google.com>,vporpo <vporpodas at google.com>,Craig Topper
 <craig.topper at sifive.com>,"Ivan R. Ivanov" <ivanov.i.aa at m.titech.ac.jp>,Ivan
 Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov
 <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
 =?utf-8?q?,?=Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>,Ivan Radanov
 Ivanov <ivanov.i.aa at m.titech.ac.jp>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/104748 at github.com>


https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/104748

>From e1912a15b6b05aab36b7bcbe617980e8d808bd80 Mon Sep 17 00:00:00 2001
From: Rahul Joshi <rjoshi at nvidia.com>
Date: Wed, 21 Aug 2024 07:10:17 -0700
Subject: [PATCH 001/116] [NFC][ADT] Format StringRefTest.cpp to fit in 80
 columns. (#105502)

---
 llvm/unittests/ADT/StringRefTest.cpp | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/llvm/unittests/ADT/StringRefTest.cpp b/llvm/unittests/ADT/StringRefTest.cpp
index b3c206a336962d..40351c99d0185c 100644
--- a/llvm/unittests/ADT/StringRefTest.cpp
+++ b/llvm/unittests/ADT/StringRefTest.cpp
@@ -939,16 +939,17 @@ struct GetDoubleStrings {
   bool AllowInexact;
   bool ShouldFail;
   double D;
-} DoubleStrings[] = {{"0", false, false, 0.0},
-                     {"0.0", false, false, 0.0},
-                     {"-0.0", false, false, -0.0},
-                     {"123.45", false, true, 123.45},
-                     {"123.45", true, false, 123.45},
-                     {"1.8e308", true, false, std::numeric_limits<double>::infinity()},
-                     {"1.8e308", false, true, std::numeric_limits<double>::infinity()},
-                     {"0x0.0000000000001P-1023", false, true, 0.0},
-                     {"0x0.0000000000001P-1023", true, false, 0.0},
-                    };
+} DoubleStrings[] = {
+    {"0", false, false, 0.0},
+    {"0.0", false, false, 0.0},
+    {"-0.0", false, false, -0.0},
+    {"123.45", false, true, 123.45},
+    {"123.45", true, false, 123.45},
+    {"1.8e308", true, false, std::numeric_limits<double>::infinity()},
+    {"1.8e308", false, true, std::numeric_limits<double>::infinity()},
+    {"0x0.0000000000001P-1023", false, true, 0.0},
+    {"0x0.0000000000001P-1023", true, false, 0.0},
+};
 
 TEST(StringRefTest, getAsDouble) {
   for (const auto &Entry : DoubleStrings) {
@@ -1117,7 +1118,8 @@ TEST(StringRefTest, StringLiteral) {
   constexpr StringRef StringRefs[] = {"Foo", "Bar"};
   EXPECT_EQ(StringRef("Foo"), StringRefs[0]);
   EXPECT_EQ(3u, (std::integral_constant<size_t, StringRefs[0].size()>::value));
-  EXPECT_EQ(false, (std::integral_constant<bool, StringRefs[0].empty()>::value));
+  EXPECT_EQ(false,
+            (std::integral_constant<bool, StringRefs[0].empty()>::value));
   EXPECT_EQ(StringRef("Bar"), StringRefs[1]);
 
   constexpr StringLiteral Strings[] = {"Foo", "Bar"};

>From 3c8f139fb73a8610680b184afc88fe4b1485add0 Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Wed, 21 Aug 2024 16:11:57 +0200
Subject: [PATCH 002/116] [InstCombine] Add tests for icmp of select of cmp
 (NFC)

---
 .../test/Transforms/InstCombine/select-cmp.ll | 104 ++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/llvm/test/Transforms/InstCombine/select-cmp.ll b/llvm/test/Transforms/InstCombine/select-cmp.ll
index 7c1a32e7b5eb70..697010b90db584 100644
--- a/llvm/test/Transforms/InstCombine/select-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/select-cmp.ll
@@ -480,4 +480,108 @@ define i1 @test_select_inverse_nonconst4(i64 %x, i64 %y, i64 %z, i1 %cond) {
   ret i1 %sel
 }
 
+define i1 @sel_icmp_two_cmp(i1 %c, i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
+; CHECK-LABEL: @sel_icmp_two_cmp(
+; CHECK-NEXT:    [[V1:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A1:%.*]], i32 [[A2:%.*]])
+; CHECK-NEXT:    [[V2:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A3:%.*]], i32 [[A4:%.*]])
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i8 [[V1]], i8 [[V2]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i8 [[SEL]], 1
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v1 = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %v2 = call i8 @llvm.scmp(i32 %a3, i32 %a4)
+  %sel = select i1 %c, i8 %v1, i8 %v2
+  %cmp = icmp sle i8 %sel, 0
+  ret i1 %cmp
+}
+
+define i1 @sel_icmp_two_cmp_extra_use1(i1 %c, i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
+; CHECK-LABEL: @sel_icmp_two_cmp_extra_use1(
+; CHECK-NEXT:    [[V1:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A1:%.*]], i32 [[A2:%.*]])
+; CHECK-NEXT:    [[V2:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A3:%.*]], i32 [[A4:%.*]])
+; CHECK-NEXT:    call void @use.i8(i8 [[V1]])
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i8 [[V1]], i8 [[V2]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i8 [[SEL]], 1
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v1 = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %v2 = call i8 @llvm.scmp(i32 %a3, i32 %a4)
+  call void @use.i8(i8 %v1)
+  %sel = select i1 %c, i8 %v1, i8 %v2
+  %cmp = icmp sle i8 %sel, 0
+  ret i1 %cmp
+}
+
+define i1 @sel_icmp_two_cmp_extra_use2(i1 %c, i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
+; CHECK-LABEL: @sel_icmp_two_cmp_extra_use2(
+; CHECK-NEXT:    [[V1:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A1:%.*]], i32 [[A2:%.*]])
+; CHECK-NEXT:    [[V2:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A3:%.*]], i32 [[A4:%.*]])
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i8 [[V1]], i8 [[V2]]
+; CHECK-NEXT:    call void @use.i8(i8 [[SEL]])
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i8 [[SEL]], 1
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v1 = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %v2 = call i8 @llvm.scmp(i32 %a3, i32 %a4)
+  %sel = select i1 %c, i8 %v1, i8 %v2
+  call void @use.i8(i8 %sel)
+  %cmp = icmp sle i8 %sel, 0
+  ret i1 %cmp
+}
+
+define i1 @sel_icmp_two_cmp_not_const(i1 %c, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i8 %b) {
+; CHECK-LABEL: @sel_icmp_two_cmp_not_const(
+; CHECK-NEXT:    [[V1:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A1:%.*]], i32 [[A2:%.*]])
+; CHECK-NEXT:    [[V2:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A3:%.*]], i32 [[A4:%.*]])
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i8 [[V1]], i8 [[V2]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp sle i8 [[SEL]], [[B:%.*]]
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v1 = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %v2 = call i8 @llvm.scmp(i32 %a3, i32 %a4)
+  %sel = select i1 %c, i8 %v1, i8 %v2
+  %cmp = icmp sle i8 %sel, %b
+  ret i1 %cmp
+}
+
+define i1 @sel_icmp_cmp_and_simplify(i1 %c, i32 %a1, i32 %a2) {
+; CHECK-LABEL: @sel_icmp_cmp_and_simplify(
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ule i32 [[A1:%.*]], [[A2:%.*]]
+; CHECK-NEXT:    [[NOT_C:%.*]] = xor i1 [[C:%.*]], true
+; CHECK-NEXT:    [[CMP:%.*]] = select i1 [[NOT_C]], i1 true, i1 [[CMP1]]
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %sel = select i1 %c, i8 %v, i8 0
+  %cmp = icmp sle i8 %sel, 0
+  ret i1 %cmp
+}
+
+define i1 @sel_icmp_cmp_and_no_simplify(i1 %c, i32 %a1, i32 %a2, i8 %b) {
+; CHECK-LABEL: @sel_icmp_cmp_and_no_simplify(
+; CHECK-NEXT:    [[V:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A1:%.*]], i32 [[A2:%.*]])
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i8 [[V]], i8 [[B:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i8 [[SEL]], 1
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %sel = select i1 %c, i8 %v, i8 %b
+  %cmp = icmp sle i8 %sel, 0
+  ret i1 %cmp
+}
+
+define i1 @sel_icmp_cmp_and_no_simplify_comm(i1 %c, i32 %a1, i32 %a2, i8 %b) {
+; CHECK-LABEL: @sel_icmp_cmp_and_no_simplify_comm(
+; CHECK-NEXT:    [[V:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A1:%.*]], i32 [[A2:%.*]])
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[C:%.*]], i8 [[B:%.*]], i8 [[V]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i8 [[SEL]], 1
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %v = call i8 @llvm.ucmp(i32 %a1, i32 %a2)
+  %sel = select i1 %c, i8 %b, i8 %v
+  %cmp = icmp sle i8 %sel, 0
+  ret i1 %cmp
+}
+
 declare void @use(i1)
+declare void @use.i8(i8)

>From 68e21e16d21deee0f0226b4c771ff8b4731b7370 Mon Sep 17 00:00:00 2001
From: Tomas Matheson <Tomas.Matheson at arm.com>
Date: Wed, 21 Aug 2024 15:15:49 +0100
Subject: [PATCH 003/116] [AArch64] Add support for ACTLR_EL12 system register
 (#105497)

Documentation can be found here:

https://developer.arm.com/documentation/ddi0601/2024-06/AArch64-Registers/ACTLR-EL1--Auxiliary-Control-Register--EL1-
---
 llvm/lib/Target/AArch64/AArch64SystemOperands.td             | 1 +
 llvm/test/MC/AArch64/arm64-system-encoding.s                 | 4 ++++
 llvm/test/MC/Disassembler/AArch64/basic-a64-instructions.txt | 4 ++++
 3 files changed, 9 insertions(+)

diff --git a/llvm/lib/Target/AArch64/AArch64SystemOperands.td b/llvm/lib/Target/AArch64/AArch64SystemOperands.td
index 7476ab852a923b..dd0ce1cf47a792 100644
--- a/llvm/lib/Target/AArch64/AArch64SystemOperands.td
+++ b/llvm/lib/Target/AArch64/AArch64SystemOperands.td
@@ -939,6 +939,7 @@ def : RWSysReg<"SCTLR_EL1",          0b11, 0b000, 0b0001, 0b0000, 0b000>;
 def : RWSysReg<"SCTLR_EL2",          0b11, 0b100, 0b0001, 0b0000, 0b000>;
 def : RWSysReg<"SCTLR_EL3",          0b11, 0b110, 0b0001, 0b0000, 0b000>;
 def : RWSysReg<"ACTLR_EL1",          0b11, 0b000, 0b0001, 0b0000, 0b001>;
+def : RWSysReg<"ACTLR_EL12",         0b11, 0b101, 0b0001, 0b0000, 0b001>;
 def : RWSysReg<"ACTLR_EL2",          0b11, 0b100, 0b0001, 0b0000, 0b001>;
 def : RWSysReg<"ACTLR_EL3",          0b11, 0b110, 0b0001, 0b0000, 0b001>;
 def : RWSysReg<"HCR_EL2",            0b11, 0b100, 0b0001, 0b0001, 0b000>;
diff --git a/llvm/test/MC/AArch64/arm64-system-encoding.s b/llvm/test/MC/AArch64/arm64-system-encoding.s
index c58a8f0cb841cb..d38f3ac9871fe5 100644
--- a/llvm/test/MC/AArch64/arm64-system-encoding.s
+++ b/llvm/test/MC/AArch64/arm64-system-encoding.s
@@ -59,6 +59,7 @@ foo:
 ; MSR/MRS instructions
 ;-----------------------------------------------------------------------------
   msr ACTLR_EL1, x3
+  msr ACTLR_EL12, x3
   msr ACTLR_EL2, x3
   msr ACTLR_EL3, x3
   msr AFSR0_EL1, x3
@@ -167,6 +168,7 @@ foo:
   msr  S0_0_C0_C0_0, x0
   msr  S1_2_C3_C4_5, x2
 ; CHECK: msr ACTLR_EL1, x3              ; encoding: [0x23,0x10,0x18,0xd5]
+; CHECK: msr ACTLR_EL12, x3             ; encoding: [0x23,0x10,0x1d,0xd5]
 ; CHECK: msr ACTLR_EL2, x3              ; encoding: [0x23,0x10,0x1c,0xd5]
 ; CHECK: msr ACTLR_EL3, x3              ; encoding: [0x23,0x10,0x1e,0xd5]
 ; CHECK: msr AFSR0_EL1, x3              ; encoding: [0x03,0x51,0x18,0xd5]
@@ -280,6 +282,7 @@ foo:
 ; CHECK-ERRORS: :[[@LINE-1]]:7: error: expected writable system register or pstate
 
   mrs x3, ACTLR_EL1
+  mrs x3, ACTLR_EL12
   mrs x3, ACTLR_EL2
   mrs x3, ACTLR_EL3
   mrs x3, AFSR0_EL1
@@ -501,6 +504,7 @@ foo:
   mrs x3, S3_3_c11_c1_4
 
 ; CHECK: mrs x3, ACTLR_EL1              ; encoding: [0x23,0x10,0x38,0xd5]
+; CHECK: mrs x3, ACTLR_EL12             ; encoding: [0x23,0x10,0x3d,0xd5]
 ; CHECK: mrs x3, ACTLR_EL2              ; encoding: [0x23,0x10,0x3c,0xd5]
 ; CHECK: mrs x3, ACTLR_EL3              ; encoding: [0x23,0x10,0x3e,0xd5]
 ; CHECK: mrs x3, AFSR0_EL1              ; encoding: [0x03,0x51,0x38,0xd5]
diff --git a/llvm/test/MC/Disassembler/AArch64/basic-a64-instructions.txt b/llvm/test/MC/Disassembler/AArch64/basic-a64-instructions.txt
index f46301e8c1c15b..5ffabfc692ad10 100644
--- a/llvm/test/MC/Disassembler/AArch64/basic-a64-instructions.txt
+++ b/llvm/test/MC/Disassembler/AArch64/basic-a64-instructions.txt
@@ -3245,6 +3245,7 @@
 # CHECK: msr      {{sctlr_el2|SCTLR_EL2}}, x12
 # CHECK: msr      {{sctlr_el3|SCTLR_EL3}}, x12
 # CHECK: msr      {{actlr_el1|ACTLR_EL1}}, x12
+# CHECK: msr      {{actlr_el12|ACTLR_EL12}}, x12
 # CHECK: msr      {{actlr_el2|ACTLR_EL2}}, x12
 # CHECK: msr      {{actlr_el3|ACTLR_EL3}}, x12
 # CHECK: msr      {{cpacr_el1|CPACR_EL1}}, x12
@@ -3575,6 +3576,7 @@
 # CHECK: mrs      x9, {{sctlr_el2|SCTLR_EL2}}
 # CHECK: mrs      x9, {{sctlr_el3|SCTLR_EL3}}
 # CHECK: mrs      x9, {{actlr_el1|ACTLR_EL1}}
+# CHECK: mrs      x9, {{actlr_el12|ACTLR_EL12}}
 # CHECK: mrs      x9, {{actlr_el2|ACTLR_EL2}}
 # CHECK: mrs      x9, {{actlr_el3|ACTLR_EL3}}
 # CHECK: mrs      x9, {{cpacr_el1|CPACR_EL1}}
@@ -3867,6 +3869,7 @@
 0xc 0x10 0x1c 0xd5
 0xc 0x10 0x1e 0xd5
 0x2c 0x10 0x18 0xd5
+0x2c 0x10 0x1d 0xd5
 0x2c 0x10 0x1c 0xd5
 0x2c 0x10 0x1e 0xd5
 0x4c 0x10 0x18 0xd5
@@ -4199,6 +4202,7 @@
 0x9 0x10 0x3c 0xd5
 0x9 0x10 0x3e 0xd5
 0x29 0x10 0x38 0xd5
+0x29 0x10 0x3d 0xd5
 0x29 0x10 0x3c 0xd5
 0x29 0x10 0x3e 0xd5
 0x49 0x10 0x38 0xd5

>From bccb22709324ae329e3d80cf8af9dd225799bc17 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Wed, 21 Aug 2024 23:16:52 +0900
Subject: [PATCH 004/116] Revert "[flang][NFC] Move OpenMP related passes into
 a separate directory (#104732)"

This reverts commit 87eeed1f0ebe57abffde560c25dd9829dc6038f3.
---
 flang/docs/OpenMP-declare-target.md           |  4 +-
 flang/docs/OpenMP-descriptor-management.md    |  4 +-
 flang/include/flang/Optimizer/CMakeLists.txt  |  1 -
 .../flang/Optimizer/OpenMP/CMakeLists.txt     |  4 --
 flang/include/flang/Optimizer/OpenMP/Passes.h | 30 --------------
 .../include/flang/Optimizer/OpenMP/Passes.td  | 40 -------------------
 .../flang/Optimizer/Transforms/Passes.td      | 26 ++++++++++++
 flang/include/flang/Tools/CLOptions.inc       |  7 ++--
 flang/lib/Frontend/CMakeLists.txt             |  1 -
 flang/lib/Optimizer/CMakeLists.txt            |  1 -
 flang/lib/Optimizer/OpenMP/CMakeLists.txt     | 25 ------------
 flang/lib/Optimizer/Transforms/CMakeLists.txt |  3 ++
 .../OMPFunctionFiltering.cpp}                 | 18 ++++-----
 .../OMPMapInfoFinalization.cpp}               | 21 +++++-----
 .../OMPMarkDeclareTarget.cpp}                 | 26 ++++--------
 flang/tools/bbc/CMakeLists.txt                |  1 -
 flang/tools/fir-opt/CMakeLists.txt            |  1 -
 flang/tools/fir-opt/fir-opt.cpp               |  2 -
 flang/tools/tco/CMakeLists.txt                |  1 -
 19 files changed, 63 insertions(+), 153 deletions(-)
 delete mode 100644 flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
 delete mode 100644 flang/include/flang/Optimizer/OpenMP/Passes.h
 delete mode 100644 flang/include/flang/Optimizer/OpenMP/Passes.td
 delete mode 100644 flang/lib/Optimizer/OpenMP/CMakeLists.txt
 rename flang/lib/Optimizer/{OpenMP/FunctionFiltering.cpp => Transforms/OMPFunctionFiltering.cpp} (90%)
 rename flang/lib/Optimizer/{OpenMP/MapInfoFinalization.cpp => Transforms/OMPMapInfoFinalization.cpp} (96%)
 rename flang/lib/Optimizer/{OpenMP/MarkDeclareTarget.cpp => Transforms/OMPMarkDeclareTarget.cpp} (80%)

diff --git a/flang/docs/OpenMP-declare-target.md b/flang/docs/OpenMP-declare-target.md
index 45062469007b65..d29a46807e1eaf 100644
--- a/flang/docs/OpenMP-declare-target.md
+++ b/flang/docs/OpenMP-declare-target.md
@@ -149,7 +149,7 @@ flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`.
 
 There are currently two passes within Flang that are related to the processing 
 of `declare target`:
-* `MarkDeclareTarget` - This pass is in charge of marking functions captured
+* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured
 (called from) in `target` regions or other `declare target` marked functions as
 `declare target`. It does so recursively, i.e. nested calls will also be 
 implicitly marked. It currently will try to mark things as conservatively as 
@@ -157,7 +157,7 @@ possible, e.g. if captured in a `target` region it will apply `nohost`, unless
 it encounters a `host` `declare target` in which case it will apply the `any` 
 device type. Functions are handled similarly, except we utilise the parent's 
 device type where possible.
-* `FunctionFiltering` - This is executed after the `MarkDeclareTarget`
+* `OMPFunctionFiltering` - This is executed after the `OMPMarkDeclareTarget`
 pass, and its job is to conservatively remove host functions from
 the module where possible when compiling for the device. This helps make 
 sure that most incompatible code for the host is not lowered for the 
diff --git a/flang/docs/OpenMP-descriptor-management.md b/flang/docs/OpenMP-descriptor-management.md
index 66c153914f70da..d0eb01b00f9bb9 100644
--- a/flang/docs/OpenMP-descriptor-management.md
+++ b/flang/docs/OpenMP-descriptor-management.md
@@ -44,7 +44,7 @@ Currently, Flang will lower these descriptor types in the OpenMP lowering (lower
 to all other map types, generating an omp.MapInfoOp containing relevant information required for lowering
 the OpenMP dialect to LLVM-IR during the final stages of the MLIR lowering. However, after 
 the lowering to FIR/HLFIR has been performed an OpenMP dialect specific pass for Fortran, 
-`MapInfoFinalizationPass` (Optimizer/OpenMP/MapInfoFinalization.cpp) will expand the 
+`OMPMapInfoFinalizationPass` (Optimizer/OMPMapInfoFinalization.cpp) will expand the 
 `omp.MapInfoOp`'s containing descriptors (which currently will be a `BoxType` or `BoxAddrOp`) into multiple 
 mappings, with one extra per pointer member in the descriptor that is supported on top of the original
 descriptor map operation. These pointers members are linked to the parent descriptor by adding them to 
@@ -53,7 +53,7 @@ owning operation's (`omp.TargetOp`, `omp.TargetDataOp` etc.) map operand list an
 operation is `IsolatedFromAbove`, it also inserts them as `BlockArgs` to canonicalize the mappings and
 simplify lowering.
 
-An example transformation by the `MapInfoFinalizationPass`:
+An example transformation by the `OMPMapInfoFinalizationPass`:
 
 ```
 
diff --git a/flang/include/flang/Optimizer/CMakeLists.txt b/flang/include/flang/Optimizer/CMakeLists.txt
index 3336ac935e1012..89e43a9ee8d621 100644
--- a/flang/include/flang/Optimizer/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/CMakeLists.txt
@@ -2,4 +2,3 @@ add_subdirectory(CodeGen)
 add_subdirectory(Dialect)
 add_subdirectory(HLFIR)
 add_subdirectory(Transforms)
-add_subdirectory(OpenMP)
diff --git a/flang/include/flang/Optimizer/OpenMP/CMakeLists.txt b/flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
deleted file mode 100644
index d59573f0f7fd91..00000000000000
--- a/flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-set(LLVM_TARGET_DEFINITIONS Passes.td)
-mlir_tablegen(Passes.h.inc -gen-pass-decls -name FlangOpenMP)
-
-add_public_tablegen_target(FlangOpenMPPassesIncGen)
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h b/flang/include/flang/Optimizer/OpenMP/Passes.h
deleted file mode 100644
index 403d79667bf448..00000000000000
--- a/flang/include/flang/Optimizer/OpenMP/Passes.h
+++ /dev/null
@@ -1,30 +0,0 @@
-//===- Passes.h - OpenMP pass entry points ----------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This header declares the flang OpenMP passes.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef FORTRAN_OPTIMIZER_OPENMP_PASSES_H
-#define FORTRAN_OPTIMIZER_OPENMP_PASSES_H
-
-#include "mlir/Dialect/Func/IR/FuncOps.h"
-#include "mlir/IR/BuiltinOps.h"
-#include "mlir/Pass/Pass.h"
-#include "mlir/Pass/PassRegistry.h"
-
-#include <memory>
-
-namespace flangomp {
-#define GEN_PASS_DECL
-#define GEN_PASS_REGISTRATION
-#include "flang/Optimizer/OpenMP/Passes.h.inc"
-
-} // namespace flangomp
-
-#endif // FORTRAN_OPTIMIZER_OPENMP_PASSES_H
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td
deleted file mode 100644
index 395178e26a5762..00000000000000
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ /dev/null
@@ -1,40 +0,0 @@
-//===-- Passes.td - flang OpenMP pass definition -----------*- tablegen -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef FORTRAN_OPTIMIZER_OPENMP_PASSES
-#define FORTRAN_OPTIMIZER_OPENMP_PASSES
-
-include "mlir/Pass/PassBase.td"
-
-def MapInfoFinalizationPass
-    : Pass<"omp-map-info-finalization"> {
-  let summary = "expands OpenMP MapInfo operations containing descriptors";
-  let description = [{
-    Expands MapInfo operations containing descriptor types into multiple
-    MapInfo's for each pointer element in the descriptor that requires
-    explicit individual mapping by the OpenMP runtime.
-  }];
-  let dependentDialects = ["mlir::omp::OpenMPDialect"];
-}
-
-def MarkDeclareTargetPass
-    : Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {
-  let summary = "Marks all functions called by an OpenMP declare target function as declare target";
-  let dependentDialects = ["mlir::omp::OpenMPDialect"];
-}
-
-def FunctionFiltering : Pass<"omp-function-filtering"> {
-  let summary = "Filters out functions intended for the host when compiling "
-                "for the target device.";
-  let dependentDialects = [
-    "mlir::func::FuncDialect",
-    "fir::FIROpsDialect"
-  ];
-}
-
-#endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index 53a1b55450972e..c703a62c03b7d9 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -340,6 +340,32 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
   let dependentDialects = [ "fir::FIROpsDialect" ];
 }
 
+def OMPMapInfoFinalizationPass
+    : Pass<"omp-map-info-finalization"> {
+  let summary = "expands OpenMP MapInfo operations containing descriptors";
+  let description = [{
+    Expands MapInfo operations containing descriptor types into multiple 
+    MapInfo's for each pointer element in the descriptor that requires 
+    explicit individual mapping by the OpenMP runtime.
+  }];
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
+def OMPMarkDeclareTargetPass
+    : Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {
+  let summary = "Marks all functions called by an OpenMP declare target function as declare target";
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
+def OMPFunctionFiltering : Pass<"omp-function-filtering"> {
+  let summary = "Filters out functions intended for the host when compiling "
+                "for the target device.";
+  let dependentDialects = [
+    "mlir::func::FuncDialect",
+    "fir::FIROpsDialect"
+  ];
+}
+
 def VScaleAttr : Pass<"vscale-attr", "mlir::func::FuncOp"> {
   let summary = "Add vscale_range attribute to functions";
   let description = [{
diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index 05b2f31711add2..7df50449494631 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -17,7 +17,6 @@
 #include "mlir/Transforms/Passes.h"
 #include "flang/Optimizer/CodeGen/CodeGen.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
-#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Transforms/Passes.h"
 #include "llvm/Passes/OptimizationLevel.h"
 #include "llvm/Support/CommandLine.h"
@@ -359,10 +358,10 @@ inline void createHLFIRToFIRPassPipeline(
 inline void createOpenMPFIRPassPipeline(
     mlir::PassManager &pm, bool isTargetDevice) {
   addNestedPassToAllTopLevelOperations(
-      pm, flangomp::createMapInfoFinalizationPass);
-  pm.addPass(flangomp::createMarkDeclareTargetPass());
+      pm, fir::createOMPMapInfoFinalizationPass);
+  pm.addPass(fir::createOMPMarkDeclareTargetPass());
   if (isTargetDevice)
-    pm.addPass(flangomp::createFunctionFiltering());
+    pm.addPass(fir::createOMPFunctionFiltering());
 }
 
 #if !defined(FLANG_EXCLUDE_CODEGEN)
diff --git a/flang/lib/Frontend/CMakeLists.txt b/flang/lib/Frontend/CMakeLists.txt
index ecdcc73d61ec1f..c20b9096aff496 100644
--- a/flang/lib/Frontend/CMakeLists.txt
+++ b/flang/lib/Frontend/CMakeLists.txt
@@ -38,7 +38,6 @@ add_flang_library(flangFrontend
   FIRTransforms
   HLFIRDialect
   HLFIRTransforms
-  FlangOpenMPTransforms
   MLIRTransforms
   MLIRBuiltinToLLVMIRTranslation
   MLIRLLVMToLLVMIRTranslation
diff --git a/flang/lib/Optimizer/CMakeLists.txt b/flang/lib/Optimizer/CMakeLists.txt
index dd153ac33c0fbb..4a602162ed2b77 100644
--- a/flang/lib/Optimizer/CMakeLists.txt
+++ b/flang/lib/Optimizer/CMakeLists.txt
@@ -5,4 +5,3 @@ add_subdirectory(HLFIR)
 add_subdirectory(Support)
 add_subdirectory(Transforms)
 add_subdirectory(Analysis)
-add_subdirectory(OpenMP)
diff --git a/flang/lib/Optimizer/OpenMP/CMakeLists.txt b/flang/lib/Optimizer/OpenMP/CMakeLists.txt
deleted file mode 100644
index a8984d256b8f6a..00000000000000
--- a/flang/lib/Optimizer/OpenMP/CMakeLists.txt
+++ /dev/null
@@ -1,25 +0,0 @@
-get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
-
-add_flang_library(FlangOpenMPTransforms
-  FunctionFiltering.cpp
-  MapInfoFinalization.cpp
-  MarkDeclareTarget.cpp
-
-  DEPENDS
-  FIRDialect
-  HLFIROpsIncGen
-  FlangOpenMPPassesIncGen
-
-  LINK_LIBS
-  FIRAnalysis
-  FIRBuilder
-  FIRCodeGen
-  FIRDialect
-  FIRDialectSupport
-  FIRSupport
-  FortranCommon
-  MLIRFuncDialect
-  MLIROpenMPDialect
-  HLFIRDialect
-  MLIRIR
-)
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index a6fc8e999d44da..3869633bd98e02 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -21,6 +21,9 @@ add_flang_library(FIRTransforms
   AddDebugInfo.cpp
   PolymorphicOpConversion.cpp
   LoopVersioning.cpp
+  OMPFunctionFiltering.cpp
+  OMPMapInfoFinalization.cpp
+  OMPMarkDeclareTarget.cpp
   StackReclaim.cpp
   VScaleAttr.cpp
   FunctionAttr.cpp
diff --git a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp b/flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
similarity index 90%
rename from flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
rename to flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
index bd9005d3e2df6f..0c472246c2a44c 100644
--- a/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
+++ b/flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
@@ -1,4 +1,4 @@
-//===- FunctionFiltering.cpp -------------------------------------------===//
+//===- OMPFunctionFiltering.cpp -------------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -13,7 +13,7 @@
 
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
-#include "flang/Optimizer/OpenMP/Passes.h"
+#include "flang/Optimizer/Transforms/Passes.h"
 
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
@@ -21,18 +21,18 @@
 #include "mlir/IR/BuiltinOps.h"
 #include "llvm/ADT/SmallVector.h"
 
-namespace flangomp {
-#define GEN_PASS_DEF_FUNCTIONFILTERING
-#include "flang/Optimizer/OpenMP/Passes.h.inc"
-} // namespace flangomp
+namespace fir {
+#define GEN_PASS_DEF_OMPFUNCTIONFILTERING
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
 
 using namespace mlir;
 
 namespace {
-class FunctionFilteringPass
-    : public flangomp::impl::FunctionFilteringBase<FunctionFilteringPass> {
+class OMPFunctionFilteringPass
+    : public fir::impl::OMPFunctionFilteringBase<OMPFunctionFilteringPass> {
 public:
-  FunctionFilteringPass() = default;
+  OMPFunctionFilteringPass() = default;
 
   void runOnOperation() override {
     MLIRContext *context = &getContext();
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
similarity index 96%
rename from flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
rename to flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
index 6e9cd03dca8f3f..ddaa3c5f404f0b 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
@@ -1,4 +1,5 @@
-//===- MapInfoFinalization.cpp -----------------------------------------===//
+//===- OMPMapInfoFinalization.cpp
+//---------------------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -27,7 +28,7 @@
 #include "flang/Optimizer/Builder/FIRBuilder.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Dialect/Support/KindMapping.h"
-#include "flang/Optimizer/OpenMP/Passes.h"
+#include "flang/Optimizer/Transforms/Passes.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/BuiltinDialect.h"
@@ -40,15 +41,15 @@
 #include "llvm/Frontend/OpenMP/OMPConstants.h"
 #include <iterator>
 
-namespace flangomp {
-#define GEN_PASS_DEF_MAPINFOFINALIZATIONPASS
-#include "flang/Optimizer/OpenMP/Passes.h.inc"
-} // namespace flangomp
+namespace fir {
+#define GEN_PASS_DEF_OMPMAPINFOFINALIZATIONPASS
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
 
 namespace {
-class MapInfoFinalizationPass
-    : public flangomp::impl::MapInfoFinalizationPassBase<
-          MapInfoFinalizationPass> {
+class OMPMapInfoFinalizationPass
+    : public fir::impl::OMPMapInfoFinalizationPassBase<
+          OMPMapInfoFinalizationPass> {
 
   void genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
                                fir::FirOpBuilder &builder,
@@ -244,7 +245,7 @@ class MapInfoFinalizationPass
       // all users appropriately, making sure to only add a single member link
       // per new generation for the original originating descriptor MapInfoOp.
       assert(llvm::hasSingleElement(op->getUsers()) &&
-             "MapInfoFinalization currently only supports single users "
+             "OMPMapInfoFinalization currently only supports single users "
              "of a MapInfoOp");
 
       if (!op.getMembers().empty()) {
diff --git a/flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp b/flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
similarity index 80%
rename from flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp
rename to flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
index a7ffd5fda82b7f..4946e13b22865d 100644
--- a/flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp
+++ b/flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
@@ -1,16 +1,4 @@
-//===- MarkDeclareTarget.cpp -------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// Mark functions called from explicit target code as implicitly declare target.
-//
-//===----------------------------------------------------------------------===//
-
-#include "flang/Optimizer/OpenMP/Passes.h"
+#include "flang/Optimizer/Transforms/Passes.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
@@ -22,14 +10,14 @@
 #include "mlir/Support/LLVM.h"
 #include "llvm/ADT/SmallPtrSet.h"
 
-namespace flangomp {
-#define GEN_PASS_DEF_MARKDECLARETARGETPASS
-#include "flang/Optimizer/OpenMP/Passes.h.inc"
-} // namespace flangomp
+namespace fir {
+#define GEN_PASS_DEF_OMPMARKDECLARETARGETPASS
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
 
 namespace {
-class MarkDeclareTargetPass
-    : public flangomp::impl::MarkDeclareTargetPassBase<MarkDeclareTargetPass> {
+class OMPMarkDeclareTargetPass
+    : public fir::impl::OMPMarkDeclareTargetPassBase<OMPMarkDeclareTargetPass> {
 
   void markNestedFuncs(mlir::omp::DeclareTargetDeviceType parentDevTy,
                        mlir::omp::DeclareTargetCaptureClause parentCapClause,
diff --git a/flang/tools/bbc/CMakeLists.txt b/flang/tools/bbc/CMakeLists.txt
index 69316d4dc61de3..9410fd00566006 100644
--- a/flang/tools/bbc/CMakeLists.txt
+++ b/flang/tools/bbc/CMakeLists.txt
@@ -25,7 +25,6 @@ FIRTransforms
 FIRBuilder
 HLFIRDialect
 HLFIRTransforms
-FlangOpenMPTransforms
 ${dialect_libs}
 ${extension_libs}
 MLIRAffineToStandard
diff --git a/flang/tools/fir-opt/CMakeLists.txt b/flang/tools/fir-opt/CMakeLists.txt
index 4c6dbf7d9c8c37..43679a9d535782 100644
--- a/flang/tools/fir-opt/CMakeLists.txt
+++ b/flang/tools/fir-opt/CMakeLists.txt
@@ -19,7 +19,6 @@ target_link_libraries(fir-opt PRIVATE
   FIRCodeGen
   HLFIRDialect
   HLFIRTransforms
-  FlangOpenMPTransforms
   FIRAnalysis
   ${test_libs}
   ${dialect_libs}
diff --git a/flang/tools/fir-opt/fir-opt.cpp b/flang/tools/fir-opt/fir-opt.cpp
index f75fba27c68f08..1846c1b317848f 100644
--- a/flang/tools/fir-opt/fir-opt.cpp
+++ b/flang/tools/fir-opt/fir-opt.cpp
@@ -14,7 +14,6 @@
 #include "mlir/Tools/mlir-opt/MlirOptMain.h"
 #include "flang/Optimizer/CodeGen/CodeGen.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
-#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Support/InitFIR.h"
 #include "flang/Optimizer/Transforms/Passes.h"
 
@@ -35,7 +34,6 @@ int main(int argc, char **argv) {
   fir::registerOptCodeGenPasses();
   fir::registerOptTransformPasses();
   hlfir::registerHLFIRPasses();
-  flangomp::registerFlangOpenMPPasses();
 #ifdef FLANG_INCLUDE_TESTS
   fir::test::registerTestFIRAliasAnalysisPass();
   mlir::registerSideEffectTestPasses();
diff --git a/flang/tools/tco/CMakeLists.txt b/flang/tools/tco/CMakeLists.txt
index 698a398547c773..808219ac361f2a 100644
--- a/flang/tools/tco/CMakeLists.txt
+++ b/flang/tools/tco/CMakeLists.txt
@@ -17,7 +17,6 @@ target_link_libraries(tco PRIVATE
   FIRBuilder
   HLFIRDialect
   HLFIRTransforms
-  FlangOpenMPTransforms
   ${dialect_libs}
   ${extension_libs}
   MLIRIR

>From d6d8243dcd4ea768549904036ed31b8e59e14c73 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Wed, 21 Aug 2024 07:20:23 -0700
Subject: [PATCH 005/116] [LTO] Use DenseSet in computeLTOCacheKey (NFC)
 (#105466)

The two instances of std::set are used only for membership checking
purposes in computeLTOCacheKey.  We do not need std::set's strengths
like iterators staying valid or the ability to traverse in a sorted
order.  This patch changes them to DenseSet.

While I am at it, this patch replaces count with contains for slightly
increased readability.
---
 llvm/include/llvm/LTO/LTO.h |  4 ++--
 llvm/lib/LTO/LTO.cpp        | 12 ++++++------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/llvm/include/llvm/LTO/LTO.h b/llvm/include/llvm/LTO/LTO.h
index 0781d57feb5a64..949e80a43f0e88 100644
--- a/llvm/include/llvm/LTO/LTO.h
+++ b/llvm/include/llvm/LTO/LTO.h
@@ -68,8 +68,8 @@ std::string computeLTOCacheKey(
     const FunctionImporter::ExportSetTy &ExportList,
     const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
     const GVSummaryMapTy &DefinedGlobals,
-    const std::set<GlobalValue::GUID> &CfiFunctionDefs = {},
-    const std::set<GlobalValue::GUID> &CfiFunctionDecls = {});
+    const DenseSet<GlobalValue::GUID> &CfiFunctionDefs = {},
+    const DenseSet<GlobalValue::GUID> &CfiFunctionDecls = {});
 
 namespace lto {
 
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index f69e089edf42e7..cb3369d93754d5 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -95,8 +95,8 @@ std::string llvm::computeLTOCacheKey(
     const FunctionImporter::ExportSetTy &ExportList,
     const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
     const GVSummaryMapTy &DefinedGlobals,
-    const std::set<GlobalValue::GUID> &CfiFunctionDefs,
-    const std::set<GlobalValue::GUID> &CfiFunctionDecls) {
+    const DenseSet<GlobalValue::GUID> &CfiFunctionDefs,
+    const DenseSet<GlobalValue::GUID> &CfiFunctionDecls) {
   // Compute the unique hash for this entry.
   // This is based on the current compiler version, the module itself, the
   // export list, the hash for every single module in the import list, the
@@ -237,9 +237,9 @@ std::string llvm::computeLTOCacheKey(
   std::set<GlobalValue::GUID> UsedTypeIds;
 
   auto AddUsedCfiGlobal = [&](GlobalValue::GUID ValueGUID) {
-    if (CfiFunctionDefs.count(ValueGUID))
+    if (CfiFunctionDefs.contains(ValueGUID))
       UsedCfiDefs.insert(ValueGUID);
-    if (CfiFunctionDecls.count(ValueGUID))
+    if (CfiFunctionDecls.contains(ValueGUID))
       UsedCfiDecls.insert(ValueGUID);
   };
 
@@ -1429,8 +1429,8 @@ class InProcessThinBackend : public ThinBackendProc {
   DefaultThreadPool BackendThreadPool;
   AddStreamFn AddStream;
   FileCache Cache;
-  std::set<GlobalValue::GUID> CfiFunctionDefs;
-  std::set<GlobalValue::GUID> CfiFunctionDecls;
+  DenseSet<GlobalValue::GUID> CfiFunctionDefs;
+  DenseSet<GlobalValue::GUID> CfiFunctionDecls;
 
   std::optional<Error> Err;
   std::mutex ErrMu;

>From 5ddc79b093f2afaaf2c69d20d7d44448da04458a Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Wed, 21 Aug 2024 07:23:30 -0700
Subject: [PATCH 006/116] [LTO] Use a range-based for loop (NFC) (#105467)

---
 llvm/lib/LTO/LTO.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index cb3369d93754d5..e5545860c329d4 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -330,8 +330,8 @@ std::string llvm::computeLTOCacheKey(
   // Include the hash for all type identifiers used by this module.
   for (GlobalValue::GUID TId : UsedTypeIds) {
     auto TidIter = Index.typeIds().equal_range(TId);
-    for (auto It = TidIter.first; It != TidIter.second; ++It)
-      AddTypeIdSummary(It->second.first, It->second.second);
+    for (const auto &I : make_range(TidIter))
+      AddTypeIdSummary(I.second.first, I.second.second);
   }
 
   AddUnsigned(UsedCfiDefs.size());

>From 70e8c982d0589b1a56faf0768b45596c2da3a510 Mon Sep 17 00:00:00 2001
From: Sjoerd Meijer <smeijer at nvidia.com>
Date: Wed, 21 Aug 2024 15:27:09 +0100
Subject: [PATCH 007/116] [AArch64] Bail out for scalable vecs in
 areExtractShuffleVectors (#105484)

The added test triggers the following assert in `areExtractShuffleVectors`
that is called from `shouldSinkOperands`:

Assertion `(!isScalable() || isZero()) && "Request for a fixed element count on a scalable object"' failed.

I don't think scalable types can be extract shuffles, so bail early if
this is the case.
---
 .../Target/AArch64/AArch64ISelLowering.cpp    |  4 ++++
 .../AArch64/sink-free-instructions.ll         | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e1d265fdf0d1a8..dbe9413f05d013 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -16149,6 +16149,10 @@ static bool isSplatShuffle(Value *V) {
 /// or upper half of the vector elements.
 static bool areExtractShuffleVectors(Value *Op1, Value *Op2,
                                      bool AllowSplat = false) {
+  // Scalable types can't be extract shuffle vectors.
+  if (Op1->getType()->isScalableTy() || Op2->getType()->isScalableTy())
+    return false;
+
   auto areTypesHalfed = [](Value *FullV, Value *HalfV) {
     auto *FullTy = FullV->getType();
     auto *HalfTy = HalfV->getType();
diff --git a/llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll b/llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll
index d6629bf4b1849b..0ccfd9c20c12ef 100644
--- a/llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll
+++ b/llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll
@@ -984,3 +984,22 @@ if.else:
   ret <5 x float> %r.4
 }
 
+; This ran in an assert in `areExtractShuffleVectors`.
+define <vscale x 8 x i16> @scalable_types_cannot_be_extract_shuffle() {
+; CHECK-LABEL: @scalable_types_cannot_be_extract_shuffle(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[BROADCAST_SPLAT68:%.*]] = shufflevector <vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP0:%.*]] = zext <vscale x 8 x i8> [[BROADCAST_SPLAT68]] to <vscale x 8 x i16>
+; CHECK-NEXT:    [[BROADCAST_SPLAT70:%.*]] = shufflevector <vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP1:%.*]] = zext <vscale x 8 x i8> [[BROADCAST_SPLAT70]] to <vscale x 8 x i16>
+; CHECK-NEXT:    [[TMP2:%.*]] = sub <vscale x 8 x i16> [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+;
+entry:
+  %broadcast.splat68 = shufflevector <vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
+  %0 = zext <vscale x 8 x i8> %broadcast.splat68 to <vscale x 8 x i16>
+  %broadcast.splat70 = shufflevector <vscale x 8 x i8> zeroinitializer, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
+  %1 = zext <vscale x 8 x i8> %broadcast.splat70 to <vscale x 8 x i16>
+  %2 = sub <vscale x 8 x i16> %0, %1
+  ret <vscale x 8 x i16> %2
+}

>From 32c38dd85ee27fc7c2dd6a749fc1f7af4abdbea1 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 21 Aug 2024 10:29:10 -0400
Subject: [PATCH 008/116] [libc++] Mark C++14 as complete and remove the status
 pages (#105514)

We already documented that libc++ was C++14 complete, but we still
documented the status of C++14. Since that is redundant (and I suspect
the C++14 status page was missing some stuff), simply remove them.
---
 libcxx/docs/Status/Cxx14.rst                 |  50 ------
 libcxx/docs/Status/Cxx14Issues.csv           | 157 -------------------
 libcxx/docs/Status/Cxx14Papers.csv           |  32 ----
 libcxx/docs/index.rst                        |   3 +-
 libcxx/utils/synchronize_csv_status_files.py |   2 -
 5 files changed, 1 insertion(+), 243 deletions(-)
 delete mode 100644 libcxx/docs/Status/Cxx14.rst
 delete mode 100644 libcxx/docs/Status/Cxx14Issues.csv
 delete mode 100644 libcxx/docs/Status/Cxx14Papers.csv

diff --git a/libcxx/docs/Status/Cxx14.rst b/libcxx/docs/Status/Cxx14.rst
deleted file mode 100644
index 0557bdc285d707..00000000000000
--- a/libcxx/docs/Status/Cxx14.rst
+++ /dev/null
@@ -1,50 +0,0 @@
-.. _cxx14-status:
-
-================================
-libc++ C++14 Status
-================================
-
-.. include:: ../Helpers/Styles.rst
-
-.. contents::
-   :local:
-
-
-Overview
-================================
-
-In April 2013, the C++ standard committee approved the draft for the next version of the C++ standard, initially known as "C++1y".
-
-The draft standard includes papers and issues that were voted on at the previous three meetings (Kona, Portland, and Bristol).
-
-In August 2014, this draft was approved by ISO as C++14.
-
-This page shows the status of libc++; the status of clang's support of the language features is `here <https://clang.llvm.org/cxx_status.html#cxx14>`__.
-
-The groups that have contributed papers:
-
--  CWG - Core Language Working group
--  LWG - Library working group
--  SG1 - Study group #1 (Concurrency working group)
-
-
-.. _paper-status-cxx14:
-
-Paper Status
-====================================
-
-.. csv-table::
-   :file: Cxx14Papers.csv
-   :header-rows: 1
-   :widths: auto
-
-
-.. _issues-status-cxx14:
-
-Library Working Group Issues Status
-====================================
-
-.. csv-table::
-   :file: Cxx14Issues.csv
-   :header-rows: 1
-   :widths: auto
diff --git a/libcxx/docs/Status/Cxx14Issues.csv b/libcxx/docs/Status/Cxx14Issues.csv
deleted file mode 100644
index aff88b89774e48..00000000000000
--- a/libcxx/docs/Status/Cxx14Issues.csv
+++ /dev/null
@@ -1,157 +0,0 @@
-"Issue #","Issue Name","Meeting","Status","First released version","Labels"
-"`LWG1214 <https://wg21.link/LWG1214>`__","Insufficient/inconsistent key immutability requirements for associative containers","2012-02 (Kona)","|Complete|","",""
-"`LWG2009 <https://wg21.link/LWG2009>`__","Reporting out-of-bound values on numeric string conversions","2012-02 (Kona)","|Complete|","",""
-"`LWG2010 <https://wg21.link/LWG2010>`__","``is_*``\  traits for binding operations can't be meaningfully specialized","2012-02 (Kona)","|Complete|","",""
-"`LWG2015 <https://wg21.link/LWG2015>`__","Incorrect pre-conditions for some type traits","2012-02 (Kona)","|Complete|","",""
-"`LWG2021 <https://wg21.link/LWG2021>`__","Further incorrect usages of result_of","2012-02 (Kona)","|Complete|","",""
-"`LWG2028 <https://wg21.link/LWG2028>`__","messages_base::catalog overspecified","2012-02 (Kona)","|Complete|","",""
-"`LWG2033 <https://wg21.link/LWG2033>`__","Preconditions of reserve, shrink_to_fit, and resize functions","2012-02 (Kona)","|Complete|","",""
-"`LWG2039 <https://wg21.link/LWG2039>`__","Issues with std::reverse and std::copy_if","2012-02 (Kona)","|Complete|","",""
-"`LWG2044 <https://wg21.link/LWG2044>`__","No definition of ""Stable"" for copy algorithms","2012-02 (Kona)","|Complete|","",""
-"`LWG2045 <https://wg21.link/LWG2045>`__","forward_list::merge and forward_list::splice_after with unequal allocators","2012-02 (Kona)","|Complete|","",""
-"`LWG2047 <https://wg21.link/LWG2047>`__","Incorrect ""mixed"" move-assignment semantics of unique_ptr","2012-02 (Kona)","|Complete|","",""
-"`LWG2050 <https://wg21.link/LWG2050>`__","Unordered associative containers do not use allocator_traits to define member types","2012-02 (Kona)","|Complete|","",""
-"`LWG2053 <https://wg21.link/LWG2053>`__","Errors in regex bitmask types","2012-02 (Kona)","|Complete|","",""
-"`LWG2061 <https://wg21.link/LWG2061>`__","make_move_iterator and arrays","2012-02 (Kona)","|Complete|","",""
-"`LWG2064 <https://wg21.link/LWG2064>`__","More noexcept issues in basic_string","2012-02 (Kona)","|Complete|","",""
-"`LWG2065 <https://wg21.link/LWG2065>`__","Minimal allocator interface","2012-02 (Kona)","|Complete|","",""
-"`LWG2067 <https://wg21.link/LWG2067>`__","packaged_task should have deleted copy c'tor with const parameter","2012-02 (Kona)","|Complete|","",""
-"`LWG2069 <https://wg21.link/LWG2069>`__","Inconsistent exception spec for basic_string move constructor","2012-02 (Kona)","|Complete|","",""
-"`LWG2096 <https://wg21.link/LWG2096>`__","Incorrect constraints of future::get in regard to MoveAssignable","2012-02 (Kona)","|Complete|","",""
-"`LWG2102 <https://wg21.link/LWG2102>`__","Why is std::launch an implementation-defined type?","2012-02 (Kona)","|Complete|","",""
-"","","","","",""
-"`LWG2071 <https://wg21.link/LWG2071>`__","std::valarray move-assignment","2012-10 (Portland)","|Complete|","",""
-"`LWG2074 <https://wg21.link/LWG2074>`__","Off by one error in std::reverse_copy","2012-10 (Portland)","|Complete|","",""
-"`LWG2081 <https://wg21.link/LWG2081>`__","Allocator requirements should include CopyConstructible","2012-10 (Portland)","|Complete|","",""
-"`LWG2083 <https://wg21.link/LWG2083>`__","const-qualification on weak_ptr::owner_before","2012-10 (Portland)","|Complete|","",""
-"`LWG2086 <https://wg21.link/LWG2086>`__","Overly generic type support for math functions","2012-10 (Portland)","|Complete|","",""
-"`LWG2099 <https://wg21.link/LWG2099>`__","Unnecessary constraints of va_start() usage","2012-10 (Portland)","|Complete|","",""
-"`LWG2103 <https://wg21.link/LWG2103>`__","std::allocator_traits<std::allocator<T>>::propagate_on_container_move_assignment","2012-10 (Portland)","|Complete|","",""
-"`LWG2105 <https://wg21.link/LWG2105>`__","Inconsistent requirements on ``const_iterator``'s value_type","2012-10 (Portland)","|Complete|","",""
-"`LWG2110 <https://wg21.link/LWG2110>`__","remove can't swap but note says it might","2012-10 (Portland)","|Complete|","",""
-"`LWG2123 <https://wg21.link/LWG2123>`__","merge() allocator requirements for lists versus forward lists","2012-10 (Portland)","|Complete|","",""
-"`LWG2005 <https://wg21.link/LWG2005>`__","unordered_map::insert(T&&) protection should apply to map too","2012-10 (Portland)","|Complete|","",""
-"`LWG2011 <https://wg21.link/LWG2011>`__","Unexpected output required of strings","2012-10 (Portland)","|Complete|","",""
-"`LWG2048 <https://wg21.link/LWG2048>`__","Unnecessary mem_fn overloads","2012-10 (Portland)","|Complete|","",""
-"`LWG2049 <https://wg21.link/LWG2049>`__","``is_destructible``\  is underspecified","2012-10 (Portland)","|Complete|","",""
-"`LWG2056 <https://wg21.link/LWG2056>`__","future_errc enums start with value 0 (invalid value for broken_promise)","2012-10 (Portland)","|Complete|","",""
-"`LWG2058 <https://wg21.link/LWG2058>`__","valarray and begin/end","2012-10 (Portland)","|Complete|","",""
-"","","","","",""
-"`LWG2091 <https://wg21.link/LWG2091>`__","Misplaced effect in m.try_lock_for()","2013-04 (Bristol)","|Complete|","",""
-"`LWG2092 <https://wg21.link/LWG2092>`__","Vague Wording for condition_variable_any","2013-04 (Bristol)","|Complete|","",""
-"`LWG2093 <https://wg21.link/LWG2093>`__","Throws clause of condition_variable::wait with predicate","2013-04 (Bristol)","|Complete|","",""
-"`LWG2094 <https://wg21.link/LWG2094>`__","duration conversion overflow shouldn't participate in overload resolution","2013-04 (Bristol)","|Complete|","",""
-"`LWG2122 <https://wg21.link/LWG2122>`__","merge() stability for lists versus forward lists","2013-04 (Bristol)","|Complete|","",""
-"`LWG2128 <https://wg21.link/LWG2128>`__","Absence of global functions cbegin/cend","2013-04 (Bristol)","|Complete|","",""
-"`LWG2145 <https://wg21.link/LWG2145>`__","error_category default constructor","2013-04 (Bristol)","|Complete|","",""
-"`LWG2147 <https://wg21.link/LWG2147>`__","Unclear hint type in Allocator's allocate function","2013-04 (Bristol)","|Complete|","",""
-"`LWG2148 <https://wg21.link/LWG2148>`__","Hashing enums should be supported directly by std::hash","2013-04 (Bristol)","|Complete|","",""
-"`LWG2149 <https://wg21.link/LWG2149>`__","Concerns about 20.8/5","2013-04 (Bristol)","|Complete|","",""
-"`LWG2162 <https://wg21.link/LWG2162>`__","allocator_traits::max_size missing noexcept","2013-04 (Bristol)","|Complete|","",""
-"`LWG2163 <https://wg21.link/LWG2163>`__","nth_element requires inconsistent post-conditions","2013-04 (Bristol)","|Complete|","",""
-"`LWG2169 <https://wg21.link/LWG2169>`__","Missing reset() requirements in unique_ptr specialization","2013-04 (Bristol)","|Complete|","",""
-"`LWG2172 <https://wg21.link/LWG2172>`__","Does ``atomic_compare_exchange_*``\  accept v == nullptr arguments?","2013-04 (Bristol)","|Complete|","",""
-"`LWG2080 <https://wg21.link/LWG2080>`__","Specify when once_flag becomes invalid","2013-04 (Bristol)","|Complete|","",""
-"`LWG2098 <https://wg21.link/LWG2098>`__","promise throws clauses","2013-04 (Bristol)","|Complete|","",""
-"`LWG2109 <https://wg21.link/LWG2109>`__","Incorrect requirements for hash specializations","2013-04 (Bristol)","|Complete|","",""
-"`LWG2130 <https://wg21.link/LWG2130>`__","missing ordering constraints for fences","2013-04 (Bristol)","|Complete|","",""
-"`LWG2138 <https://wg21.link/LWG2138>`__","atomic_flag::clear ordering constraints","2013-04 (Bristol)","|Complete|","",""
-"`LWG2140 <https://wg21.link/LWG2140>`__","notify_all_at_thread_exit synchronization","2013-04 (Bristol)","|Complete|","",""
-"`LWG2144 <https://wg21.link/LWG2144>`__","Missing noexcept specification in type_index","2013-04 (Bristol)","|Complete|","",""
-"`LWG2174 <https://wg21.link/LWG2174>`__","wstring_convert::converted() should be noexcept","2013-04 (Bristol)","|Complete|","",""
-"`LWG2175 <https://wg21.link/LWG2175>`__","string_convert and wbuffer_convert validity","2013-04 (Bristol)","|Complete|","",""
-"`LWG2176 <https://wg21.link/LWG2176>`__","Special members for wstring_convert and wbuffer_convert","2013-04 (Bristol)","|Complete|","",""
-"`LWG2177 <https://wg21.link/LWG2177>`__","Requirements on Copy/MoveInsertable","2013-04 (Bristol)","|Complete|","",""
-"`LWG2185 <https://wg21.link/LWG2185>`__","Missing throws clause for future/shared_future::wait_for/wait_until","2013-04 (Bristol)","|Complete|","",""
-"`LWG2187 <https://wg21.link/LWG2187>`__","vector<bool> is missing emplace and emplace_back member functions","2013-04 (Bristol)","|Complete|","",""
-"`LWG2190 <https://wg21.link/LWG2190>`__","ordering of condition variable operations, reflects Posix discussion","2013-04 (Bristol)","|Complete|","",""
-"`LWG2196 <https://wg21.link/LWG2196>`__","Specification of ``is_*[copy/move]_[constructible/assignable]``\  unclear for non-referencable types","2013-04 (Bristol)","|Complete|","",""
-"`LWG2197 <https://wg21.link/LWG2197>`__","Specification of ``is_[un]signed``\  unclear for non-arithmetic types","2013-04 (Bristol)","|Complete|","",""
-"`LWG2200 <https://wg21.link/LWG2200>`__","Data race avoidance for all containers, not only for sequences","2013-04 (Bristol)","|Complete|","",""
-"`LWG2203 <https://wg21.link/LWG2203>`__","scoped_allocator_adaptor uses wrong argument types for piecewise construction","2013-04 (Bristol)","|Complete|","",""
-"`LWG2207 <https://wg21.link/LWG2207>`__","basic_string::at should not have a Requires clause","2013-04 (Bristol)","|Complete|","",""
-"`LWG2209 <https://wg21.link/LWG2209>`__","assign() overspecified for sequence containers","2013-04 (Bristol)","|Complete|","",""
-"`LWG2210 <https://wg21.link/LWG2210>`__","Missing allocator-extended constructor for allocator-aware containers","2013-04 (Bristol)","|Complete|","",""
-"`LWG2211 <https://wg21.link/LWG2211>`__","Replace ambiguous use of ""Allocator"" in container requirements","2013-04 (Bristol)","|Complete|","",""
-"`LWG2222 <https://wg21.link/LWG2222>`__","Inconsistency in description of forward_list::splice_after single-element overload","2013-04 (Bristol)","|Complete|","",""
-"`LWG2225 <https://wg21.link/LWG2225>`__","Unrealistic header inclusion checks required","2013-04 (Bristol)","|Complete|","",""
-"`LWG2229 <https://wg21.link/LWG2229>`__","Standard code conversion facets underspecified","2013-04 (Bristol)","|Complete|","",""
-"`LWG2231 <https://wg21.link/LWG2231>`__","DR 704 removes complexity guarantee for clear()","2013-04 (Bristol)","|Complete|","",""
-"`LWG2235 <https://wg21.link/LWG2235>`__","Undefined behavior without proper requirements on basic_string constructors","2013-04 (Bristol)","|Complete|","",""
-"","","","","",""
-"`LWG2141 <https://wg21.link/LWG2141>`__","common_type trait produces reference types","2013-09 (Chicago)","|Complete|","",""
-"`LWG2246 <https://wg21.link/LWG2246>`__","unique_ptr assignment effects w.r.t. deleter","2013-09 (Chicago)","|Complete|","",""
-"`LWG2247 <https://wg21.link/LWG2247>`__","Type traits and std::nullptr_t","2013-09 (Chicago)","|Complete|","",""
-"`LWG2085 <https://wg21.link/LWG2085>`__","Wrong description of effect 1 of basic_istream::ignore","2013-09 (Chicago)","|Complete|","",""
-"`LWG2087 <https://wg21.link/LWG2087>`__","iostream_category() and noexcept","2013-09 (Chicago)","|Complete|","",""
-"`LWG2143 <https://wg21.link/LWG2143>`__","ios_base::xalloc should be thread-safe","2013-09 (Chicago)","|Complete|","",""
-"`LWG2150 <https://wg21.link/LWG2150>`__","Unclear specification of find_end","2013-09 (Chicago)","|Complete|","",""
-"`LWG2180 <https://wg21.link/LWG2180>`__","Exceptions from std::seed_seq operations","2013-09 (Chicago)","|Complete|","",""
-"`LWG2194 <https://wg21.link/LWG2194>`__","Impossible container requirements for adaptor types","2013-09 (Chicago)","|Complete|","",""
-"`LWG2013 <https://wg21.link/LWG2013>`__","Do library implementers have the freedom to add constexpr?","2013-09 (Chicago)","|Complete|","",""
-"`LWG2018 <https://wg21.link/LWG2018>`__","regex_traits::isctype Returns clause is wrong","2013-09 (Chicago)","|Complete|","",""
-"`LWG2078 <https://wg21.link/LWG2078>`__","Throw specification of async() incomplete","2013-09 (Chicago)","|Complete|","",""
-"`LWG2097 <https://wg21.link/LWG2097>`__","packaged_task constructors should be constrained","2013-09 (Chicago)","|Complete|","",""
-"`LWG2100 <https://wg21.link/LWG2100>`__","Timed waiting functions cannot timeout if launch::async policy used","2013-09 (Chicago)","|Complete|","",""
-"`LWG2120 <https://wg21.link/LWG2120>`__","What should async do if neither 'async' nor 'deferred' is set in policy?","2013-09 (Chicago)","|Complete|","",""
-"`LWG2159 <https://wg21.link/LWG2159>`__","atomic_flag initialization","2013-09 (Chicago)","|Complete|","",""
-"`LWG2275 <https://wg21.link/LWG2275>`__","Why is forward_as_tuple not constexpr?","2013-09 (Chicago)","|Complete|","",""
-"`LWG2284 <https://wg21.link/LWG2284>`__","Inconsistency in allocator_traits::max_size","2013-09 (Chicago)","|Complete|","",""
-"`LWG2298 <https://wg21.link/LWG2298>`__","``is_nothrow_constructible``\  is always false because of create<>","2013-09 (Chicago)","|Complete|","",""
-"`LWG2300 <https://wg21.link/LWG2300>`__","Redundant sections for map and multimap members should be removed","2013-09 (Chicago)","|Complete|","",""
-"`LWG2249 <https://wg21.link/LWG2249>`__","NB comment GB9: Remove gets from C++14","2013-09 (Chicago)","|Complete|","",""
-"","","","","",""
-"`LWG2135 <https://wg21.link/LWG2135>`__","Unclear requirement for exceptions thrown in condition_variable::wait()","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2291 <https://wg21.link/LWG2291>`__","std::hash is vulnerable to collision DoS attack","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2142 <https://wg21.link/LWG2142>`__","packaged_task::operator() synchronization too broad?","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2240 <https://wg21.link/LWG2240>`__","Probable misuse of term ""function scope"" in [thread.condition]","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2252 <https://wg21.link/LWG2252>`__","Strong guarantee on vector::push_back() still broken with C++11?","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2257 <https://wg21.link/LWG2257>`__","Simplify container requirements with the new algorithms","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2268 <https://wg21.link/LWG2268>`__","Setting a default argument in the declaration of a member function assign of std::basic_string","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2271 <https://wg21.link/LWG2271>`__","regex_traits::lookup_classname specification unclear","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2272 <https://wg21.link/LWG2272>`__","quoted should use char_traits::eq for character comparison","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2278 <https://wg21.link/LWG2278>`__","User-defined literals for Standard Library types","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2280 <https://wg21.link/LWG2280>`__","begin / end for arrays should be constexpr and noexcept","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2285 <https://wg21.link/LWG2285>`__","make_reverse_iterator","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2299 <https://wg21.link/LWG2299>`__","Effects of inaccessible ``key_compare::is_transparent``\  type are not clear","2014-02 (Issaquah)","|Complete|","",""
-"`LWG1450 <https://wg21.link/LWG1450>`__","Contradiction in regex_constants","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2003 <https://wg21.link/LWG2003>`__","String exception inconsistency in erase.","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2112 <https://wg21.link/LWG2112>`__","User-defined classes that cannot be derived from","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2132 <https://wg21.link/LWG2132>`__","std::function ambiguity","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2182 <https://wg21.link/LWG2182>`__","``Container::[const_]reference`` types are misleadingly specified","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2188 <https://wg21.link/LWG2188>`__","Reverse iterator does not fully support targets that overload operator&","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2193 <https://wg21.link/LWG2193>`__","Default constructors for standard library containers are explicit","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2205 <https://wg21.link/LWG2205>`__","Problematic postconditions of regex_match and regex_search","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2213 <https://wg21.link/LWG2213>`__","Return value of std::regex_replace","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2258 <https://wg21.link/LWG2258>`__","a.erase(q1, q2) unable to directly return q2","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2263 <https://wg21.link/LWG2263>`__","Comparing iterators and allocator pointers with different const-character","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2293 <https://wg21.link/LWG2293>`__","Wrong facet used by num_put::do_put","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2301 <https://wg21.link/LWG2301>`__","Why is std::tie not constexpr?","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2304 <https://wg21.link/LWG2304>`__","Complexity of count in unordered associative containers","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2306 <https://wg21.link/LWG2306>`__","match_results::reference should be value_type&, not const value_type&","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2308 <https://wg21.link/LWG2308>`__","Clarify container destructor requirements w.r.t. std::array","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2313 <https://wg21.link/LWG2313>`__","tuple_size should always derive from integral_constant<size_t, N>","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2314 <https://wg21.link/LWG2314>`__","apply() should return decltype(auto) and use decay_t before tuple_size","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2315 <https://wg21.link/LWG2315>`__","weak_ptr should be movable","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2316 <https://wg21.link/LWG2316>`__","weak_ptr::lock() should be atomic","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2317 <https://wg21.link/LWG2317>`__","The type property queries should be UnaryTypeTraits returning size_t","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2320 <https://wg21.link/LWG2320>`__","select_on_container_copy_construction() takes allocators, not containers","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2322 <https://wg21.link/LWG2322>`__","Associative(initializer_list, stuff) constructors are underspecified","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2323 <https://wg21.link/LWG2323>`__","vector::resize(n, t)'s specification should be simplified","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2324 <https://wg21.link/LWG2324>`__","Insert iterator constructors should use addressof()","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2329 <https://wg21.link/LWG2329>`__","regex_match()/regex_search() with match_results should forbid temporary strings","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2330 <https://wg21.link/LWG2330>`__","regex(""meow"", regex::icase) is technically forbidden but should be permitted","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2332 <https://wg21.link/LWG2332>`__","regex_iterator/regex_token_iterator should forbid temporary regexes","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2339 <https://wg21.link/LWG2339>`__","Wording issue in nth_element","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2341 <https://wg21.link/LWG2341>`__","Inconsistency between basic_ostream::seekp(pos) and basic_ostream::seekp(off, dir)","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2344 <https://wg21.link/LWG2344>`__","quoted()'s interaction with padding is unclear","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2346 <https://wg21.link/LWG2346>`__","integral_constant's member functions should be marked noexcept","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2350 <https://wg21.link/LWG2350>`__","min, max, and minmax should be constexpr","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2356 <https://wg21.link/LWG2356>`__","Stability of erasure in unordered associative containers","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2357 <https://wg21.link/LWG2357>`__","Remaining ""Assignable"" requirement","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2359 <https://wg21.link/LWG2359>`__","How does regex_constants::nosubs affect basic_regex::mark_count()?","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2360 <https://wg21.link/LWG2360>`__","``reverse_iterator::operator*()``\  is unimplementable","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2104 <https://wg21.link/LWG2104>`__","unique_lock move-assignment should not be noexcept","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2186 <https://wg21.link/LWG2186>`__","Incomplete action on async/launch::deferred","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2075 <https://wg21.link/LWG2075>`__","Progress guarantees, lock-free property, and scheduling assumptions","2014-02 (Issaquah)","|Complete|","",""
-"`LWG2288 <https://wg21.link/LWG2288>`__","Inconsistent requirements for shared mutexes","2014-02 (Issaquah)","|Complete|","",""
diff --git a/libcxx/docs/Status/Cxx14Papers.csv b/libcxx/docs/Status/Cxx14Papers.csv
deleted file mode 100644
index 3dc670ca0a5dc4..00000000000000
--- a/libcxx/docs/Status/Cxx14Papers.csv
+++ /dev/null
@@ -1,32 +0,0 @@
-"Paper #","Paper Name","Meeting","Status","First released version","Labels"
-"`N3346 <https://wg21.link/N3346>`__","Terminology for Container Element Requirements - Rev 1","2012-02 (Kona)","|Complete|","3.4",""
-"","","","","",""
-"`N3421 <https://wg21.link/N3421>`__","Making Operator Functors greater<>","2012-10 (Portland)","|Complete|","3.4",""
-"`N3462 <https://wg21.link/N3462>`__","std::result_of and SFINAE","2012-10 (Portland)","|Complete|","3.4",""
-"`N3469 <https://wg21.link/N3469>`__","Constexpr Library Additions: chrono, v3","2012-10 (Portland)","|Complete|","3.4",""
-"`N3470 <https://wg21.link/N3470>`__","Constexpr Library Additions: containers, v2","2012-10 (Portland)","|Complete|","3.4",""
-"`N3471 <https://wg21.link/N3471>`__","Constexpr Library Additions: utilities, v3","2012-10 (Portland)","|Complete|","3.4",""
-"`N3302 <https://wg21.link/N3302>`__","Constexpr Library Additions: complex, v2","2012-10 (Portland)","|Complete|","3.4",""
-"","","","","",""
-"`N3545 <https://wg21.link/N3545>`__","An Incremental Improvement to integral_constant","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3644 <https://wg21.link/N3644>`__","Null Forward Iterators","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3668 <https://wg21.link/N3668>`__","std::exchange()","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3658 <https://wg21.link/N3658>`__","Compile-time integer sequences","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3670 <https://wg21.link/N3670>`__","Addressing Tuples by Type","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3671 <https://wg21.link/N3671>`__","Making non-modifying sequence operations more robust","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3656 <https://wg21.link/N3656>`__","make_unique","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3654 <https://wg21.link/N3654>`__","Quoted Strings","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3642 <https://wg21.link/N3642>`__","User-defined Literals","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3655 <https://wg21.link/N3655>`__","TransformationTraits Redux (excluding part 4)","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3657 <https://wg21.link/N3657>`__","Adding heterogeneous comparison lookup to associative containers","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3672 <https://wg21.link/N3672>`__","A proposal to add a utility class to represent optional objects","2013-04 (Bristol)","*Removed from Draft Standard*","n/a",""
-"`N3669 <https://wg21.link/N3669>`__","Fixing constexpr member functions without const","2013-04 (Bristol)","|Complete|","3.4",""
-"`N3662 <https://wg21.link/N3662>`__","C++ Dynamic Arrays (dynarray)","2013-04 (Bristol)","*Removed from Draft Standard*","n/a",""
-"`N3659 <https://wg21.link/N3659>`__","Shared Locking in C++","2013-04 (Bristol)","|Complete|","3.4",""
-"","","","","",""
-"`N3779 <https://wg21.link/N3779>`__","User-defined Literals for std::complex","2013-09 (Chicago)","|Complete|","3.4",""
-"`N3789 <https://wg21.link/N3789>`__","Constexpr Library Additions: functional","2013-09 (Chicago)","|Complete|","3.4",""
-"","","","","",""
-"`N3924 <https://wg21.link/N3924>`__","Discouraging rand() in C++14","2014-02 (Issaquah)","|Complete|","3.5",""
-"`N3887 <https://wg21.link/N3887>`__","Consistent Metafunction Aliases","2014-02 (Issaquah)","|Complete|","3.5",""
-"`N3891 <https://wg21.link/N3891>`__","A proposal to rename shared_mutex to shared_timed_mutex","2014-02 (Issaquah)","|Complete|","3.5",""
diff --git a/libcxx/docs/index.rst b/libcxx/docs/index.rst
index 4bca3ccc8fa063..c3b724568bc51e 100644
--- a/libcxx/docs/index.rst
+++ b/libcxx/docs/index.rst
@@ -43,7 +43,6 @@ Getting Started with libc++
    Modules
    Hardening
    ReleaseProcedure
-   Status/Cxx14
    Status/Cxx17
    Status/Cxx20
    Status/Cxx23
@@ -173,7 +172,7 @@ C++ Dialect Support
 ===================
 
 * C++11 - Complete
-* :ref:`C++14 - Complete <cxx14-status>`
+* C++14 - Complete
 * :ref:`C++17 - In Progress <cxx17-status>`
 * :ref:`C++20 - In Progress <cxx20-status>`
 * :ref:`C++23 - In Progress <cxx23-status>`
diff --git a/libcxx/utils/synchronize_csv_status_files.py b/libcxx/utils/synchronize_csv_status_files.py
index 9228fc6ed20198..8c1e8cea0f394d 100755
--- a/libcxx/utils/synchronize_csv_status_files.py
+++ b/libcxx/utils/synchronize_csv_status_files.py
@@ -204,8 +204,6 @@ def sync_csv(rows: List[Tuple], from_github: List[PaperInfo]) -> List[Tuple]:
     return results
 
 CSV_FILES_TO_SYNC = [
-    'Cxx14Issues.csv',
-    'Cxx14Papers.csv',
     'Cxx17Issues.csv',
     'Cxx17Papers.csv',
     'Cxx20Issues.csv',

>From bf71c64839c0082e761a4f070ed92e01ced0187c Mon Sep 17 00:00:00 2001
From: Hans Wennborg <hans at chromium.org>
Date: Wed, 21 Aug 2024 16:28:25 +0200
Subject: [PATCH 009/116] Speculative fix for
 asan/TestCases/Darwin/cstring_section.c

It's been failing since https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/1812

It seems __TEXT,__cstring now comes before __TEXT,__const.
---
 compiler-rt/test/asan/TestCases/Darwin/cstring_section.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c b/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c
index d72b0ba8a8bb33..e40c4b1b8ed6ba 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c
+++ b/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c
@@ -6,10 +6,10 @@
 // Check that "Hello.\n" is in __asan_cstring and not in __cstring.
 // CHECK: Contents of section {{.*}}__asan_cstring:
 // CHECK: 48656c6c {{.*}} Hello.
-// CHECK: Contents of section {{.*}}__const:
-// CHECK-NOT: 48656c6c {{.*}} Hello.
 // CHECK: Contents of section {{.*}}__cstring:
 // CHECK-NOT: 48656c6c {{.*}} Hello.
+// CHECK: Contents of section {{.*}}__const:
+// CHECK-NOT: 48656c6c {{.*}} Hello.
 
 int main(int argc, char *argv[]) {
   argv[0] = "Hello.\n";

>From 8d4891591fb41780c2af6e18abd590faf1f5626c Mon Sep 17 00:00:00 2001
From: Nico Weber <thakis at chromium.org>
Date: Wed, 21 Aug 2024 10:35:10 -0400
Subject: [PATCH 010/116] [gn] port 7ad7f8f7a3d4

---
 llvm/utils/gn/secondary/libcxx/include/BUILD.gn | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/utils/gn/secondary/libcxx/include/BUILD.gn b/llvm/utils/gn/secondary/libcxx/include/BUILD.gn
index cc759d2337516d..f49c964b4128fb 100644
--- a/llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+++ b/llvm/utils/gn/secondary/libcxx/include/BUILD.gn
@@ -35,6 +35,7 @@ if (current_toolchain == default_toolchain) {
       "_LIBCPP_HAS_NO_UNICODE=",
       "_LIBCPP_HAS_NO_WIDE_CHARACTERS=",
       "_LIBCPP_HAS_NO_STD_MODULES=",
+      "_LIBCPP_HAS_NO_TERMINAL=",
       "_LIBCPP_INSTRUMENTED_WITH_ASAN=",
       "_LIBCPP_ABI_DEFINES=",
       "_LIBCPP_HARDENING_MODE_DEFAULT=_LIBCPP_HARDENING_MODE_NONE",

>From f0a3f8a370e3c85ee00cbc5e5d1c29e8ad3c51da Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 21 Aug 2024 08:54:27 -0400
Subject: [PATCH 011/116] [libc++] Enable C++23 and C++26 issues to be
 synchronized

As a drive-by, also switch to printing dangling issues instead of
killing the script, since those can be fairly common.
---
 libcxx/utils/synchronize_csv_status_files.py | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libcxx/utils/synchronize_csv_status_files.py b/libcxx/utils/synchronize_csv_status_files.py
index 8c1e8cea0f394d..68df5756e884d6 100755
--- a/libcxx/utils/synchronize_csv_status_files.py
+++ b/libcxx/utils/synchronize_csv_status_files.py
@@ -182,7 +182,8 @@ def sync_csv(rows: List[Tuple], from_github: List[PaperInfo]) -> List[Tuple]:
         if paper.is_implemented():
             dangling = [gh for gh in from_github if gh.paper_number == paper.paper_number and not gh.is_implemented()]
             if dangling:
-                raise RuntimeError(f"We found the following open tracking issues for a row which is already marked as implemented:\nrow: {row}\ntracking issues: {dangling}")
+                print(f"We found the following open tracking issues for a row which is already marked as implemented:\nrow: {row}\ntracking issues: {dangling}")
+                print("The Github issue should be closed if the work has indeed been done.")
             results.append(paper.for_printing())
         else:
             # Find any Github issues tracking this paper
@@ -208,11 +209,10 @@ def sync_csv(rows: List[Tuple], from_github: List[PaperInfo]) -> List[Tuple]:
     'Cxx17Papers.csv',
     'Cxx20Issues.csv',
     'Cxx20Papers.csv',
-    # TODO: The Github issues are not created yet.
-    # 'Cxx23Issues.csv',
-    # 'Cxx23Papers.csv',
-    # 'Cxx2cIssues.csv',
-    # 'Cxx2cPapers.csv',
+    'Cxx23Issues.csv',
+    'Cxx23Papers.csv',
+    'Cxx2cIssues.csv',
+    'Cxx2cPapers.csv',
 ]
 
 def main():

>From ddb5480e6799d0de72c2cd34c1e7f9ffd154e660 Mon Sep 17 00:00:00 2001
From: Brox Chen <guochen2 at amd.com>
Date: Wed, 21 Aug 2024 10:47:36 -0400
Subject: [PATCH 012/116] [AMDGPU][True16][MC] added VOPC realtrue/faketrue
 flag and fake16 instructions (#104739)

VOPC instructions were defined with HasTrue16BitInst flag while these
true16 instructions are actually implemented with fake16 profile.
Seperate them to true16 version and fake16 version by adding
UseRealTrue16 and UseFakeTrue16 flag and fake16 instructions.

The code default to use fake16. This is preparing for the upcoming
changes in MC to support realtrue 16bit operands and vdst. The true16
and fake16 profile will be modified in the later patches.
---
 llvm/lib/Target/AMDGPU/VOPCInstructions.td    | 149 ++++++++++++-
 .../GlobalISel/inst-select-fcmp.s16.mir       | 200 ++++++++++--------
 2 files changed, 256 insertions(+), 93 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOPCInstructions.td b/llvm/lib/Target/AMDGPU/VOPCInstructions.td
index 62ca6261c47c80..be862b44917e15 100644
--- a/llvm/lib/Target/AMDGPU/VOPCInstructions.td
+++ b/llvm/lib/Target/AMDGPU/VOPCInstructions.td
@@ -87,6 +87,17 @@ class VOPC_Profile<list<SchedReadWrite> sched, ValueType vt0, ValueType vt1 = vt
 multiclass VOPC_Profile_t16<list<SchedReadWrite> sched, ValueType vt0, ValueType vt1 = vt0> {
   def NAME : VOPC_Profile<sched, vt0, vt1>;
   def _t16 : VOPC_Profile<sched, vt0, vt1> {
+    let IsTrue16 = 1;
+    let IsRealTrue16 = 1;
+    let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src0DPP = getVregSrcForVT<Src0VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src1DPP = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src2DPP = getVregSrcForVT<Src2VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;
+    let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;
+    let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;
+  }
+  def _fake16: VOPC_Profile<sched, vt0, vt1> {
     let IsTrue16 = 1;
     let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
     let Src0DPP = getVregSrcForVT<Src0VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
@@ -117,6 +128,17 @@ class VOPC_NoSdst_Profile<list<SchedReadWrite> sched, ValueType vt0,
 multiclass VOPC_NoSdst_Profile_t16<list<SchedReadWrite> sched, ValueType vt0, ValueType vt1 = vt0> {
   def NAME : VOPC_NoSdst_Profile<sched, vt0, vt1>;
   def _t16 : VOPC_NoSdst_Profile<sched, vt0, vt1> {
+    let IsTrue16 = 1;
+    let IsRealTrue16 = 1;
+    let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src0DPP = getVregSrcForVT<Src0VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src1DPP = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src2DPP = getVregSrcForVT<Src2VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;
+    let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;
+    let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;
+  }
+  def _fake16 : VOPC_NoSdst_Profile<sched, vt0, vt1> {
     let IsTrue16 = 1;
     let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
     let Src0DPP = getVregSrcForVT<Src0VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
@@ -412,9 +434,12 @@ multiclass VOPC_F16 <string opName, SDPatternOperator cond = COND_NULL,
   let OtherPredicates = [Has16BitInsts], True16Predicate = NotHasTrue16BitInsts in {
     defm NAME : VOPC_Pseudos <opName, VOPC_I1_F16_F16, cond, revOp, 0>;
   }
-  let OtherPredicates = [HasTrue16BitInsts] in {
+  let True16Predicate = UseRealTrue16Insts in {
     defm _t16 : VOPC_Pseudos <opName#"_t16", VOPC_I1_F16_F16_t16, cond, revOp#"_t16", 0>;
   }
+  let True16Predicate = UseFakeTrue16Insts in {
+    defm _fake16 : VOPC_Pseudos <opName#"_fake16", VOPC_I1_F16_F16_fake16, cond, revOp#"_fake16", 0>;
+  }
 }
 
 multiclass VOPC_F32 <string opName, SDPatternOperator cond = COND_NULL, string revOp = opName> :
@@ -428,9 +453,12 @@ multiclass VOPC_I16 <string opName, SDPatternOperator cond = COND_NULL,
   let OtherPredicates = [Has16BitInsts], True16Predicate = NotHasTrue16BitInsts in {
     defm NAME : VOPC_Pseudos <opName, VOPC_I1_I16_I16, cond, revOp, 0>;
   }
-  let OtherPredicates = [HasTrue16BitInsts] in {
+  let True16Predicate = UseRealTrue16Insts in {
     defm _t16 : VOPC_Pseudos <opName#"_t16", VOPC_I1_I16_I16_t16, cond, revOp#"_t16", 0>;
   }
+  let True16Predicate = UseFakeTrue16Insts in {
+    defm _fake16 : VOPC_Pseudos <opName#"_fake16", VOPC_I1_I16_I16_fake16, cond, revOp#"_fake16", 0>;
+  }
 }
 
 multiclass VOPC_I32 <string opName, SDPatternOperator cond = COND_NULL, string revOp = opName> :
@@ -445,9 +473,12 @@ multiclass VOPCX_F16<string opName, string revOp = opName> {
   let OtherPredicates = [Has16BitInsts], True16Predicate = NotHasTrue16BitInsts in {
     defm NAME : VOPCX_Pseudos <opName, VOPC_I1_F16_F16, VOPC_F16_F16, COND_NULL, revOp>;
   }
-  let OtherPredicates = [HasTrue16BitInsts] in {
+  let True16Predicate = UseRealTrue16Insts in {
     defm _t16 : VOPCX_Pseudos <opName#"_t16", VOPC_I1_F16_F16_t16, VOPC_F16_F16_t16, COND_NULL, revOp#"_t16">;
   }
+  let True16Predicate = UseFakeTrue16Insts in {
+    defm _fake16 : VOPCX_Pseudos <opName#"_fake16", VOPC_I1_F16_F16_fake16, VOPC_F16_F16_fake16, COND_NULL, revOp#"_fake16">;
+  }
 }
 
 multiclass VOPCX_F32 <string opName, string revOp = opName> :
@@ -460,9 +491,12 @@ multiclass VOPCX_I16<string opName, string revOp = opName> {
   let OtherPredicates = [Has16BitInsts], True16Predicate = NotHasTrue16BitInsts in {
     defm NAME : VOPCX_Pseudos <opName, VOPC_I1_I16_I16, VOPC_I16_I16, COND_NULL, revOp>;
   }
-  let OtherPredicates = [HasTrue16BitInsts] in {
+  let True16Predicate = UseRealTrue16Insts in {
     defm _t16 : VOPCX_Pseudos <opName#"_t16", VOPC_I1_I16_I16_t16, VOPC_I16_I16_t16, COND_NULL, revOp#"_t16">;
   }
+  let True16Predicate = UseFakeTrue16Insts in {
+    defm _fake16 : VOPCX_Pseudos <opName#"_fake16", VOPC_I1_I16_I16_fake16, VOPC_I16_I16_fake16, COND_NULL, revOp#"_fake16">;
+  }
 }
 
 multiclass VOPCX_I32 <string opName, string revOp = opName> :
@@ -795,6 +829,18 @@ class VOPC_Class_Profile<list<SchedReadWrite> sched, ValueType src0VT, ValueType
 multiclass VOPC_Class_Profile_t16<list<SchedReadWrite> sched> {
   def NAME : VOPC_Class_Profile<sched, f16>;
   def _t16 : VOPC_Class_Profile<sched, f16, i16> {
+    let IsTrue16 = 1;
+    let IsRealTrue16 = 1;
+    let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src1RC64 = VSrc_b32;
+    let Src0DPP = getVregSrcForVT<Src0VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src1DPP = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src2DPP = getVregSrcForVT<Src2VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;
+    let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;
+    let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;
+  }
+  def _fake16 : VOPC_Class_Profile<sched, f16, i16> {
     let IsTrue16 = 1;
     let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
     let Src1RC64 = VSrc_b32;
@@ -822,6 +868,18 @@ class VOPC_Class_NoSdst_Profile<list<SchedReadWrite> sched, ValueType src0VT, Va
 multiclass VOPC_Class_NoSdst_Profile_t16<list<SchedReadWrite> sched> {
   def NAME : VOPC_Class_NoSdst_Profile<sched, f16>;
   def _t16 : VOPC_Class_NoSdst_Profile<sched, f16, i16> {
+    let IsTrue16 = 1;
+    let IsRealTrue16 = 1;
+    let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src1RC64 = VSrc_b32;
+    let Src0DPP = getVregSrcForVT<Src0VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src1DPP = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src2DPP = getVregSrcForVT<Src2VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
+    let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;
+    let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;
+    let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;
+  }
+  def _fake16 : VOPC_Class_NoSdst_Profile<sched, f16, i16> {
     let IsTrue16 = 1;
     let Src1RC32 = getVregSrcForVT<Src1VT, 1/*IsTrue16*/, 1/*IsFake16*/>.ret;
     let Src1RC64 = VSrc_b32;
@@ -948,18 +1006,24 @@ multiclass VOPC_CLASS_F16 <string opName> {
   let OtherPredicates = [Has16BitInsts], True16Predicate = NotHasTrue16BitInsts in {
     defm NAME : VOPC_Class_Pseudos <opName, VOPC_I1_F16_I16, 0>;
   }
-  let OtherPredicates = [HasTrue16BitInsts] in {
+  let OtherPredicates = [UseRealTrue16Insts] in {
     defm _t16 : VOPC_Class_Pseudos <opName#"_t16", VOPC_I1_F16_I16_t16, 0>;
   }
+  let OtherPredicates = [UseFakeTrue16Insts] in {
+    defm _fake16 : VOPC_Class_Pseudos <opName#"_fake16", VOPC_I1_F16_I16_fake16, 0>;
+  }
 }
 
 multiclass VOPCX_CLASS_F16 <string opName> {
   let OtherPredicates = [Has16BitInsts], True16Predicate = NotHasTrue16BitInsts in {
     defm NAME : VOPCX_Class_Pseudos <opName, VOPC_I1_F16_I16, VOPC_F16_I16>;
   }
-  let OtherPredicates = [HasTrue16BitInsts] in {
+  let OtherPredicates = [UseRealTrue16Insts] in {
     defm _t16 : VOPCX_Class_Pseudos <opName#"_t16", VOPC_I1_F16_I16_t16, VOPC_F16_I16_t16>;
   }
+  let OtherPredicates = [UseFakeTrue16Insts] in {
+    defm _fake16 : VOPCX_Class_Pseudos <opName#"_fake16", VOPC_I1_F16_I16_fake16, VOPC_F16_I16_fake16>;
+  }
 }
 
 multiclass VOPC_CLASS_F32 <string opName> :
@@ -1401,7 +1465,7 @@ multiclass VOPC_Real_with_name<GFXGen Gen, bits<9> op, string OpName,
                                                      pseudo_mnemonic),
                               asm_name, ps64.AsmVariantName>;
 
-    let DecoderNamespace = Gen.DecoderNamespace in {
+    let DecoderNamespace = Gen.DecoderNamespace # !if(ps32.Pfl.IsRealTrue16, "", "_FAKE16") in {
       def _e32#Gen.Suffix :
         // 32 and 64 bit forms of the instruction have _e32 and _e64
         // respectively appended to their assembly mnemonic.
@@ -1530,7 +1594,7 @@ multiclass VOPCX_Real_with_name<GFXGen Gen, bits<9> op, string OpName,
                                                      pseudo_mnemonic),
                               asm_name, ps64.AsmVariantName>;
 
-    let DecoderNamespace = Gen.DecoderNamespace in {
+    let DecoderNamespace = Gen.DecoderNamespace # !if(ps32.Pfl.IsRealTrue16, "", "_FAKE16") in {
       def _e32#Gen.Suffix
           : VOPC_Real<ps32, Gen.Subtarget, asm_name>,
             VOPCe<op{7-0}> {
@@ -1623,7 +1687,25 @@ defm V_CMP_NGT_F16_t16    : VOPC_Real_t16_gfx11_gfx12<0x00b, "v_cmp_ngt_f16">;
 defm V_CMP_NLE_F16_t16    : VOPC_Real_t16_gfx11_gfx12<0x00c, "v_cmp_nle_f16">;
 defm V_CMP_NEQ_F16_t16    : VOPC_Real_t16_gfx11_gfx12<0x00d, "v_cmp_neq_f16">;
 defm V_CMP_NLT_F16_t16    : VOPC_Real_t16_gfx11_gfx12<0x00e, "v_cmp_nlt_f16">;
-defm V_CMP_T_F16_t16      : VOPC_Real_with_name_gfx11<0x00f, "V_CMP_TRU_F16_t16", "v_cmp_t_f16", "v_cmp_tru_f16">;
+defm V_CMP_T_F16_t16      : VOPC_Real_t16_gfx11<0x00f, "v_cmp_t_f16", "V_CMP_TRU_F16_t16", "v_cmp_tru_f16">;
+
+defm V_CMP_F_F16_fake16      : VOPC_Real_t16_gfx11<0x000, "v_cmp_f_f16">;
+defm V_CMP_LT_F16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x001, "v_cmp_lt_f16">;
+defm V_CMP_EQ_F16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x002, "v_cmp_eq_f16">;
+defm V_CMP_LE_F16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x003, "v_cmp_le_f16">;
+defm V_CMP_GT_F16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x004, "v_cmp_gt_f16">;
+defm V_CMP_LG_F16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x005, "v_cmp_lg_f16">;
+defm V_CMP_GE_F16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x006, "v_cmp_ge_f16">;
+defm V_CMP_O_F16_fake16      : VOPC_Real_t16_gfx11_gfx12<0x007, "v_cmp_o_f16">;
+defm V_CMP_U_F16_fake16      : VOPC_Real_t16_gfx11_gfx12<0x008, "v_cmp_u_f16">;
+defm V_CMP_NGE_F16_fake16    : VOPC_Real_t16_gfx11_gfx12<0x009, "v_cmp_nge_f16">;
+defm V_CMP_NLG_F16_fake16    : VOPC_Real_t16_gfx11_gfx12<0x00a, "v_cmp_nlg_f16">;
+defm V_CMP_NGT_F16_fake16    : VOPC_Real_t16_gfx11_gfx12<0x00b, "v_cmp_ngt_f16">;
+defm V_CMP_NLE_F16_fake16    : VOPC_Real_t16_gfx11_gfx12<0x00c, "v_cmp_nle_f16">;
+defm V_CMP_NEQ_F16_fake16    : VOPC_Real_t16_gfx11_gfx12<0x00d, "v_cmp_neq_f16">;
+defm V_CMP_NLT_F16_fake16    : VOPC_Real_t16_gfx11_gfx12<0x00e, "v_cmp_nlt_f16">;
+defm V_CMP_T_F16_fake16      : VOPC_Real_t16_gfx11<0x00f, "v_cmp_t_f16", "V_CMP_TRU_F16_fake16", "v_cmp_tru_f16">;
+
 defm V_CMP_F_F32      : VOPC_Real_gfx11<0x010>;
 defm V_CMP_LT_F32     : VOPC_Real_gfx11_gfx12<0x011>;
 defm V_CMP_EQ_F32     : VOPC_Real_gfx11_gfx12<0x012>;
@@ -1641,6 +1723,7 @@ defm V_CMP_NEQ_F32    : VOPC_Real_gfx11_gfx12<0x01d>;
 defm V_CMP_NLT_F32    : VOPC_Real_gfx11_gfx12<0x01e>;
 defm V_CMP_T_F32      : VOPC_Real_with_name_gfx11<0x01f, "V_CMP_TRU_F32", "v_cmp_t_f32">;
 defm V_CMP_T_F64      : VOPC_Real_with_name_gfx11<0x02f, "V_CMP_TRU_F64", "v_cmp_t_f64">;
+
 defm V_CMP_LT_I16_t16     : VOPC_Real_t16_gfx11_gfx12<0x031, "v_cmp_lt_i16">;
 defm V_CMP_EQ_I16_t16     : VOPC_Real_t16_gfx11_gfx12<0x032, "v_cmp_eq_i16">;
 defm V_CMP_LE_I16_t16     : VOPC_Real_t16_gfx11_gfx12<0x033, "v_cmp_le_i16">;
@@ -1653,6 +1736,20 @@ defm V_CMP_LE_U16_t16     : VOPC_Real_t16_gfx11_gfx12<0x03b, "v_cmp_le_u16">;
 defm V_CMP_GT_U16_t16     : VOPC_Real_t16_gfx11_gfx12<0x03c, "v_cmp_gt_u16">;
 defm V_CMP_NE_U16_t16     : VOPC_Real_t16_gfx11_gfx12<0x03d, "v_cmp_ne_u16">;
 defm V_CMP_GE_U16_t16     : VOPC_Real_t16_gfx11_gfx12<0x03e, "v_cmp_ge_u16">;
+
+defm V_CMP_LT_I16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x031, "v_cmp_lt_i16">;
+defm V_CMP_EQ_I16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x032, "v_cmp_eq_i16">;
+defm V_CMP_LE_I16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x033, "v_cmp_le_i16">;
+defm V_CMP_GT_I16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x034, "v_cmp_gt_i16">;
+defm V_CMP_NE_I16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x035, "v_cmp_ne_i16">;
+defm V_CMP_GE_I16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x036, "v_cmp_ge_i16">;
+defm V_CMP_LT_U16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x039, "v_cmp_lt_u16">;
+defm V_CMP_EQ_U16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x03a, "v_cmp_eq_u16">;
+defm V_CMP_LE_U16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x03b, "v_cmp_le_u16">;
+defm V_CMP_GT_U16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x03c, "v_cmp_gt_u16">;
+defm V_CMP_NE_U16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x03d, "v_cmp_ne_u16">;
+defm V_CMP_GE_U16_fake16     : VOPC_Real_t16_gfx11_gfx12<0x03e, "v_cmp_ge_u16">;
+
 defm V_CMP_F_I32      : VOPC_Real_gfx11<0x040>;
 defm V_CMP_LT_I32     : VOPC_Real_gfx11_gfx12<0x041>;
 defm V_CMP_EQ_I32     : VOPC_Real_gfx11_gfx12<0x042>;
@@ -1688,6 +1785,7 @@ defm V_CMP_GE_U64     : VOPC_Real_gfx11_gfx12<0x05e>;
 defm V_CMP_T_U64      : VOPC_Real_gfx11<0x05f>;
 
 defm V_CMP_CLASS_F16_t16 : VOPC_Real_t16_gfx11_gfx12<0x07d, "v_cmp_class_f16">;
+defm V_CMP_CLASS_F16_fake16 : VOPC_Real_t16_gfx11_gfx12<0x07d, "v_cmp_class_f16">;
 defm V_CMP_CLASS_F32     : VOPC_Real_gfx11_gfx12<0x07e>;
 defm V_CMP_CLASS_F64     : VOPC_Real_gfx11_gfx12<0x07f>;
 
@@ -1707,6 +1805,24 @@ defm V_CMPX_NLE_F16_t16   : VOPCX_Real_t16_gfx11_gfx12<0x08c, "v_cmpx_nle_f16">;
 defm V_CMPX_NEQ_F16_t16   : VOPCX_Real_t16_gfx11_gfx12<0x08d, "v_cmpx_neq_f16">;
 defm V_CMPX_NLT_F16_t16   : VOPCX_Real_t16_gfx11_gfx12<0x08e, "v_cmpx_nlt_f16">;
 defm V_CMPX_T_F16_t16     : VOPCX_Real_with_name_gfx11<0x08f, "V_CMPX_TRU_F16_t16", "v_cmpx_t_f16", "v_cmpx_tru_f16">;
+
+defm V_CMPX_F_F16_fake16     : VOPCX_Real_t16_gfx11<0x080, "v_cmpx_f_f16">;
+defm V_CMPX_LT_F16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x081, "v_cmpx_lt_f16">;
+defm V_CMPX_EQ_F16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x082, "v_cmpx_eq_f16">;
+defm V_CMPX_LE_F16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x083, "v_cmpx_le_f16">;
+defm V_CMPX_GT_F16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x084, "v_cmpx_gt_f16">;
+defm V_CMPX_LG_F16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x085, "v_cmpx_lg_f16">;
+defm V_CMPX_GE_F16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x086, "v_cmpx_ge_f16">;
+defm V_CMPX_O_F16_fake16     : VOPCX_Real_t16_gfx11_gfx12<0x087, "v_cmpx_o_f16">;
+defm V_CMPX_U_F16_fake16     : VOPCX_Real_t16_gfx11_gfx12<0x088, "v_cmpx_u_f16">;
+defm V_CMPX_NGE_F16_fake16   : VOPCX_Real_t16_gfx11_gfx12<0x089, "v_cmpx_nge_f16">;
+defm V_CMPX_NLG_F16_fake16   : VOPCX_Real_t16_gfx11_gfx12<0x08a, "v_cmpx_nlg_f16">;
+defm V_CMPX_NGT_F16_fake16   : VOPCX_Real_t16_gfx11_gfx12<0x08b, "v_cmpx_ngt_f16">;
+defm V_CMPX_NLE_F16_fake16   : VOPCX_Real_t16_gfx11_gfx12<0x08c, "v_cmpx_nle_f16">;
+defm V_CMPX_NEQ_F16_fake16   : VOPCX_Real_t16_gfx11_gfx12<0x08d, "v_cmpx_neq_f16">;
+defm V_CMPX_NLT_F16_fake16   : VOPCX_Real_t16_gfx11_gfx12<0x08e, "v_cmpx_nlt_f16">;
+defm V_CMPX_T_F16_fake16     : VOPCX_Real_with_name_gfx11<0x08f, "V_CMPX_TRU_F16_fake16", "v_cmpx_t_f16", "v_cmpx_tru_f16">;
+
 defm V_CMPX_F_F32     : VOPCX_Real_gfx11<0x090>;
 defm V_CMPX_LT_F32    : VOPCX_Real_gfx11_gfx12<0x091>;
 defm V_CMPX_EQ_F32    : VOPCX_Real_gfx11_gfx12<0x092>;
@@ -1753,6 +1869,20 @@ defm V_CMPX_LE_U16_t16    : VOPCX_Real_t16_gfx11_gfx12<0x0bb, "v_cmpx_le_u16">;
 defm V_CMPX_GT_U16_t16    : VOPCX_Real_t16_gfx11_gfx12<0x0bc, "v_cmpx_gt_u16">;
 defm V_CMPX_NE_U16_t16    : VOPCX_Real_t16_gfx11_gfx12<0x0bd, "v_cmpx_ne_u16">;
 defm V_CMPX_GE_U16_t16    : VOPCX_Real_t16_gfx11_gfx12<0x0be, "v_cmpx_ge_u16">;
+
+defm V_CMPX_LT_I16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b1, "v_cmpx_lt_i16">;
+defm V_CMPX_EQ_I16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b2, "v_cmpx_eq_i16">;
+defm V_CMPX_LE_I16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b3, "v_cmpx_le_i16">;
+defm V_CMPX_GT_I16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b4, "v_cmpx_gt_i16">;
+defm V_CMPX_NE_I16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b5, "v_cmpx_ne_i16">;
+defm V_CMPX_GE_I16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b6, "v_cmpx_ge_i16">;
+defm V_CMPX_LT_U16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0b9, "v_cmpx_lt_u16">;
+defm V_CMPX_EQ_U16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0ba, "v_cmpx_eq_u16">;
+defm V_CMPX_LE_U16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0bb, "v_cmpx_le_u16">;
+defm V_CMPX_GT_U16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0bc, "v_cmpx_gt_u16">;
+defm V_CMPX_NE_U16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0bd, "v_cmpx_ne_u16">;
+defm V_CMPX_GE_U16_fake16    : VOPCX_Real_t16_gfx11_gfx12<0x0be, "v_cmpx_ge_u16">;
+
 defm V_CMPX_F_I32     : VOPCX_Real_gfx11<0x0c0>;
 defm V_CMPX_LT_I32    : VOPCX_Real_gfx11_gfx12<0x0c1>;
 defm V_CMPX_EQ_I32    : VOPCX_Real_gfx11_gfx12<0x0c2>;
@@ -1787,6 +1917,7 @@ defm V_CMPX_NE_U64    : VOPCX_Real_gfx11_gfx12<0x0dd>;
 defm V_CMPX_GE_U64    : VOPCX_Real_gfx11_gfx12<0x0de>;
 defm V_CMPX_T_U64     : VOPCX_Real_gfx11<0x0df>;
 defm V_CMPX_CLASS_F16_t16 : VOPCX_Real_t16_gfx11_gfx12<0x0fd, "v_cmpx_class_f16">;
+defm V_CMPX_CLASS_F16_fake16 : VOPCX_Real_t16_gfx11_gfx12<0x0fd, "v_cmpx_class_f16">;
 defm V_CMPX_CLASS_F32     : VOPCX_Real_gfx11_gfx12<0x0fe>;
 defm V_CMPX_CLASS_F64     : VOPCX_Real_gfx11_gfx12<0x0ff>;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.mir
index 04c3f050d165a3..5c387baf467524 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcmp.s16.mir
@@ -20,6 +20,7 @@ body: |
     ; WAVE64-NEXT: [[TRUNC1:%[0-9]+]]:vgpr(s16) = G_TRUNC [[COPY1]](s32)
     ; WAVE64-NEXT: [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(false), [[TRUNC]](s16), [[TRUNC1]]
     ; WAVE64-NEXT: S_ENDPGM 0, implicit [[FCMP]](s1)
+    ;
     ; WAVE32-LABEL: name: fcmp_false_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
@@ -29,6 +30,7 @@ body: |
     ; WAVE32-NEXT: [[TRUNC1:%[0-9]+]]:vgpr(s16) = G_TRUNC [[COPY1]](s32)
     ; WAVE32-NEXT: [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(false), [[TRUNC]](s16), [[TRUNC1]]
     ; WAVE32-NEXT: S_ENDPGM 0, implicit [[FCMP]](s1)
+    ;
     ; GFX11-LABEL: name: fcmp_false_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
@@ -59,22 +61,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_EQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_EQ_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_EQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_EQ_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_oeq_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_EQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_EQ_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_EQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_EQ_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_oeq_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_EQ_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_EQ_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_EQ_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_EQ_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -96,22 +100,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_GT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_GT_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_GT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_GT_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ogt_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_GT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_GT_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_GT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_GT_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ogt_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_GT_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_GT_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_GT_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_GT_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -133,22 +139,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_GE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_GE_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_GE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_GE_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_oge_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_GE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_GE_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_GE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_GE_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_oge_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_GE_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_GE_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_GE_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_GE_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -170,22 +178,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_LT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_LT_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_LT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_LT_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_olt_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_LT_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_LT_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_olt_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LT_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_LT_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LT_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_LT_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -207,22 +217,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_LE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_LE_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_LE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_LE_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ole_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_LE_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_LE_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ole_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LE_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_LE_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LE_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_LE_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -243,22 +255,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_LG_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_LG_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_one_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_LG_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_LG_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_one_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_LG_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_LG_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -280,22 +294,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_LG_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_LG_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ord_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_LG_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_LG_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ord_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_LG_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_LG_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_LG_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -317,22 +333,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_U_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_U_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_U_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_U_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_uno_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_U_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_U_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_U_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_U_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_uno_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_U_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_U_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_U_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_U_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -354,22 +372,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_NLG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_NLG_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_NLG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLG_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ueq_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NLG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_NLG_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NLG_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLG_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ueq_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NLG_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_NLG_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NLG_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLG_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -391,22 +411,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_NLE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_NLE_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_NLE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLE_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ugt_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NLE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_NLE_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NLE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLE_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ugt_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NLE_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_NLE_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NLE_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLE_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -428,22 +450,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_NLT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_NLT_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_NLT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLT_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_uge_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NLT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_NLT_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NLT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLT_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_uge_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NLT_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_NLT_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NLT_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_NLT_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -465,22 +489,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_NGE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_NGE_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_NGE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_NGE_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ult_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NGE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_NGE_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NGE_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_NGE_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ult_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NGE_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_NGE_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NGE_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_NGE_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -502,22 +528,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_NGT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_NGT_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_NGT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_NGT_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_ule_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NGT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_NGT_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NGT_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_NGT_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_ule_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NGT_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_NGT_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NGT_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_NGT_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -539,22 +567,24 @@ body: |
     ; WAVE64-NEXT: {{  $}}
     ; WAVE64-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE64-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE64-NEXT: %4:sreg_64_xexec = nofpexcept V_CMP_NEQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE64-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE64-NEXT: [[V_CMP_NEQ_F16_e64_:%[0-9]+]]:sreg_64_xexec = nofpexcept V_CMP_NEQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE64-NEXT: S_ENDPGM 0, implicit [[V_CMP_NEQ_F16_e64_]]
+    ;
     ; WAVE32-LABEL: name: fcmp_une_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
     ; WAVE32-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; WAVE32-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; WAVE32-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NEQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; WAVE32-NEXT: S_ENDPGM 0, implicit %4
+    ; WAVE32-NEXT: [[V_CMP_NEQ_F16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NEQ_F16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; WAVE32-NEXT: S_ENDPGM 0, implicit [[V_CMP_NEQ_F16_e64_]]
+    ;
     ; GFX11-LABEL: name: fcmp_une_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}
     ; GFX11-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
     ; GFX11-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-    ; GFX11-NEXT: %4:sreg_32_xm0_xexec = nofpexcept V_CMP_NEQ_F16_t16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
-    ; GFX11-NEXT: S_ENDPGM 0, implicit %4
+    ; GFX11-NEXT: [[V_CMP_NEQ_F16_fake16_e64_:%[0-9]+]]:sreg_32_xm0_xexec = nofpexcept V_CMP_NEQ_F16_fake16_e64 0, [[COPY]], 0, [[COPY1]], 0, implicit $mode, implicit $exec
+    ; GFX11-NEXT: S_ENDPGM 0, implicit [[V_CMP_NEQ_F16_fake16_e64_]]
     %0:vgpr(s32) = COPY $vgpr0
     %1:vgpr(s32) = COPY $vgpr1
     %2:vgpr(s16) = G_TRUNC %0
@@ -580,6 +610,7 @@ body: |
     ; WAVE64-NEXT: [[TRUNC1:%[0-9]+]]:vgpr(s16) = G_TRUNC [[COPY1]](s32)
     ; WAVE64-NEXT: [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(true), [[TRUNC]](s16), [[TRUNC1]]
     ; WAVE64-NEXT: S_ENDPGM 0, implicit [[FCMP]](s1)
+    ;
     ; WAVE32-LABEL: name: fcmp_true_s16_vv
     ; WAVE32: liveins: $vgpr0, $vgpr1
     ; WAVE32-NEXT: {{  $}}
@@ -589,6 +620,7 @@ body: |
     ; WAVE32-NEXT: [[TRUNC1:%[0-9]+]]:vgpr(s16) = G_TRUNC [[COPY1]](s32)
     ; WAVE32-NEXT: [[FCMP:%[0-9]+]]:vcc(s1) = G_FCMP floatpred(true), [[TRUNC]](s16), [[TRUNC1]]
     ; WAVE32-NEXT: S_ENDPGM 0, implicit [[FCMP]](s1)
+    ;
     ; GFX11-LABEL: name: fcmp_true_s16_vv
     ; GFX11: liveins: $vgpr0, $vgpr1
     ; GFX11-NEXT: {{  $}}

>From c9ba6d35c19022a582516e9455af3f0d79101adf Mon Sep 17 00:00:00 2001
From: Philip Reames <preames at rivosinc.com>
Date: Wed, 21 Aug 2024 07:53:47 -0700
Subject: [PATCH 013/116] [RISCV] Add coverage for fp reductions of <2^N-1 x
 FP> vectors

---
 .../RISCV/rvv/fixed-vectors-reduction-fp.ll   | 375 +++++++++++++-----
 1 file changed, 283 insertions(+), 92 deletions(-)

diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
index a6763fa22822ed..e9e147861df564 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
@@ -91,6 +91,26 @@ define half @vreduce_ord_fadd_v4f16(ptr %x, half %s) {
   ret half %red
 }
 
+declare half @llvm.vector.reduce.fadd.v7f16(half, <7 x half>)
+
+define half @vreduce_fadd_v7f16(ptr %x, half %s) {
+; CHECK-LABEL: vreduce_fadd_v7f16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e16, m1, ta, ma
+; CHECK-NEXT:    vle16.v v8, (a0)
+; CHECK-NEXT:    lui a0, 1048568
+; CHECK-NEXT:    vmv.s.x v9, a0
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vslideup.vi v8, v9, 7
+; CHECK-NEXT:    vfmv.s.f v9, fa0
+; CHECK-NEXT:    vfredusum.vs v8, v8, v9
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x half>, ptr %x
+  %red = call reassoc half @llvm.vector.reduce.fadd.v7f16(half %s, <7 x half> %v)
+  ret half %red
+}
+
 declare half @llvm.vector.reduce.fadd.v8f16(half, <8 x half>)
 
 define half @vreduce_fadd_v8f16(ptr %x, half %s) {
@@ -443,6 +463,45 @@ define float @vreduce_ord_fwadd_v4f32(ptr %x, float %s) {
   ret float %red
 }
 
+declare float @llvm.vector.reduce.fadd.v7f32(float, <7 x float>)
+
+define float @vreduce_fadd_v7f32(ptr %x, float %s) {
+; CHECK-LABEL: vreduce_fadd_v7f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 524288
+; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v8, v10, 7
+; CHECK-NEXT:    vfmv.s.f v10, fa0
+; CHECK-NEXT:    vfredusum.vs v8, v8, v10
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call reassoc float @llvm.vector.reduce.fadd.v7f32(float %s, <7 x float> %v)
+  ret float %red
+}
+
+define float @vreduce_ord_fadd_v7f32(ptr %x, float %s) {
+; CHECK-LABEL: vreduce_ord_fadd_v7f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 524288
+; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v8, v10, 7
+; CHECK-NEXT:    vfmv.s.f v10, fa0
+; CHECK-NEXT:    vfredosum.vs v8, v8, v10
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call float @llvm.vector.reduce.fadd.v7f32(float %s, <7 x float> %v)
+  ret float %red
+}
+
+
 declare float @llvm.vector.reduce.fadd.v8f32(float, <8 x float>)
 
 define float @vreduce_fadd_v8f32(ptr %x, float %s) {
@@ -1250,6 +1309,26 @@ define float @vreduce_fmin_v4f32_nonans_noinfs(ptr %x) {
   ret float %red
 }
 
+declare float @llvm.vector.reduce.fmin.v7f32(<7 x float>)
+
+define float @vreduce_fmin_v7f32(ptr %x) {
+; CHECK-LABEL: vreduce_fmin_v7f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 523264
+; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vmv.v.v v12, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v12, v10, 7
+; CHECK-NEXT:    vfredmin.vs v8, v12, v8
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call float @llvm.vector.reduce.fmin.v7f32(<7 x float> %v)
+  ret float %red
+}
+
 declare float @llvm.vector.reduce.fmin.v128f32(<128 x float>)
 
 define float @vreduce_fmin_v128f32(ptr %x) {
@@ -1480,6 +1559,26 @@ define float @vreduce_fmax_v4f32_nonans_noinfs(ptr %x) {
   ret float %red
 }
 
+declare float @llvm.vector.reduce.fmax.v7f32(<7 x float>)
+
+define float @vreduce_fmax_v7f32(ptr %x) {
+; CHECK-LABEL: vreduce_fmax_v7f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 1047552
+; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vmv.v.v v12, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v12, v10, 7
+; CHECK-NEXT:    vfredmax.vs v8, v12, v8
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call float @llvm.vector.reduce.fmax.v7f32(<7 x float> %v)
+  ret float %red
+}
+
 declare float @llvm.vector.reduce.fmax.v128f32(<128 x float>)
 
 define float @vreduce_fmax_v128f32(ptr %x) {
@@ -1602,12 +1701,12 @@ define float @vreduce_fminimum_v2f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v9, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v9
-; CHECK-NEXT:    beqz a0, .LBB99_2
+; CHECK-NEXT:    beqz a0, .LBB104_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB99_2:
+; CHECK-NEXT:  .LBB104_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -1638,12 +1737,12 @@ define float @vreduce_fminimum_v4f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v9, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v9
-; CHECK-NEXT:    beqz a0, .LBB101_2
+; CHECK-NEXT:    beqz a0, .LBB106_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB101_2:
+; CHECK-NEXT:  .LBB106_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -1665,6 +1764,52 @@ define float @vreduce_fminimum_v4f32_nonans(ptr %x) {
   ret float %red
 }
 
+declare float @llvm.vector.reduce.fminimum.v7f32(<7 x float>)
+
+define float @vreduce_fminimum_v7f32(ptr %x) {
+; CHECK-LABEL: vreduce_fminimum_v7f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 522240
+; CHECK-NEXT:    vmv.s.x v12, a0
+; CHECK-NEXT:    vmv.v.v v10, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v10, v12, 7
+; CHECK-NEXT:    vmfne.vv v9, v10, v10
+; CHECK-NEXT:    vcpop.m a0, v9
+; CHECK-NEXT:    beqz a0, .LBB108_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    lui a0, 523264
+; CHECK-NEXT:    fmv.w.x fa0, a0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB108_2:
+; CHECK-NEXT:    vfredmin.vs v8, v10, v8
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call float @llvm.vector.reduce.fminimum.v7f32(<7 x float> %v)
+  ret float %red
+}
+
+define float @vreduce_fminimum_v7f32_nonans(ptr %x) {
+; CHECK-LABEL: vreduce_fminimum_v7f32_nonans:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 522240
+; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vmv.v.v v12, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v12, v10, 7
+; CHECK-NEXT:    vfredmin.vs v8, v12, v8
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call nnan float @llvm.vector.reduce.fminimum.v7f32(<7 x float> %v)
+  ret float %red
+}
+
 declare float @llvm.vector.reduce.fminimum.v8f32(<8 x float>)
 
 define float @vreduce_fminimum_v8f32(ptr %x) {
@@ -1674,12 +1819,12 @@ define float @vreduce_fminimum_v8f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v10, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v10
-; CHECK-NEXT:    beqz a0, .LBB103_2
+; CHECK-NEXT:    beqz a0, .LBB110_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB103_2:
+; CHECK-NEXT:  .LBB110_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -1710,12 +1855,12 @@ define float @vreduce_fminimum_v16f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v12, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v12
-; CHECK-NEXT:    beqz a0, .LBB105_2
+; CHECK-NEXT:    beqz a0, .LBB112_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB105_2:
+; CHECK-NEXT:  .LBB112_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -1747,12 +1892,12 @@ define float @vreduce_fminimum_v32f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB107_2
+; CHECK-NEXT:    beqz a0, .LBB114_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB107_2:
+; CHECK-NEXT:  .LBB114_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -1802,15 +1947,15 @@ define float @vreduce_fminimum_v64f32(ptr %x) {
 ; CHECK-NEXT:    vfmin.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB109_2
+; CHECK-NEXT:    beqz a0, .LBB116_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
-; CHECK-NEXT:    j .LBB109_3
-; CHECK-NEXT:  .LBB109_2:
+; CHECK-NEXT:    j .LBB116_3
+; CHECK-NEXT:  .LBB116_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB109_3:
+; CHECK-NEXT:  .LBB116_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    add sp, sp, a0
@@ -1924,15 +2069,15 @@ define float @vreduce_fminimum_v128f32(ptr %x) {
 ; CHECK-NEXT:    vfmin.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB111_2
+; CHECK-NEXT:    beqz a0, .LBB118_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
-; CHECK-NEXT:    j .LBB111_3
-; CHECK-NEXT:  .LBB111_2:
+; CHECK-NEXT:    j .LBB118_3
+; CHECK-NEXT:  .LBB118_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB111_3:
+; CHECK-NEXT:  .LBB118_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a1, a0
@@ -1978,12 +2123,12 @@ define double @vreduce_fminimum_v2f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v9, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v9
-; CHECK-NEXT:    beqz a0, .LBB113_2
+; CHECK-NEXT:    beqz a0, .LBB120_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI113_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI113_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI120_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI120_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB113_2:
+; CHECK-NEXT:  .LBB120_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2014,12 +2159,12 @@ define double @vreduce_fminimum_v4f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v10, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v10
-; CHECK-NEXT:    beqz a0, .LBB115_2
+; CHECK-NEXT:    beqz a0, .LBB122_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI115_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI115_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI122_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI122_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB115_2:
+; CHECK-NEXT:  .LBB122_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2050,12 +2195,12 @@ define double @vreduce_fminimum_v8f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v12, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v12
-; CHECK-NEXT:    beqz a0, .LBB117_2
+; CHECK-NEXT:    beqz a0, .LBB124_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI117_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI117_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI124_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI124_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB117_2:
+; CHECK-NEXT:  .LBB124_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2086,12 +2231,12 @@ define double @vreduce_fminimum_v16f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB119_2
+; CHECK-NEXT:    beqz a0, .LBB126_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI119_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI119_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI126_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI126_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB119_2:
+; CHECK-NEXT:  .LBB126_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2139,15 +2284,15 @@ define double @vreduce_fminimum_v32f64(ptr %x) {
 ; CHECK-NEXT:    vfmin.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB121_2
+; CHECK-NEXT:    beqz a0, .LBB128_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI121_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI121_0)(a0)
-; CHECK-NEXT:    j .LBB121_3
-; CHECK-NEXT:  .LBB121_2:
+; CHECK-NEXT:    lui a0, %hi(.LCPI128_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI128_0)(a0)
+; CHECK-NEXT:    j .LBB128_3
+; CHECK-NEXT:  .LBB128_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB121_3:
+; CHECK-NEXT:  .LBB128_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    add sp, sp, a0
@@ -2259,15 +2404,15 @@ define double @vreduce_fminimum_v64f64(ptr %x) {
 ; CHECK-NEXT:    vfmin.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB123_2
+; CHECK-NEXT:    beqz a0, .LBB130_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI123_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI123_0)(a0)
-; CHECK-NEXT:    j .LBB123_3
-; CHECK-NEXT:  .LBB123_2:
+; CHECK-NEXT:    lui a0, %hi(.LCPI130_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI130_0)(a0)
+; CHECK-NEXT:    j .LBB130_3
+; CHECK-NEXT:  .LBB130_2:
 ; CHECK-NEXT:    vfredmin.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB123_3:
+; CHECK-NEXT:  .LBB130_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a1, a0
@@ -2312,12 +2457,12 @@ define float @vreduce_fmaximum_v2f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v9, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v9
-; CHECK-NEXT:    beqz a0, .LBB125_2
+; CHECK-NEXT:    beqz a0, .LBB132_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB125_2:
+; CHECK-NEXT:  .LBB132_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2348,12 +2493,12 @@ define float @vreduce_fmaximum_v4f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v9, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v9
-; CHECK-NEXT:    beqz a0, .LBB127_2
+; CHECK-NEXT:    beqz a0, .LBB134_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB127_2:
+; CHECK-NEXT:  .LBB134_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2375,6 +2520,52 @@ define float @vreduce_fmaximum_v4f32_nonans(ptr %x) {
   ret float %red
 }
 
+declare float @llvm.vector.reduce.fmaximum.v7f32(<7 x float>)
+
+define float @vreduce_fmaximum_v7f32(ptr %x) {
+; CHECK-LABEL: vreduce_fmaximum_v7f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 1046528
+; CHECK-NEXT:    vmv.s.x v12, a0
+; CHECK-NEXT:    vmv.v.v v10, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v10, v12, 7
+; CHECK-NEXT:    vmfne.vv v9, v10, v10
+; CHECK-NEXT:    vcpop.m a0, v9
+; CHECK-NEXT:    beqz a0, .LBB136_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    lui a0, 523264
+; CHECK-NEXT:    fmv.w.x fa0, a0
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB136_2:
+; CHECK-NEXT:    vfredmax.vs v8, v10, v8
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call float @llvm.vector.reduce.fmaximum.v7f32(<7 x float> %v)
+  ret float %red
+}
+
+define float @vreduce_fmaximum_v7f32_nonans(ptr %x) {
+; CHECK-LABEL: vreduce_fmaximum_v7f32_nonans:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 7, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    lui a0, 1046528
+; CHECK-NEXT:    vmv.s.x v10, a0
+; CHECK-NEXT:    vmv.v.v v12, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vslideup.vi v12, v10, 7
+; CHECK-NEXT:    vfredmax.vs v8, v12, v8
+; CHECK-NEXT:    vfmv.f.s fa0, v8
+; CHECK-NEXT:    ret
+  %v = load <7 x float>, ptr %x
+  %red = call nnan float @llvm.vector.reduce.fmaximum.v7f32(<7 x float> %v)
+  ret float %red
+}
+
 declare float @llvm.vector.reduce.fmaximum.v8f32(<8 x float>)
 
 define float @vreduce_fmaximum_v8f32(ptr %x) {
@@ -2384,12 +2575,12 @@ define float @vreduce_fmaximum_v8f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v10, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v10
-; CHECK-NEXT:    beqz a0, .LBB129_2
+; CHECK-NEXT:    beqz a0, .LBB138_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB129_2:
+; CHECK-NEXT:  .LBB138_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2420,12 +2611,12 @@ define float @vreduce_fmaximum_v16f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v12, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v12
-; CHECK-NEXT:    beqz a0, .LBB131_2
+; CHECK-NEXT:    beqz a0, .LBB140_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB131_2:
+; CHECK-NEXT:  .LBB140_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2457,12 +2648,12 @@ define float @vreduce_fmaximum_v32f32(ptr %x) {
 ; CHECK-NEXT:    vle32.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB133_2
+; CHECK-NEXT:    beqz a0, .LBB142_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB133_2:
+; CHECK-NEXT:  .LBB142_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2512,15 +2703,15 @@ define float @vreduce_fmaximum_v64f32(ptr %x) {
 ; CHECK-NEXT:    vfmax.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB135_2
+; CHECK-NEXT:    beqz a0, .LBB144_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
-; CHECK-NEXT:    j .LBB135_3
-; CHECK-NEXT:  .LBB135_2:
+; CHECK-NEXT:    j .LBB144_3
+; CHECK-NEXT:  .LBB144_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB135_3:
+; CHECK-NEXT:  .LBB144_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    add sp, sp, a0
@@ -2634,15 +2825,15 @@ define float @vreduce_fmaximum_v128f32(ptr %x) {
 ; CHECK-NEXT:    vfmax.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB137_2
+; CHECK-NEXT:    beqz a0, .LBB146_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    lui a0, 523264
 ; CHECK-NEXT:    fmv.w.x fa0, a0
-; CHECK-NEXT:    j .LBB137_3
-; CHECK-NEXT:  .LBB137_2:
+; CHECK-NEXT:    j .LBB146_3
+; CHECK-NEXT:  .LBB146_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB137_3:
+; CHECK-NEXT:  .LBB146_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a1, a0
@@ -2688,12 +2879,12 @@ define double @vreduce_fmaximum_v2f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v9, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v9
-; CHECK-NEXT:    beqz a0, .LBB139_2
+; CHECK-NEXT:    beqz a0, .LBB148_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI139_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI139_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI148_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI148_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB139_2:
+; CHECK-NEXT:  .LBB148_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2724,12 +2915,12 @@ define double @vreduce_fmaximum_v4f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v10, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v10
-; CHECK-NEXT:    beqz a0, .LBB141_2
+; CHECK-NEXT:    beqz a0, .LBB150_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI141_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI141_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI150_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI150_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB141_2:
+; CHECK-NEXT:  .LBB150_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2760,12 +2951,12 @@ define double @vreduce_fmaximum_v8f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v12, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v12
-; CHECK-NEXT:    beqz a0, .LBB143_2
+; CHECK-NEXT:    beqz a0, .LBB152_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI143_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI143_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI152_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI152_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB143_2:
+; CHECK-NEXT:  .LBB152_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2796,12 +2987,12 @@ define double @vreduce_fmaximum_v16f64(ptr %x) {
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB145_2
+; CHECK-NEXT:    beqz a0, .LBB154_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI145_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI145_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI154_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI154_0)(a0)
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB145_2:
+; CHECK-NEXT:  .LBB154_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
 ; CHECK-NEXT:    ret
@@ -2849,15 +3040,15 @@ define double @vreduce_fmaximum_v32f64(ptr %x) {
 ; CHECK-NEXT:    vfmax.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB147_2
+; CHECK-NEXT:    beqz a0, .LBB156_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI147_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI147_0)(a0)
-; CHECK-NEXT:    j .LBB147_3
-; CHECK-NEXT:  .LBB147_2:
+; CHECK-NEXT:    lui a0, %hi(.LCPI156_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI156_0)(a0)
+; CHECK-NEXT:    j .LBB156_3
+; CHECK-NEXT:  .LBB156_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB147_3:
+; CHECK-NEXT:  .LBB156_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    add sp, sp, a0
@@ -2969,15 +3160,15 @@ define double @vreduce_fmaximum_v64f64(ptr %x) {
 ; CHECK-NEXT:    vfmax.vv v8, v8, v16
 ; CHECK-NEXT:    vmfne.vv v16, v8, v8
 ; CHECK-NEXT:    vcpop.m a0, v16
-; CHECK-NEXT:    beqz a0, .LBB149_2
+; CHECK-NEXT:    beqz a0, .LBB158_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lui a0, %hi(.LCPI149_0)
-; CHECK-NEXT:    fld fa0, %lo(.LCPI149_0)(a0)
-; CHECK-NEXT:    j .LBB149_3
-; CHECK-NEXT:  .LBB149_2:
+; CHECK-NEXT:    lui a0, %hi(.LCPI158_0)
+; CHECK-NEXT:    fld fa0, %lo(.LCPI158_0)(a0)
+; CHECK-NEXT:    j .LBB158_3
+; CHECK-NEXT:  .LBB158_2:
 ; CHECK-NEXT:    vfredmax.vs v8, v8, v8
 ; CHECK-NEXT:    vfmv.f.s fa0, v8
-; CHECK-NEXT:  .LBB149_3:
+; CHECK-NEXT:  .LBB158_3:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    slli a0, a0, 3
 ; CHECK-NEXT:    mv a1, a0

>From c0d222219a8d01d3945100114256d26cfe833a1c Mon Sep 17 00:00:00 2001
From: Andy Kaylor <andrew.kaylor at intel.com>
Date: Wed, 21 Aug 2024 08:10:26 -0700
Subject: [PATCH 014/116] Fix bug with -ffp-contract=fast-honor-pragmas
 (#104857)

This fixes a problem which caused clang to assert in the Sema pragma
handling if it encountered "#pragma STDC FP_CONTRACT DEFAULT" when
compiling with the -ffp-contract=fast-honor-pragmas option.

This fixes https://github.com/llvm/llvm-project/issues/104830
---
 clang/docs/ReleaseNotes.rst                   |   3 +
 clang/lib/Sema/SemaAttr.cpp                   |   3 +-
 .../ffp-contract-fast-honor-pramga-option.cpp |  37 +++++
 .../ffp-contract-fhp-pragma-override.cpp      | 151 ++++++++++++++++++
 4 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 clang/test/CodeGen/ffp-contract-fast-honor-pramga-option.cpp
 create mode 100644 clang/test/CodeGen/ffp-contract-fhp-pragma-override.cpp

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 8f98167dff31ef..5c156a9c073a9c 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -310,6 +310,9 @@ Miscellaneous Clang Crashes Fixed
 - Fixed a crash caused by long chains of ``sizeof`` and other similar operators
   that can be followed by a non-parenthesized expression. (#GH45061)
 
+- Fixed an crash when compiling ``#pragma STDC FP_CONTRACT DEFAULT`` with
+  ``-ffp-contract=fast-honor-pragmas``. (#GH104830)
+
 - Fixed a crash when function has more than 65536 parameters.
   Now a diagnostic is emitted. (#GH35741)
 
diff --git a/clang/lib/Sema/SemaAttr.cpp b/clang/lib/Sema/SemaAttr.cpp
index b0c239678d0b01..a1724820472b59 100644
--- a/clang/lib/Sema/SemaAttr.cpp
+++ b/clang/lib/Sema/SemaAttr.cpp
@@ -1269,13 +1269,12 @@ void Sema::ActOnPragmaFPContract(SourceLocation Loc,
     NewFPFeatures.setAllowFPContractWithinStatement();
     break;
   case LangOptions::FPM_Fast:
+  case LangOptions::FPM_FastHonorPragmas:
     NewFPFeatures.setAllowFPContractAcrossStatement();
     break;
   case LangOptions::FPM_Off:
     NewFPFeatures.setDisallowFPContract();
     break;
-  case LangOptions::FPM_FastHonorPragmas:
-    llvm_unreachable("Should not happen");
   }
   FpPragmaStack.Act(Loc, Sema::PSK_Set, StringRef(), NewFPFeatures);
   CurFPFeatures = NewFPFeatures.applyOverrides(getLangOpts());
diff --git a/clang/test/CodeGen/ffp-contract-fast-honor-pramga-option.cpp b/clang/test/CodeGen/ffp-contract-fast-honor-pramga-option.cpp
new file mode 100644
index 00000000000000..fef4da1edf1fc9
--- /dev/null
+++ b/clang/test/CodeGen/ffp-contract-fast-honor-pramga-option.cpp
@@ -0,0 +1,37 @@
+// RUN: %clang_cc1 -O3 -ffp-contract=fast-honor-pragmas -triple %itanium_abi_triple -emit-llvm -o - %s | FileCheck %s
+
+float fp_contract_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_1fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  return a * b + c;
+}
+
+float fp_contract_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_2fff(
+  // CHECK: fmul contract float
+  // CHECK: fsub contract float
+  return a * b - c;
+}
+
+void fp_contract_3(float *a, float b, float c) {
+  // CHECK-LABEL: fp_contract_3Pfff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  a[0] += b * c;
+}
+
+void fp_contract_4(float *a, float b, float c) {
+  // CHECK-LABEL: fp_contract_4Pfff(
+  // CHECK: fmul contract float
+  // CHECK: fsub contract float
+  a[0] -= b * c;
+}
+
+float fp_contract_5(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_5fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  float t = a * b;
+  return t + c;
+}
diff --git a/clang/test/CodeGen/ffp-contract-fhp-pragma-override.cpp b/clang/test/CodeGen/ffp-contract-fhp-pragma-override.cpp
new file mode 100644
index 00000000000000..ff35c9204c79cd
--- /dev/null
+++ b/clang/test/CodeGen/ffp-contract-fhp-pragma-override.cpp
@@ -0,0 +1,151 @@
+// RUN: %clang_cc1 -O3 -ffp-contract=fast-honor-pragmas -triple %itanium_abi_triple -emit-llvm -o - %s | FileCheck %s
+
+float fp_contract_on_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_on_1fff(
+  // CHECK: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+  #pragma STDC FP_CONTRACT ON
+  return a * b + c;
+}
+
+float fp_contract_on_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_on_2fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  #pragma STDC FP_CONTRACT ON
+  float t = a * b;
+  return t + c;
+}
+
+float fp_contract_off_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_off_1fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  #pragma STDC FP_CONTRACT OFF
+  return a * b + c;
+}
+
+float fp_contract_off_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_off_2fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  #pragma STDC FP_CONTRACT OFF
+  float t = a * b;
+  return t + c;
+}
+
+float fp_contract_default_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_default_1fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  #pragma STDC FP_CONTRACT DEFAULT
+  return a * b + c;
+}
+
+float fp_contract_default_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_default_2fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  #pragma STDC FP_CONTRACT DEFAULT
+  float t = a * b;
+  return t + c;
+}
+
+float fp_contract_clang_on_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_clang_on_1fff(
+  // CHECK: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+  #pragma clang fp contract(on)
+  return a * b + c;
+}
+
+float fp_contract_clang_on_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_clang_on_2fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  #pragma clang fp contract(on)
+  float t = a * b;
+  return t + c;
+}
+
+float fp_contract_clang_off_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_clang_off_1fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  #pragma clang fp contract(off)
+  return a * b + c;
+}
+
+float fp_contract_clang_off_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_clang_off_2fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  #pragma clang fp contract(off)
+  float t = a * b;
+  return t + c;
+}
+
+float fp_contract_clang_fast_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_clang_fast_1fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  #pragma clang fp contract(fast)
+  return a * b + c;
+}
+
+float fp_contract_clang_fast_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_clang_fast_2fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  #pragma clang fp contract(fast)
+  float t = a * b;
+  return t + c;
+}
+
+#pragma STDC FP_CONTRACT ON
+
+float fp_contract_global_on_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_global_on_1fff(
+  // CHECK: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
+  return a * b + c;
+}
+
+float fp_contract_global_on_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_global_on_2fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  float t = a * b;
+  return t + c;
+}
+
+#pragma STDC FP_CONTRACT OFF
+
+float fp_contract_global_off_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_global_off_1fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  return a * b + c;
+}
+
+float fp_contract_global_off_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_global_off_2fff(
+  // CHECK: fmul float
+  // CHECK: fadd float
+  float t = a * b;
+  return t + c;
+}
+
+#pragma STDC FP_CONTRACT DEFAULT
+
+float fp_contract_global_default_1(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_global_default_1fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  return a * b + c;
+}
+
+float fp_contract_global_default_2(float a, float b, float c) {
+  // CHECK-LABEL: fp_contract_global_default_2fff(
+  // CHECK: fmul contract float
+  // CHECK: fadd contract float
+  float t = a * b;
+  return t + c;
+}

>From 278fc8efdf004a1959a31bb4c208df5ee733d5c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Pettersson?= <bjorn.a.pettersson at ericsson.com>
Date: Wed, 21 Aug 2024 17:56:27 +0200
Subject: [PATCH 015/116] [DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation
 bug in visitFREEZE (#104924)

In visitFREEZE we have been collecting a set/vector of
MaybePoisonOperands that later was iterated over, applying a freeze to
those operands. However, C-level fuzzy testing has discovered that the
recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
the MaybePoisonOperands vector to be replaced when replacing an earlier
operand. That would then turn up as
   Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
              "Operand is DELETED_NODE!"' failed.
failures when trying to freeze those later operands.

So we need to make sure that the vector with MaybePoisonOperands is
mutated as well when needed. Or as the solution used in this patch, make
sure to keep track of operand numbers that should be frozen instead of
having a vector of SDValues. And then we can refetch the operands while
iterating over operand numbers.

The problem was seen after adding SELECT_CC to the set of operations
including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
guess that this could happen for other operations as well for which we
allow multiple maybe poison operands.
---
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 22 ++++++++++---
 .../CodeGen/AArch64/dag-combine-freeze.ll     | 31 +++++++++++++++++++
 2 files changed, 49 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/dag-combine-freeze.ll

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 4180dcc8a720d5..c9ab7e7a66079c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15808,13 +15808,16 @@ SDValue DAGCombiner::visitFREEZE(SDNode *N) {
     }
   }
 
-  SmallSetVector<SDValue, 8> MaybePoisonOperands;
-  for (SDValue Op : N0->ops()) {
+  SmallSet<SDValue, 8> MaybePoisonOperands;
+  SmallVector<unsigned, 8> MaybePoisonOperandNumbers;
+  for (auto [OpNo, Op] : enumerate(N0->ops())) {
     if (DAG.isGuaranteedNotToBeUndefOrPoison(Op, /*PoisonOnly*/ false,
                                              /*Depth*/ 1))
       continue;
     bool HadMaybePoisonOperands = !MaybePoisonOperands.empty();
-    bool IsNewMaybePoisonOperand = MaybePoisonOperands.insert(Op);
+    bool IsNewMaybePoisonOperand = MaybePoisonOperands.insert(Op).second;
+    if (IsNewMaybePoisonOperand)
+      MaybePoisonOperandNumbers.push_back(OpNo);
     if (!HadMaybePoisonOperands)
       continue;
     if (IsNewMaybePoisonOperand && !AllowMultipleMaybePoisonOperands) {
@@ -15826,7 +15829,18 @@ SDValue DAGCombiner::visitFREEZE(SDNode *N) {
   // it could create undef or poison due to it's poison-generating flags.
   // So not finding any maybe-poison operands is fine.
 
-  for (SDValue MaybePoisonOperand : MaybePoisonOperands) {
+  for (unsigned OpNo : MaybePoisonOperandNumbers) {
+    // N0 can mutate during iteration, so make sure to refetch the maybe poison
+    // operands via the operand numbers. The typical scenario is that we have
+    // something like this
+    //   t262: i32 = freeze t181
+    //   t150: i32 = ctlz_zero_undef t262
+    //   t184: i32 = ctlz_zero_undef t181
+    //   t268: i32 = select_cc t181, Constant:i32<0>, t184, t186, setne:ch
+    // When freezing the t181 operand we get t262 back, and then the
+    // ReplaceAllUsesOfValueWith call will not only replace t181 by t262, but
+    // also recursively replace t184 by t150.
+    SDValue MaybePoisonOperand = N->getOperand(0).getOperand(OpNo);
     // Don't replace every single UNDEF everywhere with frozen UNDEF, though.
     if (MaybePoisonOperand.getOpcode() == ISD::UNDEF)
       continue;
diff --git a/llvm/test/CodeGen/AArch64/dag-combine-freeze.ll b/llvm/test/CodeGen/AArch64/dag-combine-freeze.ll
new file mode 100644
index 00000000000000..4f0c3d0ce18006
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/dag-combine-freeze.ll
@@ -0,0 +1,31 @@
+; RUN: llc -mtriple aarch64 -o /dev/null %s
+
+; This used to fail with:
+;    Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
+;               "Operand is DELETED_NODE!"' failed.
+; Just make sure we do not crash here.
+define void @test_fold_freeze_over_select_cc(i15 %a, ptr %p1, ptr %p2) {
+entry:
+  %a2 = add nsw i15 %a, 1
+  %sext = sext i15 %a2 to i32
+  %ashr = ashr i32 %sext, 31
+  %lshr = lshr i32 %ashr, 7
+  ; Setup an already frozen input to ctlz.
+  %freeze = freeze i32 %lshr
+  %ctlz = call i32 @llvm.ctlz.i32(i32 %freeze, i1 true)
+  store i32 %ctlz, ptr %p1, align 1
+  ; Here is another ctlz, which is used by a frozen select.
+  ; DAGCombiner::visitFREEZE will to try to fold the freeze over a SELECT_CC,
+  ; and when dealing with the condition operand the other SELECT_CC operands
+  ; will be replaced/simplified as well. So the SELECT_CC is mutated while
+  ; freezing the "maybe poison operands". This needs to be handled by
+  ; DAGCombiner::visitFREEZE, as it can't store the list of SDValues that
+  ; should be frozen in a separate data structure that isn't updated when the
+  ; SELECT_CC is mutated.
+  %ctlz1 = call i32 @llvm.ctlz.i32(i32 %lshr, i1 true)
+  %icmp = icmp ne i32 %lshr, 0
+  %select = select i1 %icmp, i32 %ctlz1, i32 0
+  %freeze1 = freeze i32 %select
+  store i32 %freeze1, ptr %p2, align 1
+  ret void
+}

>From 6fd46089c9fbd5b22bb67ac3d6196fe70ba684c6 Mon Sep 17 00:00:00 2001
From: Abid Qadeer <haqadeer at amd.com>
Date: Wed, 21 Aug 2024 16:57:08 +0100
Subject: [PATCH 016/116] [flang][debug] Allow non default array lower bounds.
 (#104467)

As mentioned in #98877, we currently always use 1 as lower bound for
fixed size arrays. This PR removes this restriction. It passes along
`DeclareOp` to type conversion functions and uses the shift information
(if present) to get the lower bound value. This was suggested by
@jeanPerier in
https://github.com/llvm/llvm-project/pull/96746#issuecomment-2195164553

This PR also adds a small cleanup that type conversion functions don't
take Location now. It was initially added so that location of derived
types can be passed. But that information can be extracted from typeInfo
objects and we don't need to pass it along.

This PR will handle the problem for local and global variable. We may
need a bit more work for derived type once the support for derived types
lands.

Fixes #98877.
---
 .../lib/Optimizer/Transforms/AddDebugInfo.cpp | 23 ++++----
 .../Transforms/DebugTypeGenerator.cpp         | 54 ++++++++++---------
 .../Optimizer/Transforms/DebugTypeGenerator.h | 26 ++++-----
 .../Integration/debug-fixed-array-type-2.f90  | 38 ++++++++-----
 .../Transforms/debug-fixed-array-type.fir     |  7 +++
 5 files changed, 87 insertions(+), 61 deletions(-)

diff --git a/flang/lib/Optimizer/Transforms/AddDebugInfo.cpp b/flang/lib/Optimizer/Transforms/AddDebugInfo.cpp
index 3c067bf946cfc9..30fc4185575e61 100644
--- a/flang/lib/Optimizer/Transforms/AddDebugInfo.cpp
+++ b/flang/lib/Optimizer/Transforms/AddDebugInfo.cpp
@@ -65,7 +65,8 @@ class AddDebugInfoPass : public fir::impl::AddDebugInfoBase<AddDebugInfoPass> {
 
   void handleGlobalOp(fir::GlobalOp glocalOp, mlir::LLVM::DIFileAttr fileAttr,
                       mlir::LLVM::DIScopeAttr scope,
-                      mlir::SymbolTable *symbolTable);
+                      mlir::SymbolTable *symbolTable,
+                      fir::cg::XDeclareOp declOp);
   void handleFuncOp(mlir::func::FuncOp funcOp, mlir::LLVM::DIFileAttr fileAttr,
                     mlir::LLVM::DICompileUnitAttr cuAttr,
                     mlir::SymbolTable *symbolTable);
@@ -100,10 +101,9 @@ void AddDebugInfoPass::handleDeclareOp(fir::cg::XDeclareOp declOp,
 
   if (result.first != fir::NameUniquer::NameKind::VARIABLE)
     return;
-
   // If this DeclareOp actually represents a global then treat it as such.
   if (auto global = symbolTable->lookup<fir::GlobalOp>(declOp.getUniqName())) {
-    handleGlobalOp(global, fileAttr, scopeAttr, symbolTable);
+    handleGlobalOp(global, fileAttr, scopeAttr, symbolTable, declOp);
     return;
   }
 
@@ -127,7 +127,7 @@ void AddDebugInfoPass::handleDeclareOp(fir::cg::XDeclareOp declOp,
   }
 
   auto tyAttr = typeGen.convertType(fir::unwrapRefType(declOp.getType()),
-                                    fileAttr, scopeAttr, declOp.getLoc());
+                                    fileAttr, scopeAttr, declOp);
 
   auto localVarAttr = mlir::LLVM::DILocalVariableAttr::get(
       context, scopeAttr, mlir::StringAttr::get(context, result.second.name),
@@ -160,7 +160,8 @@ mlir::LLVM::DIModuleAttr AddDebugInfoPass::getOrCreateModuleAttr(
 void AddDebugInfoPass::handleGlobalOp(fir::GlobalOp globalOp,
                                       mlir::LLVM::DIFileAttr fileAttr,
                                       mlir::LLVM::DIScopeAttr scope,
-                                      mlir::SymbolTable *symbolTable) {
+                                      mlir::SymbolTable *symbolTable,
+                                      fir::cg::XDeclareOp declOp) {
   if (debugInfoIsAlreadySet(globalOp.getLoc()))
     return;
   mlir::ModuleOp module = getOperation();
@@ -200,8 +201,8 @@ void AddDebugInfoPass::handleGlobalOp(fir::GlobalOp globalOp,
     scope = getOrCreateModuleAttr(result.second.modules[0], fileAttr, scope,
                                   line - 1, !globalOp.isInitialized());
   }
-  mlir::LLVM::DITypeAttr diType = typeGen.convertType(
-      globalOp.getType(), fileAttr, scope, globalOp.getLoc());
+  mlir::LLVM::DITypeAttr diType =
+      typeGen.convertType(globalOp.getType(), fileAttr, scope, declOp);
   auto gvAttr = mlir::LLVM::DIGlobalVariableAttr::get(
       context, scope, mlir::StringAttr::get(context, result.second.name),
       mlir::StringAttr::get(context, globalOp.getName()), fileAttr, line,
@@ -246,12 +247,13 @@ void AddDebugInfoPass::handleFuncOp(mlir::func::FuncOp funcOp,
   llvm::SmallVector<mlir::LLVM::DITypeAttr> types;
   fir::DebugTypeGenerator typeGen(module);
   for (auto resTy : funcOp.getResultTypes()) {
-    auto tyAttr = typeGen.convertType(resTy, fileAttr, cuAttr, funcOp.getLoc());
+    auto tyAttr =
+        typeGen.convertType(resTy, fileAttr, cuAttr, /*declOp=*/nullptr);
     types.push_back(tyAttr);
   }
   for (auto inTy : funcOp.getArgumentTypes()) {
     auto tyAttr = typeGen.convertType(fir::unwrapRefType(inTy), fileAttr,
-                                      cuAttr, funcOp.getLoc());
+                                      cuAttr, /*declOp=*/nullptr);
     types.push_back(tyAttr);
   }
 
@@ -358,7 +360,8 @@ void AddDebugInfoPass::runOnOperation() {
   if (debugLevel == mlir::LLVM::DIEmissionKind::Full) {
     // Process 'GlobalOp' only if full debug info is requested.
     for (auto globalOp : module.getOps<fir::GlobalOp>())
-      handleGlobalOp(globalOp, fileAttr, cuAttr, &symbolTable);
+      handleGlobalOp(globalOp, fileAttr, cuAttr, &symbolTable,
+                     /*declOp=*/nullptr);
   }
 }
 
diff --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
index db559731552df2..860c16c9a13ce9 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
@@ -83,8 +83,8 @@ static mlir::LLVM::DITypeAttr genPlaceholderType(mlir::MLIRContext *context) {
 
 mlir::LLVM::DITypeAttr DebugTypeGenerator::convertBoxedSequenceType(
     fir::SequenceType seqTy, mlir::LLVM::DIFileAttr fileAttr,
-    mlir::LLVM::DIScopeAttr scope, mlir::Location loc, bool genAllocated,
-    bool genAssociated) {
+    mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp,
+    bool genAllocated, bool genAssociated) {
 
   mlir::MLIRContext *context = module.getContext();
   // FIXME: Assumed rank arrays not supported yet
@@ -114,7 +114,7 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertBoxedSequenceType(
 
   llvm::SmallVector<mlir::LLVM::DINodeAttr> elements;
   mlir::LLVM::DITypeAttr elemTy =
-      convertType(seqTy.getEleTy(), fileAttr, scope, loc);
+      convertType(seqTy.getEleTy(), fileAttr, scope, declOp);
   unsigned offset = dimsOffset;
   const unsigned indexSize = dimsSize / 3;
   for ([[maybe_unused]] auto _ : seqTy.getShape()) {
@@ -156,13 +156,14 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertBoxedSequenceType(
 
 mlir::LLVM::DITypeAttr DebugTypeGenerator::convertSequenceType(
     fir::SequenceType seqTy, mlir::LLVM::DIFileAttr fileAttr,
-    mlir::LLVM::DIScopeAttr scope, mlir::Location loc) {
+    mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp) {
   mlir::MLIRContext *context = module.getContext();
 
   llvm::SmallVector<mlir::LLVM::DINodeAttr> elements;
   mlir::LLVM::DITypeAttr elemTy =
-      convertType(seqTy.getEleTy(), fileAttr, scope, loc);
+      convertType(seqTy.getEleTy(), fileAttr, scope, declOp);
 
+  unsigned index = 0;
   for (fir::SequenceType::Extent dim : seqTy.getShape()) {
     if (dim == seqTy.getUnknownExtent()) {
       // FIXME: This path is taken for assumed size arrays but also for arrays
@@ -174,20 +175,20 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertSequenceType(
       elements.push_back(subrangeTy);
     } else {
       auto intTy = mlir::IntegerType::get(context, 64);
-      // FIXME: Only supporting lower bound of 1 at the moment. The
-      // 'SequenceType' has information about the shape but not the shift. In
-      // cases where the conversion originated during the processing of
-      // 'DeclareOp', it may be possible to pass on this information. But the
-      // type conversion should ideally be based on what information present in
-      // the type class so that it works from everywhere (e.g. when it is part
-      // of a module or a derived type.)
+      int64_t shift = 1;
+      if (declOp && declOp.getShift().size() > index) {
+        if (std::optional<std::int64_t> optint =
+                getIntIfConstant(declOp.getShift()[index]))
+          shift = *optint;
+      }
       auto countAttr = mlir::IntegerAttr::get(intTy, llvm::APInt(64, dim));
-      auto lowerAttr = mlir::IntegerAttr::get(intTy, llvm::APInt(64, 1));
+      auto lowerAttr = mlir::IntegerAttr::get(intTy, llvm::APInt(64, shift));
       auto subrangeTy = mlir::LLVM::DISubrangeAttr::get(
           context, countAttr, lowerAttr, /*upperBound=*/nullptr,
           /*stride=*/nullptr);
       elements.push_back(subrangeTy);
     }
+    ++index;
   }
   // Apart from arrays, the `DICompositeTypeAttr` is used for other things like
   // structure types. Many of its fields which are not applicable to arrays
@@ -203,7 +204,8 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertSequenceType(
 
 mlir::LLVM::DITypeAttr DebugTypeGenerator::convertCharacterType(
     fir::CharacterType charTy, mlir::LLVM::DIFileAttr fileAttr,
-    mlir::LLVM::DIScopeAttr scope, mlir::Location loc, bool hasDescriptor) {
+    mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp,
+    bool hasDescriptor) {
   mlir::MLIRContext *context = module.getContext();
 
   // DWARF 5 says the following about the character encoding in 5.1.1.2.
@@ -250,21 +252,21 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertCharacterType(
 
 mlir::LLVM::DITypeAttr DebugTypeGenerator::convertPointerLikeType(
     mlir::Type elTy, mlir::LLVM::DIFileAttr fileAttr,
-    mlir::LLVM::DIScopeAttr scope, mlir::Location loc, bool genAllocated,
-    bool genAssociated) {
+    mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp,
+    bool genAllocated, bool genAssociated) {
   mlir::MLIRContext *context = module.getContext();
 
   // Arrays and character need different treatment because DWARF have special
   // constructs for them to get the location from the descriptor. Rest of
   // types are handled like pointer to underlying type.
   if (auto seqTy = mlir::dyn_cast_or_null<fir::SequenceType>(elTy))
-    return convertBoxedSequenceType(seqTy, fileAttr, scope, loc, genAllocated,
-                                    genAssociated);
+    return convertBoxedSequenceType(seqTy, fileAttr, scope, declOp,
+                                    genAllocated, genAssociated);
   if (auto charTy = mlir::dyn_cast_or_null<fir::CharacterType>(elTy))
-    return convertCharacterType(charTy, fileAttr, scope, loc,
+    return convertCharacterType(charTy, fileAttr, scope, declOp,
                                 /*hasDescriptor=*/true);
 
-  mlir::LLVM::DITypeAttr elTyAttr = convertType(elTy, fileAttr, scope, loc);
+  mlir::LLVM::DITypeAttr elTyAttr = convertType(elTy, fileAttr, scope, declOp);
 
   return mlir::LLVM::DIDerivedTypeAttr::get(
       context, llvm::dwarf::DW_TAG_pointer_type,
@@ -276,7 +278,7 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertPointerLikeType(
 mlir::LLVM::DITypeAttr
 DebugTypeGenerator::convertType(mlir::Type Ty, mlir::LLVM::DIFileAttr fileAttr,
                                 mlir::LLVM::DIScopeAttr scope,
-                                mlir::Location loc) {
+                                fir::cg::XDeclareOp declOp) {
   mlir::MLIRContext *context = module.getContext();
   if (Ty.isInteger()) {
     return genBasicType(context, mlir::StringAttr::get(context, "integer"),
@@ -306,22 +308,22 @@ DebugTypeGenerator::convertType(mlir::Type Ty, mlir::LLVM::DIFileAttr fileAttr,
     return genBasicType(context, mlir::StringAttr::get(context, "complex"),
                         bitWidth * 2, llvm::dwarf::DW_ATE_complex_float);
   } else if (auto seqTy = mlir::dyn_cast_or_null<fir::SequenceType>(Ty)) {
-    return convertSequenceType(seqTy, fileAttr, scope, loc);
+    return convertSequenceType(seqTy, fileAttr, scope, declOp);
   } else if (auto charTy = mlir::dyn_cast_or_null<fir::CharacterType>(Ty)) {
-    return convertCharacterType(charTy, fileAttr, scope, loc,
+    return convertCharacterType(charTy, fileAttr, scope, declOp,
                                 /*hasDescriptor=*/false);
   } else if (auto boxTy = mlir::dyn_cast_or_null<fir::BoxType>(Ty)) {
     auto elTy = boxTy.getElementType();
     if (auto seqTy = mlir::dyn_cast_or_null<fir::SequenceType>(elTy))
-      return convertBoxedSequenceType(seqTy, fileAttr, scope, loc, false,
+      return convertBoxedSequenceType(seqTy, fileAttr, scope, declOp, false,
                                       false);
     if (auto heapTy = mlir::dyn_cast_or_null<fir::HeapType>(elTy))
       return convertPointerLikeType(heapTy.getElementType(), fileAttr, scope,
-                                    loc, /*genAllocated=*/true,
+                                    declOp, /*genAllocated=*/true,
                                     /*genAssociated=*/false);
     if (auto ptrTy = mlir::dyn_cast_or_null<fir::PointerType>(elTy))
       return convertPointerLikeType(ptrTy.getElementType(), fileAttr, scope,
-                                    loc, /*genAllocated=*/false,
+                                    declOp, /*genAllocated=*/false,
                                     /*genAssociated=*/true);
     return genPlaceholderType(context);
   } else {
diff --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
index ec881e8be7cadc..5ab6ca5e9f880e 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
@@ -13,6 +13,7 @@
 #ifndef FORTRAN_OPTIMIZER_TRANSFORMS_DEBUGTYPEGENERATOR_H
 #define FORTRAN_OPTIMIZER_TRANSFORMS_DEBUGTYPEGENERATOR_H
 
+#include "flang/Optimizer/CodeGen/CGOps.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Dialect/Support/FIRContext.h"
 #include "flang/Optimizer/Dialect/Support/KindMapping.h"
@@ -28,33 +29,34 @@ class DebugTypeGenerator {
   mlir::LLVM::DITypeAttr convertType(mlir::Type Ty,
                                      mlir::LLVM::DIFileAttr fileAttr,
                                      mlir::LLVM::DIScopeAttr scope,
-                                     mlir::Location loc);
+                                     fir::cg::XDeclareOp declOp);
 
 private:
   mlir::LLVM::DITypeAttr convertSequenceType(fir::SequenceType seqTy,
                                              mlir::LLVM::DIFileAttr fileAttr,
                                              mlir::LLVM::DIScopeAttr scope,
-                                             mlir::Location loc);
+                                             fir::cg::XDeclareOp declOp);
 
   /// The 'genAllocated' is true when we want to generate 'allocated' field
   /// in the DICompositeType. It is needed for the allocatable arrays.
   /// Similarly, 'genAssociated' is used with 'pointer' type to generate
   /// 'associated' field.
-  mlir::LLVM::DITypeAttr
-  convertBoxedSequenceType(fir::SequenceType seqTy,
-                           mlir::LLVM::DIFileAttr fileAttr,
-                           mlir::LLVM::DIScopeAttr scope, mlir::Location loc,
-                           bool genAllocated, bool genAssociated);
+  mlir::LLVM::DITypeAttr convertBoxedSequenceType(
+      fir::SequenceType seqTy, mlir::LLVM::DIFileAttr fileAttr,
+      mlir::LLVM::DIScopeAttr scope, fir::cg::XDeclareOp declOp,
+      bool genAllocated, bool genAssociated);
   mlir::LLVM::DITypeAttr convertCharacterType(fir::CharacterType charTy,
                                               mlir::LLVM::DIFileAttr fileAttr,
                                               mlir::LLVM::DIScopeAttr scope,
-                                              mlir::Location loc,
+                                              fir::cg::XDeclareOp declOp,
                                               bool hasDescriptor);
 
-  mlir::LLVM::DITypeAttr
-  convertPointerLikeType(mlir::Type elTy, mlir::LLVM::DIFileAttr fileAttr,
-                         mlir::LLVM::DIScopeAttr scope, mlir::Location loc,
-                         bool genAllocated, bool genAssociated);
+  mlir::LLVM::DITypeAttr convertPointerLikeType(mlir::Type elTy,
+                                                mlir::LLVM::DIFileAttr fileAttr,
+                                                mlir::LLVM::DIScopeAttr scope,
+                                                fir::cg::XDeclareOp declOp,
+                                                bool genAllocated,
+                                                bool genAssociated);
 
   mlir::ModuleOp module;
   KindMapping kindMapping;
diff --git a/flang/test/Integration/debug-fixed-array-type-2.f90 b/flang/test/Integration/debug-fixed-array-type-2.f90
index b34413458ad8d3..705c1da593c705 100644
--- a/flang/test/Integration/debug-fixed-array-type-2.f90
+++ b/flang/test/Integration/debug-fixed-array-type-2.f90
@@ -1,19 +1,22 @@
 ! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o - | FileCheck %s
 
-program mn
-
+module test
   integer d1(3)
-  integer d2(2, 5)
-  real d3(6, 8, 7)
+  integer d2(1:4, -1:3)
+  real d3(-2:6, 0:5, 3:7)
+end
+
+program mn
+  use test
 
   i8 = fn1(d1, d2, d3)
 contains
   function fn1(a1, b1, c1) result (res)
     integer a1(3)
-    integer b1(2, 5)
-    real c1(6, 8, 7)
+    integer b1(-1:0, 5:9)
+    real c1(-2:6, 0:5, 3:7)
     integer res
-    res = a1(1) + b1(1,2) + c1(3, 3, 4)
+    res = a1(1) + b1(0,6) + c1(3, 3, 4)
   end function
 
 end program
@@ -24,17 +27,26 @@ function fn1(a1, b1, c1) result (res)
 ! CHECK-DAG: ![[SUB1:.*]] = !{![[R1]]}
 ! CHECK-DAG: ![[D1TY:.*]] = !DICompositeType(tag: DW_TAG_array_type, baseType: ![[INT]], elements: ![[SUB1]])
 
-! CHECK-DAG: ![[R21:.*]] = !DISubrange(count: 2, lowerBound: 1)
-! CHECK-DAG: ![[R22:.*]] = !DISubrange(count: 5, lowerBound: 1)
+! CHECK-DAG: ![[R21:.*]] = !DISubrange(count: 4, lowerBound: 1)
+! CHECK-DAG: ![[R22:.*]] = !DISubrange(count: 5, lowerBound: -1)
 ! CHECK-DAG: ![[SUB2:.*]] = !{![[R21]], ![[R22]]}
 ! CHECK-DAG: ![[D2TY:.*]] = !DICompositeType(tag: DW_TAG_array_type, baseType: ![[INT]], elements: ![[SUB2]])
 
-! CHECK-DAG: ![[R31:.*]] = !DISubrange(count: 6, lowerBound: 1)
-! CHECK-DAG: ![[R32:.*]] = !DISubrange(count: 8, lowerBound: 1)
-! CHECK-DAG: ![[R33:.*]] = !DISubrange(count: 7, lowerBound: 1)
+! CHECK-DAG: ![[R31:.*]] = !DISubrange(count: 9, lowerBound: -2)
+! CHECK-DAG: ![[R32:.*]] = !DISubrange(count: 6, lowerBound: 0)
+! CHECK-DAG: ![[R33:.*]] = !DISubrange(count: 5, lowerBound: 3)
 ! CHECK-DAG: ![[SUB3:.*]] = !{![[R31]], ![[R32]], ![[R33]]}
 ! CHECK-DAG: ![[D3TY:.*]] = !DICompositeType(tag: DW_TAG_array_type, baseType: ![[REAL]], elements: ![[SUB3]])
 
+! CHECK-DAG: ![[B11:.*]] = !DISubrange(count: 2, lowerBound: -1)
+! CHECK-DAG: ![[B12:.*]] = !DISubrange(count: 5, lowerBound: 5)
+! CHECK-DAG: ![[B1:.*]] = !{![[B11]], ![[B12]]}
+! CHECK-DAG: ![[B1TY:.*]] = !DICompositeType(tag: DW_TAG_array_type, baseType: ![[INT]], elements: ![[B1]])
+
+! CHECK-DAG: {{.*}}!DIGlobalVariable(name: "d1"{{.*}}type: ![[D1TY]]{{.*}})
+! CHECK-DAG: {{.*}}!DIGlobalVariable(name: "d2"{{.*}}type: ![[D2TY]]{{.*}})
+! CHECK-DAG: {{.*}}!DIGlobalVariable(name: "d3"{{.*}}type: ![[D3TY]]{{.*}})
+
 ! CHECK-DAG: !DILocalVariable(name: "a1", arg: 1{{.*}}type: ![[D1TY]])
-! CHECK-DAG: !DILocalVariable(name: "b1", arg: 2{{.*}}type: ![[D2TY]])
+! CHECK-DAG: !DILocalVariable(name: "b1", arg: 2{{.*}}type: ![[B1TY]])
 ! CHECK-DAG: !DILocalVariable(name: "c1", arg: 3{{.*}}type: ![[D3TY]])
diff --git a/flang/test/Transforms/debug-fixed-array-type.fir b/flang/test/Transforms/debug-fixed-array-type.fir
index d4ed0b97020898..1a7d8115908a07 100644
--- a/flang/test/Transforms/debug-fixed-array-type.fir
+++ b/flang/test/Transforms/debug-fixed-array-type.fir
@@ -8,12 +8,16 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<>} {
     %c5 = arith.constant 5 : index
     %c2 = arith.constant 2 : index
     %c3 = arith.constant 3 : index
+    %c-2 = arith.constant -2 : index loc(#loc3)
+    %c4 = arith.constant 4 : index loc(#loc3)
     %0 = fir.alloca !fir.array<3xi32> {bindc_name = "d1", uniq_name = "_QFEd1"}
     %1 = fircg.ext_declare %0(%c3) {uniq_name = "_QFEd1"} : (!fir.ref<!fir.array<3xi32>>, index) -> !fir.ref<!fir.array<3xi32>> loc(#loc1)
     %2 = fir.address_of(@_QFEd2) : !fir.ref<!fir.array<2x5xi32>>
     %3 = fircg.ext_declare %2(%c2, %c5) {uniq_name = "_QFEd2"} : (!fir.ref<!fir.array<2x5xi32>>, index, index) -> !fir.ref<!fir.array<2x5xi32>> loc(#loc2)
     %4 = fir.address_of(@_QFEd3) : !fir.ref<!fir.array<6x8x7xf32>>
     %5 = fircg.ext_declare %4(%c6, %c8, %c7) {uniq_name = "_QFEd3"} : (!fir.ref<!fir.array<6x8x7xf32>>, index, index, index) -> !fir.ref<!fir.array<6x8x7xf32>> loc(#loc3)
+    %6 = fir.address_of(@_QFEd4) : !fir.ref<!fir.array<6x7xi32>>
+    %7 = fircg.ext_declare %6(%c6, %c7) origin %c-2, %c4 {uniq_name = "_QFEd4"} : (!fir.ref<!fir.array<6x7xi32>>, index, index, index, index) -> !fir.ref<!fir.array<6x7xi32>> loc(#loc5)
     return
   } loc(#loc4)
 }
@@ -22,6 +26,7 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<>} {
 #loc2 = loc("test.f90":6:11)
 #loc3 = loc("test.f90":7:11)
 #loc4 = loc("test.f90":2:8)
+#loc5 = loc("test.f90":8:11)
 
 
 // CHECK-DAG: #[[INT:.*]] = #llvm.di_basic_type<tag = DW_TAG_base_type, name = "integer", sizeInBits = 32, encoding = DW_ATE_signed>
@@ -29,6 +34,8 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<>} {
 // CHECK-DAG: #[[D1TY:.*]] = #llvm.di_composite_type<tag = DW_TAG_array_type{{.*}}baseType = #[[INT]], elements = #llvm.di_subrange<count = 3 : i64, lowerBound = 1 : i64>>
 // CHECK-DAG: #[[D2TY:.*]] = #llvm.di_composite_type<tag = DW_TAG_array_type{{.*}}baseType = #[[INT]], elements = #llvm.di_subrange<count = 2 : i64, lowerBound = 1 : i64>, #llvm.di_subrange<count = 5 : i64, lowerBound = 1 : i64>>
 // CHECK-DAG: #[[D3TY:.*]] = #llvm.di_composite_type<tag = DW_TAG_array_type{{.*}}baseType = #[[REAL]], elements = #llvm.di_subrange<count = 6 : i64, lowerBound = 1 : i64>, #llvm.di_subrange<count = 8 : i64, lowerBound = 1 : i64>, #llvm.di_subrange<count = 7 : i64, lowerBound = 1 : i64>>
+// CHECK-DAG: #[[D4TY:.*]] = #llvm.di_composite_type<tag = DW_TAG_array_type, baseType = #di_basic_type, elements = #llvm.di_subrange<count = 6 : i64, lowerBound = -2 : i64>, #llvm.di_subrange<count = 7 : i64, lowerBound = 4 : i64>>
 // CHECK-DAG: #llvm.di_local_variable<{{.*}}name = "d1"{{.*}}type = #[[D1TY]]>
 // CHECK-DAG: #llvm.di_local_variable<{{.*}}name = "d2"{{.*}}type = #[[D2TY]]>
 // CHECK-DAG: #llvm.di_local_variable<{{.*}}name = "d3"{{.*}}type = #[[D3TY]]>
+// CHECK-DAG: #llvm.di_local_variable<{{.*}}name = "d4"{{.*}}type = #[[D4TY]]>

>From 839275d0536f992591f4c5d81e13a26e6095dda6 Mon Sep 17 00:00:00 2001
From: Sergio Afonso <safonsof at amd.com>
Date: Wed, 21 Aug 2024 16:57:31 +0100
Subject: [PATCH 017/116] [MLIR][OpenMP] Add missing OpenMP to LLVM conversion
 patterns (#104440)

This patch adds conversion patterns to LLVM for the following OpenMP
dialect operations:
  - `omp.critical.declare`
  - `omp.cancel`
  - `omp.cancellation_point`
  - `omp.distribute`
  - `omp.teams`
  - `omp.ordered`
  - `omp.taskloop`

Also, arbitrary sorting of operations when passing them as template
argument lists when configuring that pass is replaced by alphabetical
sorting.
---
 .../Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp  | 69 ++++++++------
 .../OpenMPToLLVM/convert-to-llvmir.mlir       | 94 +++++++++++++++++++
 2 files changed, 132 insertions(+), 31 deletions(-)

diff --git a/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp b/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
index f6a6d1d7228a06..d6b4ec8584b082 100644
--- a/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
+++ b/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
@@ -223,22 +223,21 @@ void MultiRegionOpConversion<omp::PrivateClauseOp>::forwardOpAttrs(
 void mlir::configureOpenMPToLLVMConversionLegality(
     ConversionTarget &target, LLVMTypeConverter &typeConverter) {
   target.addDynamicallyLegalOp<
-      mlir::omp::AtomicReadOp, mlir::omp::AtomicWriteOp, mlir::omp::FlushOp,
-      mlir::omp::ThreadprivateOp, mlir::omp::YieldOp,
-      mlir::omp::TargetEnterDataOp, mlir::omp::TargetExitDataOp,
-      mlir::omp::TargetUpdateOp, mlir::omp::MapBoundsOp, mlir::omp::MapInfoOp>(
-      [&](Operation *op) {
-        return typeConverter.isLegal(op->getOperandTypes()) &&
-               typeConverter.isLegal(op->getResultTypes());
-      });
+      omp::AtomicReadOp, omp::AtomicWriteOp, omp::CancellationPointOp,
+      omp::CancelOp, omp::CriticalDeclareOp, omp::FlushOp, omp::MapBoundsOp,
+      omp::MapInfoOp, omp::OrderedOp, omp::TargetEnterDataOp,
+      omp::TargetExitDataOp, omp::TargetUpdateOp, omp::ThreadprivateOp,
+      omp::YieldOp>([&](Operation *op) {
+    return typeConverter.isLegal(op->getOperandTypes()) &&
+           typeConverter.isLegal(op->getResultTypes());
+  });
   target.addDynamicallyLegalOp<
-      mlir::omp::AtomicUpdateOp, mlir::omp::CriticalOp, mlir::omp::TargetOp,
-      mlir::omp::TargetDataOp, mlir::omp::LoopNestOp,
-      mlir::omp::OrderedRegionOp, mlir::omp::ParallelOp, mlir::omp::WsloopOp,
-      mlir::omp::SimdOp, mlir::omp::MasterOp, mlir::omp::SectionOp,
-      mlir::omp::SectionsOp, mlir::omp::SingleOp, mlir::omp::TaskgroupOp,
-      mlir::omp::TaskOp, mlir::omp::DeclareReductionOp,
-      mlir::omp::PrivateClauseOp>([&](Operation *op) {
+      omp::AtomicUpdateOp, omp::CriticalOp, omp::DeclareReductionOp,
+      omp::DistributeOp, omp::LoopNestOp, omp::MasterOp, omp::OrderedRegionOp,
+      omp::ParallelOp, omp::PrivateClauseOp, omp::SectionOp, omp::SectionsOp,
+      omp::SimdOp, omp::SingleOp, omp::TargetDataOp, omp::TargetOp,
+      omp::TaskgroupOp, omp::TaskloopOp, omp::TaskOp, omp::TeamsOp,
+      omp::WsloopOp>([&](Operation *op) {
     return std::all_of(op->getRegions().begin(), op->getRegions().end(),
                        [&](Region &region) {
                          return typeConverter.isLegal(&region);
@@ -260,23 +259,31 @@ void mlir::populateOpenMPToLLVMConversionPatterns(LLVMTypeConverter &converter,
       AtomicReadOpConversion, MapInfoOpConversion,
       MultiRegionOpConversion<omp::DeclareReductionOp>,
       MultiRegionOpConversion<omp::PrivateClauseOp>,
-      RegionOpConversion<omp::CriticalOp>, RegionOpConversion<omp::LoopNestOp>,
-      RegionOpConversion<omp::MasterOp>,
-      RegionOpConversion<omp::OrderedRegionOp>,
-      RegionOpConversion<omp::ParallelOp>, RegionOpConversion<omp::WsloopOp>,
-      RegionOpConversion<omp::SectionsOp>, RegionOpConversion<omp::SectionOp>,
-      RegionOpConversion<omp::SimdOp>, RegionOpConversion<omp::SingleOp>,
-      RegionOpConversion<omp::TaskgroupOp>, RegionOpConversion<omp::TaskOp>,
-      RegionOpConversion<omp::TargetDataOp>, RegionOpConversion<omp::TargetOp>,
-      RegionLessOpWithVarOperandsConversion<omp::AtomicWriteOp>,
-      RegionOpWithVarOperandsConversion<omp::AtomicUpdateOp>,
-      RegionLessOpWithVarOperandsConversion<omp::FlushOp>,
-      RegionLessOpWithVarOperandsConversion<omp::ThreadprivateOp>,
-      RegionLessOpConversion<omp::YieldOp>,
+      RegionLessOpConversion<omp::CancellationPointOp>,
+      RegionLessOpConversion<omp::CancelOp>,
+      RegionLessOpConversion<omp::CriticalDeclareOp>,
+      RegionLessOpConversion<omp::OrderedOp>,
       RegionLessOpConversion<omp::TargetEnterDataOp>,
       RegionLessOpConversion<omp::TargetExitDataOp>,
       RegionLessOpConversion<omp::TargetUpdateOp>,
-      RegionLessOpWithVarOperandsConversion<omp::MapBoundsOp>>(converter);
+      RegionLessOpConversion<omp::YieldOp>,
+      RegionLessOpWithVarOperandsConversion<omp::AtomicWriteOp>,
+      RegionLessOpWithVarOperandsConversion<omp::FlushOp>,
+      RegionLessOpWithVarOperandsConversion<omp::MapBoundsOp>,
+      RegionLessOpWithVarOperandsConversion<omp::ThreadprivateOp>,
+      RegionOpConversion<omp::AtomicCaptureOp>,
+      RegionOpConversion<omp::CriticalOp>,
+      RegionOpConversion<omp::DistributeOp>,
+      RegionOpConversion<omp::LoopNestOp>, RegionOpConversion<omp::MaskedOp>,
+      RegionOpConversion<omp::MasterOp>,
+      RegionOpConversion<omp::OrderedRegionOp>,
+      RegionOpConversion<omp::ParallelOp>, RegionOpConversion<omp::SectionOp>,
+      RegionOpConversion<omp::SectionsOp>, RegionOpConversion<omp::SimdOp>,
+      RegionOpConversion<omp::SingleOp>, RegionOpConversion<omp::TargetDataOp>,
+      RegionOpConversion<omp::TargetOp>, RegionOpConversion<omp::TaskgroupOp>,
+      RegionOpConversion<omp::TaskloopOp>, RegionOpConversion<omp::TaskOp>,
+      RegionOpConversion<omp::TeamsOp>, RegionOpConversion<omp::WsloopOp>,
+      RegionOpWithVarOperandsConversion<omp::AtomicUpdateOp>>(converter);
 }
 
 namespace {
@@ -301,8 +308,8 @@ void ConvertOpenMPToLLVMPass::runOnOperation() {
   populateOpenMPToLLVMConversionPatterns(converter, patterns);
 
   LLVMConversionTarget target(getContext());
-  target.addLegalOp<omp::TerminatorOp, omp::TaskyieldOp, omp::FlushOp,
-                    omp::BarrierOp, omp::TaskwaitOp>();
+  target.addLegalOp<omp::BarrierOp, omp::FlushOp, omp::TaskwaitOp,
+                    omp::TaskyieldOp, omp::TerminatorOp>();
   configureOpenMPToLLVMConversionLegality(target, converter);
   if (failed(applyPartialConversion(module, target, std::move(patterns))))
     signalPassFailure();
diff --git a/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir b/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir
index d81487daf34f68..5afdbaa2a56af3 100644
--- a/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir
+++ b/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir
@@ -18,6 +18,20 @@ func.func @critical_block_arg() {
 
 // -----
 
+// CHECK: omp.critical.declare @[[MUTEX:.*]] hint(contended, speculative)
+omp.critical.declare @mutex hint(contended, speculative)
+
+// CHECK: llvm.func @critical_declare
+func.func @critical_declare() {
+  // CHECK: omp.critical(@[[MUTEX]])
+  omp.critical(@mutex) {
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
 // CHECK-LABEL: llvm.func @master_block_arg
 func.func @master_block_arg() {
   // CHECK: omp.master
@@ -523,3 +537,83 @@ omp.private {type = firstprivate} @y.privatizer : index alloc {
   // CHECK: omp.yield(%arg0 : i64)
   omp.yield(%arg0 : index)
 }
+
+// -----
+
+// CHECK-LABEL: llvm.func @omp_cancel_cancellation_point()
+func.func @omp_cancel_cancellation_point() -> () {
+  omp.parallel {
+    // CHECK: omp.cancel cancellation_construct_type(parallel)
+    omp.cancel cancellation_construct_type(parallel)
+    // CHECK: omp.cancellation_point cancellation_construct_type(parallel)
+    omp.cancellation_point cancellation_construct_type(parallel)
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
+// CHECK-LABEL: llvm.func @omp_distribute(
+// CHECK-SAME:  %[[ARG0:.*]]: i64)
+func.func @omp_distribute(%arg0 : index) -> () {
+  // CHECK: omp.distribute dist_schedule_static dist_schedule_chunk_size(%[[ARG0]] : i64) {
+  omp.distribute dist_schedule_static dist_schedule_chunk_size(%arg0 : index) {
+    omp.loop_nest (%iv) : index = (%arg0) to (%arg0) step (%arg0) {
+      omp.yield
+    }
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
+// CHECK-LABEL: llvm.func @omp_teams(
+// CHECK-SAME:  %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.ptr, %[[ARG2:.*]]: i64)
+func.func @omp_teams(%arg0 : memref<i32>) -> () {
+  // CHECK: omp.teams allocate(%{{.*}} : !llvm.struct<(ptr, ptr, i64)> -> %{{.*}} : !llvm.struct<(ptr, ptr, i64)>)
+  omp.teams allocate(%arg0 : memref<i32> -> %arg0 : memref<i32>) {
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
+// CHECK-LABEL: llvm.func @omp_ordered(
+// CHECK-SAME:  %[[ARG0:.*]]: i64)
+func.func @omp_ordered(%arg0 : index) -> () {
+  omp.wsloop ordered(1) {
+    omp.loop_nest (%iv) : index = (%arg0) to (%arg0) step (%arg0) {
+      // CHECK: omp.ordered depend_vec(%[[ARG0]] : i64) {doacross_num_loops = 1 : i64}
+      omp.ordered depend_vec(%arg0 : index) {doacross_num_loops = 1 : i64}
+      omp.yield
+    }
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
+// CHECK-LABEL: @omp_taskloop(
+// CHECK-SAME:  %[[ARG0:.*]]: i64, %[[ARG1:.*]]: !llvm.ptr, %[[ARG2:.*]]: !llvm.ptr, %[[ARG3:.*]]: i64)
+func.func @omp_taskloop(%arg0: index, %arg1 : memref<i32>) {
+  // CHECK: omp.parallel {
+  omp.parallel {
+    // CHECK: omp.taskloop allocate(%{{.*}} : !llvm.struct<(ptr, ptr, i64)> -> %{{.*}} : !llvm.struct<(ptr, ptr, i64)>) {
+    omp.taskloop allocate(%arg1 : memref<i32> -> %arg1 : memref<i32>) {
+      // CHECK: omp.loop_nest (%[[IV:.*]]) : i64 = (%[[ARG0]]) to (%[[ARG0]]) step (%[[ARG0]]) {
+      omp.loop_nest (%iv) : index = (%arg0) to (%arg0) step (%arg0) {
+        // CHECK-DAG: %[[CAST_IV:.*]] = builtin.unrealized_conversion_cast %[[IV]] : i64 to index
+        // CHECK: "test.payload"(%[[CAST_IV]]) : (index) -> ()
+        "test.payload"(%iv) : (index) -> ()
+        omp.yield
+      }
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}

>From 6816a137985bfa38cda20b9cd4e23c361c3bd0de Mon Sep 17 00:00:00 2001
From: Tom Eccles <tom.eccles at arm.com>
Date: Wed, 21 Aug 2024 16:58:57 +0100
Subject: [PATCH 018/116] [flang][Driver] Remove misleading test comment
 (#105528)

The test initially worked on ArmPL but this was changed during code
review and I neglected to fix this comment.

Thanks for pointing this out @banach-space
---
 flang/test/Driver/fveclib-codegen.f90 | 1 -
 1 file changed, 1 deletion(-)

diff --git a/flang/test/Driver/fveclib-codegen.f90 b/flang/test/Driver/fveclib-codegen.f90
index 3a96c29ac70854..3720b9e597f5b5 100644
--- a/flang/test/Driver/fveclib-codegen.f90
+++ b/flang/test/Driver/fveclib-codegen.f90
@@ -1,5 +1,4 @@
 ! test that -fveclib= is passed to the backend
-! -target aarch64 so that ArmPL is available
 ! RUN: %if aarch64-registered-target %{ %flang -S -Ofast -target aarch64-unknown-linux-gnu -fveclib=LIBMVEC -o - %s | FileCheck %s %}
 ! RUN: %if x86-registered-target %{ %flang -S -Ofast -target x86_64-unknown-linux-gnu -fveclib=LIBMVEC -o - %s | FileCheck %s %}
 ! RUN: %flang -S -Ofast -fveclib=NoLibrary -o - %s | FileCheck %s --check-prefix=NOLIB

>From e49068624c48f4d906707b32b31f6a1d561605be Mon Sep 17 00:00:00 2001
From: Alex Rice <alexrice999 at hotmail.co.uk>
Date: Wed, 21 Aug 2024 17:14:33 +0100
Subject: [PATCH 019/116] [mlir] [tablegen] Make `hasSummary` and
 `hasDescription` useful (#105531)

The `hasSummary` and `hasDescription` functions are currently useless as
they check if the corresponding `summary` and `description` are present.
However, these values are set to a default value of `""`, and so these
functions always return true.

This PR changes these functions to check if the summary and description
are just whitespace, which is presumably closer to their original
intent.

@math-fehr
@zero9178
---
 mlir/lib/TableGen/Operator.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mlir/lib/TableGen/Operator.cpp b/mlir/lib/TableGen/Operator.cpp
index bd3e3b1c6b7ccf..76af82a827da13 100644
--- a/mlir/lib/TableGen/Operator.cpp
+++ b/mlir/lib/TableGen/Operator.cpp
@@ -798,14 +798,14 @@ const InferredResultType &Operator::getInferredResultType(int index) const {
 ArrayRef<SMLoc> Operator::getLoc() const { return def.getLoc(); }
 
 bool Operator::hasDescription() const {
-  return def.getValue("description") != nullptr;
+  return !getDescription().trim().empty();
 }
 
 StringRef Operator::getDescription() const {
   return def.getValueAsString("description");
 }
 
-bool Operator::hasSummary() const { return def.getValue("summary") != nullptr; }
+bool Operator::hasSummary() const { return !getSummary().trim().empty(); }
 
 StringRef Operator::getSummary() const {
   return def.getValueAsString("summary");

>From 625841c3be4dbaab089c01217726a2906f3a8103 Mon Sep 17 00:00:00 2001
From: magic-akari <akari.ccino at gmail.com>
Date: Thu, 22 Aug 2024 00:22:21 +0800
Subject: [PATCH 020/116] [clang-format] Use double hyphen for multiple-letter
 flags (#100978)

- Closes: #100974
---
 clang/tools/clang-format/clang-format-diff.py    |  8 ++++----
 clang/tools/clang-format/clang-format-sublime.py |  8 ++++----
 clang/tools/clang-format/clang-format.el         | 14 +++++++-------
 clang/tools/clang-format/clang-format.py         | 16 ++++++++--------
 clang/tools/clang-format/git-clang-format        |  6 +++---
 5 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/clang/tools/clang-format/clang-format-diff.py b/clang/tools/clang-format/clang-format-diff.py
index 3a74b90e731578..9eec0f3c89de37 100755
--- a/clang/tools/clang-format/clang-format-diff.py
+++ b/clang/tools/clang-format/clang-format-diff.py
@@ -134,7 +134,7 @@ def main():
             if line_count != 0:
                 end_line += line_count - 1
             lines_by_file.setdefault(filename, []).extend(
-                ["-lines", str(start_line) + ":" + str(end_line)]
+                ["--lines", str(start_line) + ":" + str(end_line)]
             )
 
     # Reformat files containing changes in place.
@@ -146,12 +146,12 @@ def main():
         if args.i:
             command.append("-i")
         if args.sort_includes:
-            command.append("-sort-includes")
+            command.append("--sort-includes")
         command.extend(lines)
         if args.style:
-            command.extend(["-style", args.style])
+            command.extend(["--style", args.style])
         if args.fallback_style:
-            command.extend(["-fallback-style", args.fallback_style])
+            command.extend(["--fallback-style", args.fallback_style])
 
         try:
             p = subprocess.Popen(
diff --git a/clang/tools/clang-format/clang-format-sublime.py b/clang/tools/clang-format/clang-format-sublime.py
index dcd72e68e94faa..8d41da332c1889 100644
--- a/clang/tools/clang-format/clang-format-sublime.py
+++ b/clang/tools/clang-format/clang-format-sublime.py
@@ -35,18 +35,18 @@ def run(self, edit):
         regions = []
         command = [binary]
         if style:
-            command.extend(["-style", style])
+            command.extend(["--style", style])
         for region in self.view.sel():
             regions.append(region)
             region_offset = min(region.a, region.b)
             region_length = abs(region.b - region.a)
             command.extend(
                 [
-                    "-offset",
+                    "--offset",
                     str(region_offset),
-                    "-length",
+                    "--length",
                     str(region_length),
-                    "-assume-filename",
+                    "--assume-filename",
                     str(self.view.file_name()),
                 ]
             )
diff --git a/clang/tools/clang-format/clang-format.el b/clang/tools/clang-format/clang-format.el
index f43bf063c62970..f3da5415f8672b 100644
--- a/clang/tools/clang-format/clang-format.el
+++ b/clang/tools/clang-format/clang-format.el
@@ -166,19 +166,19 @@ uses the function `buffer-file-name'."
         (let ((status (apply #'call-process-region
                              nil nil clang-format-executable
                              nil `(,temp-buffer ,temp-file) nil
-                             `("-output-replacements-xml"
+                             `("--output-replacements-xml"
                                ;; Guard against a nil assume-file-name.
                                ;; If the clang-format option -assume-filename
                                ;; is given a blank string it will crash as per
                                ;; the following bug report
                                ;; https://bugs.llvm.org/show_bug.cgi?id=34667
                                ,@(and assume-file-name
-                                      (list "-assume-filename" assume-file-name))
-                               ,@(and style (list "-style" style))
-                               "-fallback-style" ,clang-format-fallback-style
-                               "-offset" ,(number-to-string file-start)
-                               "-length" ,(number-to-string (- file-end file-start))
-                               "-cursor" ,(number-to-string cursor))))
+                                      (list "--assume-filename" assume-file-name))
+                               ,@(and style (list "--style" style))
+                               "--fallback-style" ,clang-format-fallback-style
+                               "--offset" ,(number-to-string file-start)
+                               "--length" ,(number-to-string (- file-end file-start))
+                               "--cursor" ,(number-to-string cursor))))
               (stderr (with-temp-buffer
                         (unless (zerop (cadr (insert-file-contents temp-file)))
                           (insert ": "))
diff --git a/clang/tools/clang-format/clang-format.py b/clang/tools/clang-format/clang-format.py
index 28e0d14a552fd1..07eebd27f49d11 100644
--- a/clang/tools/clang-format/clang-format.py
+++ b/clang/tools/clang-format/clang-format.py
@@ -78,7 +78,7 @@ def main():
 
     # Determine range to format.
     if vim.eval('exists("l:lines")') == "1":
-        lines = ["-lines", vim.eval("l:lines")]
+        lines = ["--lines", vim.eval("l:lines")]
     elif vim.eval('exists("l:formatdiff")') == "1" and os.path.exists(
         vim.current.buffer.name
     ):
@@ -88,12 +88,12 @@ def main():
         lines = []
         for op in reversed(sequence.get_opcodes()):
             if op[0] not in ["equal", "delete"]:
-                lines += ["-lines", "%s:%s" % (op[3] + 1, op[4])]
+                lines += ["--lines", "%s:%s" % (op[3] + 1, op[4])]
         if lines == []:
             return
     else:
         lines = [
-            "-lines",
+            "--lines",
             "%s:%s" % (vim.current.range.start + 1, vim.current.range.end + 1),
         ]
 
@@ -116,15 +116,15 @@ def main():
         startupinfo.wShowWindow = subprocess.SW_HIDE
 
     # Call formatter.
-    command = [binary, "-cursor", str(cursor_byte)]
-    if lines != ["-lines", "all"]:
+    command = [binary, "--cursor", str(cursor_byte)]
+    if lines != ["--lines", "all"]:
         command += lines
     if style:
-        command.extend(["-style", style])
+        command.extend(["--style", style])
     if fallback_style:
-        command.extend(["-fallback-style", fallback_style])
+        command.extend(["--fallback-style", fallback_style])
     if vim.current.buffer.name:
-        command.extend(["-assume-filename", vim.current.buffer.name])
+        command.extend(["--assume-filename", vim.current.buffer.name])
     p = subprocess.Popen(
         command,
         stdout=subprocess.PIPE,
diff --git a/clang/tools/clang-format/git-clang-format b/clang/tools/clang-format/git-clang-format
index 714ba8a6e77d51..bacbd8de245666 100755
--- a/clang/tools/clang-format/git-clang-format
+++ b/clang/tools/clang-format/git-clang-format
@@ -510,12 +510,12 @@ def clang_format_to_blob(filename, line_ranges, revision=None,
   Returns the object ID (SHA-1) of the created blob."""
   clang_format_cmd = [binary]
   if style:
-    clang_format_cmd.extend(['-style='+style])
+    clang_format_cmd.extend(['--style='+style])
   clang_format_cmd.extend([
-      '-lines=%s:%s' % (start_line, start_line+line_count-1)
+      '--lines=%s:%s' % (start_line, start_line+line_count-1)
       for start_line, line_count in line_ranges])
   if revision is not None:
-    clang_format_cmd.extend(['-assume-filename='+filename])
+    clang_format_cmd.extend(['--assume-filename='+filename])
     git_show_cmd = ['git', 'cat-file', 'blob', '%s:%s' % (revision, filename)]
     git_show = subprocess.Popen(git_show_cmd, env=env, stdin=subprocess.PIPE,
                                 stdout=subprocess.PIPE)

>From f7bbc40b0736cc417f57cd039b098b504cf6a71f Mon Sep 17 00:00:00 2001
From: Siu Chi Chan <siuchi.chan at amd.com>
Date: Wed, 7 Aug 2024 14:29:27 +0000
Subject: [PATCH 021/116] [ELF,test] Enhance hip-section-layout.s

Check different object file order

Change-Id: I6096c12e29e9ddb6b3053f977e4cbb24eea9b7d3
---
 lld/test/ELF/hip-section-layout.s | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lld/test/ELF/hip-section-layout.s b/lld/test/ELF/hip-section-layout.s
index c76df50919e6d0..b76141c6b41aec 100644
--- a/lld/test/ELF/hip-section-layout.s
+++ b/lld/test/ELF/hip-section-layout.s
@@ -7,8 +7,10 @@
 
 # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux --defsym=NON_HIP_SECTIONS=1 %s -o %t.1.o
 # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux --defsym=HIP_SECTIONS=1 %s -o %t.2.o
-# RUN: ld.lld %t.1.o %t.2.o -o %t.s.out
-# RUN: llvm-readobj --sections %t.s.out | FileCheck %s
+# RUN: ld.lld %t.1.o %t.2.o -o %t.1.s.out
+# RUN: llvm-readobj --sections %t.1.s.out | FileCheck %s
+# RUN: ld.lld %t.2.o %t.1.o -o %t.2.s.out
+# RUN: llvm-readobj --sections %t.2.s.out | FileCheck %s
 
 .ifdef HIP_SECTIONS
 .section .hipFatBinSegment,"aw", at progbits; .space 1

>From f03b7830902225a8910d2972c39143355795efa9 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee <kyulee at meta.com>
Date: Wed, 21 Aug 2024 09:32:51 -0700
Subject: [PATCH 022/116] [CGData] Rename CodeGenDataTests to CGDataTests
 (#105463)

This addresses the comment for
https://github.com/llvm/llvm-project/pull/101461.
---
 llvm/unittests/CGData/CMakeLists.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/unittests/CGData/CMakeLists.txt b/llvm/unittests/CGData/CMakeLists.txt
index 9cedab56d3f6bc..792b323130b474 100644
--- a/llvm/unittests/CGData/CMakeLists.txt
+++ b/llvm/unittests/CGData/CMakeLists.txt
@@ -6,9 +6,9 @@ set(LLVM_LINK_COMPONENTS
   Support
   )
 
-add_llvm_unittest(CodeGenDataTests
+add_llvm_unittest(CGDataTests
   OutlinedHashTreeRecordTest.cpp
   OutlinedHashTreeTest.cpp
   )
 
-target_link_libraries(CodeGenDataTests PRIVATE LLVMTestingSupport)
+target_link_libraries(CGDataTests PRIVATE LLVMTestingSupport)

>From 216d6a06524e4a8ebd6de2806c473b92d3349c4e Mon Sep 17 00:00:00 2001
From: Chenguang Wang <w3cing at gmail.com>
Date: Wed, 21 Aug 2024 09:54:57 -0700
Subject: [PATCH 023/116] [bazel] Fix mlir build broken by 681ae097. (#105552)

The cmake config creates two targets, `MLIRTensorMeshShardingExtensions`
and `MLIRTensorAllExtensions`; but for bazel, with the `Func` dialect we
only have a single `FuncExtensions`. Here I am following the `Func`
dialect convension to only create a single `TensorExtensions`.
---
 .../llvm-project-overlay/mlir/BUILD.bazel     | 39 +++++++++----------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/utils/bazel/llvm-project-overlay/mlir/BUILD.bazel b/utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
index 57b08448ae9294..ddb08f12f04976 100644
--- a/utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+++ b/utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
@@ -3337,25 +3337,6 @@ cc_library(
     ],
 )
 
-cc_library(
-    name = "TensorShardingInterfaceImpl",
-    srcs = ["lib/Dialect/Mesh/Interfaces/TensorShardingInterfaceImpl.cpp"],
-    hdrs = [
-        "include/mlir/Dialect/Mesh/IR/TensorShardingInterfaceImpl.h",
-    ],
-    includes = ["include"],
-    deps = [
-        ":DialectUtils",
-        ":IR",
-        ":MeshDialect",
-        ":MeshShardingInterface",
-        ":MeshShardingInterfaceIncGen",
-        ":Support",
-        ":TensorDialect",
-        "//llvm:Support",
-    ],
-)
-
 cc_library(
     name = "MeshDialect",
     srcs = ["lib/Dialect/Mesh/IR/MeshOps.cpp"],
@@ -4890,6 +4871,7 @@ cc_library(
         ":ROCDLToLLVMIRTranslation",
         ":SCFTransformOps",
         ":SparseTensorTransformOps",
+        ":TensorExtensions",
         ":TensorTransformOps",
         ":TransformDebugExtension",
         ":TransformIRDLExtension",
@@ -7600,6 +7582,7 @@ cc_library(
         "lib/Dialect/Tensor/IR/ValueBoundsOpInterfaceImpl.cpp",
     ],
     hdrs = [
+        "include/mlir/Dialect/Tensor/IR/ShardingInterfaceImpl.h",
         "include/mlir/Dialect/Tensor/IR/Tensor.h",
         "include/mlir/Dialect/Tensor/IR/ValueBoundsOpInterfaceImpl.h",
     ],
@@ -7669,6 +7652,23 @@ cc_library(
     ],
 )
 
+cc_library(
+    name = "TensorExtensions",
+    srcs = glob(["lib/Dialect/Tensor/Extensions/*.cpp"]),
+    hdrs = glob(["include/mlir/Dialect/Tensor/Extensions/*.h"]),
+    includes = ["include"],
+    deps = [
+        ":DialectUtils",
+        ":IR",
+        ":MeshDialect",
+        ":MeshShardingInterface",
+        ":MeshShardingInterfaceIncGen",
+        ":Support",
+        ":TensorDialect",
+        "//llvm:Support",
+    ],
+)
+
 cc_library(
     name = "TensorUtils",
     srcs = ["lib/Dialect/Tensor/Utils/Utils.cpp"],
@@ -9603,7 +9603,6 @@ cc_library(
         ":SparseTensorTransforms",
         ":TensorDialect",
         ":TensorInferTypeOpInterfaceImpl",
-        ":TensorShardingInterfaceImpl",
         ":TensorTilingInterfaceImpl",
         ":TensorTransformOps",
         ":TensorTransforms",

>From 3b7611594f010ecd5233ab9580b2feb88837f9ef Mon Sep 17 00:00:00 2001
From: Johannes Doerfert <johannes at jdoerfert.de>
Date: Wed, 21 Aug 2024 10:01:35 -0700
Subject: [PATCH 024/116] [Offload] Improve error reporting on memory faults
 (#104254)

Since we can already track allocations, we can diagnose memory faults to
some degree. If the fault happens in a prior allocation (use after free)
or "close but outside" one, we can provide that information to the user.
Note that the fault address might be page aligned, and not all accesses
trigger a fault, especially for allocations that are backed by a
MemoryManager. Still, if people disable the MemoryManager or the
allocation is big enough, we can sometimes provide valueable feedback.
---
 offload/plugins-nextgen/amdgpu/src/rtl.cpp    | 12 +++-
 .../common/include/ErrorReporting.h           | 67 +++++++++++++++++--
 .../common/include/PluginInterface.h          | 46 +++++++++++--
 offload/test/sanitizer/double_free.c          |  6 +-
 offload/test/sanitizer/double_free_racy.c     |  2 +-
 offload/test/sanitizer/free_wrong_ptr_kind.c  |  2 +-
 .../test/sanitizer/free_wrong_ptr_kind.cpp    |  2 +-
 offload/test/sanitizer/ptr_outside_alloc_1.c  | 40 +++++++++++
 offload/test/sanitizer/ptr_outside_alloc_2.c  | 26 +++++++
 offload/test/sanitizer/use_after_free_1.c     | 39 +++++++++++
 offload/test/sanitizer/use_after_free_2.c     | 32 +++++++++
 11 files changed, 256 insertions(+), 18 deletions(-)
 create mode 100644 offload/test/sanitizer/ptr_outside_alloc_1.c
 create mode 100644 offload/test/sanitizer/ptr_outside_alloc_2.c
 create mode 100644 offload/test/sanitizer/use_after_free_1.c
 create mode 100644 offload/test/sanitizer/use_after_free_2.c

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index a434a0089d5f94..86df4584db0914 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -3264,8 +3264,18 @@ struct AMDGPUPluginTy final : public GenericPluginTy {
       }
       if (DeviceNode != Node)
         continue;
-
+      void *DevicePtr = (void *)Event->memory_fault.virtual_address;
+      std::string S;
+      llvm::raw_string_ostream OS(S);
+      OS << llvm::format("Memory access fault by GPU %" PRIu32
+                         " (agent 0x%" PRIx64
+                         ") at virtual address %p. Reasons: %s",
+                         Node, Event->memory_fault.agent.handle,
+                         (void *)Event->memory_fault.virtual_address,
+                         llvm::join(Reasons, ", ").c_str());
       ErrorReporter::reportKernelTraces(AMDGPUDevice, *KernelTraceInfoRecord);
+      ErrorReporter::reportMemoryAccessError(AMDGPUDevice, DevicePtr, S,
+                                             /*Abort*/ true);
     }
 
     // Abort the execution since we do not recover from this error.
diff --git a/offload/plugins-nextgen/common/include/ErrorReporting.h b/offload/plugins-nextgen/common/include/ErrorReporting.h
index e557b32c2c24f8..8478977a8f86af 100644
--- a/offload/plugins-nextgen/common/include/ErrorReporting.h
+++ b/offload/plugins-nextgen/common/include/ErrorReporting.h
@@ -157,10 +157,13 @@ class ErrorReporter {
 
     if (ATI->HostPtr)
       print(BoldLightPurple,
-            "Last allocation of size %lu for host pointer %p:\n", ATI->Size,
-            ATI->HostPtr);
+            "Last allocation of size %lu for host pointer %p -> device pointer "
+            "%p:\n",
+            ATI->Size, ATI->HostPtr, ATI->DevicePtr);
     else
-      print(BoldLightPurple, "Last allocation of size %lu:\n", ATI->Size);
+      print(BoldLightPurple,
+            "Last allocation of size %lu -> device pointer %p:\n", ATI->Size,
+            ATI->DevicePtr);
     reportStackTrace(ATI->AllocationTrace);
     if (!ATI->LastAllocationInfo)
       return;
@@ -174,10 +177,13 @@ class ErrorReporter {
             ATI->Size);
       reportStackTrace(ATI->DeallocationTrace);
       if (ATI->HostPtr)
-        print(BoldLightPurple, " #%u Prior allocation for host pointer %p:\n",
-              I, ATI->HostPtr);
+        print(
+            BoldLightPurple,
+            " #%u Prior allocation for host pointer %p -> device pointer %p:\n",
+            I, ATI->HostPtr, ATI->DevicePtr);
       else
-        print(BoldLightPurple, " #%u Prior allocation:\n", I);
+        print(BoldLightPurple, " #%u Prior allocation -> device pointer %p:\n",
+              I, ATI->DevicePtr);
       reportStackTrace(ATI->AllocationTrace);
       ++I;
     }
@@ -219,6 +225,55 @@ class ErrorReporter {
 #undef DEALLOCATION_ERROR
   }
 
+  static void reportMemoryAccessError(GenericDeviceTy &Device, void *DevicePtr,
+                                      std::string &ErrorStr, bool Abort) {
+    reportError(ErrorStr.c_str());
+
+    if (!Device.OMPX_TrackAllocationTraces) {
+      print(Yellow, "Use '%s=true' to track device allocations\n",
+            Device.OMPX_TrackAllocationTraces.getName().data());
+      if (Abort)
+        abortExecution();
+      return;
+    }
+    uintptr_t Distance = false;
+    auto *ATI =
+        Device.getClosestAllocationTraceInfoForAddr(DevicePtr, Distance);
+    if (!ATI) {
+      print(Cyan,
+            "No host-issued allocations; device pointer %p might be "
+            "a global, stack, or shared location\n",
+            DevicePtr);
+      if (Abort)
+        abortExecution();
+      return;
+    }
+    if (!Distance) {
+      print(Cyan, "Device pointer %p points into%s host-issued allocation:\n",
+            DevicePtr, ATI->DeallocationTrace.empty() ? "" : " prior");
+      reportAllocationInfo(ATI);
+      if (Abort)
+        abortExecution();
+      return;
+    }
+
+    bool IsClose = Distance < (1L << 29L /*512MB=*/);
+    print(Cyan,
+          "Device pointer %p does not point into any (current or prior) "
+          "host-issued allocation%s.\n",
+          DevicePtr,
+          IsClose ? "" : " (might be a global, stack, or shared location)");
+    if (IsClose) {
+      print(Cyan,
+            "Closest host-issued allocation (distance %" PRIuPTR
+            " byte%s; might be by page):\n",
+            Distance, Distance > 1 ? "s" : "");
+      reportAllocationInfo(ATI);
+    }
+    if (Abort)
+      abortExecution();
+  }
+
   /// Report that a kernel encountered a trap instruction.
   static void reportTrapInKernel(
       GenericDeviceTy &Device, KernelTraceInfoRecordTy &KTIR,
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 81823338fe2112..7e3e788fa52dc9 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -938,6 +938,42 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   /// been deallocated, both for error reporting purposes.
   ProtectedObj<DenseMap<void *, AllocationTraceInfoTy *>> AllocationTraces;
 
+  /// Return the allocation trace info for a device pointer, that is the
+  /// allocation into which this device pointer points to (or pointed into).
+  AllocationTraceInfoTy *getAllocationTraceInfoForAddr(void *DevicePtr) {
+    auto AllocationTraceMap = AllocationTraces.getExclusiveAccessor();
+    for (auto &It : *AllocationTraceMap) {
+      if (It.first <= DevicePtr &&
+          advanceVoidPtr(It.first, It.second->Size) > DevicePtr)
+        return It.second;
+    }
+    return nullptr;
+  }
+
+  /// Return the allocation trace info for a device pointer, that is the
+  /// allocation into which this device pointer points to (or pointed into).
+  AllocationTraceInfoTy *
+  getClosestAllocationTraceInfoForAddr(void *DevicePtr, uintptr_t &Distance) {
+    Distance = 0;
+    if (auto *ATI = getAllocationTraceInfoForAddr(DevicePtr)) {
+      return ATI;
+    }
+
+    AllocationTraceInfoTy *ATI = nullptr;
+    uintptr_t DevicePtrI = uintptr_t(DevicePtr);
+    auto AllocationTraceMap = AllocationTraces.getExclusiveAccessor();
+    for (auto &It : *AllocationTraceMap) {
+      uintptr_t Begin = uintptr_t(It.second->DevicePtr);
+      uintptr_t End = Begin + It.second->Size - 1;
+      uintptr_t ItDistance = std::min(Begin - DevicePtrI, DevicePtrI - End);
+      if (ATI && ItDistance > Distance)
+        continue;
+      ATI = It.second;
+      Distance = ItDistance;
+    }
+    return ATI;
+  }
+
   /// Map to record kernel have been launchedl, for error reporting purposes.
   ProtectedObj<KernelTraceInfoRecordTy> KernelLaunchTraces;
 
@@ -946,6 +982,11 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   UInt32Envar OMPX_TrackNumKernelLaunches =
       UInt32Envar("OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES", 0);
 
+  /// Environment variable to determine if stack traces for allocations and
+  /// deallocations are tracked.
+  BoolEnvar OMPX_TrackAllocationTraces =
+      BoolEnvar("OFFLOAD_TRACK_ALLOCATION_TRACES", false);
+
 private:
   /// Get and set the stack size and heap size for the device. If not used, the
   /// plugin can implement the setters as no-op and setting the output
@@ -996,11 +1037,6 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   UInt32Envar OMPX_InitialNumStreams;
   UInt32Envar OMPX_InitialNumEvents;
 
-  /// Environment variable to determine if stack traces for allocations and
-  /// deallocations are tracked.
-  BoolEnvar OMPX_TrackAllocationTraces =
-      BoolEnvar("OFFLOAD_TRACK_ALLOCATION_TRACES", false);
-
   /// Array of images loaded into the device. Images are automatically
   /// deallocated by the allocator.
   llvm::SmallVector<DeviceImageTy *> LoadedImages;
diff --git a/offload/test/sanitizer/double_free.c b/offload/test/sanitizer/double_free.c
index ca7310e34fc9d0..a3d8b06f1c7381 100644
--- a/offload/test/sanitizer/double_free.c
+++ b/offload/test/sanitizer/double_free.c
@@ -36,7 +36,7 @@ int main(void) {
 // NDEBG:  main
 // DEBUG:  main {{.*}}double_free.c:24
 //
-// CHECK: Last allocation of size 8:
+// CHECK: Last allocation of size 8 -> device pointer
 // CHECK:  dataAlloc
 // CHECK:  omp_target_alloc
 // NDEBG:  main
@@ -49,7 +49,7 @@ int main(void) {
 // NDEBG:  main
 // DEBUG:  main {{.*}}double_free.c:22
 //
-// CHECK: #0 Prior allocation:
+// CHECK: #0 Prior allocation -> device pointer
 // CHECK:  dataAlloc
 // CHECK:  omp_target_alloc
 // NDEBG:  main
@@ -61,7 +61,7 @@ int main(void) {
 // NDEBG:  main
 // DEBUG:  main {{.*}}double_free.c:20
 //
-// CHECK: #1 Prior allocation:
+// CHECK: #1 Prior allocation -> device pointer
 // CHECK:  dataAlloc
 // CHECK:  omp_target_alloc
 // NDEBG:  main
diff --git a/offload/test/sanitizer/double_free_racy.c b/offload/test/sanitizer/double_free_racy.c
index 3b4f2d5c51571c..4ebd8f36efa10c 100644
--- a/offload/test/sanitizer/double_free_racy.c
+++ b/offload/test/sanitizer/double_free_racy.c
@@ -28,6 +28,6 @@ int main(void) {
 // CHECK:  dataDelete
 // CHECK:  omp_target_free
 
-// CHECK: Last allocation of size 8:
+// CHECK: Last allocation of size 8 -> device pointer
 // CHECK:  dataAlloc
 // CHECK:  omp_target_alloc
diff --git a/offload/test/sanitizer/free_wrong_ptr_kind.c b/offload/test/sanitizer/free_wrong_ptr_kind.c
index 0c178541db1170..7c5a4ff7085024 100644
--- a/offload/test/sanitizer/free_wrong_ptr_kind.c
+++ b/offload/test/sanitizer/free_wrong_ptr_kind.c
@@ -28,7 +28,7 @@ int main(void) {
 // NDEBG: main
 // DEBUG:  main {{.*}}free_wrong_ptr_kind.c:22
 //
-// CHECK: Last allocation of size 8:
+// CHECK: Last allocation of size 8 -> device pointer
 // CHECK:  dataAlloc
 // CHECK:  llvm_omp_target_alloc_host
 // NDEBG:  main
diff --git a/offload/test/sanitizer/free_wrong_ptr_kind.cpp b/offload/test/sanitizer/free_wrong_ptr_kind.cpp
index 87a52c5d4baf23..7ebb8c438433a9 100644
--- a/offload/test/sanitizer/free_wrong_ptr_kind.cpp
+++ b/offload/test/sanitizer/free_wrong_ptr_kind.cpp
@@ -31,7 +31,7 @@ int main(void) {
 // NDEBG: main
 // DEBUG:  main {{.*}}free_wrong_ptr_kind.cpp:25
 //
-// CHECK: Last allocation of size 8:
+// CHECK: Last allocation of size 8 -> device pointer
 // CHECK:  dataAlloc
 // CHECK:  llvm_omp_target_alloc_shared
 // NDEBG:  main
diff --git a/offload/test/sanitizer/ptr_outside_alloc_1.c b/offload/test/sanitizer/ptr_outside_alloc_1.c
new file mode 100644
index 00000000000000..38742b783e8e9b
--- /dev/null
+++ b/offload/test/sanitizer/ptr_outside_alloc_1.c
@@ -0,0 +1,40 @@
+// clang-format off
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK,NTRCE
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK,TRACE
+// clang-format on
+
+// UNSUPPORTED: aarch64-unknown-linux-gnu
+// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+// UNSUPPORTED: s390x-ibm-linux-gnu
+// UNSUPPORTED: s390x-ibm-linux-gnu-LTO
+
+#include <omp.h>
+
+void *llvm_omp_target_alloc_host(size_t Size, int DeviceNum);
+void llvm_omp_target_free_host(void *Ptr, int DeviceNum);
+
+int main() {
+  int N = (1 << 30);
+  char *A = (char *)llvm_omp_target_alloc_host(N, omp_get_default_device());
+  char *P;
+#pragma omp target map(from : P)
+  {
+    P = &A[0];
+    *P = 3;
+  }
+  // clang-format off
+// CHECK: OFFLOAD ERROR: Memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
+// NTRCE: Use 'OFFLOAD_TRACK_ALLOCATION_TRACES=true' to track device allocations
+// TRACE: Device pointer [[PTR]] does not point into any (current or prior) host-issued allocation.
+// TRACE: Closest host-issued allocation (distance 4096 bytes; might be by page):
+// TRACE: Last allocation of size 1073741824
+// clang-format on
+#pragma omp target
+  { P[-4] = 5; }
+
+  llvm_omp_target_free_host(A, omp_get_default_device());
+}
diff --git a/offload/test/sanitizer/ptr_outside_alloc_2.c b/offload/test/sanitizer/ptr_outside_alloc_2.c
new file mode 100644
index 00000000000000..ac47c8922f09ef
--- /dev/null
+++ b/offload/test/sanitizer/ptr_outside_alloc_2.c
@@ -0,0 +1,26 @@
+// clang-format off
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK
+// clang-format on
+
+// UNSUPPORTED: aarch64-unknown-linux-gnu
+// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+// UNSUPPORTED: s390x-ibm-linux-gnu
+// UNSUPPORTED: s390x-ibm-linux-gnu-LTO
+
+#include <omp.h>
+
+int main() {
+  int N = (1 << 30);
+  char *A = (char *)malloc(N);
+#pragma omp target map(A[ : N])
+  { A[N] = 3; }
+  // clang-format off
+// CHECK: OFFLOAD ERROR: Memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
+// CHECK: Device pointer [[PTR]] does not point into any (current or prior) host-issued allocation.
+// CHECK: Closest host-issued allocation (distance 1 byte; might be by page):
+// CHECK: Last allocation of size 1073741824
+// clang-format on
+}
diff --git a/offload/test/sanitizer/use_after_free_1.c b/offload/test/sanitizer/use_after_free_1.c
new file mode 100644
index 00000000000000..cebcdee1803475
--- /dev/null
+++ b/offload/test/sanitizer/use_after_free_1.c
@@ -0,0 +1,39 @@
+// clang-format off
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK,NTRCE
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK,TRACE
+// clang-format on
+
+// UNSUPPORTED: aarch64-unknown-linux-gnu
+// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+// UNSUPPORTED: s390x-ibm-linux-gnu
+// UNSUPPORTED: s390x-ibm-linux-gnu-LTO
+
+#include <omp.h>
+
+void *llvm_omp_target_alloc_host(size_t Size, int DeviceNum);
+void llvm_omp_target_free_host(void *Ptr, int DeviceNum);
+
+int main() {
+  int N = (1 << 30);
+  char *A = (char *)llvm_omp_target_alloc_host(N, omp_get_default_device());
+  char *P;
+#pragma omp target map(from : P)
+  {
+    P = &A[N / 2];
+    *P = 3;
+  }
+  llvm_omp_target_free_host(A, omp_get_default_device());
+  // clang-format off
+// CHECK: OFFLOAD ERROR: Memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
+// NTRCE: Use 'OFFLOAD_TRACK_ALLOCATION_TRACES=true' to track device allocations
+// TRACE: Device pointer [[PTR]] points into prior host-issued allocation:
+// TRACE: Last deallocation:
+// TRACE: Last allocation of size 1073741824
+// clang-format on
+#pragma omp target
+  { *P = 5; }
+}
diff --git a/offload/test/sanitizer/use_after_free_2.c b/offload/test/sanitizer/use_after_free_2.c
new file mode 100644
index 00000000000000..587d04a6ff3528
--- /dev/null
+++ b/offload/test/sanitizer/use_after_free_2.c
@@ -0,0 +1,32 @@
+// clang-format off
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK
+// clang-format on
+
+// UNSUPPORTED: aarch64-unknown-linux-gnu
+// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+// UNSUPPORTED: s390x-ibm-linux-gnu
+// UNSUPPORTED: s390x-ibm-linux-gnu-LTO
+
+#include <omp.h>
+
+int main() {
+  int N = (1 << 30);
+  char *A = (char *)malloc(N);
+  char *P;
+#pragma omp target map(A[ : N]) map(from : P)
+  {
+    P = &A[N / 2];
+    *P = 3;
+  }
+  // clang-format off
+// CHECK: OFFLOAD ERROR: Memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
+// CHECK: Device pointer [[PTR]] points into prior host-issued allocation:
+// CHECK: Last deallocation:
+// CHECK: Last allocation of size 1073741824
+// clang-format on
+#pragma omp target
+  { *P = 5; }
+}

>From 1c9d8a62cb208afe1bc87669c7dd5d9590e615b2 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Wed, 21 Aug 2024 12:07:27 -0500
Subject: [PATCH 025/116] [libcxx] Add cache file for the GPU build (#99348)

Summary:
This patch adds a CMake cache config file for the GPU build. This cache
will set the default required options when used from the LLVM runtime
interface or directly. These options pretty much disable everything the
GPU can't handle.

With this and the following patches: #99259, #99243, #99287, and #99333,
we will be able to build `libc++` targeting the GPU with an invocation
like this.

```
$ cmake ../llvm
-DRUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES=${LLVM_SRC}/../libcxx/cmake/caches/NVPTX.cmake \
-DRUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES=${LLVM_SRC}/../libcxx/cmake/caches/AMDGPU.cmake \
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx   \
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda                \
```

This will then install the libraries and headers into the appropriate
locations for use with `clang`.
---
 libcxx/cmake/caches/AMDGPU.cmake | 36 ++++++++++++++++++++++++++++++++
 libcxx/cmake/caches/NVPTX.cmake  | 36 ++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)
 create mode 100644 libcxx/cmake/caches/AMDGPU.cmake
 create mode 100644 libcxx/cmake/caches/NVPTX.cmake

diff --git a/libcxx/cmake/caches/AMDGPU.cmake b/libcxx/cmake/caches/AMDGPU.cmake
new file mode 100644
index 00000000000000..0cd2eebfb9c16a
--- /dev/null
+++ b/libcxx/cmake/caches/AMDGPU.cmake
@@ -0,0 +1,36 @@
+# Configuration options for libcxx.
+set(LIBCXX_ABI_VERSION 2 CACHE STRING "")
+set(LIBCXX_CXX_ABI libcxxabi CACHE STRING "")
+set(LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_FILESYSTEM OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_LOCALIZATION OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_MONOTONIC_CLOCK ON CACHE BOOL "")
+set(LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS ON CACHE BOOL "")
+set(LIBCXX_ENABLE_RANDOM_DEVICE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_RTTI OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC_ABI_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC ON CACHE BOOL "")
+set(LIBCXX_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_UNICODE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_WIDE_CHARACTERS OFF CACHE BOOL "")
+set(LIBCXX_HAS_TERMINAL_AVAILABLE OFF CACHE BOOL "")
+set(LIBCXX_INSTALL_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_LIBC "llvm-libc" CACHE STRING "")
+set(LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")
+
+# Configuration options for libcxxabi.
+set(LIBCXXABI_BAREMETAL ON CACHE BOOL "")
+set(LIBCXXABI_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_NEW_DELETE_DEFINITIONS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXXABI_USE_LLVM_UNWINDER OFF CACHE BOOL "")
+
+# Necessary compile flags for AMDGPU.
+set(LIBCXX_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;-Xclang;-mcode-object-version=none" CACHE STRING "")
+set(LIBCXXABI_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;-Xclang;-mcode-object-version=none" CACHE STRING "")
+set(CMAKE_REQUIRED_FLAGS "-nogpulib -nodefaultlibs" CACHE STRING "")
diff --git a/libcxx/cmake/caches/NVPTX.cmake b/libcxx/cmake/caches/NVPTX.cmake
new file mode 100644
index 00000000000000..47a24a349e996e
--- /dev/null
+++ b/libcxx/cmake/caches/NVPTX.cmake
@@ -0,0 +1,36 @@
+# Configuration options for libcxx.
+set(LIBCXX_ABI_VERSION 2 CACHE STRING "")
+set(LIBCXX_CXX_ABI libcxxabi CACHE STRING "")
+set(LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_FILESYSTEM OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_LOCALIZATION OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_MONOTONIC_CLOCK ON CACHE BOOL "")
+set(LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS ON CACHE BOOL "")
+set(LIBCXX_ENABLE_RANDOM_DEVICE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_RTTI OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC_ABI_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC ON CACHE BOOL "")
+set(LIBCXX_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_UNICODE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_WIDE_CHARACTERS OFF CACHE BOOL "")
+set(LIBCXX_HAS_TERMINAL_AVAILABLE OFF CACHE BOOL "")
+set(LIBCXX_INSTALL_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_LIBC "llvm-libc" CACHE STRING "")
+set(LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")
+
+# Configuration options for libcxxabi.
+set(LIBCXXABI_BAREMETAL ON CACHE BOOL "")
+set(LIBCXXABI_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_NEW_DELETE_DEFINITIONS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXXABI_USE_LLVM_UNWINDER OFF CACHE BOOL "")
+
+# Necessary compile flags for NVPTX.
+set(LIBCXX_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;--cuda-feature=+ptx63" CACHE STRING "")
+set(LIBCXXABI_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;--cuda-feature=+ptx63" CACHE STRING "")
+set(CMAKE_REQUIRED_FLAGS "-nogpulib -nodefaultlibs -flto -c" CACHE STRING "")

>From c61d565721d0cf03e2658ec65a3526dd89142e52 Mon Sep 17 00:00:00 2001
From: David Green <david.green at arm.com>
Date: Wed, 21 Aug 2024 18:10:16 +0100
Subject: [PATCH 026/116] [AArch64] Set scalar fneg to free for fnmul (#104814)

A fneg(fmul(..)) or fmul(fneg(..)) can be folded into a fnmul under
AArch64. https://clang.godbolt.org/z/znPj34Mae

This discounts the cost of the fneg in such patterns to be free.
---
 .../Target/AArch64/AArch64TargetTransformInfo.cpp    |  9 +++++++++
 llvm/test/Analysis/CostModel/AArch64/arith-fp.ll     | 12 ++++++------
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index a782c9c4351237..f31e1fa9ab3045 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3242,6 +3242,15 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     return LT.first;
 
   case ISD::FNEG:
+    // Scalar fmul(fneg) or fneg(fmul) can be converted to fnmul
+    if ((Ty->isFloatTy() || Ty->isDoubleTy() ||
+         (Ty->isHalfTy() && ST->hasFullFP16())) &&
+        CxtI &&
+        ((CxtI->hasOneUse() &&
+          match(*CxtI->user_begin(), m_FMul(m_Value(), m_Value()))) ||
+         match(CxtI->getOperand(0), m_FMul(m_Value(), m_Value()))))
+      return 0;
+    [[fallthrough]];
   case ISD::FADD:
   case ISD::FSUB:
     // Increase the cost for half and bfloat types if not architecturally
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
index 84150765d77973..aaffd97b92b2de 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
@@ -133,7 +133,7 @@ define i32 @fneg(i32 %arg) {
 
 define i32 @fmulfneg(i32 %arg) {
 ; CHECK-LABEL: 'fmulfneg'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F16 = fneg half undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %F16 = fneg half undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F16M = fmul half %F16, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2F16 = fneg <2 x half> undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F16M = fmul <2 x half> %V2F16, undef
@@ -143,7 +143,7 @@ define i32 @fmulfneg(i32 %arg) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8F16M = fmul <8 x half> %V8F16, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fneg <16 x half> undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16F16M = fmul <16 x half> %V16F16, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F32 = fneg float undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %F32 = fneg float undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F32M = fmul float %F32, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fneg <2 x float> undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F32M = fmul <2 x float> %V2F32, undef
@@ -151,7 +151,7 @@ define i32 @fmulfneg(i32 %arg) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4F32M = fmul <4 x float> %V4F32, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fneg <8 x float> undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8F32M = fmul <8 x float> %V8F32, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F64 = fneg double undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %F64 = fneg double undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F64M = fmul double %F64, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fneg <2 x double> undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F64M = fmul <2 x double> %V2F64, undef
@@ -192,7 +192,7 @@ define i32 @fmulfneg(i32 %arg) {
 define i32 @fnegfmul(i32 %arg) {
 ; CHECK-LABEL: 'fnegfmul'
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F16M = fmul half undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F16 = fneg half %F16M
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %F16 = fneg half %F16M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F16M = fmul <2 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2F16 = fneg <2 x half> %V2F16M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4F16M = fmul <4 x half> undef, undef
@@ -202,7 +202,7 @@ define i32 @fnegfmul(i32 %arg) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16F16M = fmul <16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fneg <16 x half> %V16F16M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F32M = fmul float undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F32 = fneg float %F32M
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %F32 = fneg float %F32M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F32M = fmul <2 x float> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fneg <2 x float> %V2F32M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4F32M = fmul <4 x float> undef, undef
@@ -210,7 +210,7 @@ define i32 @fnegfmul(i32 %arg) {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V8F32M = fmul <8 x float> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fneg <8 x float> %V8F32M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F64M = fmul double undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %F64 = fneg double %F64M
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %F64 = fneg double %F64M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F64M = fmul <2 x double> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fneg <2 x double> %V2F64M
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4F64M = fmul <4 x double> undef, undef

>From e78156a0e225673e592920410c8cadc94f19aa66 Mon Sep 17 00:00:00 2001
From: Sumanth Gundapaneni <sumanth.gundapaneni at amd.com>
Date: Wed, 21 Aug 2024 12:13:56 -0500
Subject: [PATCH 027/116] Scalarize the vector inputs to llvm.lround intrinsic
 by default. (#101054)

Verifier is updated in a different patch to let the vector types for
llvm.lround and llvm.llround intrinsics.
---
 .../CodeGen/GlobalISel/LegalizerHelper.cpp    |   2 +
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp |  10 +-
 .../SelectionDAG/LegalizeFloatTypes.cpp       |   2 +
 llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h |   2 +-
 .../SelectionDAG/LegalizeVectorOps.cpp        |   2 +
 .../SelectionDAG/LegalizeVectorTypes.cpp      |  16 +-
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +
 llvm/lib/CodeGen/TargetLoweringBase.cpp       |   5 +-
 llvm/test/CodeGen/AMDGPU/lround.ll            | 479 ++++++++++++++----
 9 files changed, 424 insertions(+), 96 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index bdbef20e20960d..3fece81df1f2fd 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -4921,6 +4921,8 @@ LegalizerHelper::fewerElementsVector(MachineInstr &MI, unsigned TypeIdx,
   case G_INTRINSIC_LLRINT:
   case G_INTRINSIC_ROUND:
   case G_INTRINSIC_ROUNDEVEN:
+  case G_LROUND:
+  case G_LLROUND:
   case G_INTRINSIC_TRUNC:
   case G_FCOS:
   case G_FSIN:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c9ab7e7a66079c..11935cbc309f01 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -507,7 +507,7 @@ namespace {
     SDValue visitUINT_TO_FP(SDNode *N);
     SDValue visitFP_TO_SINT(SDNode *N);
     SDValue visitFP_TO_UINT(SDNode *N);
-    SDValue visitXRINT(SDNode *N);
+    SDValue visitXROUND(SDNode *N);
     SDValue visitFP_ROUND(SDNode *N);
     SDValue visitFP_EXTEND(SDNode *N);
     SDValue visitFNEG(SDNode *N);
@@ -1929,8 +1929,10 @@ SDValue DAGCombiner::visit(SDNode *N) {
   case ISD::UINT_TO_FP:         return visitUINT_TO_FP(N);
   case ISD::FP_TO_SINT:         return visitFP_TO_SINT(N);
   case ISD::FP_TO_UINT:         return visitFP_TO_UINT(N);
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::LRINT:
-  case ISD::LLRINT:             return visitXRINT(N);
+  case ISD::LLRINT:             return visitXROUND(N);
   case ISD::FP_ROUND:           return visitFP_ROUND(N);
   case ISD::FP_EXTEND:          return visitFP_EXTEND(N);
   case ISD::FNEG:               return visitFNEG(N);
@@ -17998,15 +18000,17 @@ SDValue DAGCombiner::visitFP_TO_UINT(SDNode *N) {
   return FoldIntToFPToInt(N, DAG);
 }
 
-SDValue DAGCombiner::visitXRINT(SDNode *N) {
+SDValue DAGCombiner::visitXROUND(SDNode *N) {
   SDValue N0 = N->getOperand(0);
   EVT VT = N->getValueType(0);
 
   // fold (lrint|llrint undef) -> undef
+  // fold (lround|llround undef) -> undef
   if (N0.isUndef())
     return DAG.getUNDEF(VT);
 
   // fold (lrint|llrint c1fp) -> c1
+  // fold (lround|llround c1fp) -> c1
   if (DAG.isConstantFPBuildVectorOrConstantFP(N0))
     return DAG.getNode(N->getOpcode(), SDLoc(N), VT, N0);
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index ad0c054d3ccd50..221dcfe145594f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -2441,6 +2441,8 @@ bool DAGTypeLegalizer::PromoteFloatOperand(SDNode *N, unsigned OpNo) {
     case ISD::FCOPYSIGN:  R = PromoteFloatOp_FCOPYSIGN(N, OpNo); break;
     case ISD::FP_TO_SINT:
     case ISD::FP_TO_UINT:
+    case ISD::LROUND:
+    case ISD::LLROUND:
     case ISD::LRINT:
     case ISD::LLRINT:     R = PromoteFloatOp_UnaryOp(N, OpNo); break;
     case ISD::FP_TO_SINT_SAT:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 27dd4ae241bd10..1088db4bdbe0b3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -1052,7 +1052,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue WidenVecRes_Convert(SDNode *N);
   SDValue WidenVecRes_Convert_StrictFP(SDNode *N);
   SDValue WidenVecRes_FP_TO_XINT_SAT(SDNode *N);
-  SDValue WidenVecRes_XRINT(SDNode *N);
+  SDValue WidenVecRes_XROUND(SDNode *N);
   SDValue WidenVecRes_FCOPYSIGN(SDNode *N);
   SDValue WidenVecRes_UnarySameEltsWithScalarArg(SDNode *N);
   SDValue WidenVecRes_ExpOp(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 57843f0959ac28..3f104baed97b1a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -473,6 +473,8 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
                                               Node->getValueType(0), Scale);
     break;
   }
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::LRINT:
   case ISD::LLRINT:
   case ISD::SINT_TO_FP:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index aad0047b4839a8..8315efcb6750f9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -110,6 +110,8 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
   case ISD::LLRINT:
   case ISD::FROUND:
   case ISD::FROUNDEVEN:
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::FSIN:
   case ISD::FSINH:
   case ISD::FSQRT:
@@ -752,6 +754,8 @@ bool DAGTypeLegalizer::ScalarizeVectorOperand(SDNode *N, unsigned OpNo) {
   case ISD::FP_TO_UINT:
   case ISD::SINT_TO_FP:
   case ISD::UINT_TO_FP:
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::LRINT:
   case ISD::LLRINT:
     Res = ScalarizeVecOp_UnaryOp(N);
@@ -1215,6 +1219,8 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
   case ISD::VP_FROUND:
   case ISD::FROUNDEVEN:
   case ISD::VP_FROUNDEVEN:
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::FSIN:
   case ISD::FSINH:
   case ISD::FSQRT: case ISD::VP_SQRT:
@@ -3270,6 +3276,8 @@ bool DAGTypeLegalizer::SplitVectorOperand(SDNode *N, unsigned OpNo) {
   case ISD::ZERO_EXTEND:
   case ISD::ANY_EXTEND:
   case ISD::FTRUNC:
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::LRINT:
   case ISD::LLRINT:
     Res = SplitVecOp_UnaryOp(N);
@@ -4594,7 +4602,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
   case ISD::LLRINT:
   case ISD::VP_LRINT:
   case ISD::VP_LLRINT:
-    Res = WidenVecRes_XRINT(N);
+  case ISD::LROUND:
+  case ISD::LLROUND:
+    Res = WidenVecRes_XROUND(N);
     break;
 
   case ISD::FABS:
@@ -5231,7 +5241,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_FP_TO_XINT_SAT(SDNode *N) {
   return DAG.getNode(N->getOpcode(), dl, WidenVT, Src, N->getOperand(1));
 }
 
-SDValue DAGTypeLegalizer::WidenVecRes_XRINT(SDNode *N) {
+SDValue DAGTypeLegalizer::WidenVecRes_XROUND(SDNode *N) {
   SDLoc dl(N);
   EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
   ElementCount WidenNumElts = WidenVT.getVectorElementCount();
@@ -6480,6 +6490,8 @@ bool DAGTypeLegalizer::WidenVectorOperand(SDNode *N, unsigned OpNo) {
   case ISD::VSELECT:            Res = WidenVecOp_VSELECT(N); break;
   case ISD::FLDEXP:
   case ISD::FCOPYSIGN:
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::LRINT:
   case ISD::LLRINT:
     Res = WidenVecOp_UnrollVectorOp(N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 18a3b7bce104a7..27675dce70c260 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5436,6 +5436,8 @@ bool SelectionDAG::isKnownNeverNaN(SDValue Op, bool SNaN, unsigned Depth) const
   case ISD::FCEIL:
   case ISD::FROUND:
   case ISD::FROUNDEVEN:
+  case ISD::LROUND:
+  case ISD::LLROUND:
   case ISD::FRINT:
   case ISD::LRINT:
   case ISD::LLRINT:
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 4ff8617f740c89..35d6304cf9b400 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -774,8 +774,9 @@ void TargetLoweringBase::initActions() {
       setOperationAction(
           {ISD::FCOPYSIGN, ISD::SIGN_EXTEND_INREG, ISD::ANY_EXTEND_VECTOR_INREG,
            ISD::SIGN_EXTEND_VECTOR_INREG, ISD::ZERO_EXTEND_VECTOR_INREG,
-           ISD::SPLAT_VECTOR, ISD::LRINT, ISD::LLRINT, ISD::FTAN, ISD::FACOS,
-           ISD::FASIN, ISD::FATAN, ISD::FCOSH, ISD::FSINH, ISD::FTANH},
+           ISD::SPLAT_VECTOR, ISD::LRINT, ISD::LLRINT, ISD::LROUND,
+           ISD::LLROUND, ISD::FTAN, ISD::FACOS, ISD::FASIN, ISD::FATAN,
+           ISD::FCOSH, ISD::FSINH, ISD::FTANH},
           VT, Expand);
 
       // Constrained floating-point operations default to expand.
diff --git a/llvm/test/CodeGen/AMDGPU/lround.ll b/llvm/test/CodeGen/AMDGPU/lround.ll
index d45d83026013df..072ee70b840d83 100644
--- a/llvm/test/CodeGen/AMDGPU/lround.ll
+++ b/llvm/test/CodeGen/AMDGPU/lround.ll
@@ -6,94 +6,6 @@
 ; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck -check-prefixes=GFX11-SDAG %s
 ; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1100 < %s | FileCheck -check-prefixes=GFX11-GISEL %s
 
-declare float @llvm.round.f32(float)
-declare i32 @llvm.lround.i32.f32(float)
-declare i32 @llvm.lround.i32.f64(double)
-declare i64 @llvm.lround.i64.f32(float)
-declare i64 @llvm.lround.i64.f64(double)
-declare i64 @llvm.llround.i64.f32(float)
-declare half @llvm.round.f16(half)
-declare i32 @llvm.lround.i32.f16(half %arg)
-
-define float @intrinsic_fround(float %arg) {
-; GFX9-SDAG-LABEL: intrinsic_fround:
-; GFX9-SDAG:       ; %bb.0: ; %entry
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v1, v0
-; GFX9-SDAG-NEXT:    v_sub_f32_e32 v2, v0, v1
-; GFX9-SDAG-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v2|, 0.5
-; GFX9-SDAG-NEXT:    v_cndmask_b32_e64 v2, 0, 1.0, s[4:5]
-; GFX9-SDAG-NEXT:    s_brev_b32 s4, -2
-; GFX9-SDAG-NEXT:    v_bfi_b32 v0, s4, v2, v0
-; GFX9-SDAG-NEXT:    v_add_f32_e32 v0, v1, v0
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: intrinsic_fround:
-; GFX9-GISEL:       ; %bb.0: ; %entry
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v1, v0
-; GFX9-GISEL-NEXT:    v_sub_f32_e32 v2, v0, v1
-; GFX9-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v2|, 0.5
-; GFX9-GISEL-NEXT:    v_cndmask_b32_e64 v2, 0, 1.0, s[4:5]
-; GFX9-GISEL-NEXT:    v_bfrev_b32_e32 v3, 1
-; GFX9-GISEL-NEXT:    v_and_or_b32 v0, v0, v3, v2
-; GFX9-GISEL-NEXT:    v_add_f32_e32 v0, v1, v0
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-SDAG-LABEL: intrinsic_fround:
-; GFX10-SDAG:       ; %bb.0: ; %entry
-; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v1, v0
-; GFX10-SDAG-NEXT:    v_sub_f32_e32 v2, v0, v1
-; GFX10-SDAG-NEXT:    v_cmp_ge_f32_e64 s4, |v2|, 0.5
-; GFX10-SDAG-NEXT:    v_cndmask_b32_e64 v2, 0, 1.0, s4
-; GFX10-SDAG-NEXT:    v_bfi_b32 v0, 0x7fffffff, v2, v0
-; GFX10-SDAG-NEXT:    v_add_f32_e32 v0, v1, v0
-; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-GISEL-LABEL: intrinsic_fround:
-; GFX10-GISEL:       ; %bb.0: ; %entry
-; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v1, v0
-; GFX10-GISEL-NEXT:    v_sub_f32_e32 v2, v0, v1
-; GFX10-GISEL-NEXT:    v_cmp_ge_f32_e64 s4, |v2|, 0.5
-; GFX10-GISEL-NEXT:    v_cndmask_b32_e64 v2, 0, 1.0, s4
-; GFX10-GISEL-NEXT:    v_and_or_b32 v0, 0x80000000, v0, v2
-; GFX10-GISEL-NEXT:    v_add_f32_e32 v0, v1, v0
-; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-SDAG-LABEL: intrinsic_fround:
-; GFX11-SDAG:       ; %bb.0: ; %entry
-; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v1, v0
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_sub_f32_e32 v2, v0, v1
-; GFX11-SDAG-NEXT:    v_cmp_ge_f32_e64 s0, |v2|, 0.5
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_cndmask_b32_e64 v2, 0, 1.0, s0
-; GFX11-SDAG-NEXT:    v_bfi_b32 v0, 0x7fffffff, v2, v0
-; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-SDAG-NEXT:    v_add_f32_e32 v0, v1, v0
-; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-GISEL-LABEL: intrinsic_fround:
-; GFX11-GISEL:       ; %bb.0: ; %entry
-; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v1, v0
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-GISEL-NEXT:    v_sub_f32_e32 v2, v0, v1
-; GFX11-GISEL-NEXT:    v_cmp_ge_f32_e64 s0, |v2|, 0.5
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-GISEL-NEXT:    v_cndmask_b32_e64 v2, 0, 1.0, s0
-; GFX11-GISEL-NEXT:    v_and_or_b32 v0, 0x80000000, v0, v2
-; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-GISEL-NEXT:    v_add_f32_e32 v0, v1, v0
-; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
-entry:
-  %res = tail call float @llvm.round.f32(float %arg)
-  ret float %res
-}
-
 define i32 @intrinsic_lround_i32_f32(float %arg) {
 ; GFX9-SDAG-LABEL: intrinsic_lround_i32_f32:
 ; GFX9-SDAG:       ; %bb.0: ; %entry
@@ -1034,3 +946,394 @@ entry:
   ret i32 %res
 }
 
+define <2 x i32> @intrinsic_lround_v2i32_v2f32(<2 x float> %arg) {
+; GFX9-SDAG-LABEL: intrinsic_lround_v2i32_v2f32:
+; GFX9-SDAG:       ; %bb.0: ; %entry
+; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX9-SDAG-NEXT:    v_sub_f32_e32 v3, v0, v2
+; GFX9-SDAG-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v3|, 0.5
+; GFX9-SDAG-NEXT:    v_cndmask_b32_e64 v3, 0, 1.0, s[4:5]
+; GFX9-SDAG-NEXT:    s_brev_b32 s6, -2
+; GFX9-SDAG-NEXT:    v_bfi_b32 v0, s6, v3, v0
+; GFX9-SDAG-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v2, v1
+; GFX9-SDAG-NEXT:    v_sub_f32_e32 v3, v1, v2
+; GFX9-SDAG-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v3|, 0.5
+; GFX9-SDAG-NEXT:    v_cndmask_b32_e64 v3, 0, 1.0, s[4:5]
+; GFX9-SDAG-NEXT:    v_bfi_b32 v1, s6, v3, v1
+; GFX9-SDAG-NEXT:    v_add_f32_e32 v1, v2, v1
+; GFX9-SDAG-NEXT:    v_cvt_i32_f32_e32 v0, v0
+; GFX9-SDAG-NEXT:    v_cvt_i32_f32_e32 v1, v1
+; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX9-GISEL-LABEL: intrinsic_lround_v2i32_v2f32:
+; GFX9-GISEL:       ; %bb.0: ; %entry
+; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX9-GISEL-NEXT:    v_sub_f32_e32 v3, v0, v2
+; GFX9-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v3|, 0.5
+; GFX9-GISEL-NEXT:    v_cndmask_b32_e64 v3, 0, 1.0, s[4:5]
+; GFX9-GISEL-NEXT:    v_bfrev_b32_e32 v4, 1
+; GFX9-GISEL-NEXT:    v_and_or_b32 v0, v0, v4, v3
+; GFX9-GISEL-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v2, v1
+; GFX9-GISEL-NEXT:    v_sub_f32_e32 v3, v1, v2
+; GFX9-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v3|, 0.5
+; GFX9-GISEL-NEXT:    v_cndmask_b32_e64 v3, 0, 1.0, s[4:5]
+; GFX9-GISEL-NEXT:    v_and_or_b32 v1, v1, v4, v3
+; GFX9-GISEL-NEXT:    v_add_f32_e32 v1, v2, v1
+; GFX9-GISEL-NEXT:    v_cvt_i32_f32_e32 v0, v0
+; GFX9-GISEL-NEXT:    v_cvt_i32_f32_e32 v1, v1
+; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-SDAG-LABEL: intrinsic_lround_v2i32_v2f32:
+; GFX10-SDAG:       ; %bb.0: ; %entry
+; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX10-SDAG-NEXT:    v_sub_f32_e32 v4, v0, v2
+; GFX10-SDAG-NEXT:    v_sub_f32_e32 v5, v1, v3
+; GFX10-SDAG-NEXT:    v_cmp_ge_f32_e64 s4, |v4|, 0.5
+; GFX10-SDAG-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s4
+; GFX10-SDAG-NEXT:    v_cmp_ge_f32_e64 s4, |v5|, 0.5
+; GFX10-SDAG-NEXT:    v_bfi_b32 v0, 0x7fffffff, v4, v0
+; GFX10-SDAG-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s4
+; GFX10-SDAG-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX10-SDAG-NEXT:    v_bfi_b32 v1, 0x7fffffff, v5, v1
+; GFX10-SDAG-NEXT:    v_cvt_i32_f32_e32 v0, v0
+; GFX10-SDAG-NEXT:    v_add_f32_e32 v1, v3, v1
+; GFX10-SDAG-NEXT:    v_cvt_i32_f32_e32 v1, v1
+; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: intrinsic_lround_v2i32_v2f32:
+; GFX10-GISEL:       ; %bb.0: ; %entry
+; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX10-GISEL-NEXT:    v_sub_f32_e32 v4, v0, v2
+; GFX10-GISEL-NEXT:    v_sub_f32_e32 v5, v1, v3
+; GFX10-GISEL-NEXT:    v_cmp_ge_f32_e64 s4, |v4|, 0.5
+; GFX10-GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s4
+; GFX10-GISEL-NEXT:    v_cmp_ge_f32_e64 s4, |v5|, 0.5
+; GFX10-GISEL-NEXT:    v_and_or_b32 v0, 0x80000000, v0, v4
+; GFX10-GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s4
+; GFX10-GISEL-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX10-GISEL-NEXT:    v_and_or_b32 v1, 0x80000000, v1, v5
+; GFX10-GISEL-NEXT:    v_cvt_i32_f32_e32 v0, v0
+; GFX10-GISEL-NEXT:    v_add_f32_e32 v1, v3, v1
+; GFX10-GISEL-NEXT:    v_cvt_i32_f32_e32 v1, v1
+; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: intrinsic_lround_v2i32_v2f32:
+; GFX11-SDAG:       ; %bb.0: ; %entry
+; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_dual_sub_f32 v4, v0, v2 :: v_dual_sub_f32 v5, v1, v3
+; GFX11-SDAG-NEXT:    v_cmp_ge_f32_e64 s0, |v4|, 0.5
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s0
+; GFX11-SDAG-NEXT:    v_cmp_ge_f32_e64 s0, |v5|, 0.5
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-NEXT:    v_bfi_b32 v0, 0x7fffffff, v4, v0
+; GFX11-SDAG-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s0
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_bfi_b32 v1, 0x7fffffff, v5, v1
+; GFX11-SDAG-NEXT:    v_dual_add_f32 v0, v2, v0 :: v_dual_add_f32 v1, v3, v1
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-NEXT:    v_cvt_i32_f32_e32 v0, v0
+; GFX11-SDAG-NEXT:    v_cvt_i32_f32_e32 v1, v1
+; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: intrinsic_lround_v2i32_v2f32:
+; GFX11-GISEL:       ; %bb.0: ; %entry
+; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_dual_sub_f32 v4, v0, v2 :: v_dual_sub_f32 v5, v1, v3
+; GFX11-GISEL-NEXT:    v_cmp_ge_f32_e64 s0, |v4|, 0.5
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s0
+; GFX11-GISEL-NEXT:    v_cmp_ge_f32_e64 s0, |v5|, 0.5
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-GISEL-NEXT:    v_and_or_b32 v0, 0x80000000, v0, v4
+; GFX11-GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s0
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_and_or_b32 v1, 0x80000000, v1, v5
+; GFX11-GISEL-NEXT:    v_dual_add_f32 v0, v2, v0 :: v_dual_add_f32 v1, v3, v1
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-GISEL-NEXT:    v_cvt_i32_f32_e32 v0, v0
+; GFX11-GISEL-NEXT:    v_cvt_i32_f32_e32 v1, v1
+; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %res = tail call <2 x i32> @llvm.lround.v2i32.v2f32(<2 x float> %arg)
+  ret <2 x i32> %res
+}
+
+define <2 x i64> @intrinsic_lround_v2i64_v2f32(<2 x float> %arg) {
+; GFX9-SDAG-LABEL: intrinsic_lround_v2i64_v2f32:
+; GFX9-SDAG:       ; %bb.0: ; %entry
+; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX9-SDAG-NEXT:    v_sub_f32_e32 v3, v0, v2
+; GFX9-SDAG-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v3|, 0.5
+; GFX9-SDAG-NEXT:    v_cndmask_b32_e64 v3, 0, 1.0, s[4:5]
+; GFX9-SDAG-NEXT:    s_brev_b32 s6, -2
+; GFX9-SDAG-NEXT:    v_bfi_b32 v0, s6, v3, v0
+; GFX9-SDAG-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v0, v0
+; GFX9-SDAG-NEXT:    s_mov_b32 s7, 0x2f800000
+; GFX9-SDAG-NEXT:    v_mul_f32_e64 v2, |v0|, s7
+; GFX9-SDAG-NEXT:    v_floor_f32_e32 v2, v2
+; GFX9-SDAG-NEXT:    s_mov_b32 s8, 0xcf800000
+; GFX9-SDAG-NEXT:    v_fma_f32 v3, v2, s8, |v0|
+; GFX9-SDAG-NEXT:    v_cvt_u32_f32_e32 v3, v3
+; GFX9-SDAG-NEXT:    v_ashrrev_i32_e32 v4, 31, v0
+; GFX9-SDAG-NEXT:    v_cvt_u32_f32_e32 v2, v2
+; GFX9-SDAG-NEXT:    v_xor_b32_e32 v0, v3, v4
+; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX9-SDAG-NEXT:    v_sub_f32_e32 v5, v1, v3
+; GFX9-SDAG-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v5|, 0.5
+; GFX9-SDAG-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s[4:5]
+; GFX9-SDAG-NEXT:    v_bfi_b32 v1, s6, v5, v1
+; GFX9-SDAG-NEXT:    v_add_f32_e32 v1, v3, v1
+; GFX9-SDAG-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX9-SDAG-NEXT:    v_mul_f32_e64 v1, |v3|, s7
+; GFX9-SDAG-NEXT:    v_floor_f32_e32 v1, v1
+; GFX9-SDAG-NEXT:    v_fma_f32 v5, v1, s8, |v3|
+; GFX9-SDAG-NEXT:    v_cvt_u32_f32_e32 v5, v5
+; GFX9-SDAG-NEXT:    v_cvt_u32_f32_e32 v6, v1
+; GFX9-SDAG-NEXT:    v_xor_b32_e32 v2, v2, v4
+; GFX9-SDAG-NEXT:    v_sub_co_u32_e32 v0, vcc, v0, v4
+; GFX9-SDAG-NEXT:    v_ashrrev_i32_e32 v3, 31, v3
+; GFX9-SDAG-NEXT:    v_subb_co_u32_e32 v1, vcc, v2, v4, vcc
+; GFX9-SDAG-NEXT:    v_xor_b32_e32 v2, v5, v3
+; GFX9-SDAG-NEXT:    v_xor_b32_e32 v4, v6, v3
+; GFX9-SDAG-NEXT:    v_sub_co_u32_e32 v2, vcc, v2, v3
+; GFX9-SDAG-NEXT:    v_subb_co_u32_e32 v3, vcc, v4, v3, vcc
+; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX9-GISEL-LABEL: intrinsic_lround_v2i64_v2f32:
+; GFX9-GISEL:       ; %bb.0: ; %entry
+; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX9-GISEL-NEXT:    v_sub_f32_e32 v3, v0, v2
+; GFX9-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v3|, 0.5
+; GFX9-GISEL-NEXT:    v_cndmask_b32_e64 v3, 0, 1.0, s[4:5]
+; GFX9-GISEL-NEXT:    v_bfrev_b32_e32 v4, 1
+; GFX9-GISEL-NEXT:    v_and_or_b32 v0, v0, v4, v3
+; GFX9-GISEL-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX9-GISEL-NEXT:    v_mov_b32_e32 v3, 0x2f800000
+; GFX9-GISEL-NEXT:    v_mul_f32_e64 v5, |v2|, v3
+; GFX9-GISEL-NEXT:    v_floor_f32_e32 v5, v5
+; GFX9-GISEL-NEXT:    v_mov_b32_e32 v6, 0xcf800000
+; GFX9-GISEL-NEXT:    v_fma_f32 v2, v5, v6, |v2|
+; GFX9-GISEL-NEXT:    v_cvt_u32_f32_e32 v2, v2
+; GFX9-GISEL-NEXT:    v_cvt_u32_f32_e32 v5, v5
+; GFX9-GISEL-NEXT:    v_ashrrev_i32_e32 v7, 31, v0
+; GFX9-GISEL-NEXT:    v_xor_b32_e32 v0, v2, v7
+; GFX9-GISEL-NEXT:    v_xor_b32_e32 v2, v5, v7
+; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v5, v1
+; GFX9-GISEL-NEXT:    v_sub_f32_e32 v8, v1, v5
+; GFX9-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], |v8|, 0.5
+; GFX9-GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, 1.0, s[4:5]
+; GFX9-GISEL-NEXT:    v_and_or_b32 v1, v1, v4, v8
+; GFX9-GISEL-NEXT:    v_add_f32_e32 v4, v5, v1
+; GFX9-GISEL-NEXT:    v_trunc_f32_e32 v1, v4
+; GFX9-GISEL-NEXT:    v_mul_f32_e64 v3, |v1|, v3
+; GFX9-GISEL-NEXT:    v_floor_f32_e32 v3, v3
+; GFX9-GISEL-NEXT:    v_fma_f32 v1, v3, v6, |v1|
+; GFX9-GISEL-NEXT:    v_cvt_u32_f32_e32 v5, v1
+; GFX9-GISEL-NEXT:    v_cvt_u32_f32_e32 v3, v3
+; GFX9-GISEL-NEXT:    v_sub_co_u32_e32 v0, vcc, v0, v7
+; GFX9-GISEL-NEXT:    v_ashrrev_i32_e32 v4, 31, v4
+; GFX9-GISEL-NEXT:    v_subb_co_u32_e32 v1, vcc, v2, v7, vcc
+; GFX9-GISEL-NEXT:    v_xor_b32_e32 v2, v5, v4
+; GFX9-GISEL-NEXT:    v_xor_b32_e32 v3, v3, v4
+; GFX9-GISEL-NEXT:    v_sub_co_u32_e32 v2, vcc, v2, v4
+; GFX9-GISEL-NEXT:    v_subb_co_u32_e32 v3, vcc, v3, v4, vcc
+; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-SDAG-LABEL: intrinsic_lround_v2i64_v2f32:
+; GFX10-SDAG:       ; %bb.0: ; %entry
+; GFX10-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX10-SDAG-NEXT:    v_sub_f32_e32 v4, v0, v2
+; GFX10-SDAG-NEXT:    v_sub_f32_e32 v5, v1, v3
+; GFX10-SDAG-NEXT:    v_cmp_ge_f32_e64 s4, |v4|, 0.5
+; GFX10-SDAG-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s4
+; GFX10-SDAG-NEXT:    v_cmp_ge_f32_e64 s4, |v5|, 0.5
+; GFX10-SDAG-NEXT:    v_bfi_b32 v0, 0x7fffffff, v4, v0
+; GFX10-SDAG-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s4
+; GFX10-SDAG-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX10-SDAG-NEXT:    v_bfi_b32 v1, 0x7fffffff, v5, v1
+; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v0, v0
+; GFX10-SDAG-NEXT:    v_add_f32_e32 v1, v3, v1
+; GFX10-SDAG-NEXT:    v_mul_f32_e64 v2, 0x2f800000, |v0|
+; GFX10-SDAG-NEXT:    v_trunc_f32_e32 v1, v1
+; GFX10-SDAG-NEXT:    v_ashrrev_i32_e32 v5, 31, v0
+; GFX10-SDAG-NEXT:    v_floor_f32_e32 v2, v2
+; GFX10-SDAG-NEXT:    v_mul_f32_e64 v3, 0x2f800000, |v1|
+; GFX10-SDAG-NEXT:    v_ashrrev_i32_e32 v6, 31, v1
+; GFX10-SDAG-NEXT:    v_fma_f32 v4, 0xcf800000, v2, |v0|
+; GFX10-SDAG-NEXT:    v_floor_f32_e32 v3, v3
+; GFX10-SDAG-NEXT:    v_cvt_u32_f32_e32 v2, v2
+; GFX10-SDAG-NEXT:    v_fma_f32 v0, 0xcf800000, v3, |v1|
+; GFX10-SDAG-NEXT:    v_cvt_u32_f32_e32 v1, v4
+; GFX10-SDAG-NEXT:    v_cvt_u32_f32_e32 v3, v3
+; GFX10-SDAG-NEXT:    v_xor_b32_e32 v2, v2, v5
+; GFX10-SDAG-NEXT:    v_cvt_u32_f32_e32 v0, v0
+; GFX10-SDAG-NEXT:    v_xor_b32_e32 v1, v1, v5
+; GFX10-SDAG-NEXT:    v_xor_b32_e32 v3, v3, v6
+; GFX10-SDAG-NEXT:    v_xor_b32_e32 v4, v0, v6
+; GFX10-SDAG-NEXT:    v_sub_co_u32 v0, vcc_lo, v1, v5
+; GFX10-SDAG-NEXT:    v_sub_co_ci_u32_e32 v1, vcc_lo, v2, v5, vcc_lo
+; GFX10-SDAG-NEXT:    v_sub_co_u32 v2, vcc_lo, v4, v6
+; GFX10-SDAG-NEXT:    v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v6, vcc_lo
+; GFX10-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: intrinsic_lround_v2i64_v2f32:
+; GFX10-GISEL:       ; %bb.0: ; %entry
+; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX10-GISEL-NEXT:    v_sub_f32_e32 v4, v0, v2
+; GFX10-GISEL-NEXT:    v_sub_f32_e32 v5, v1, v3
+; GFX10-GISEL-NEXT:    v_cmp_ge_f32_e64 s4, |v4|, 0.5
+; GFX10-GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s4
+; GFX10-GISEL-NEXT:    v_cmp_ge_f32_e64 s4, |v5|, 0.5
+; GFX10-GISEL-NEXT:    v_and_or_b32 v0, 0x80000000, v0, v4
+; GFX10-GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s4
+; GFX10-GISEL-NEXT:    v_add_f32_e32 v0, v2, v0
+; GFX10-GISEL-NEXT:    v_and_or_b32 v1, 0x80000000, v1, v5
+; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX10-GISEL-NEXT:    v_add_f32_e32 v1, v3, v1
+; GFX10-GISEL-NEXT:    v_ashrrev_i32_e32 v6, 31, v0
+; GFX10-GISEL-NEXT:    v_mul_f32_e64 v4, 0x2f800000, |v2|
+; GFX10-GISEL-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX10-GISEL-NEXT:    v_floor_f32_e32 v4, v4
+; GFX10-GISEL-NEXT:    v_mul_f32_e64 v5, 0x2f800000, |v3|
+; GFX10-GISEL-NEXT:    v_fma_f32 v2, 0xcf800000, v4, |v2|
+; GFX10-GISEL-NEXT:    v_floor_f32_e32 v5, v5
+; GFX10-GISEL-NEXT:    v_fma_f32 v0, 0xcf800000, v5, |v3|
+; GFX10-GISEL-NEXT:    v_ashrrev_i32_e32 v3, 31, v1
+; GFX10-GISEL-NEXT:    v_cvt_u32_f32_e32 v1, v2
+; GFX10-GISEL-NEXT:    v_cvt_u32_f32_e32 v2, v4
+; GFX10-GISEL-NEXT:    v_cvt_u32_f32_e32 v4, v5
+; GFX10-GISEL-NEXT:    v_cvt_u32_f32_e32 v0, v0
+; GFX10-GISEL-NEXT:    v_xor_b32_e32 v1, v1, v6
+; GFX10-GISEL-NEXT:    v_xor_b32_e32 v2, v2, v6
+; GFX10-GISEL-NEXT:    v_xor_b32_e32 v4, v4, v3
+; GFX10-GISEL-NEXT:    v_xor_b32_e32 v5, v0, v3
+; GFX10-GISEL-NEXT:    v_sub_co_u32 v0, vcc_lo, v1, v6
+; GFX10-GISEL-NEXT:    v_sub_co_ci_u32_e32 v1, vcc_lo, v2, v6, vcc_lo
+; GFX10-GISEL-NEXT:    v_sub_co_u32 v2, vcc_lo, v5, v3
+; GFX10-GISEL-NEXT:    v_sub_co_ci_u32_e32 v3, vcc_lo, v4, v3, vcc_lo
+; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: intrinsic_lround_v2i64_v2f32:
+; GFX11-SDAG:       ; %bb.0: ; %entry
+; GFX11-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_dual_sub_f32 v4, v0, v2 :: v_dual_sub_f32 v5, v1, v3
+; GFX11-SDAG-NEXT:    v_cmp_ge_f32_e64 s0, |v4|, 0.5
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s0
+; GFX11-SDAG-NEXT:    v_cmp_ge_f32_e64 s0, |v5|, 0.5
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-NEXT:    v_bfi_b32 v0, 0x7fffffff, v4, v0
+; GFX11-SDAG-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s0
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-SDAG-NEXT:    v_bfi_b32 v1, 0x7fffffff, v5, v1
+; GFX11-SDAG-NEXT:    v_dual_add_f32 v0, v2, v0 :: v_dual_add_f32 v1, v3, v1
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v0, v0
+; GFX11-SDAG-NEXT:    v_trunc_f32_e32 v1, v1
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_mul_f32_e64 v2, 0x2f800000, |v0|
+; GFX11-SDAG-NEXT:    v_ashrrev_i32_e32 v5, 31, v0
+; GFX11-SDAG-NEXT:    v_mul_f32_e64 v3, 0x2f800000, |v1|
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_floor_f32_e32 v2, v2
+; GFX11-SDAG-NEXT:    v_ashrrev_i32_e32 v6, 31, v1
+; GFX11-SDAG-NEXT:    v_floor_f32_e32 v3, v3
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_fma_f32 v4, 0xcf800000, v2, |v0|
+; GFX11-SDAG-NEXT:    v_cvt_u32_f32_e32 v2, v2
+; GFX11-SDAG-NEXT:    v_fma_f32 v0, 0xcf800000, v3, |v1|
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-SDAG-NEXT:    v_cvt_u32_f32_e32 v1, v4
+; GFX11-SDAG-NEXT:    v_cvt_u32_f32_e32 v3, v3
+; GFX11-SDAG-NEXT:    v_xor_b32_e32 v2, v2, v5
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-SDAG-NEXT:    v_cvt_u32_f32_e32 v0, v0
+; GFX11-SDAG-NEXT:    v_xor_b32_e32 v1, v1, v5
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_xor_b32_e32 v3, v3, v6
+; GFX11-SDAG-NEXT:    v_xor_b32_e32 v4, v0, v6
+; GFX11-SDAG-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-SDAG-NEXT:    v_sub_co_u32 v0, vcc_lo, v1, v5
+; GFX11-SDAG-NEXT:    v_sub_co_ci_u32_e32 v1, vcc_lo, v2, v5, vcc_lo
+; GFX11-SDAG-NEXT:    v_sub_co_u32 v2, vcc_lo, v4, v6
+; GFX11-SDAG-NEXT:    v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v6, vcc_lo
+; GFX11-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: intrinsic_lround_v2i64_v2f32:
+; GFX11-GISEL:       ; %bb.0: ; %entry
+; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_dual_sub_f32 v4, v0, v2 :: v_dual_sub_f32 v5, v1, v3
+; GFX11-GISEL-NEXT:    v_cmp_ge_f32_e64 s0, |v4|, 0.5
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, 1.0, s0
+; GFX11-GISEL-NEXT:    v_cmp_ge_f32_e64 s0, |v5|, 0.5
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-GISEL-NEXT:    v_and_or_b32 v0, 0x80000000, v0, v4
+; GFX11-GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, 1.0, s0
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-GISEL-NEXT:    v_and_or_b32 v1, 0x80000000, v1, v5
+; GFX11-GISEL-NEXT:    v_dual_add_f32 v0, v2, v0 :: v_dual_add_f32 v1, v3, v1
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v2, v0
+; GFX11-GISEL-NEXT:    v_ashrrev_i32_e32 v6, 31, v0
+; GFX11-GISEL-NEXT:    v_trunc_f32_e32 v3, v1
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-GISEL-NEXT:    v_mul_f32_e64 v4, 0x2f800000, |v2|
+; GFX11-GISEL-NEXT:    v_mul_f32_e64 v5, 0x2f800000, |v3|
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-GISEL-NEXT:    v_floor_f32_e32 v4, v4
+; GFX11-GISEL-NEXT:    v_floor_f32_e32 v5, v5
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-GISEL-NEXT:    v_fma_f32 v2, 0xcf800000, v4, |v2|
+; GFX11-GISEL-NEXT:    v_fma_f32 v0, 0xcf800000, v5, |v3|
+; GFX11-GISEL-NEXT:    v_ashrrev_i32_e32 v3, 31, v1
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_3) | instid1(VALU_DEP_4)
+; GFX11-GISEL-NEXT:    v_cvt_u32_f32_e32 v1, v2
+; GFX11-GISEL-NEXT:    v_cvt_u32_f32_e32 v2, v4
+; GFX11-GISEL-NEXT:    v_cvt_u32_f32_e32 v4, v5
+; GFX11-GISEL-NEXT:    v_cvt_u32_f32_e32 v0, v0
+; GFX11-GISEL-NEXT:    v_xor_b32_e32 v1, v1, v6
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-GISEL-NEXT:    v_xor_b32_e32 v2, v2, v6
+; GFX11-GISEL-NEXT:    v_xor_b32_e32 v4, v4, v3
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
+; GFX11-GISEL-NEXT:    v_xor_b32_e32 v5, v0, v3
+; GFX11-GISEL-NEXT:    v_sub_co_u32 v0, vcc_lo, v1, v6
+; GFX11-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-GISEL-NEXT:    v_sub_co_ci_u32_e32 v1, vcc_lo, v2, v6, vcc_lo
+; GFX11-GISEL-NEXT:    v_sub_co_u32 v2, vcc_lo, v5, v3
+; GFX11-GISEL-NEXT:    v_sub_co_ci_u32_e32 v3, vcc_lo, v4, v3, vcc_lo
+; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %res = tail call <2 x i64> @llvm.lround.v2i64.v2f32(<2 x float> %arg)
+  ret <2 x i64> %res
+}

>From 6cb14599ade843be3171fa7e4dd5f3601a3bb0de Mon Sep 17 00:00:00 2001
From: Jacob Lalonde <jalalonde at fb.com>
Date: Wed, 21 Aug 2024 10:25:23 -0700
Subject: [PATCH 028/116] [LLDB][Minidump] Fix
 ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps
 are missing. (#101086)

This PR is in response to a bug my coworker @mbucko discovered where on
MacOS Minidumps were being created where the 64b memory regions were
readable, but were not being listed in
`SBProcess.GetMemoryRegionList()`. This went unnoticed in #95312 due to
all the linux testing including /proc/pid maps. On MacOS generated dumps
(or any dump without access to /proc/pid) we would fail to properly map
Memory Regions due to there being two independent methods for 32b and
64b mapping.

In this PR I addressed this minor bug and merged the methods, but in
order to add test coverage required additions to `obj2yaml` and
`yaml2obj` which make up the bulk of this patch.

Lastly, there are some non-required changes such as the addition of the
`Memory64ListHeader` type, to make writing/reading the header section of
the Memory64List easier.
---
 .../Minidump/MinidumpFileBuilder.cpp          |  13 +-
 .../Process/minidump/MinidumpParser.cpp       | 113 +++++++-----------
 .../Plugins/Process/minidump/MinidumpParser.h |   4 +
 .../Process/minidump/MinidumpTypes.cpp        |  20 ----
 .../Plugins/Process/minidump/MinidumpTypes.h  |   3 -
 .../minidump-new/TestMiniDumpNew.py           |  19 +++
 .../minidump-new/linux-x86_64_mem64.yaml      |  43 +++++++
 7 files changed, 119 insertions(+), 96 deletions(-)
 create mode 100644 lldb/test/API/functionalities/postmortem/minidump-new/linux-x86_64_mem64.yaml

diff --git a/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp b/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp
index c0cc3af638a777..4b862d8d8e99b8 100644
--- a/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp
+++ b/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp
@@ -1014,15 +1014,17 @@ MinidumpFileBuilder::AddMemoryList_32(Process::CoreFileMemoryRanges &ranges) {
   // With a size of the number of ranges as a 32 bit num
   // And then the size of all the ranges
   error = AddDirectory(StreamType::MemoryList,
-                       sizeof(llvm::support::ulittle32_t) +
+                       sizeof(llvm::minidump::MemoryListHeader) +
                            descriptors.size() *
                                sizeof(llvm::minidump::MemoryDescriptor));
   if (error.Fail())
     return error;
 
+  llvm::minidump::MemoryListHeader list_header;
   llvm::support::ulittle32_t memory_ranges_num =
       static_cast<llvm::support::ulittle32_t>(descriptors.size());
-  m_data.AppendData(&memory_ranges_num, sizeof(llvm::support::ulittle32_t));
+  list_header.NumberOfMemoryRanges = memory_ranges_num;
+  m_data.AppendData(&list_header, sizeof(llvm::minidump::MemoryListHeader));
   // For 32b we can get away with writing off the descriptors after the data.
   // This means no cleanup loop needed.
   m_data.AppendData(descriptors.data(),
@@ -1044,9 +1046,10 @@ MinidumpFileBuilder::AddMemoryList_64(Process::CoreFileMemoryRanges &ranges) {
   if (error.Fail())
     return error;
 
+  llvm::minidump::Memory64ListHeader list_header;
   llvm::support::ulittle64_t memory_ranges_num =
       static_cast<llvm::support::ulittle64_t>(ranges.size());
-  m_data.AppendData(&memory_ranges_num, sizeof(llvm::support::ulittle64_t));
+  list_header.NumberOfMemoryRanges = memory_ranges_num;
   // Capture the starting offset for all the descriptors so we can clean them up
   // if needed.
   offset_t starting_offset =
@@ -1058,8 +1061,8 @@ MinidumpFileBuilder::AddMemoryList_64(Process::CoreFileMemoryRanges &ranges) {
       (ranges.size() * sizeof(llvm::minidump::MemoryDescriptor_64));
   llvm::support::ulittle64_t memory_ranges_base_rva =
       static_cast<llvm::support::ulittle64_t>(base_rva);
-  m_data.AppendData(&memory_ranges_base_rva,
-                    sizeof(llvm::support::ulittle64_t));
+  list_header.BaseRVA = memory_ranges_base_rva;
+  m_data.AppendData(&list_header, sizeof(llvm::minidump::Memory64ListHeader));
 
   bool cleanup_required = false;
   std::vector<MemoryDescriptor_64> descriptors;
diff --git a/lldb/source/Plugins/Process/minidump/MinidumpParser.cpp b/lldb/source/Plugins/Process/minidump/MinidumpParser.cpp
index be9fae938e2276..c099c28a620ecf 100644
--- a/lldb/source/Plugins/Process/minidump/MinidumpParser.cpp
+++ b/lldb/source/Plugins/Process/minidump/MinidumpParser.cpp
@@ -429,7 +429,6 @@ const minidump::ExceptionStream *MinidumpParser::GetExceptionStream() {
 
 std::optional<minidump::Range>
 MinidumpParser::FindMemoryRange(lldb::addr_t addr) {
-  llvm::ArrayRef<uint8_t> data64 = GetStream(StreamType::Memory64List);
   Log *log = GetLog(LLDBLog::Modules);
 
   auto ExpectedMemory = GetMinidumpFile().getMemoryList();
@@ -457,33 +456,17 @@ MinidumpParser::FindMemoryRange(lldb::addr_t addr) {
     }
   }
 
-  // Some Minidumps have a Memory64ListStream that captures all the heap memory
-  // (full-memory Minidumps).  We can't exactly use the same loop as above,
-  // because the Minidump uses slightly different data structures to describe
-  // those
-
-  if (!data64.empty()) {
-    llvm::ArrayRef<MinidumpMemoryDescriptor64> memory64_list;
-    uint64_t base_rva;
-    std::tie(memory64_list, base_rva) =
-        MinidumpMemoryDescriptor64::ParseMemory64List(data64);
-
-    if (memory64_list.empty())
-      return std::nullopt;
-
-    for (const auto &memory_desc64 : memory64_list) {
-      const lldb::addr_t range_start = memory_desc64.start_of_memory_range;
-      const size_t range_size = memory_desc64.data_size;
-
-      if (base_rva + range_size > GetData().size())
-        return std::nullopt;
-
-      if (range_start <= addr && addr < range_start + range_size) {
-        return minidump::Range(range_start,
-                               GetData().slice(base_rva, range_size));
+  if (!GetStream(StreamType::Memory64List).empty()) {
+    llvm::Error err = llvm::Error::success();
+    for (const auto &memory_desc :  GetMinidumpFile().getMemory64List(err)) {
+      if (memory_desc.first.StartOfMemoryRange <= addr 
+          && addr < memory_desc.first.StartOfMemoryRange + memory_desc.first.DataSize) {
+        return minidump::Range(memory_desc.first.StartOfMemoryRange, memory_desc.second);
       }
-      base_rva += range_size;
     }
+
+    if (err)
+      LLDB_LOG_ERROR(log, std::move(err), "Failed to read memory64 list: {0}");
   }
 
   return std::nullopt;
@@ -512,6 +495,11 @@ llvm::ArrayRef<uint8_t> MinidumpParser::GetMemory(lldb::addr_t addr,
   return range->range_ref.slice(offset, overlap);
 }
 
+llvm::iterator_range<FallibleMemory64Iterator> MinidumpParser::GetMemory64Iterator(llvm::Error &err) {
+  llvm::ErrorAsOutParameter ErrAsOutParam(&err);
+  return m_file->getMemory64List(err);
+}
+
 static bool
 CreateRegionsCacheFromMemoryInfoList(MinidumpParser &parser,
                                      std::vector<MemoryRegionInfo> &regions) {
@@ -553,53 +541,44 @@ static bool
 CreateRegionsCacheFromMemoryList(MinidumpParser &parser,
                                  std::vector<MemoryRegionInfo> &regions) {
   Log *log = GetLog(LLDBLog::Modules);
+  // Cache the expected memory32 into an optional
+  // because it is possible to just have a memory64 list
   auto ExpectedMemory = parser.GetMinidumpFile().getMemoryList();
   if (!ExpectedMemory) {
     LLDB_LOG_ERROR(log, ExpectedMemory.takeError(),
                    "Failed to read memory list: {0}");
-    return false;
-  }
-  regions.reserve(ExpectedMemory->size());
-  for (const MemoryDescriptor &memory_desc : *ExpectedMemory) {
-    if (memory_desc.Memory.DataSize == 0)
-      continue;
-    MemoryRegionInfo region;
-    region.GetRange().SetRangeBase(memory_desc.StartOfMemoryRange);
-    region.GetRange().SetByteSize(memory_desc.Memory.DataSize);
-    region.SetReadable(MemoryRegionInfo::eYes);
-    region.SetMapped(MemoryRegionInfo::eYes);
-    regions.push_back(region);
+  } else {
+    for (const MemoryDescriptor &memory_desc : *ExpectedMemory) {
+      if (memory_desc.Memory.DataSize == 0)
+        continue;
+      MemoryRegionInfo region;
+      region.GetRange().SetRangeBase(memory_desc.StartOfMemoryRange);
+      region.GetRange().SetByteSize(memory_desc.Memory.DataSize);
+      region.SetReadable(MemoryRegionInfo::eYes);
+      region.SetMapped(MemoryRegionInfo::eYes);
+      regions.push_back(region);
+    }
   }
-  regions.shrink_to_fit();
-  return !regions.empty();
-}
-
-static bool
-CreateRegionsCacheFromMemory64List(MinidumpParser &parser,
-                                   std::vector<MemoryRegionInfo> &regions) {
-  llvm::ArrayRef<uint8_t> data =
-      parser.GetStream(StreamType::Memory64List);
-  if (data.empty())
-    return false;
-  llvm::ArrayRef<MinidumpMemoryDescriptor64> memory64_list;
-  uint64_t base_rva;
-  std::tie(memory64_list, base_rva) =
-      MinidumpMemoryDescriptor64::ParseMemory64List(data);
 
-  if (memory64_list.empty())
-    return false;
+  if (!parser.GetStream(StreamType::Memory64List).empty()) {
+    llvm::Error err = llvm::Error::success();
+    for (const auto &memory_desc : parser.GetMemory64Iterator(err)) {
+      if (memory_desc.first.DataSize == 0)
+        continue;
+      MemoryRegionInfo region;
+      region.GetRange().SetRangeBase(memory_desc.first.StartOfMemoryRange);
+      region.GetRange().SetByteSize(memory_desc.first.DataSize);
+      region.SetReadable(MemoryRegionInfo::eYes);
+      region.SetMapped(MemoryRegionInfo::eYes);
+      regions.push_back(region);
+    }
 
-  regions.reserve(memory64_list.size());
-  for (const auto &memory_desc : memory64_list) {
-    if (memory_desc.data_size == 0)
-      continue;
-    MemoryRegionInfo region;
-    region.GetRange().SetRangeBase(memory_desc.start_of_memory_range);
-    region.GetRange().SetByteSize(memory_desc.data_size);
-    region.SetReadable(MemoryRegionInfo::eYes);
-    region.SetMapped(MemoryRegionInfo::eYes);
-    regions.push_back(region);
+    if (err) {
+      LLDB_LOG_ERROR(log, std::move(err), "Failed to read memory64 list: {0}");
+      return false;
+    }
   }
+
   regions.shrink_to_fit();
   return !regions.empty();
 }
@@ -620,9 +599,7 @@ std::pair<MemoryRegionInfos, bool> MinidumpParser::BuildMemoryRegions() {
     return return_sorted(true);
   if (CreateRegionsCacheFromMemoryInfoList(*this, result))
     return return_sorted(true);
-  if (CreateRegionsCacheFromMemoryList(*this, result))
-    return return_sorted(false);
-  CreateRegionsCacheFromMemory64List(*this, result);
+  CreateRegionsCacheFromMemoryList(*this, result);
   return return_sorted(false);
 }
 
diff --git a/lldb/source/Plugins/Process/minidump/MinidumpParser.h b/lldb/source/Plugins/Process/minidump/MinidumpParser.h
index 050ba086f46f54..222c0ef47fb853 100644
--- a/lldb/source/Plugins/Process/minidump/MinidumpParser.h
+++ b/lldb/source/Plugins/Process/minidump/MinidumpParser.h
@@ -47,6 +47,8 @@ struct Range {
   }
 };
 
+using FallibleMemory64Iterator = llvm::object::MinidumpFile::FallibleMemory64Iterator;
+
 class MinidumpParser {
 public:
   static llvm::Expected<MinidumpParser>
@@ -92,6 +94,8 @@ class MinidumpParser {
   /// complete (includes all regions mapped into the process memory).
   std::pair<MemoryRegionInfos, bool> BuildMemoryRegions();
 
+  llvm::iterator_range<FallibleMemory64Iterator> GetMemory64Iterator(llvm::Error &err);
+
   static llvm::StringRef GetStreamTypeAsString(StreamType stream_type);
 
   llvm::object::MinidumpFile &GetMinidumpFile() { return *m_file; }
diff --git a/lldb/source/Plugins/Process/minidump/MinidumpTypes.cpp b/lldb/source/Plugins/Process/minidump/MinidumpTypes.cpp
index 5b919828428fae..45dd2272aef041 100644
--- a/lldb/source/Plugins/Process/minidump/MinidumpTypes.cpp
+++ b/lldb/source/Plugins/Process/minidump/MinidumpTypes.cpp
@@ -57,23 +57,3 @@ LinuxProcStatus::Parse(llvm::ArrayRef<uint8_t> &data) {
 }
 
 lldb::pid_t LinuxProcStatus::GetPid() const { return pid; }
-
-std::pair<llvm::ArrayRef<MinidumpMemoryDescriptor64>, uint64_t>
-MinidumpMemoryDescriptor64::ParseMemory64List(llvm::ArrayRef<uint8_t> &data) {
-  const llvm::support::ulittle64_t *mem_ranges_count;
-  Status error = consumeObject(data, mem_ranges_count);
-  if (error.Fail() ||
-      *mem_ranges_count * sizeof(MinidumpMemoryDescriptor64) > data.size())
-    return {};
-
-  const llvm::support::ulittle64_t *base_rva;
-  error = consumeObject(data, base_rva);
-  if (error.Fail())
-    return {};
-
-  return std::make_pair(
-      llvm::ArrayRef(
-          reinterpret_cast<const MinidumpMemoryDescriptor64 *>(data.data()),
-          *mem_ranges_count),
-      *base_rva);
-}
diff --git a/lldb/source/Plugins/Process/minidump/MinidumpTypes.h b/lldb/source/Plugins/Process/minidump/MinidumpTypes.h
index fe99abf9e24ed9..9a9f1cc1578336 100644
--- a/lldb/source/Plugins/Process/minidump/MinidumpTypes.h
+++ b/lldb/source/Plugins/Process/minidump/MinidumpTypes.h
@@ -62,9 +62,6 @@ Status consumeObject(llvm::ArrayRef<uint8_t> &Buffer, const T *&Object) {
 struct MinidumpMemoryDescriptor64 {
   llvm::support::ulittle64_t start_of_memory_range;
   llvm::support::ulittle64_t data_size;
-
-  static std::pair<llvm::ArrayRef<MinidumpMemoryDescriptor64>, uint64_t>
-  ParseMemory64List(llvm::ArrayRef<uint8_t> &data);
 };
 static_assert(sizeof(MinidumpMemoryDescriptor64) == 16,
               "sizeof MinidumpMemoryDescriptor64 is not correct!");
diff --git a/lldb/test/API/functionalities/postmortem/minidump-new/TestMiniDumpNew.py b/lldb/test/API/functionalities/postmortem/minidump-new/TestMiniDumpNew.py
index 91fd2439492b54..2de3e36b507341 100644
--- a/lldb/test/API/functionalities/postmortem/minidump-new/TestMiniDumpNew.py
+++ b/lldb/test/API/functionalities/postmortem/minidump-new/TestMiniDumpNew.py
@@ -491,3 +491,22 @@ def test_minidump_sysroot(self):
         spec_dir_norm = os.path.normcase(module.GetFileSpec().GetDirectory())
         exe_dir_norm = os.path.normcase(exe_dir)
         self.assertEqual(spec_dir_norm, exe_dir_norm)
+
+    def test_minidump_memory64list(self):
+        """Test that lldb can read from the memory64list in a minidump."""
+        self.process_from_yaml("linux-x86_64_mem64.yaml")
+
+        region_count = 3
+        region_info_list = self.process.GetMemoryRegions()
+        self.assertEqual(region_info_list.GetSize(), region_count)
+
+        region = lldb.SBMemoryRegionInfo()
+        self.assertTrue(region_info_list.GetMemoryRegionAtIndex(0, region))
+        self.assertEqual(region.GetRegionBase(), 0x7FFF12A84030)
+        self.assertTrue(region.GetRegionEnd(), 0x7FFF12A84030 + 0x2FD0)
+        self.assertTrue(region_info_list.GetMemoryRegionAtIndex(1, region))
+        self.assertEqual(region.GetRegionBase(), 0x00007fff12a87000)
+        self.assertTrue(region.GetRegionEnd(), 0x00007fff12a87000 + 0x00000018)
+        self.assertTrue(region_info_list.GetMemoryRegionAtIndex(2, region))
+        self.assertEqual(region.GetRegionBase(), 0x00007fff12a87018)
+        self.assertTrue(region.GetRegionEnd(), 0x00007fff12a87018 + 0x00000400)
diff --git a/lldb/test/API/functionalities/postmortem/minidump-new/linux-x86_64_mem64.yaml b/lldb/test/API/functionalities/postmortem/minidump-new/linux-x86_64_mem64.yaml
new file mode 100644
index 00000000000000..df3c6477ae50a0
--- /dev/null
+++ b/lldb/test/API/functionalities/postmortem/minidump-new/linux-x86_64_mem64.yaml
@@ -0,0 +1,43 @@
+--- !minidump
+Streams:
+  - Type:            SystemInfo
+    Processor Arch:  AMD64
+    Processor Level: 6
+    Processor Revision: 15876
+    Number of Processors: 40
+    Platform ID:     Linux
+    CSD Version:     'Linux 3.13.0-91-generic'
+    CPU:
+      Vendor ID:       GenuineIntel
+      Version Info:    0x00000000
+      Feature Info:    0x00000000
+  - Type:            LinuxProcStatus
+    Text:             |
+      Name:	linux-x86_64
+      State:	t (tracing stop)
+      Tgid:	29917
+      Ngid:	0
+      Pid:	29917
+      PPid:	29370
+      TracerPid:	29918
+      Uid:	1001	1001	1001	1001
+      Gid:	1001	1001	1001	1001
+  - Type:            ThreadList
+    Threads:
+      - Thread Id:       0x2896BB
+        Context:         0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000700100000000000FFFFFFFF0000FFFFFFFFFFFFFFFFFFFF0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000B040A812FF7F00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000050D0A75BBA7F00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+        Stack:
+          Start of Memory Range: 0x0
+          Content:         ''
+  - Type:            Memory64List
+    Memory Ranges:
+      - Start of Memory Range: 0x7FFF12A84030
+        Data Size:       0x2FD0
+        Content :        ''
+      - Start of Memory Range: 0x00007fff12a87000
+        Data Size:       0x00000018
+        Content :        ''
+      - Start of Memory Range: 0x00007fff12a87018
+        Data Size:       0x00000400
+        Content :        ''
+...

>From ec866638ff36b4a01b38a3ab8ef604596cb37178 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 21 Aug 2024 12:56:41 -0400
Subject: [PATCH 029/116] [libc++][NFC] A few mechanical adjustments to
 capitalization in status files

Make sure that we consistently use `Nothing To Do`, and that we use the
RST tags properly (e.g. '|Complete|' instead of 'Complete').
---
 libcxx/docs/Status/Cxx17Issues.csv |  2 +-
 libcxx/docs/Status/Cxx17Papers.csv |  8 ++++----
 libcxx/docs/Status/Cxx20Issues.csv |  2 +-
 libcxx/docs/Status/Cxx23Issues.csv | 20 ++++++++++----------
 libcxx/docs/Status/Cxx23Papers.csv | 16 ++++++++--------
 5 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/libcxx/docs/Status/Cxx17Issues.csv b/libcxx/docs/Status/Cxx17Issues.csv
index 35e42e5ec2d7ba..2e469dc0bfddec 100644
--- a/libcxx/docs/Status/Cxx17Issues.csv
+++ b/libcxx/docs/Status/Cxx17Issues.csv
@@ -160,7 +160,7 @@
 "`LWG2685 <https://wg21.link/LWG2685>`__","shared_ptr deleters must not throw on move construction","2016-06 (Oulu)","|Complete|","",""
 "`LWG2687 <https://wg21.link/LWG2687>`__","{inclusive,exclusive}_scan misspecified","2016-06 (Oulu)","","",""
 "`LWG2688 <https://wg21.link/LWG2688>`__","clamp misses preconditions and has extraneous condition on result","2016-06 (Oulu)","|Complete|","",""
-"`LWG2689 <https://wg21.link/LWG2689>`__","Parallel versions of std::copy and std::move shouldn't be in order","2016-06 (Oulu)","|Nothing to do|","",""
+"`LWG2689 <https://wg21.link/LWG2689>`__","Parallel versions of std::copy and std::move shouldn't be in order","2016-06 (Oulu)","|Nothing To Do|","",""
 "`LWG2698 <https://wg21.link/LWG2698>`__","Effect of assign() on iterators/pointers/references","2016-06 (Oulu)","|Complete|","",""
 "`LWG2704 <https://wg21.link/LWG2704>`__","recursive_directory_iterator's members should require '``*this`` is dereferenceable'","2016-06 (Oulu)","|Complete|","",""
 "`LWG2706 <https://wg21.link/LWG2706>`__","Error reporting for recursive_directory_iterator::pop() is under-specified","2016-06 (Oulu)","|Complete|","",""
diff --git a/libcxx/docs/Status/Cxx17Papers.csv b/libcxx/docs/Status/Cxx17Papers.csv
index 614cc4ca73f63e..c2f0cb4be96822 100644
--- a/libcxx/docs/Status/Cxx17Papers.csv
+++ b/libcxx/docs/Status/Cxx17Papers.csv
@@ -98,16 +98,16 @@
 "`P0452R1 <https://wg21.link/P0452R1>`__","Unifying <numeric> Parallel Algorithms","2017-02 (Kona)","|Partial| [#note-P0452]_","",""
 "`P0467R2 <https://wg21.link/P0467R2>`__","Iterator Concerns for Parallel Algorithms","2017-02 (Kona)","|Partial|","",""
 "`P0492R2 <https://wg21.link/P0492R2>`__","Proposed Resolution of C++17 National Body Comments for Filesystems","2017-02 (Kona)","|Complete|","7.0",""
-"`P0518R1 <https://wg21.link/P0518R1>`__","Allowing copies as arguments to function objects given to parallel algorithms in response to CH11","2017-02 (Kona)","|Nothing to do|","",""
-"`P0523R1 <https://wg21.link/P0523R1>`__","Wording for CH 10: Complexity of parallel algorithms","2017-02 (Kona)","|Nothing to do|","",""
+"`P0518R1 <https://wg21.link/P0518R1>`__","Allowing copies as arguments to function objects given to parallel algorithms in response to CH11","2017-02 (Kona)","|Nothing To Do|","",""
+"`P0523R1 <https://wg21.link/P0523R1>`__","Wording for CH 10: Complexity of parallel algorithms","2017-02 (Kona)","|Nothing To Do|","",""
 "`P0548R1 <https://wg21.link/P0548R1>`__","common_type and duration","2017-02 (Kona)","|Complete|","5.0",""
 "`P0558R1 <https://wg21.link/P0558R1>`__","Resolving atomic<T> named base class inconsistencies","2017-02 (Kona)","|Complete|","",""
-"`P0574R1 <https://wg21.link/P0574R1>`__","Algorithm Complexity Constraints and Parallel Overloads","2017-02 (Kona)","|Nothing to do|","",""
+"`P0574R1 <https://wg21.link/P0574R1>`__","Algorithm Complexity Constraints and Parallel Overloads","2017-02 (Kona)","|Nothing To Do|","",""
 "`P0599R1 <https://wg21.link/P0599R1>`__","noexcept for hash functions","2017-02 (Kona)","|Complete|","5.0",""
 "`P0604R0 <https://wg21.link/P0604R0>`__","Resolving GB 55, US 84, US 85, US 86","2017-02 (Kona)","|Complete|","",""
 "`P0607R0 <https://wg21.link/P0607R0>`__","Inline Variables for the Standard Library","2017-02 (Kona)","|In Progress| [#note-P0607]_","6.0",""
 "`P0618R0 <https://wg21.link/P0618R0>`__","Deprecating <codecvt>","2017-02 (Kona)","|Complete|","15.0",""
-"`P0623R0 <https://wg21.link/P0623R0>`__","Final C++17 Parallel Algorithms Fixes","2017-02 (Kona)","|Nothing to do|","",""
+"`P0623R0 <https://wg21.link/P0623R0>`__","Final C++17 Parallel Algorithms Fixes","2017-02 (Kona)","|Nothing To Do|","",""
 "","","","","",""
 "`P0682R1 <https://wg21.link/P0682R1>`__","Repairing elementary string conversions","2017-07 (Toronto)","","",""
 "`P0739R0 <https://wg21.link/P0739R0>`__","Some improvements to class template argument deduction integration into the standard library","2017-07 (Toronto)","|Complete|","5.0",""
diff --git a/libcxx/docs/Status/Cxx20Issues.csv b/libcxx/docs/Status/Cxx20Issues.csv
index 1d441de31f107b..bdc2b637efc348 100644
--- a/libcxx/docs/Status/Cxx20Issues.csv
+++ b/libcxx/docs/Status/Cxx20Issues.csv
@@ -193,7 +193,7 @@
 "`LWG2859 <https://wg21.link/LWG2859>`__","Definition of *reachable* in [ptr.launder] misses pointer arithmetic from pointer-interconvertible object","2020-02 (Prague)","","",""
 "`LWG3018 <https://wg21.link/LWG3018>`__","``shared_ptr``\  of function type","2020-02 (Prague)","|Nothing To Do|","",""
 "`LWG3050 <https://wg21.link/LWG3050>`__","Conversion specification problem in ``chrono::duration``\  constructor","2020-02 (Prague)","|Complete|","19.0","|chrono|"
-"`LWG3141 <https://wg21.link/LWG3141>`__","``CopyConstructible``\  doesn't preserve source values","2020-02 (Prague)","|Nothing to do|","",""
+"`LWG3141 <https://wg21.link/LWG3141>`__","``CopyConstructible``\  doesn't preserve source values","2020-02 (Prague)","|Nothing To Do|","",""
 "`LWG3150 <https://wg21.link/LWG3150>`__","``UniformRandomBitGenerator``\  should validate ``min``\  and ``max``\ ","2020-02 (Prague)","|Complete|","13.0","|ranges|"
 "`LWG3175 <https://wg21.link/LWG3175>`__","The ``CommonReference``\  requirement of concept ``SwappableWith``\  is not satisfied in the example","2020-02 (Prague)","|Complete|","13.0",""
 "`LWG3194 <https://wg21.link/LWG3194>`__","``ConvertibleTo``\  prose does not match code","2020-02 (Prague)","|Complete|","13.0",""
diff --git a/libcxx/docs/Status/Cxx23Issues.csv b/libcxx/docs/Status/Cxx23Issues.csv
index 0484c650e3c36c..16471406f41588 100644
--- a/libcxx/docs/Status/Cxx23Issues.csv
+++ b/libcxx/docs/Status/Cxx23Issues.csv
@@ -142,18 +142,18 @@
 "`LWG3525 <https://wg21.link/LWG3525>`__","``uses_allocator_construction_args`` fails to handle types convertible to ``pair``","2022-02 (Virtual)","","",""
 "`LWG3598 <https://wg21.link/LWG3598>`__","``system_category().default_error_condition(0)`` is underspecified","2022-02 (Virtual)","","",""
 "`LWG3601 <https://wg21.link/LWG3601>`__","common_iterator's postfix-proxy needs ``indirectly_readable`` ","2022-02 (Virtual)","","","|ranges|"
-"`LWG3607 <https://wg21.link/LWG3607>`__","``contiguous_iterator`` should not be allowed to have custom ``iter_move`` and ``iter_swap`` behavior","2022-02 (Virtual)","|Nothing to do|","","|ranges|"
+"`LWG3607 <https://wg21.link/LWG3607>`__","``contiguous_iterator`` should not be allowed to have custom ``iter_move`` and ``iter_swap`` behavior","2022-02 (Virtual)","|Nothing To Do|","","|ranges|"
 "`LWG3610 <https://wg21.link/LWG3610>`__","``iota_view::size`` sometimes rejects integer-class types","2022-02 (Virtual)","","","|ranges|"
 "`LWG3612 <https://wg21.link/LWG3612>`__","Inconsistent pointer alignment in ``std::format`` ","2022-02 (Virtual)","|Complete|","14.0","|format|"
 "`LWG3616 <https://wg21.link/LWG3616>`__","LWG 3498 seems to miss the non-member ``swap`` for ``basic_syncbuf`` ","2022-02 (Virtual)","|Complete|","18.0",""
 "`LWG3618 <https://wg21.link/LWG3618>`__","Unnecessary ``iter_move`` for ``transform_view::iterator`` ","2022-02 (Virtual)","|Complete|","19.0","|ranges|"
-"`LWG3619 <https://wg21.link/LWG3619>`__","Specification of ``vformat_to`` contains ill-formed ``formatted_size`` calls","2022-02 (Virtual)","|Nothing to do|","","|format|"
+"`LWG3619 <https://wg21.link/LWG3619>`__","Specification of ``vformat_to`` contains ill-formed ``formatted_size`` calls","2022-02 (Virtual)","|Nothing To Do|","","|format|"
 "`LWG3621 <https://wg21.link/LWG3621>`__","Remove feature-test macro ``__cpp_lib_monadic_optional`` ","2022-02 (Virtual)","|Complete|","15.0",""
-"`LWG3632 <https://wg21.link/LWG3632>`__","``unique_ptr`` ""Mandates: This constructor is not selected by class template argument deduction""","2022-02 (Virtual)","|Nothing to do|","",""
+"`LWG3632 <https://wg21.link/LWG3632>`__","``unique_ptr`` ""Mandates: This constructor is not selected by class template argument deduction""","2022-02 (Virtual)","|Nothing To Do|","",""
 "`LWG3643 <https://wg21.link/LWG3643>`__","Missing ``constexpr`` in ``std::counted_iterator`` ","2022-02 (Virtual)","|Complete|","19.0","|ranges|"
 "`LWG3648 <https://wg21.link/LWG3648>`__","``format`` should not print ``bool`` with ``'c'`` ","2022-02 (Virtual)","|Complete|","15.0","|format|"
 "`LWG3649 <https://wg21.link/LWG3649>`__","[fund.ts.v2] Reinstate and bump ``__cpp_lib_experimental_memory_resource`` feature test macro","2022-02 (Virtual)","","",""
-"`LWG3650 <https://wg21.link/LWG3650>`__","Are ``std::basic_string`` 's ``iterator`` and ``const_iterator`` constexpr iterators?","2022-02 (Virtual)","|Nothing to do|","",""
+"`LWG3650 <https://wg21.link/LWG3650>`__","Are ``std::basic_string`` 's ``iterator`` and ``const_iterator`` constexpr iterators?","2022-02 (Virtual)","|Nothing To Do|","",""
 "`LWG3654 <https://wg21.link/LWG3654>`__","``basic_format_context::arg(size_t)`` should be ``noexcept`` ","2022-02 (Virtual)","|Complete|","15.0","|format|"
 "`LWG3657 <https://wg21.link/LWG3657>`__","``std::hash<std::filesystem::path>`` is not enabled","2022-02 (Virtual)","|Complete|","17.0",""
 "`LWG3660 <https://wg21.link/LWG3660>`__","``iterator_traits<common_iterator>::pointer`` should conform to §[iterator.traits]","2022-02 (Virtual)","|Complete|","14.0","|ranges|"
@@ -188,7 +188,7 @@
 "","","","","",""
 "`LWG3028 <https://wg21.link/LWG3028>`__","Container requirements tables should distinguish ``const`` and non-``const`` variables","2022-11 (Kona)","","",""
 "`LWG3118 <https://wg21.link/LWG3118>`__","``fpos`` equality comparison unspecified","2022-11 (Kona)","","",""
-"`LWG3177 <https://wg21.link/LWG3177>`__","Limit permission to specialize variable templates to program-defined types","2022-11 (Kona)","|Nothing to do|","",""
+"`LWG3177 <https://wg21.link/LWG3177>`__","Limit permission to specialize variable templates to program-defined types","2022-11 (Kona)","|Nothing To Do|","",""
 "`LWG3515 <https://wg21.link/LWG3515>`__","§[stacktrace.basic.nonmem]: ``operator<<`` should be less templatized","2022-11 (Kona)","","",""
 "`LWG3545 <https://wg21.link/LWG3545>`__","``std::pointer_traits`` should be SFINAE-friendly","2022-11 (Kona)","|Complete|","18.0",""
 "`LWG3569 <https://wg21.link/LWG3569>`__","``join_view`` fails to support ranges of ranges with non-default_initializable iterators","2022-11 (Kona)","","","|ranges|"
@@ -200,7 +200,7 @@
 "`LWG3646 <https://wg21.link/LWG3646>`__","``std::ranges::view_interface::size`` returns a signed type","2022-11 (Kona)","|Complete|","16.0","|ranges|"
 "`LWG3677 <https://wg21.link/LWG3677>`__","Is a cv-qualified ``pair`` specially handled in uses-allocator construction?","2022-11 (Kona)","|Complete|","18.0",""
 "`LWG3717 <https://wg21.link/LWG3717>`__","``common_view::end`` should improve ``random_access_range`` case","2022-11 (Kona)","","","|ranges|"
-"`LWG3732 <https://wg21.link/LWG3732>`__","``prepend_range`` and ``append_range`` can't be amortized constant time","2022-11 (Kona)","|Nothing to do|","","|ranges|"
+"`LWG3732 <https://wg21.link/LWG3732>`__","``prepend_range`` and ``append_range`` can't be amortized constant time","2022-11 (Kona)","|Nothing To Do|","","|ranges|"
 "`LWG3736 <https://wg21.link/LWG3736>`__","``move_iterator`` missing ``disable_sized_sentinel_for`` specialization","2022-11 (Kona)","|Complete|","19.0","|ranges|"
 "`LWG3737 <https://wg21.link/LWG3737>`__","``take_view::sentinel`` should provide ``operator-``","2022-11 (Kona)","","","|ranges|"
 "`LWG3738 <https://wg21.link/LWG3738>`__","Missing preconditions for ``take_view`` constructor","2022-11 (Kona)","|Complete|","16.0","|ranges|"
@@ -211,7 +211,7 @@
 "`LWG3750 <https://wg21.link/LWG3750>`__","Too many papers bump ``__cpp_lib_format``","2022-11 (Kona)","|Partial| [#note-LWG3750]_","","|format|"
 "`LWG3751 <https://wg21.link/LWG3751>`__","Missing feature macro for ``flat_set``","2022-11 (Kona)","","","|flat_containers|"
 "`LWG3753 <https://wg21.link/LWG3753>`__","Clarify entity vs. freestanding entity","2022-11 (Kona)","","",""
-"`LWG3754 <https://wg21.link/LWG3754>`__","Class template expected synopsis contains declarations that do not match the detailed description","2022-11 (Kona)","|Nothing to do|","",""
+"`LWG3754 <https://wg21.link/LWG3754>`__","Class template expected synopsis contains declarations that do not match the detailed description","2022-11 (Kona)","|Nothing To Do|","",""
 "`LWG3755 <https://wg21.link/LWG3755>`__","``tuple-for-each`` can call ``user-defined`` ``operator,``","2022-11 (Kona)","|Complete|","17.0",""
 "`LWG3757 <https://wg21.link/LWG3757>`__","What's the effect of ``std::forward_like<void>(x)``?","2022-11 (Kona)","","",""
 "`LWG3759 <https://wg21.link/LWG3759>`__","``ranges::rotate_copy`` should use ``std::move``","2022-11 (Kona)","|Complete|","15.0","|ranges|"
@@ -226,7 +226,7 @@
 "`LWG3774 <https://wg21.link/LWG3774>`__","``<flat_set>`` should include ``<compare>``","2022-11 (Kona)","","","|flat_containers|"
 "`LWG3775 <https://wg21.link/LWG3775>`__","Broken dependencies in the ``Cpp17Allocator`` requirements","2022-11 (Kona)","","",""
 "`LWG3778 <https://wg21.link/LWG3778>`__","``vector<bool>`` missing exception specifications","2022-11 (Kona)","|Complete|","3.7",""
-"`LWG3781 <https://wg21.link/LWG3781>`__","The exposition-only alias templates ``cont-key-type`` and ``cont-mapped-type`` should be removed","2022-11 (Kona)","|Nothing to do|","",""
+"`LWG3781 <https://wg21.link/LWG3781>`__","The exposition-only alias templates ``cont-key-type`` and ``cont-mapped-type`` should be removed","2022-11 (Kona)","|Nothing To Do|","",""
 "`LWG3782 <https://wg21.link/LWG3782>`__","Should ``<math.h>`` declare ``::lerp``?","2022-11 (Kona)","|Complete|","17.0",""
 "`LWG3784 <https://wg21.link/LWG3784>`__","std.compat should not provide ``::byte`` and its friends","2022-11 (Kona)","|Complete|","19.0",""
 "`LWG3785 <https://wg21.link/LWG3785>`__","``ranges::to`` is over-constrained on the destination type being a range","2022-11 (Kona)","","","|ranges|"
@@ -239,10 +239,10 @@
 "`LWG3814 <https://wg21.link/LWG3814>`__","Add freestanding items requested by NB comments","2022-11 (Kona)","","",""
 "`LWG3816 <https://wg21.link/LWG3816>`__","``flat_map`` and ``flat_multimap`` should impose sequence container requirements","2022-11 (Kona)","","","|flat_containers|"
 "`LWG3817 <https://wg21.link/LWG3817>`__","Missing preconditions on ``forward_list`` modifiers","2022-11 (Kona)","","",""
-"`LWG3818 <https://wg21.link/LWG3818>`__","Exposition-only concepts are not described in library intro","2022-11 (Kona)","|Nothing to do|","",""
+"`LWG3818 <https://wg21.link/LWG3818>`__","Exposition-only concepts are not described in library intro","2022-11 (Kona)","|Nothing To Do|","",""
 "`LWG3822 <https://wg21.link/LWG3822>`__","Avoiding normalization in ``filesystem::weakly_canonical``","2022-11 (Kona)","","",""
 "`LWG3823 <https://wg21.link/LWG3823>`__","Unnecessary precondition for ``is_aggregate``","2022-11 (Kona)","|Nothing To Do|","",""
-"`LWG3824 <https://wg21.link/LWG3824>`__","Number of ``bind`` placeholders is underspecified","2022-11 (Kona)","|Nothing to do|","",""
+"`LWG3824 <https://wg21.link/LWG3824>`__","Number of ``bind`` placeholders is underspecified","2022-11 (Kona)","|Nothing To Do|","",""
 "`LWG3826 <https://wg21.link/LWG3826>`__","Redundant specification [for overload of yield_value]","2022-11 (Kona)","|Nothing To Do|","",""
 "","","","","",""
 "`LWG2195 <https://wg21.link/LWG2195>`__","Missing constructors for ``match_results``","2023-02 (Issaquah)","","",""
diff --git a/libcxx/docs/Status/Cxx23Papers.csv b/libcxx/docs/Status/Cxx23Papers.csv
index 9389f031b1842c..8e1544acb2ce0e 100644
--- a/libcxx/docs/Status/Cxx23Papers.csv
+++ b/libcxx/docs/Status/Cxx23Papers.csv
@@ -6,9 +6,9 @@
 "","","","","",""
 "`P1682R3 <https://wg21.link/P1682R3>`__","std::to_underlying for enumerations","2021-02 (Virtual)","|Complete|","13.0",""
 "`P2017R1 <https://wg21.link/P2017R1>`__","Conditionally borrowed ranges","2021-02 (Virtual)","|Complete|","16.0","|ranges|"
-"`P2160R1 <https://wg21.link/P2160R1>`__","Locks lock lockables","2021-02 (Virtual)","Nothing to do","",""
+"`P2160R1 <https://wg21.link/P2160R1>`__","Locks lock lockables","2021-02 (Virtual)","|Nothing To Do|","",""
 "`P2162R2 <https://wg21.link/P2162R2>`__","Inheriting from std::variant","2021-02 (Virtual)","|Complete|","13.0",""
-"`P2212R2 <https://wg21.link/P2212R2>`__","Relax Requirements for time_point::clock","2021-02 (Virtual)","Nothing to do","",""
+"`P2212R2 <https://wg21.link/P2212R2>`__","Relax Requirements for time_point::clock","2021-02 (Virtual)","|Nothing To Do|","",""
 "`P2259R1 <https://wg21.link/P2259R1>`__","Repairing input range adaptors and counted_iterator","2021-02 (Virtual)","","","|ranges|"
 "","","","","",""
 "`P0401R6 <https://wg21.link/P0401R6>`__","Providing size feedback in the Allocator interface","2021-06 (Virtual)","|Complete|","15.0",""
@@ -29,17 +29,17 @@
 "`P1072R10 <https://wg21.link/P1072R10>`__","``basic_string::resize_and_overwrite``","2021-10 (Virtual)","|Complete|","14.0",""
 "`P1147R1 <https://wg21.link/P1147R1>`__","Printing ``volatile`` Pointers","2021-10 (Virtual)","|Complete|","14.0",""
 "`P1272R4 <https://wg21.link/P1272R4>`__","Byteswapping for fun&&nuf","2021-10 (Virtual)","|Complete|","14.0",""
-"`P1675R2 <https://wg21.link/P1675R2>`__","``rethrow_exception`` must be allowed to copy","2021-10 (Virtual)","Nothing to do","",""
+"`P1675R2 <https://wg21.link/P1675R2>`__","``rethrow_exception`` must be allowed to copy","2021-10 (Virtual)","|Nothing To Do|","",""
 "`P2077R3 <https://wg21.link/P2077R3>`__","Heterogeneous erasure overloads for associative containers","2021-10 (Virtual)","","",""
 "`P2251R1 <https://wg21.link/P2251R1>`__","Require ``span`` & ``basic_string_view`` to be Trivially Copyable","2021-10 (Virtual)","|Complete|","14.0",""
 "`P2301R1 <https://wg21.link/P2301R1>`__","Add a ``pmr`` alias for ``std::stacktrace``","2021-10 (Virtual)","","",""
 "`P2321R2 <https://wg21.link/P2321R2>`__","``zip``","2021-10 (Virtual)","|In Progress|","","|ranges|"
-"`P2340R1 <https://wg21.link/P2340R1>`__","Clarifying the status of the 'C headers'","2021-10 (Virtual)","Nothing to do","",""
+"`P2340R1 <https://wg21.link/P2340R1>`__","Clarifying the status of the 'C headers'","2021-10 (Virtual)","|Nothing To Do|","",""
 "`P2393R1 <https://wg21.link/P2393R1>`__","Cleaning up ``integer``-class types","2021-10 (Virtual)","","",""
 "`P2401R0 <https://wg21.link/P2401R0>`__","Add a conditional ``noexcept`` specification to ``std::exchange``","2021-10 (Virtual)","|Complete|","14.0",""
 "","","","","",""
 "`P0323R12 <https://wg21.link/P0323R12>`__","``std::expected``","2022-02 (Virtual)","|Complete|","16.0",""
-"`P0533R9 <https://wg21.link/P0533R9>`__","``constexpr`` for ``<cmath>`` and ``<cstdlib>``","2022-02 (Virtual)","|In progress| [#note-P0533R9]_","",""
+"`P0533R9 <https://wg21.link/P0533R9>`__","``constexpr`` for ``<cmath>`` and ``<cstdlib>``","2022-02 (Virtual)","|In Progress| [#note-P0533R9]_","",""
 "`P0627R6 <https://wg21.link/P0627R6>`__","Function to mark unreachable code","2022-02 (Virtual)","|Complete|","15.0",""
 "`P1206R7 <https://wg21.link/P1206R7>`__","``ranges::to``: A function to convert any range to a container","2022-02 (Virtual)","|Complete|","17.0","|ranges|"
 "`P1413R3 <https://wg21.link/P1413R3>`__","Deprecate ``std::aligned_storage`` and ``std::aligned_union``","2022-02 (Virtual)","|Complete| [#note-P1413R3]_","",""
@@ -74,7 +74,7 @@
 "`P2438R2 <https://wg21.link/P2438R2>`__","``std::string::substr() &&``","2022-07 (Virtual)","|Complete|","16.0",""
 "`P2445R1 <https://wg21.link/P2445R1>`__","``forward_like``","2022-07 (Virtual)","|Complete|","16.0",""
 "`P2446R2 <https://wg21.link/P2446R2>`__","``views::as_rvalue``","2022-07 (Virtual)","|Complete|","16.0","|ranges|"
-"`P2460R2 <https://wg21.link/P2460R2>`__","Relax requirements on ``wchar_t`` to match existing practices","2022-07 (Virtual)","Nothing to do","",""
+"`P2460R2 <https://wg21.link/P2460R2>`__","Relax requirements on ``wchar_t`` to match existing practices","2022-07 (Virtual)","|Nothing To Do|","",""
 "`P2465R3 <https://wg21.link/P2465R3>`__","Standard Library Modules ``std`` and ``std.compat``","2022-07 (Virtual)","|Complete|","19.0",""
 "`P2467R1 <https://wg21.link/P2467R1>`__","Support exclusive mode for ``fstreams``","2022-07 (Virtual)","|Complete|","18.0",""
 "`P2474R2 <https://wg21.link/P2474R2>`__","``views::repeat``","2022-07 (Virtual)","|Complete|","17.0","|ranges|"
@@ -101,7 +101,7 @@
 "`P2505R5 <https://wg21.link/P2505R5>`__","Monadic Functions for ``std::expected``","2022-11 (Kona)","|Complete|","17.0",""
 "`P2539R4 <https://wg21.link/P2539R4>`__","Should the output of ``std::print`` to a terminal be synchronized with the underlying stream?","2022-11 (Kona)","|Complete|","18.0","|format|"
 "`P2602R2 <https://wg21.link/P2602R2>`__","Poison Pills are Too Toxic","2022-11 (Kona)","|Complete|","19.0","|ranges|"
-"`P2708R1 <https://wg21.link/P2708R1>`__","No Further Fundamentals TSes","2022-11 (Kona)","|Nothing to do|","",""
+"`P2708R1 <https://wg21.link/P2708R1>`__","No Further Fundamentals TSes","2022-11 (Kona)","|Nothing To Do|","",""
 "","","","","",""
 "`P0290R4 <https://wg21.link/P0290R4>`__","``apply()`` for ``synchronized_value<T>``","2023-02 (Issaquah)","","","|concurrency TS|"
 "`P2770R0 <https://wg21.link/P2770R0>`__","Stashing stashing ``iterators`` for proper flattening","2023-02 (Issaquah)","|Partial| [#note-P2770R0]_","","|ranges|"
@@ -120,4 +120,4 @@
 "`P2614R2 <https://wg21.link/P2614R2>`__","Deprecate ``numeric_limits::has_denorm``","2023-02 (Issaquah)","|Complete|","18.0",""
 "`P2588R3 <https://wg21.link/P2588R3>`__","``barrier``’s phase completion guarantees","2023-02 (Issaquah)","","",""
 "`P2763R1 <https://wg21.link/P2763R1>`__","``layout_stride`` static extents default constructor fix","2023-02 (Issaquah)","","",""
-"`P2736R2 <https://wg21.link/P2736R2>`__","Referencing The Unicode Standard","2023-02 (Issaquah)","Complete","19.0","|format|"
+"`P2736R2 <https://wg21.link/P2736R2>`__","Referencing The Unicode Standard","2023-02 (Issaquah)","|Complete|","19.0","|format|"

>From 7a28192ce1c1d9d0398348eabc46c94eadb317d8 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 21 Aug 2024 13:11:46 -0400
Subject: [PATCH 030/116] [libc++] Standardize how we track removed and
 superseded papers

Instead of having various status entries like 'Superseded by XXX',
we use '|Nothing To Do|' but we add a note explaining that the paper
was pulled at another meeting.
---
 libcxx/docs/Status/Cxx17.rst       |  7 ++++++-
 libcxx/docs/Status/Cxx17Issues.csv |  6 +++---
 libcxx/docs/Status/Cxx17Papers.csv |  6 +++---
 libcxx/docs/Status/Cxx20.rst       |  9 +++++++++
 libcxx/docs/Status/Cxx20Issues.csv | 10 +++++-----
 libcxx/docs/Status/Cxx20Papers.csv |  8 ++++----
 libcxx/docs/Status/Cxx23.rst       |  3 +++
 libcxx/docs/Status/Cxx23Issues.csv |  6 +++---
 8 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/libcxx/docs/Status/Cxx17.rst b/libcxx/docs/Status/Cxx17.rst
index 9263bd7c4af0c8..c1073c0b411b06 100644
--- a/libcxx/docs/Status/Cxx17.rst
+++ b/libcxx/docs/Status/Cxx17.rst
@@ -45,7 +45,12 @@ Paper Status
    .. [#note-P0607] P0607: The parts of P0607 that are not done are the ``<regex>`` bits.
    .. [#note-P0154] P0154: The required macros are only implemented as of clang 19.
    .. [#note-P0452] P0452: The changes to ``std::transform_inclusive_scan`` and ``std::transform_exclusive_scan`` have not yet been implemented.
-   .. [#note-P0156] P0156: This paper was reverted in Kona.
+   .. [#note-P0156] P0156: That paper was pulled out of the draft at the 2017-01 meeting in Kona.
+   .. [#note-P0181] P0181: That paper was pulled out of the draft at the 2017-01 meeting in Kona.
+   .. [#note-P0067] P0067: That paper was resolved by `P0067R5 <https://wg21.link/P0067R5>`__.
+   .. [#note-LWG2587] LWG2587: That LWG issue was resolved by `LWG2567 <https://wg21.link/LWG2567>`__.
+   .. [#note-LWG2588] LWG2588: That LWG issue was resolved by `LWG2568 <https://wg21.link/LWG2568>`__.
+   .. [#note-LWG2955] LWG2955: That LWG issue was resolved by `P0682R1 <https://wg21.link/P0682R1>`__.
 
 .. _issues-status-cxx17:
 
diff --git a/libcxx/docs/Status/Cxx17Issues.csv b/libcxx/docs/Status/Cxx17Issues.csv
index 2e469dc0bfddec..902a3717e5a388 100644
--- a/libcxx/docs/Status/Cxx17Issues.csv
+++ b/libcxx/docs/Status/Cxx17Issues.csv
@@ -211,8 +211,8 @@
 "`LWG2570 <https://wg21.link/LWG2570>`__","[fund.ts.v2] conjunction and disjunction requirements are too strict","2016-11 (Issaquah)","","",""
 "`LWG2578 <https://wg21.link/LWG2578>`__","Iterator requirements should reference iterator traits","2016-11 (Issaquah)","|Complete|","",""
 "`LWG2584 <https://wg21.link/LWG2584>`__","<regex> ECMAScript IdentityEscape is ambiguous","2016-11 (Issaquah)","","",""
-"`LWG2587 <https://wg21.link/LWG2587>`__","""Convertible to bool"" requirement in conjunction and disjunction","2016-11 (Issaquah)","Resolved by `LWG2567 <https://wg21.link/LWG2567>`__","",""
-"`LWG2588 <https://wg21.link/LWG2588>`__","[fund.ts.v2] ""Convertible to bool"" requirement in conjunction and disjunction","2016-11 (Issaquah)","Resolved by `LWG2568 <https://wg21.link/LWG2568>`__","",""
+"`LWG2587 <https://wg21.link/LWG2587>`__","""Convertible to bool"" requirement in conjunction and disjunction","2016-11 (Issaquah)","|Nothing To Do| [#note-LWG2587]_","",""
+"`LWG2588 <https://wg21.link/LWG2588>`__","[fund.ts.v2] ""Convertible to bool"" requirement in conjunction and disjunction","2016-11 (Issaquah)","|Nothing To Do| [#note-LWG2588]_","",""
 "`LWG2589 <https://wg21.link/LWG2589>`__","match_results can't satisfy the requirements of a container","2016-11 (Issaquah)","|Complete|","",""
 "`LWG2591 <https://wg21.link/LWG2591>`__","std::function's member template target() should not lead to undefined behaviour","2016-11 (Issaquah)","|Complete|","",""
 "`LWG2598 <https://wg21.link/LWG2598>`__","addressof works on temporaries","2016-11 (Issaquah)","|Complete|","",""
@@ -310,5 +310,5 @@
 "`LWG2934 <https://wg21.link/LWG2934>`__","optional<const T> doesn't compare with T","2017-02 (Kona)","|Complete|","",""
 "","","","","",""
 "`LWG2901 <https://wg21.link/LWG2901>`__","Variants cannot properly support allocators","2017-07 (Toronto)","|Complete|","",""
-"`LWG2955 <https://wg21.link/LWG2955>`__","``to_chars / from_chars``\  depend on ``std::string``\ ","2017-07 (Toronto)","Resolved by `P0682R1 <https://wg21.link/P0682R1>`__","",""
+"`LWG2955 <https://wg21.link/LWG2955>`__","``to_chars / from_chars``\  depend on ``std::string``\ ","2017-07 (Toronto)","|Nothing To Do| [#note-LWG2955]_","",""
 "`LWG2956 <https://wg21.link/LWG2956>`__","``filesystem::canonical()``\  still defined in terms of ``absolute(p, base)``\ ","2017-07 (Toronto)","|Complete|","",""
diff --git a/libcxx/docs/Status/Cxx17Papers.csv b/libcxx/docs/Status/Cxx17Papers.csv
index c2f0cb4be96822..0aeb15f18b76bb 100644
--- a/libcxx/docs/Status/Cxx17Papers.csv
+++ b/libcxx/docs/Status/Cxx17Papers.csv
@@ -21,7 +21,7 @@
 "`P0006R0 <https://wg21.link/P0006R0>`__","Adopt Type Traits Variable Templates for C++17.","2015-10 (Kona)","|Complete|","3.8",""
 "`P0092R1 <https://wg21.link/P0092R1>`__","Polishing <chrono>","2015-10 (Kona)","|Complete|","3.8",""
 "`P0007R1 <https://wg21.link/P0007R1>`__","Constant View: A proposal for a ``std::as_const``\  helper function template.","2015-10 (Kona)","|Complete|","3.8",""
-"`P0156R0 <https://wg21.link/P0156R0>`__","Variadic lock_guard(rev 3).","2015-10 (Kona)","|Complete| [#note-P0156]_","3.9",""
+"`P0156R0 <https://wg21.link/P0156R0>`__","Variadic lock_guard(rev 3).","2015-10 (Kona)","|Nothing To Do| [#note-P0156]_","",""
 "`P0074R0 <https://wg21.link/P0074R0>`__","Making ``std::owner_less``\  more flexible","2015-10 (Kona)","|Complete|","3.8",""
 "`P0013R1 <https://wg21.link/P0013R1>`__","Logical type traits rev 2","2015-10 (Kona)","|Complete|","3.8",""
 "","","","","",""
@@ -44,7 +44,7 @@
 "`P0032R3 <https://wg21.link/P0032R3>`__","Homogeneous interface for variant, any and optional","2016-06 (Oulu)","|Complete|","4.0",""
 "`P0040R3 <https://wg21.link/P0040R3>`__","Extending memory management tools","2016-06 (Oulu)","|Complete|","4.0",""
 "`P0063R3 <https://wg21.link/P0063R3>`__","C++17 should refer to C11 instead of C99","2016-06 (Oulu)","|Complete|","7.0",""
-"`P0067R3 <https://wg21.link/P0067R3>`__","Elementary string conversions","2016-06 (Oulu)","Now `P0067R5 <https://wg21.link/P0067R5>`__","n/a",""
+"`P0067R3 <https://wg21.link/P0067R3>`__","Elementary string conversions","2016-06 (Oulu)","|Nothing To Do| [#note-P0067]_","n/a",""
 "`P0083R3 <https://wg21.link/P0083R3>`__","Splicing Maps and Sets","2016-06 (Oulu)","|Complete|","8.0",""
 "`P0084R2 <https://wg21.link/P0084R2>`__","Emplace Return Type","2016-06 (Oulu)","|Complete|","4.0",""
 "`P0088R3 <https://wg21.link/P0088R3>`__","Variant: a type-safe union for C++17","2016-06 (Oulu)","|Complete|","4.0",""
@@ -53,7 +53,7 @@
 "`P0174R2 <https://wg21.link/P0174R2>`__","Deprecating Vestigial Library Parts in C++17","2016-06 (Oulu)","|Complete|","15.0",""
 "`P0175R1 <https://wg21.link/P0175R1>`__","Synopses for the C library","2016-06 (Oulu)","","",""
 "`P0180R2 <https://wg21.link/P0180R2>`__","Reserve a New Library Namespace for Future Standardization","2016-06 (Oulu)","|Nothing To Do|","n/a",""
-"`P0181R1 <https://wg21.link/P0181R1>`__","Ordered by Default","2016-06 (Oulu)","*Removed in Kona*","n/a",""
+"`P0181R1 <https://wg21.link/P0181R1>`__","Ordered by Default","2016-06 (Oulu)","|Nothing To Do| [#note-P0181]_","n/a",""
 "`P0209R2 <https://wg21.link/P0209R2>`__","make_from_tuple: apply for construction","2016-06 (Oulu)","|Complete|","3.9",""
 "`P0219R1 <https://wg21.link/P0219R1>`__","Relative Paths for Filesystem","2016-06 (Oulu)","|Complete|","7.0",""
 "`P0254R2 <https://wg21.link/P0254R2>`__","Integrating std::string_view and std::string","2016-06 (Oulu)","|Complete|","4.0",""
diff --git a/libcxx/docs/Status/Cxx20.rst b/libcxx/docs/Status/Cxx20.rst
index 5331af92dea594..f5b35d7ccc39e7 100644
--- a/libcxx/docs/Status/Cxx20.rst
+++ b/libcxx/docs/Status/Cxx20.rst
@@ -49,6 +49,15 @@ Paper Status
    .. [#note-P0883.2] P0883: ``ATOMIC_FLAG_INIT`` was marked deprecated in version 14.0, but was undeprecated with the implementation of LWG3659 in version 15.0.
    .. [#note-P0660] P0660: The paper is implemented but the features are experimental and can be enabled via ``-fexperimental-library``.
    .. [#note-P1614] P1614: ``std::strong_order(long double, long double)`` is partly implemented.
+   .. [#note-P0542] P0542: That paper was pulled out of the draft at the 2019-07 meeting in Cologne.
+   .. [#note-P0788] P0788: That paper was pulled out of the draft at the 2019-07 meeting in Cologne.
+   .. [#note-P0920] P0920: That paper was reverted by `P1661 <https://wg21.link/P1661>`__.
+   .. [#note-P1424] P1424: That paper was superseded by `P1902 <https://wg21.link/P1902>`__.
+   .. [#note-LWG2070] LWG2070: That LWG issue was resolved by `P0674R1 <https://wg21.link/P0674R1>`__.
+   .. [#note-LWG2499] LWG2499: That LWG issue was resolved by `P0487R1 <https://wg21.link/P0487R1>`__.
+   .. [#note-LWG2797] LWG2797: That LWG issue was resolved by `P1285R0 <https://wg21.link/P1285R0>`__.
+   .. [#note-LWG3022] LWG3022: That LWG issue was resolved by `P1285R0 <https://wg21.link/P1285R0>`__.
+   .. [#note-LWG3134] LWG3134: That LWG issue was resolved by `P1210R0 <https://wg21.link/P1210R0>`__.
    .. [#note-P0355] P0355: The implementation status is:
 
       * ``Calendars`` mostly done in Clang 7
diff --git a/libcxx/docs/Status/Cxx20Issues.csv b/libcxx/docs/Status/Cxx20Issues.csv
index bdc2b637efc348..d72a3682420620 100644
--- a/libcxx/docs/Status/Cxx20Issues.csv
+++ b/libcxx/docs/Status/Cxx20Issues.csv
@@ -1,5 +1,5 @@
 "Issue #","Issue Name","Meeting","Status","First released version","Labels"
-"`LWG2070 <https://wg21.link/LWG2070>`__","``allocate_shared``\  should use ``allocator_traits<A>::construct``\ ","2017-07 (Toronto)","Resolved by `P0674R1 <https://wg21.link/P0674R1>`__","",""
+"`LWG2070 <https://wg21.link/LWG2070>`__","``allocate_shared``\  should use ``allocator_traits<A>::construct``\ ","2017-07 (Toronto)","|Nothing To Do| [#note-LWG2070]_","",""
 "`LWG2444 <https://wg21.link/LWG2444>`__","Inconsistent complexity for ``std::sort_heap``\ ","2017-07 (Toronto)","|Nothing To Do|","",""
 "`LWG2593 <https://wg21.link/LWG2593>`__","Moved-from state of Allocators","2017-07 (Toronto)","","",""
 "`LWG2597 <https://wg21.link/LWG2597>`__","``std::log``\  misspecified for complex numbers","2017-07 (Toronto)","","",""
@@ -94,17 +94,17 @@
 "`LWG2183 <https://wg21.link/LWG2183>`__","Muddled allocator requirements for ``match_results``\  constructors","2018-11 (San Diego)","|Complete|","",""
 "`LWG2184 <https://wg21.link/LWG2184>`__","Muddled allocator requirements for ``match_results``\  assignments","2018-11 (San Diego)","|Complete|","",""
 "`LWG2412 <https://wg21.link/LWG2412>`__","``promise::set_value()``\  and ``promise::get_future()``\  should not race","2018-11 (San Diego)","","",""
-"`LWG2499 <https://wg21.link/LWG2499>`__","``operator>>(basic_istream&, CharT*)``\  makes it hard to avoid buffer overflows","2018-11 (San Diego)","Resolved by `P0487R1 <https://wg21.link/P0487R1>`__","",""
+"`LWG2499 <https://wg21.link/LWG2499>`__","``operator>>(basic_istream&, CharT*)``\  makes it hard to avoid buffer overflows","2018-11 (San Diego)","|Nothing To Do| [#note-LWG2499]_","",""
 "`LWG2682 <https://wg21.link/LWG2682>`__","``filesystem::copy()``\  won't create a symlink to a directory","2018-11 (San Diego)","|Nothing To Do|","",""
 "`LWG2697 <https://wg21.link/LWG2697>`__","[concurr.ts] Behavior of ``future/shared_future``\  unwrapping constructor when given an invalid ``future``\ ","2018-11 (San Diego)","","",""
-"`LWG2797 <https://wg21.link/LWG2797>`__","Trait precondition violations","2018-11 (San Diego)","Resolved by `P1285R0 <https://wg21.link/P1285R0>`__","",""
+"`LWG2797 <https://wg21.link/LWG2797>`__","Trait precondition violations","2018-11 (San Diego)","|Nothing To Do| [#note-LWG2797]_","",""
 "`LWG2936 <https://wg21.link/LWG2936>`__","Path comparison is defined in terms of the generic format","2018-11 (San Diego)","|Complete|","",""
 "`LWG2943 <https://wg21.link/LWG2943>`__","Problematic specification of the wide version of ``basic_filebuf::open``\ ","2018-11 (San Diego)","|Nothing To Do|","",""
 "`LWG2960 <https://wg21.link/LWG2960>`__","[fund.ts.v3] ``nonesuch``\  is insufficiently useless","2018-11 (San Diego)","|Complete|","",""
 "`LWG2995 <https://wg21.link/LWG2995>`__","``basic_stringbuf``\  default constructor forbids it from using SSO capacity","2018-11 (San Diego)","|Complete|","20.0",""
 "`LWG2996 <https://wg21.link/LWG2996>`__","Missing rvalue overloads for ``shared_ptr``\  operations","2018-11 (San Diego)","|Complete|","17.0",""
 "`LWG3008 <https://wg21.link/LWG3008>`__","``make_shared``\  (sub)object destruction semantics are not specified","2018-11 (San Diego)","|Complete|","16.0",""
-"`LWG3022 <https://wg21.link/LWG3022>`__","``is_convertible<derived*, base*>``\  may lead to ODR","2018-11 (San Diego)","Resolved by `P1285R0 <https://wg21.link/P1285R0>`__","",""
+"`LWG3022 <https://wg21.link/LWG3022>`__","``is_convertible<derived*, base*>``\  may lead to ODR","2018-11 (San Diego)","|Nothing To Do| [#note-LWG3022]_","",""
 "`LWG3025 <https://wg21.link/LWG3025>`__","Map-like container deduction guides should use ``pair<Key, T>``\ , not ``pair<const Key, T>``\ ","2018-11 (San Diego)","|Complete|","",""
 "`LWG3031 <https://wg21.link/LWG3031>`__","Algorithms and predicates with non-const reference arguments","2018-11 (San Diego)","","",""
 "`LWG3037 <https://wg21.link/LWG3037>`__","``polymorphic_allocator``\  and incomplete types","2018-11 (San Diego)","|Complete|","16.0",""
@@ -120,7 +120,7 @@
 "`LWG3130 <https://wg21.link/LWG3130>`__","|sect|\ [input.output] needs many ``addressof``\ ","2018-11 (San Diego)","|Complete|","20.0",""
 "`LWG3131 <https://wg21.link/LWG3131>`__","``addressof``\  all the things","2018-11 (San Diego)","","",""
 "`LWG3132 <https://wg21.link/LWG3132>`__","Library needs to ban macros named ``expects``\  or ``ensures``\ ","2018-11 (San Diego)","|Nothing To Do|","",""
-"`LWG3134 <https://wg21.link/LWG3134>`__","[fund.ts.v3] LFTSv3 contains extraneous [meta] variable templates that should have been deleted by P09961","2018-11 (San Diego)","Resolved by `P1210R0 <https://wg21.link/P1210R0>`__","",""
+"`LWG3134 <https://wg21.link/LWG3134>`__","[fund.ts.v3] LFTSv3 contains extraneous [meta] variable templates that should have been deleted by P09961","2018-11 (San Diego)","|Nothing To Do| [#note-LWG3134]_","",""
 "`LWG3137 <https://wg21.link/LWG3137>`__","Header for ``__cpp_lib_to_chars``\ ","2018-11 (San Diego)","|Complete|","",""
 "`LWG3140 <https://wg21.link/LWG3140>`__","``COMMON_REF``\  is unimplementable as specified","2018-11 (San Diego)","|Nothing To Do|","",""
 "`LWG3145 <https://wg21.link/LWG3145>`__","``file_clock``\  breaks ABI for C++17 implementations","2018-11 (San Diego)","|Complete|","",""
diff --git a/libcxx/docs/Status/Cxx20Papers.csv b/libcxx/docs/Status/Cxx20Papers.csv
index 40442f3b6fa50f..8aeff47830ece2 100644
--- a/libcxx/docs/Status/Cxx20Papers.csv
+++ b/libcxx/docs/Status/Cxx20Papers.csv
@@ -32,7 +32,7 @@
 "`P0475R1 <https://wg21.link/P0475R1>`__","LWG 2511: guaranteed copy elision for piecewise construction","2018-06 (Rapperswil)","|Complete|","",""
 "`P0476R2 <https://wg21.link/P0476R2>`__","Bit-casting object representations","2018-06 (Rapperswil)","|Complete|","14.0",""
 "`P0528R3 <https://wg21.link/P0528R3>`__","The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange","2018-06 (Rapperswil)","","",""
-"`P0542R5 <https://wg21.link/P0542R5>`__","Support for contract based programming in C++","2018-06 (Rapperswil)","*Removed in Cologne*","n/a",""
+"`P0542R5 <https://wg21.link/P0542R5>`__","Support for contract based programming in C++","2018-06 (Rapperswil)","|Nothing To Do| [#note-P0542]_","n/a",""
 "`P0556R3 <https://wg21.link/P0556R3>`__","Integral power-of-2 operations","2018-06 (Rapperswil)","|Complete|","9.0",""
 "`P0619R4 <https://wg21.link/P0619R4>`__","Reviewing Deprecated Facilities of C++17 for C++20","2018-06 (Rapperswil)","|Partial| [#note-P0619]_","",""
 "`P0646R1 <https://wg21.link/P0646R1>`__","Improving the Return Value of Erase-Like Algorithms","2018-06 (Rapperswil)","|Complete|","10.0",""
@@ -40,7 +40,7 @@
 "`P0758R1 <https://wg21.link/P0758R1>`__","Implicit conversion traits and utility functions","2018-06 (Rapperswil)","|Complete|","",""
 "`P0759R1 <https://wg21.link/P0759R1>`__","fpos Requirements","2018-06 (Rapperswil)","|Complete|","11.0",""
 "`P0769R2 <https://wg21.link/P0769R2>`__","Add shift to <algorithm>","2018-06 (Rapperswil)","|Complete|","12.0",""
-"`P0788R3 <https://wg21.link/P0788R3>`__","Standard Library Specification in a Concepts and Contracts World","2018-06 (Rapperswil)","*Removed in Cologne*","n/a",""
+"`P0788R3 <https://wg21.link/P0788R3>`__","Standard Library Specification in a Concepts and Contracts World","2018-06 (Rapperswil)","|Nothing To Do| [#note-P0788]_","n/a",""
 "`P0879R0 <https://wg21.link/P0879R0>`__","Constexpr for swap and swap related functions Also resolves LWG issue 2800.","2018-06 (Rapperswil)","|Complete|","13.0",""
 "`P0887R1 <https://wg21.link/P0887R1>`__","The identity metafunction","2018-06 (Rapperswil)","|Complete|","8.0",""
 "`P0892R2 <https://wg21.link/P0892R2>`__","explicit(bool)","2018-06 (Rapperswil)","","",""
@@ -85,7 +85,7 @@
 "`P0340R3 <https://wg21.link/P0340R3>`__","Making std::underlying_type SFINAE-friendly","2019-02 (Kona)","|Complete|","9.0",""
 "`P0738R2 <https://wg21.link/P0738R2>`__","I Stream, You Stream, We All Stream for istream_iterator","2019-02 (Kona)","","",""
 "`P0811R3 <https://wg21.link/P0811R3>`__","Well-behaved interpolation for numbers and pointers","2019-02 (Kona)","|Complete|","9.0",""
-"`P0920R2 <https://wg21.link/P0920R2>`__","Precalculated hash values in lookup","2019-02 (Kona)","Reverted by `P1661 <https://wg21.link/P1661>`__","",""
+"`P0920R2 <https://wg21.link/P0920R2>`__","Precalculated hash values in lookup","2019-02 (Kona)","|Nothing To Do| [#note-P0920]_","",""
 "`P1001R2 <https://wg21.link/P1001R2>`__","Target Vectorization Policies from Parallelism V2 TS to C++20","2019-02 (Kona)","|Complete|","17.0",""
 "`P1024R3 <https://wg21.link/P1024R3>`__","Usability Enhancements for std::span","2019-02 (Kona)","|Complete|","9.0",""
 "`P1164R1 <https://wg21.link/P1164R1>`__","Make create_directory() Intuitive","2019-02 (Kona)","|Complete|","12.0",""
@@ -117,7 +117,7 @@
 "`P1355R2 <https://wg21.link/P1355R2>`__","Exposing a narrow contract for ceil2","2019-07 (Cologne)","|Complete|","9.0",""
 "`P1361R2 <https://wg21.link/P1361R2>`__","Integration of chrono with text formatting","2019-07 (Cologne)","|Partial|","",""
 "`P1423R3 <https://wg21.link/P1423R3>`__","char8_t backward compatibility remediation","2019-07 (Cologne)","|Complete|","15.0",""
-"`P1424R1 <https://wg21.link/P1424R1>`__","'constexpr' feature macro concerns","2019-07 (Cologne)","Superseded by `P1902 <https://wg21.link/P1902>`__","",""
+"`P1424R1 <https://wg21.link/P1424R1>`__","'constexpr' feature macro concerns","2019-07 (Cologne)","|Nothing To Do| [#note-P1424]_","",""
 "`P1466R3 <https://wg21.link/P1466R3>`__","Miscellaneous minor fixes for chrono","2019-07 (Cologne)","","",""
 "`P1474R1 <https://wg21.link/P1474R1>`__","Helpful pointers for ContiguousIterator","2019-07 (Cologne)","|Complete|","15.0","|ranges|"
 "`P1502R1 <https://wg21.link/P1502R1>`__","Standard library header units for C++20","2019-07 (Cologne)","","",""
diff --git a/libcxx/docs/Status/Cxx23.rst b/libcxx/docs/Status/Cxx23.rst
index 23d30c8128d71e..b3918149a735f1 100644
--- a/libcxx/docs/Status/Cxx23.rst
+++ b/libcxx/docs/Status/Cxx23.rst
@@ -46,6 +46,9 @@ Paper Status
    .. [#note-P2520R0] P2520R0: Libc++ implemented this paper as a DR in C++20 as well.
    .. [#note-P2711R1] P2711R1: ``join_with_view`` hasn't been done yet since this type isn't implemented yet.
    .. [#note-P2770R0] P2770R0: ``join_with_view`` hasn't been done yet since this type isn't implemented yet.
+   .. [#note-LWG3494] LWG3494: That LWG issue was superseded by `P2017R1 <https://wg21.link/P2017R1>`__.
+   .. [#note-LWG3481] LWG3481: That LWG issue was superseded by `P2415R2 <https://wg21.link/P2415R2>`__.
+   .. [#note-LWG3265] LWG3265: That LWG issue was resolved by `LWG3435 <https://wg21.link/LWG3435>`__.
    .. [#note-P2693R1] P2693R1: The formatter for ``std::thread::id`` is implemented.
       The formatter for ``stacktrace`` is not implemented, since ``stacktrace`` is
       not implemented yet.
diff --git a/libcxx/docs/Status/Cxx23Issues.csv b/libcxx/docs/Status/Cxx23Issues.csv
index 16471406f41588..a0a9ccdca48c3c 100644
--- a/libcxx/docs/Status/Cxx23Issues.csv
+++ b/libcxx/docs/Status/Cxx23Issues.csv
@@ -5,7 +5,7 @@
 "`LWG3195 <https://wg21.link/LWG3195>`__","What is the stored pointer value of an empty ``weak_ptr``?","2020-11 (Virtual)","|Nothing To Do|","",""
 "`LWG3211 <https://wg21.link/LWG3211>`__","``std::tuple<>`` should be trivially constructible","2020-11 (Virtual)","|Complete|","9.0",""
 "`LWG3236 <https://wg21.link/LWG3236>`__","Random access iterator requirements lack limiting relational operators domain to comparing those from the same range","2020-11 (Virtual)","|Nothing To Do|","",""
-"`LWG3265 <https://wg21.link/LWG3265>`__","``move_iterator``'s conversions are more broken after P1207","2020-11 (Virtual)","Resolved by `LWG3435 <https://wg21.link/LWG3435>`__","",""
+"`LWG3265 <https://wg21.link/LWG3265>`__","``move_iterator``'s conversions are more broken after P1207","2020-11 (Virtual)","|Nothing To Do| [#note-LWG3265]_","",""
 "`LWG3435 <https://wg21.link/LWG3435>`__","``three_way_comparable_with<reverse_iterator<int*>, reverse_iterator<const int*>>``","2020-11 (Virtual)","|Complete|","13.0",""
 "`LWG3432 <https://wg21.link/LWG3432>`__","Missing requirement for ``comparison_category``","2020-11 (Virtual)","|Complete|","16.0","|spaceship|"
 "`LWG3447 <https://wg21.link/LWG3447>`__","Deduction guides for ``take_view`` and ``drop_view`` have different constraints","2020-11 (Virtual)","|Complete|","14.0","|ranges|"
@@ -54,7 +54,7 @@
 "`LWG3433 <https://wg21.link/LWG3433>`__","``subrange::advance(n)`` has UB when ``n < 0``","2021-02 (Virtual)","|Complete|","14.0","|ranges|"
 "`LWG3490 <https://wg21.link/LWG3490>`__","``ranges::drop_while_view::begin()`` is missing a precondition","2021-02 (Virtual)","|Nothing To Do|","","|ranges|"
 "`LWG3492 <https://wg21.link/LWG3492>`__","Minimal improvements to ``elements_view::iterator``","2021-02 (Virtual)","|Complete|","16.0","|ranges|"
-"`LWG3494 <https://wg21.link/LWG3494>`__","Allow ranges to be conditionally borrowed","2021-02 (Virtual)","Superseded by `P2017R1 <https://wg21.link/P2017R1>`__","","|ranges|"
+"`LWG3494 <https://wg21.link/LWG3494>`__","Allow ranges to be conditionally borrowed","2021-02 (Virtual)","|Nothing To Do| [#note-LWG3494]_","","|ranges|"
 "`LWG3495 <https://wg21.link/LWG3495>`__","``constexpr launder`` makes pointers to inactive members of unions usable","2021-02 (Virtual)","|Nothing To Do|","",""
 "`LWG3500 <https://wg21.link/LWG3500>`__","``join_view::iterator::operator->()`` is bogus","2021-02 (Virtual)","|Complete|","14.0","|ranges|"
 "`LWG3502 <https://wg21.link/LWG3502>`__","``elements_view`` should not be allowed to return dangling reference","2021-02 (Virtual)","|Complete|","16.0","|ranges|"
@@ -66,7 +66,7 @@
 "`LWG3410 <https://wg21.link/LWG3410>`__","``lexicographical_compare_three_way`` is overspecified","2021-06 (Virtual)","|Complete|","17.0","|spaceship|"
 "`LWG3430 <https://wg21.link/LWG3430>`__","``std::fstream`` & co. should be constructible from string_view","2021-06 (Virtual)","|Complete|","19.0",""
 "`LWG3462 <https://wg21.link/LWG3462>`__","§[formatter.requirements]: Formatter requirements forbid use of ``fc.arg()``","2021-06 (Virtual)","|Nothing To Do|","","|format|"
-"`LWG3481 <https://wg21.link/LWG3481>`__","``viewable_range`` mishandles lvalue move-only views","2021-06 (Virtual)","Superseded by `P2415R2 <https://wg21.link/P2415R2>`__","","|ranges|"
+"`LWG3481 <https://wg21.link/LWG3481>`__","``viewable_range`` mishandles lvalue move-only views","2021-06 (Virtual)","|Nothing To Do| [#note-LWG3481]_","","|ranges|"
 "`LWG3506 <https://wg21.link/LWG3506>`__","Missing allocator-extended constructors for ``priority_queue``","2021-06 (Virtual)","|Complete|","14.0",""
 "`LWG3517 <https://wg21.link/LWG3517>`__","``join_view::iterator``'s ``iter_swap`` is underconstrained","2021-06 (Virtual)","|Complete|","14.0","|ranges|"
 "`LWG3518 <https://wg21.link/LWG3518>`__","Exception requirements on char trait operations unclear","2021-06 (Virtual)","|Nothing To Do|","",""

>From ae48affd25ac8e211a5bc1c72ef208615fc7eb7d Mon Sep 17 00:00:00 2001
From: Philip Reames <preames at rivosinc.com>
Date: Wed, 21 Aug 2024 10:46:21 -0700
Subject: [PATCH 031/116] [RISCV] Minor style fixes in
 lowerVectorMaskVecReduction [nfc]

Reuse existing routine to avoid duplication, and reduce variable scopes.
---
 llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 66ea6423097ab2..670dee2edb1dfb 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9409,10 +9409,7 @@ SDValue RISCVTargetLowering::lowerVectorMaskVecReduction(SDValue Op,
         getDefaultVLOps(VecVT, ContainerVT, DL, DAG, Subtarget);
   }
 
-  unsigned BaseOpc;
   ISD::CondCode CC;
-  SDValue Zero = DAG.getConstant(0, DL, XLenVT);
-
   switch (Op.getOpcode()) {
   default:
     llvm_unreachable("Unhandled reduction");
@@ -9423,7 +9420,6 @@ SDValue RISCVTargetLowering::lowerVectorMaskVecReduction(SDValue Op,
     Vec = DAG.getNode(RISCVISD::VMXOR_VL, DL, ContainerVT, Vec, TrueMask, VL);
     Vec = DAG.getNode(RISCVISD::VCPOP_VL, DL, XLenVT, Vec, Mask, VL);
     CC = ISD::SETEQ;
-    BaseOpc = ISD::AND;
     break;
   }
   case ISD::VECREDUCE_OR:
@@ -9431,7 +9427,6 @@ SDValue RISCVTargetLowering::lowerVectorMaskVecReduction(SDValue Op,
     // vcpop x != 0
     Vec = DAG.getNode(RISCVISD::VCPOP_VL, DL, XLenVT, Vec, Mask, VL);
     CC = ISD::SETNE;
-    BaseOpc = ISD::OR;
     break;
   case ISD::VECREDUCE_XOR:
   case ISD::VP_REDUCE_XOR: {
@@ -9440,11 +9435,11 @@ SDValue RISCVTargetLowering::lowerVectorMaskVecReduction(SDValue Op,
     Vec = DAG.getNode(RISCVISD::VCPOP_VL, DL, XLenVT, Vec, Mask, VL);
     Vec = DAG.getNode(ISD::AND, DL, XLenVT, Vec, One);
     CC = ISD::SETNE;
-    BaseOpc = ISD::XOR;
     break;
   }
   }
 
+  SDValue Zero = DAG.getConstant(0, DL, XLenVT);
   SDValue SetCC = DAG.getSetCC(DL, XLenVT, Vec, Zero, CC);
   SetCC = DAG.getNode(ISD::TRUNCATE, DL, Op.getValueType(), SetCC);
 
@@ -9457,6 +9452,7 @@ SDValue RISCVTargetLowering::lowerVectorMaskVecReduction(SDValue Op,
   // 0 for an inactive vector, and so we've already received the neutral value:
   // AND gives us (0 == 0) -> 1 and OR/XOR give us (0 != 0) -> 0. Therefore we
   // can simply include the start value.
+  unsigned BaseOpc = ISD::getVecReduceBaseOpcode(Op.getOpcode());
   return DAG.getNode(BaseOpc, DL, Op.getValueType(), SetCC, Op.getOperand(0));
 }
 

>From c975dc1da03d684604ddf787b07b63fb8e903648 Mon Sep 17 00:00:00 2001
From: Harini0924 <79345568+Harini0924 at users.noreply.github.com>
Date: Wed, 21 Aug 2024 10:48:32 -0700
Subject: [PATCH 032/116] [clang] [test] Use lit Syntax for Environment
 Variables in Clang subproject (#102647)

This patch updates the clang tests by replacing shell command
substitutions with lit-compatible syntax for setting and referencing
environment variables. Specifically, the use of shell-style variable
substitution (e.g., `DEFAULT_TRIPLE=`and `EXPECTED_RESOURCE_DIR=`) has
been replaced with `env` and `%{env}` to align with lit's internal shell
requirements. These changes ensure that environment variables are
properly set and accessed within the lit environment.

When using the lit internal shell with the command
`LIT_USE_INTERNAL_SHELL=1 ninja check-clang`, one common error
encountered is:
```
FAIL: Clang :: Driver/program-path-priority.c (19 of 20640)
******************** TEST 'Clang :: Driver/program-path-priority.c' FAILED ********************
Exit Code: 127

Command Output (stdout):
--
# RUN: at line 90
DEFAULT_TRIPLE=`/usr/local/google/home/harinidonthula/llvm-project/build/tools/clang/test/Driver/Output/program-path-priority.c.tmp/clang --version | grep "Target:" | cut -d ' ' -f2`
# executed command: 'DEFAULT_TRIPLE=`/usr/local/google/home/harinidonthula/llvm-project/build/tools/clang/test/Driver/Output/program-path-priority.c.tmp/clang' --version
# .---command stderr------------
# | 'DEFAULT_TRIPLE=`/usr/local/google/home/harinidonthula/llvm-project/build/tools/clang/test/Driver/Output/program-path-priority.c.tmp/clang': command not found
# `-----------------------------
# error: command failed with exit status: 127
```
To fix this issue, the patch replaces traditional shell substitutions
with lit's environment variable handling, ensuring compatibility with
the lit internal shell framework. This update applies to both the
handling of the `DEFAULT_TRIPLE` and `EXPECTED_RESOURCE_DIR` variables,
allowing the tests to pass when using the lit internal shell.
The patch also adds `env` to the `PWD` variable setting in the following
command to ensure the environment variable is correctly set within the
lit internal shell:
```
// RUN: %if system-linux %{ env PWD=/proc/self/cwd %clang -### -c --coverage %s -o foo/bar.o 2>&1 | FileCheck --check-prefix=PWD %s %}
```
fixes: #102395
[link to
RFC](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
---
 clang/test/ClangScanDeps/pr61006.cppm     | 10 +++++-----
 clang/test/Driver/coverage.c              |  4 ++--
 clang/test/Driver/program-path-priority.c |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/clang/test/ClangScanDeps/pr61006.cppm b/clang/test/ClangScanDeps/pr61006.cppm
index f75edd38c81ba9..9ce6edaf2010e1 100644
--- a/clang/test/ClangScanDeps/pr61006.cppm
+++ b/clang/test/ClangScanDeps/pr61006.cppm
@@ -6,13 +6,13 @@
 // RUN: mkdir -p %t
 // RUN: split-file %s %t
 //
-// RUN: EXPECTED_RESOURCE_DIR=`%clang -print-resource-dir` && \
+// RUN: %clang -print-resource-dir > %t/resource-dir.txt && \
 // RUN: ln -s %clang++ %t/clang++ && \
-// RUN: sed "s|EXPECTED_RESOURCE_DIR|$EXPECTED_RESOURCE_DIR|g; s|DIR|%/t|g" %t/P1689.json.in > %t/P1689.json && \
-// RUN: clang-scan-deps -compilation-database %t/P1689.json -format=p1689 | FileCheck %t/a.cpp -DPREFIX=%/t && \
-// RUN: clang-scan-deps -format=p1689 \
+// RUN: sed "s|EXPECTED_RESOURCE_DIR|%{readfile:%t/resource-dir.txt}|g; s|DIR|%/t|g" %t/P1689.json.in > %t/P1689.json && \
+// RUN: env EXPECTED_RESOURCE_DIR=%{readfile:%t/resource-dir.txt} clang-scan-deps -compilation-database %t/P1689.json -format=p1689 | FileCheck %t/a.cpp -DPREFIX=%/t && \
+// RUN: env EXPECTED_RESOURCE_DIR=%{readfile:%t/resource-dir.txt} clang-scan-deps -format=p1689 \
 // RUN:   -- %t/clang++ -std=c++20 -c -fprebuilt-module-path=%t %t/a.cpp -o %t/a.o \
-// RUN:      -resource-dir $EXPECTED_RESOURCE_DIR | FileCheck %t/a.cpp -DPREFIX=%/t
+// RUN:      -resource-dir %{env:EXPECTED_RESOURCE_DIR} | FileCheck %t/a.cpp -DPREFIX=%/t
 
 //--- P1689.json.in
 [
diff --git a/clang/test/Driver/coverage.c b/clang/test/Driver/coverage.c
index e5ed064aab457c..ab791ada2d351a 100644
--- a/clang/test/Driver/coverage.c
+++ b/clang/test/Driver/coverage.c
@@ -18,7 +18,7 @@
 // GCNO-LOCATION-REL: "-coverage-notes-file={{.*}}{{/|\\\\}}foo/bar.gcno"
 
 /// GCC allows PWD to change the paths.
-// RUN: %if system-linux %{ PWD=/proc/self/cwd %clang -### -c --coverage %s -o foo/bar.o 2>&1 | FileCheck --check-prefix=PWD %s %}
+// RUN: %if system-linux %{ env PWD=/proc/self/cwd %clang -### -c --coverage %s -o foo/bar.o 2>&1 | FileCheck --check-prefix=PWD %s %}
 // PWD: "-coverage-notes-file=/proc/self/cwd/foo/bar.gcno" "-coverage-data-file=/proc/self/cwd/foo/bar.gcda"
 
 /// Don't warn -Wunused-command-line-argument.
@@ -50,6 +50,6 @@
 // LINK2: -cc1{{.*}} "-coverage-notes-file={{.*}}{{/|\\\\}}f/gb.gcno" "-coverage-data-file={{.*}}{{/|\\\\}}f/gb.gcda"
 
 /// GCC allows PWD to change the paths.
-// RUN: %if system-linux %{ PWD=/proc/self/cwd %clang -### --coverage d/a.c d/b.c -o e/x -fprofile-dir=f 2>&1 | FileCheck %s --check-prefix=LINK3 %}
+// RUN: %if system-linux %{ env PWD=/proc/self/cwd %clang -### --coverage d/a.c d/b.c -o e/x -fprofile-dir=f 2>&1 | FileCheck %s --check-prefix=LINK3 %}
 // LINK3: -cc1{{.*}} "-coverage-notes-file=/proc/self/cwd/e/x-a.gcno" "-coverage-data-file=f/proc/self/cwd/e/x-a.gcda"
 // LINK3: -cc1{{.*}} "-coverage-notes-file=/proc/self/cwd/e/x-b.gcno" "-coverage-data-file=f/proc/self/cwd/e/x-b.gcda"
diff --git a/clang/test/Driver/program-path-priority.c b/clang/test/Driver/program-path-priority.c
index c940c4ced94420..358a06d7c6d1b5 100644
--- a/clang/test/Driver/program-path-priority.c
+++ b/clang/test/Driver/program-path-priority.c
@@ -87,8 +87,8 @@
 
 /// <default-triple>-gcc has lowest priority so <triple>-gcc
 /// on PATH beats default triple in program path
-// RUN: DEFAULT_TRIPLE=`%t/clang --version | grep "Target:" | cut -d ' ' -f2`
-// RUN: touch %t/$DEFAULT_TRIPLE-gcc && chmod +x %t/$DEFAULT_TRIPLE-gcc
+// RUN: %t/clang --version | grep "Target:" | cut -d ' ' -f2 > %t/default-triple.txt
+// RUN: env DEFAULT_TRIPLE=%{readfile:%t/default-triple.txt} touch %t/%{env:DEFAULT_TRIPLE}-gcc && chmod +x %t/%{env:DEFAULT_TRIPLE}-gcc
 // RUN: touch %t/%target_triple-gcc && chmod +x %t/%target_triple-gcc
 // RUN: env "PATH=%t/env/" %t/clang -### -target notreal-none-elf %s 2>&1 | \
 // RUN:   FileCheck --check-prefix=DEFAULT_TRIPLE_GCC %s

>From b89fef8f67974ebcd4114fa75ac2e53fd687870c Mon Sep 17 00:00:00 2001
From: Michael Jones <michaelrj at google.com>
Date: Wed, 21 Aug 2024 10:50:39 -0700
Subject: [PATCH 033/116] [libc][docs] Update docs to reflect new headergen
 (#102381)

Since new headergen is now the default for building LLVM-libc, the docs
need to be updated to reflect that. While I was editing those docs, I
took a quick pass at updating other out-of-date pages.
---
 libc/docs/build_and_test.rst                 |   7 --
 libc/docs/contributing.rst                   |  14 +--
 libc/docs/dev/api_test.rst                   |  25 ----
 libc/docs/dev/ground_truth_specification.rst |  11 --
 libc/docs/dev/header_generation.rst          |   2 +
 libc/docs/dev/index.rst                      |   3 -
 libc/docs/dev/mechanics_of_public_api.rst    |  29 -----
 libc/docs/dev/source_tree_layout.rst         |  24 ++--
 libc/docs/full_cross_build.rst               | 115 +++----------------
 libc/docs/full_host_build.rst                |  87 ++++++++++++--
 libc/docs/fullbuild_mode.rst                 |   5 +
 libc/docs/gpu/building.rst                   |   6 +-
 libc/docs/index.rst                          |  17 +--
 libc/docs/overlay_mode.rst                   |  36 +++---
 libc/docs/porting.rst                        |  15 ---
 15 files changed, 154 insertions(+), 242 deletions(-)
 delete mode 100644 libc/docs/dev/api_test.rst
 delete mode 100644 libc/docs/dev/ground_truth_specification.rst
 delete mode 100644 libc/docs/dev/mechanics_of_public_api.rst

diff --git a/libc/docs/build_and_test.rst b/libc/docs/build_and_test.rst
index 22b09b07d9612d..ccd8b5bbee4759 100644
--- a/libc/docs/build_and_test.rst
+++ b/libc/docs/build_and_test.rst
@@ -38,13 +38,6 @@ The libc can be built and tested in two different modes:
 
         $> ninja libc-integration-tests
 
-   #. API verification test - See :ref:`api_test` for more information about
-      the API test. It can be run by the command:
-
-      .. code-block:: sh
-
-        $> ninja libc-api-test
-
 Building with VSCode
 ====================
 
diff --git a/libc/docs/contributing.rst b/libc/docs/contributing.rst
index bd7d9d79be57d7..a674290cf6dc03 100644
--- a/libc/docs/contributing.rst
+++ b/libc/docs/contributing.rst
@@ -4,7 +4,7 @@
 Contributing to the libc Project
 ================================
 
-LLVM's libc is being developed as part of the LLVM project so contributions
+LLVM-libc is being developed as part of the LLVM project so contributions
 to the libc project should also follow the general LLVM
 `contribution guidelines <https://llvm.org/docs/Contributing.html>`_. Below is
 a list of open projects that one can start with:
@@ -31,24 +31,12 @@ a list of open projects that one can start with:
    directory. So, a simple but mechanical project would be to move the parts
    following the old styles to the new style.
 
-#. **Integrating with the rest of the LLVM project** - There are two parts to
-   this project:
-
-   #. One is about adding CMake facilities to optionally link the libc's overlay
-      static archive (see :ref:`overlay_mode`) with other LLVM tools/executables.
-   #. The other is about putting plumbing in place to release the overlay static
-      archive (see :ref:`overlay_mode`) as part of the LLVM binary releases.
-
 #. **Implement Linux syscall wrappers** - A large portion of the POSIX API can
    be implemented as syscall wrappers on Linux. A good number have already been
    implemented but many more are yet to be implemented. So, a project of medium
    complexity would be to implement syscall wrappers which have not yet been
    implemented.
 
-#. **Add a better random number generator** - The current random number
-   generator has a very small range. This has to be improved or switched over
-   to a fast random number generator with a large range.
-
 #. **Update the clang-tidy lint rules and use them in the build and/or CI** -
    Currently, the :ref:`clang_tidy_checks` have gone stale and are mostly unused
    by the developers and on the CI builders. This project is about updating
diff --git a/libc/docs/dev/api_test.rst b/libc/docs/dev/api_test.rst
deleted file mode 100644
index 3191a32b7e3fb1..00000000000000
--- a/libc/docs/dev/api_test.rst
+++ /dev/null
@@ -1,25 +0,0 @@
-.. _api_test:
-
-========
-API Test
-========
-
-.. warning::
-  This page is severely out of date. Much of the information it contains may be
-  incorrect. Please only remove this warning once the page has been updated.
-
-The implementation of libc-project is unique because our public C header files
-are generated using information from ground truth captured in TableGen files.
-Unit tests only exercise the internal C++ implementations and don't ensure the
-headers were generated by the build system and that the generated header files
-contain the expected declarations and definitions. A simple solution is to have
-contributors write an integration test for each individual function as a C
-program; however, this would place a large burden on contributors and duplicates
-some effort from the unit tests.
-
-Instead we automate the generation of what we call as an API test. This API test
-ensures that public facing symbols are visible, that the header files are
-generated as expected, and that each libc function has the correct function
-prototype as specified by the standards. The API test cmake rules are located in
-``test/src/CMakeLists.txt``. The source file for the API test is generated in
-``<build directory>/projects/libc/test/src/public_api_test.cpp``
diff --git a/libc/docs/dev/ground_truth_specification.rst b/libc/docs/dev/ground_truth_specification.rst
deleted file mode 100644
index f2540b6f78e715..00000000000000
--- a/libc/docs/dev/ground_truth_specification.rst
+++ /dev/null
@@ -1,11 +0,0 @@
-The ground truth of standards
-=============================
-
-Like any modern libc, LLVM libc also supports a wide number of standards and
-extensions. To avoid developing headers, wrappers and sources in a disjointed
-fashion, LLVM libc employs ground truth files. These files live under the
-``spec`` directory and list ground truth corresponding the ISO C standard, the
-POSIX extension standard, etc. For example, the path to the ground truth file
-for the ISO C standard is ``spec/stdc.td``. Tools like the header generator
-(described in the header generation document), docs generator, etc. use the
-ground truth files to generate headers, docs etc.
diff --git a/libc/docs/dev/header_generation.rst b/libc/docs/dev/header_generation.rst
index 735db2d291ff16..ec4206217ca777 100644
--- a/libc/docs/dev/header_generation.rst
+++ b/libc/docs/dev/header_generation.rst
@@ -1,3 +1,5 @@
+.. _header_generation:
+
 Generating Public and Internal headers
 ======================================
 
diff --git a/libc/docs/dev/index.rst b/libc/docs/dev/index.rst
index 87712afcae2ac6..c16121feb3a45d 100644
--- a/libc/docs/dev/index.rst
+++ b/libc/docs/dev/index.rst
@@ -15,10 +15,7 @@ Navigate to the links below for information on the respective topics:
    config_options
    clang_tidy_checks
    fuzzing
-   ground_truth_specification
    header_generation
    implementation_standard
    undefined_behavior
    printf_behavior
-   api_test
-   mechanics_of_public_api
diff --git a/libc/docs/dev/mechanics_of_public_api.rst b/libc/docs/dev/mechanics_of_public_api.rst
deleted file mode 100644
index 257ab3d71bc17a..00000000000000
--- a/libc/docs/dev/mechanics_of_public_api.rst
+++ /dev/null
@@ -1,29 +0,0 @@
-The mechanics of the ``public_api`` command
-===========================================
-
-The build system, in combination with the header generation mechanism,
-facilitates the fine grained ability to pick and choose the public API one wants
-to expose on their platform. The public header files are always generated from
-the corresponding ``.h.def`` files. A header generation command ``%%public_api``
-is listed in these files. In the generated header file, the header generator
-replaces this command with the public API relevant for the target platform.
-
-Under the hood
---------------
-
-When the header generator sees the ``%%public_api`` command, it looks up the
-API config file for the platform in the path ``config/<platform>/api.td``.
-The API config file lists two kinds of items:
-
-1. The list of standards from which the public entities available on the platform
-   are derived from.
-2. For each header file exposed on the platform, the list of public members
-   provided in that header file.
-
-Note that, the header generator only learns the names of the public entities
-from the header config file (the 2nd item from above.) The exact manner in which
-the entities are to be declared is got from the standards (the 1st item from
-above.)
-
-See the ground truth document for more information on how the standards are
-formally listed in LLVM libc using LLVM table-gen files.
diff --git a/libc/docs/dev/source_tree_layout.rst b/libc/docs/dev/source_tree_layout.rst
index 0bcedc96a133c3..8b423a1712cc81 100644
--- a/libc/docs/dev/source_tree_layout.rst
+++ b/libc/docs/dev/source_tree_layout.rst
@@ -14,9 +14,10 @@ directories::
         - docs
         - examples
         - fuzzing
+        - hdr
         - include
         - lib
-        - spec
+        - newhdrgen
         - src
         - startup
         - test
@@ -62,6 +63,14 @@ The directory structure within this directory mirrors the directory structure
 of the top-level ``libc`` directory itself. For more details, see
 :doc:`fuzzing`.
 
+The ``hdr`` directory
+---------------------
+
+This directory contains proxy headers which are included from the files in the
+src directory. These proxy headers either include our internal type or macro
+definitions, or the system's type or macro definitions, depending on if we are
+in fullbuild or overlay mode.
+
 The ``include`` directory
 -------------------------
 
@@ -80,13 +89,14 @@ The ``lib`` directory
 This directory contains a ``CMakeLists.txt`` file listing the targets for the
 public libraries ``libc.a``, ``libm.a`` etc.
 
-The ``spec`` directory
-----------------------
+The ``newhdrgen`` directory
+---------------------------
 
-This directory contains the specifications for the types, macros, and entrypoint
-functions. These definitions come from the various standards and extensions
-LLVM-libc supports, and they are used along with the ``*.h.def`` files and the
-config files to generate the headers for fullbuild mode.
+This directory contains the sources and specifications for the types, macros
+and entrypoint functions. These definitions are organized in the ``yaml``
+subdirectory and match the organization of the ``*.h.def`` files. This folder
+also contains the python sources for new headergen, which is what generates the
+headers.
 
 The ``src`` directory
 ---------------------
diff --git a/libc/docs/full_cross_build.rst b/libc/docs/full_cross_build.rst
index 100e17a977e764..5f57169d228ef7 100644
--- a/libc/docs/full_cross_build.rst
+++ b/libc/docs/full_cross_build.rst
@@ -8,35 +8,33 @@ Full Cross Build
    :depth: 1
    :local:
 
+.. note:: 
+   Fullbuild requires running headergen, which is a python program that depends on
+   pyyaml. The minimum versions are listed on the :ref:`header_generation`
+   page, as well as additional information.
+
 In this document, we will present recipes to cross build the full libc. When we
 say *cross build* a full libc, we mean that we will build the full libc for a
 target system which is not the same as the system on which the libc is being
 built. For example, you could be building for a bare metal aarch64 *target* on a
 Linux x86_64 *host*.
 
-There are three main recipes to cross build the full libc. Each one serves a
+There are two main recipes to cross build the full libc. Each one serves a
 different use case. Below is a short description of these recipes to help users
 pick the recipe that best suites their needs and contexts.
 
 * **Standalone cross build** - Using this recipe one can build the libc using a
   compiler of their choice. One should use this recipe if their compiler can
   build for the host as well as the target.
-* **Runtimes cross build** - In this recipe, one will have to first build the
-  libc build tools for the host separately and then use those build tools to
-  build the libc. Users can use the compiler of their choice to build the
-  libc build tools as well as the libc. One should use this recipe if they
-  have to use a host compiler to build the build tools for the host and then
-  use a target compiler (which is different from the host compiler) to build
-  the libc.
 * **Bootstrap cross build** - In this recipe, one will build the ``clang``
   compiler and the libc build tools for the host first, and then use them to
-  build the libc for the target. Unlike with the runtimes build recipe, the
-  user does not have explicitly build ``clang`` and other libc build tools.
+  build the libc for the target. Unlike with the standalone build recipe, the
+  user does not have explicitly build ``clang`` and other build tools.
   They get built automatically before building the libc. One should use this
   recipe if they intend use the built ``clang`` and the libc as part of their
   toolchain for the target.
 
-The following sections present the three recipes in detail.
+The following sections present the two recipes in detail.
 
 Standalone cross build
 ======================
@@ -61,9 +59,9 @@ Below is the CMake command to configure the standalone crossbuild of the libc.
   $> cd build
   $> C_COMPILER=<C compiler> # For example "clang"
   $> CXX_COMPILER=<C++ compiler> # For example "clang++"
-  $> cmake ../llvm  \
+  $> cmake ../runtimes  \
      -G Ninja \
-     -DLLVM_ENABLE_PROJECTS=libc  \
+     -DLLVM_ENABLE_RUNTIMES=libc  \
      -DCMAKE_C_COMPILER=$C_COMPILER \
      -DCMAKE_CXX_COMPILER=$CXX_COMPILER \
      -DLLVM_LIBC_FULL_BUILD=ON \
@@ -72,8 +70,8 @@ Below is the CMake command to configure the standalone crossbuild of the libc.
 
 We will go over the special options passed to the ``cmake`` command above.
 
-* **Enabled Projects** - Since we want to build the libc project, we list
-  ``libc`` as the enabled project.
+* **Enabled Runtimes** - Since we want to build LLVM-libc, we list
+  ``libc`` as the enabled runtime.
 * **The full build option** - Since we want to build the full libc, we pass
   ``-DLLVM_LIBC_FULL_BUILD=ON``.
 * **The target triple** - This is the target triple of the target for which
@@ -94,88 +92,6 @@ The above ``ninja`` command will build the libc static archives ``libc.a`` and
 ``libm.a`` for the target specified with ``-DLIBC_TARGET_TRIPLE`` in the CMake
 configure step.
 
-.. _runtimes_cross_build:
-
-Runtimes cross build
-====================
-
-The *runtimes cross build* is very similar to the standalone crossbuild but the
-user will have to first build the libc build tools for the host separately. One
-should use this recipe if they want to use a different host and target compiler.
-Note that the libc build tools MUST be in sync with the libc. That is, the
-libc build tools and the libc, both should be built from the same source
-revision. At the time of this writing, there is only one libc build tool that
-has to be built separately. It is done as follows:
-
-.. code-block:: sh
-
-  $> cd llvm-project  # The llvm-project checkout
-  $> mkdir build-libc-tools # A different build directory for the build tools
-  $> cd build-libc-tools
-  $> HOST_C_COMPILER=<C compiler for the host> # For example "clang"
-  $> HOST_CXX_COMPILER=<C++ compiler for the host> # For example "clang++"
-  $> cmake ../llvm  \
-     -G Ninja \
-     -DLLVM_ENABLE_PROJECTS=libc  \
-     -DCMAKE_C_COMPILER=$HOST_C_COMPILER \
-     -DCMAKE_CXX_COMPILER=$HOST_CXX_COMPILER  \
-     -DLLVM_LIBC_FULL_BUILD=ON \
-     -DCMAKE_BUILD_TYPE=Debug # User can choose to use "Release" build type
-  $> ninja libc-hdrgen
-
-The above commands should build a binary named ``libc-hdrgen``. Copy this binary
-to a directory of your choice.
-
-CMake configure step
---------------------
-
-After copying the ``libc-hdrgen`` binary to say ``/path/to/libc-hdrgen``,
-configure the libc build using the following command:
-
-.. code-block:: sh
-
-  $> cd llvm-project  # The llvm-project checkout
-  $> mkdir build
-  $> cd build
-  $> TARGET_C_COMPILER=<C compiler for the target>
-  $> TARGET_CXX_COMPILER=<C++ compiler for the target>
-  $> HDRGEN=</path/to/libc-hdrgen>
-  $> TARGET_TRIPLE=<Your target triple>
-  $> cmake ../runtimes  \
-     -G Ninja \
-     -DLLVM_ENABLE_RUNTIMES=libc  \
-     -DCMAKE_C_COMPILER=$TARGET_C_COMPILER \
-     -DCMAKE_CXX_COMPILER=$TARGET_CXX_COMPILER \
-     -DLLVM_LIBC_FULL_BUILD=ON \
-     -DLIBC_HDRGEN_EXE=$HDRGEN \
-     -DLIBC_TARGET_TRIPLE=$TARGET_TRIPLE \
-     -DCMAKE_BUILD_TYPE=Debug # User can choose to use "Release" build type
-
-Note the differences in the above cmake command versus the one used in the
-CMake configure step of the standalone build recipe:
-
-* Instead of listing ``libc`` in ``LLVM_ENABLED_PROJECTS``, we list it in
-  ``LLVM_ENABLED_RUNTIMES``.
-* Instead of using ``llvm-project/llvm`` as the root CMake source directory,
-  we use ``llvm-project/runtimes`` as the root CMake source directory.
-* The path to the ``libc-hdrgen`` binary built earlier is specified with
-  ``-DLIBC_HDRGEN_EXE=/path/to/libc-hdrgen``.
-
-Build step
-----------
-
-The build step in the runtimes build recipe is exactly the same as that of
-the standalone build recipe:
-
-.. code-block:: sh
-
-    $> ninja libc libm
-
-As with the standalone build recipe, the above ninja command will build the
-libc static archives for the target specified with ``-DLIBC_TARGET_TRIPLE`` in
-the CMake configure step.
-
-
 Bootstrap cross build
 =====================
 
@@ -203,8 +119,7 @@ CMake configure step
      -DLLVM_RUNTIME_TARGETS=$TARGET_TRIPLE \
      -DCMAKE_BUILD_TYPE=Debug
 
-Note how the above cmake command differs from the one used in the other two
-recipes:
+Note how the above cmake command differs from the one used in the other recipe:
 
 * ``clang`` is listed in ``-DLLVM_ENABLE_PROJECTS`` and ``libc`` is
   listed in ``-DLLVM_ENABLE_RUNTIMES``.
@@ -214,7 +129,7 @@ recipes:
 Build step
 ----------
 
-The build step is similar to the other two recipes:
+The build step is similar to the other recipe:
 
 .. code-block:: sh
 
diff --git a/libc/docs/full_host_build.rst b/libc/docs/full_host_build.rst
index 4fb3072590f322..f687c2fdab213e 100644
--- a/libc/docs/full_host_build.rst
+++ b/libc/docs/full_host_build.rst
@@ -8,17 +8,90 @@ Full Host Build
    :depth: 1
    :local:
 
+.. note:: 
+   Fullbuild requires running headergen, which is a python program that depends on
+   pyyaml. The minimum versions are listed on the :ref:`header_generation`
+   page, as well as additional information.
+
 In this document, we will present a recipe to build the full libc for the host.
 When we say *build the libc for the host*, the goal is to build the libc for
-the same system on which the libc is being built. Also, we will take this
-opportunity to demonstrate how one can set up a *sysroot* (see the documentation
+the same system on which the libc is being built. First, we will explain how to
+build for developing LLVM-libc, then we will explain how to build LLVM-libc as
+part of a complete toolchain.
+
+Configure the build for development
+===================================
+
+
+Below is the list of commands for a simple recipe to build LLVM-libc for
+development. In this we've set the Ninja generator, set the build type to
+"Debug", and enabled the Scudo allocator. This build also enables generating the
+documentation and verbose cmake logging, which are useful development features.
+
+.. note::
+   if your build fails with an error saying the compiler can't find
+   ``<asm/unistd.h>`` or similar then you're probably missing the symlink from
+   ``/usr/include/asm`` to ``/usr/include/<HOST TRIPLE>/asm``. Installing the
+   ``gcc-multilib`` package creates this symlink, or you can do it manually with
+   this command:
+   ``sudo ln -s /usr/include/<HOST TRIPLE>/asm /usr/include/asm``
+   (your host triple will probably be similar to ``x86_64-linux-gnu``)
+
+.. code-block:: sh
+
+   $> cd llvm-project  # The llvm-project checkout
+   $> mkdir build
+   $> cd build
+   $> cmake ../runtimes \
+      -G Ninja \
+      -DCMAKE_C_COMPILER=clang \
+      -DCMAKE_CXX_COMPILER=clang++ \
+      -DLLVM_ENABLE_RUNTIMES="libc;compiler-rt" \
+      -DLLVM_LIBC_FULL_BUILD=ON \
+      -DCMAKE_BUILD_TYPE=Debug \
+      -DLLVM_LIBC_INCLUDE_SCUDO=ON \
+      -DCOMPILER_RT_BUILD_SCUDO_STANDALONE_WITH_LLVM_LIBC=ON \
+      -DCOMPILER_RT_BUILD_GWP_ASAN=OFF                       \
+      -DCOMPILER_RT_SCUDO_STANDALONE_BUILD_SHARED=OFF        \
+      -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
+      -DLLVM_ENABLE_SPHINX=ON -DLIBC_INCLUDE_DOCS=ON \
+      -DLIBC_CMAKE_VERBOSE_LOGGING=ON
+
+Build and test
+==============
+
+After configuring the build with the above ``cmake`` command, one can build test
+libc with the following command:
+
+.. code-block:: sh
+
+   $> ninja libc libm check-libc
+
+To build the docs run this command:
+
+
+.. code-block:: sh
+
+   $> ninja docs-libc-html
+
+To run a specific test, use the following:
+
+.. code-block:: sh
+
+   $> ninja libc.test.src.<HEADER>.<FUNCTION>_test.__unit__
+   $> ninja libc.test.src.ctype.isalpha_test.__unit__ # EXAMPLE
+
+Configure the complete toolchain build
+======================================
+
+For a complete toolchain we recommend creating a *sysroot* (see the documentation
 of the ``--sysroot`` option here:
 `<https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html>`_) which includes
 not only the components of LLVM's libc, but also a full LLVM only toolchain
 consisting of the `clang <https://clang.llvm.org/>`_ compiler, the
 `lld <https://lld.llvm.org/>`_ linker and the
-`compiler-rt <https://compiler-rt.llvm.org/>`_ runtime libraries. LLVM's libc is
-not yet complete enough to allow using and linking a C++ application against
+`compiler-rt <https://compiler-rt.llvm.org/>`_ runtime libraries. LLVM-libc is
+not quite complete enough to allow using and linking a C++ application against
 a C++ standard library (like libc++). Hence, we do not include
 `libc++ <https://libcxx.llvm.org/>`_ in the sysroot.
 
@@ -26,9 +99,6 @@ a C++ standard library (like libc++). Hence, we do not include
    `libc++ <https://libcxx.llvm.org/>`_, libcxx-abi and libunwind in the
    LLVM only toolchain and use them to build and link C++ applications.
 
-Configure the full libc build
-===============================
-
 Below is the list of commands for a simple recipe to build and install the
 libc components along with other components of an LLVM only toolchain.  In this
 we've set the Ninja generator, enabled a full compiler suite, set the build
@@ -43,6 +113,7 @@ to use the freshly built lld and compiler-rt.
    this command:
    ``sudo ln -s /usr/include/<TARGET TRIPLE>/asm /usr/include/asm``
 
+.. TODO: Move from projects to runtimes for libc, compiler-rt
 .. code-block:: sh
 
    $> cd llvm-project  # The llvm-project checkout
@@ -51,7 +122,7 @@ to use the freshly built lld and compiler-rt.
    $> SYSROOT=/path/to/sysroot # Remember to set this!
    $> cmake ../llvm  \
       -G Ninja  \
-      -DLLVM_ENABLE_PROJECTS="clang;libc;lld;compiler-rt"   \
+      -DLLVM_ENABLE_PROJECTS="clang;lld;libc;compiler-rt"   \
       -DCMAKE_BUILD_TYPE=Debug  \
       -DCMAKE_C_COMPILER=clang \
       -DCMAKE_CXX_COMPILER=clang++ \
diff --git a/libc/docs/fullbuild_mode.rst b/libc/docs/fullbuild_mode.rst
index b1151017fbc794..d5c62172dac8e7 100644
--- a/libc/docs/fullbuild_mode.rst
+++ b/libc/docs/fullbuild_mode.rst
@@ -4,6 +4,11 @@
 Fullbuild Mode
 ==============
 
+.. note:: 
+   Fullbuild requires running headergen, which is a python program that depends on
+   pyyaml. The minimum versions are listed on the :ref:`header_generation`
+   page, as well as additional information.
+
 The *fullbuild* mode of LLVM's libc is the mode in which it is to be used as
 the only libc (as opposed to the :ref:`overlay_mode` in which it is used along
 with the system libc.) In order to use it as the only libc, one will have to
diff --git a/libc/docs/gpu/building.rst b/libc/docs/gpu/building.rst
index 60498e348395a3..37dccdab6dc340 100644
--- a/libc/docs/gpu/building.rst
+++ b/libc/docs/gpu/building.rst
@@ -63,9 +63,13 @@ targeting the default host environment as well.
 Runtimes cross build
 --------------------
 
+.. note::
+  These instructions need to be updated for new headergen. They may be
+  inaccurate.
+
 For users wanting more direct control over the build process, the build steps
 can be done manually instead. This build closely follows the instructions in the
-:ref:`main documentation<runtimes_cross_build>` but is specialized for the GPU
+:ref:`main documentation<full_cross_build>` but is specialized for the GPU
 build. We follow the same steps to first build the libc tools and a suitable
 compiler. These tools must all be up-to-date with the libc source.
 
diff --git a/libc/docs/index.rst b/libc/docs/index.rst
index 5b96987e0aada0..d089a800ab90ab 100644
--- a/libc/docs/index.rst
+++ b/libc/docs/index.rst
@@ -2,14 +2,16 @@
 The LLVM C Library
 ==================
 
-.. warning::
-  The libc is not complete.  If you need a fully functioning C library right
-  now, you should continue to use your standard system libraries.
+.. note::
+  LLVM-libc is not fully complete right now. Some programs may fail to build due
+  to missing functions (especially C++ ones). If you would like to help us
+  finish LLVM-libc, check out "Contributing to the libc project" in the sidebar
+  or ask on discord.
 
 Introduction
 ============
 
-The libc aspires to a unique place in the software ecosystem.  The goals are:
+LLVM-libc aspires to a unique place in the software ecosystem.  The goals are:
 
 - Fully compliant with current C standards (C17 and upcoming C2x) and POSIX.
 - Easily decomposed and embedded: Supplement or replace system C library
@@ -32,8 +34,9 @@ The libc aspires to a unique place in the software ecosystem.  The goals are:
 Platform Support
 ================
 
-Most development is currently targeting x86_64 and aarch64 on Linux.  Several
-functions in the libc have been tested on Windows.  The Fuchsia platform is
+Most development is currently targeting Linux on x86_64, aarch64, arm, and
+RISC-V. Embedded/baremetal targets are supported on arm and RISC-V, and Windows
+and MacOS have limited support (may be broken).  The Fuchsia platform is
 slowly replacing functions from its bundled libc with functions from this
 project.
 
@@ -41,7 +44,7 @@ ABI Compatibility
 =================
 
 The libc is written to be ABI independent.  Interfaces are generated using
-LLVM's tablegen, so supporting arbitrary ABIs is possible.  In it's initial
+headergen, so supporting arbitrary ABIs is possible.  In it's initial
 stages there is no ABI stability in any form.
 
 .. toctree::
diff --git a/libc/docs/overlay_mode.rst b/libc/docs/overlay_mode.rst
index 37368ffc1fea15..ca04c4c7674a3e 100644
--- a/libc/docs/overlay_mode.rst
+++ b/libc/docs/overlay_mode.rst
@@ -28,18 +28,18 @@ Also, if users choose to mix more than one libc with the system libc, then
 the name ``libllvmlibc.a`` makes it absolutely clear that it is the static
 archive of LLVM's libc.
 
-Building the static archive with libc as a normal LLVM project
---------------------------------------------------------------
+Building LLVM-libc as a standalone runtime
+------------------------------------------
 
-We can treat the ``libc`` project as any other normal LLVM project and perform
-the CMake configure step as follows:
+We can treat the ``libc`` project like any other normal LLVM runtime library by
+building it with the following cmake command:
 
 .. code-block:: sh
 
   $> cd llvm-project  # The llvm-project checkout
   $> mkdir build
   $> cd build
-  $> cmake ../llvm -G Ninja -DLLVM_ENABLE_RUNTIMES="libc"  \
+  $> cmake ../runtimes -G Ninja -DLLVM_ENABLE_RUNTIMES="libc"  \
      -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
      -DCMAKE_BUILD_TYPE=<Debug|Release>                    \  # Select build type
      -DCMAKE_INSTALL_PREFIX=<Your prefix of choice>           # Optional
@@ -50,24 +50,29 @@ Next, build the libc:
 
   $> ninja libc
 
+Then, run the tests:
+
+.. code-block:: sh
+
+  $> ninja check-libc
+
 The build step will build the static archive the in the directory
 ``build/projects/libc/lib``. Notice that the above CMake configure step also
-specified an install prefix. This is optional, but if one uses it, then they
-can follow up the build step with an install step:
+specified an install prefix. This is optional, but it's used, then the following
+command will install the static archive to the install path:
 
 .. code-block:: sh
 
-  $> ninja install-llvmlibc
+  $> ninja install-libc
 
 Building the static archive as part of the bootstrap build
 ----------------------------------------------------------
 
 The bootstrap build is a build mode in which runtime components like libc++,
 libcxx-abi, libc etc. are built using the ToT clang. The idea is that this build
-produces an in-sync toolchain of compiler + runtime libraries. Such a synchrony
-is not essential for the libc but can one still build the overlay static archive
-as part of the bootstrap build if one wants to. The first step is to configure
-appropriately:
+produces an in-sync toolchain of compiler + runtime libraries. This ensures that
+LLVM-libc has access to the latest clang features, which should provide the best
+performance possible.
 
 .. code-block:: sh
 
@@ -77,14 +82,13 @@ appropriately:
      -DCMAKE_BUILD_TYPE=<Debug|Release>                    \  # Select build type
      -DCMAKE_INSTALL_PREFIX=<Your prefix of choice>           # Optional
 
-The build and install steps are similar to the those used when configured
-as a normal project. Note that the build step takes much longer this time
-as ``clang`` will be built before building ``libllvmlibc.a``.
+The build and install steps are the same as above, but the build step will take
+much longer since ``clang`` will be built before building ``libllvmlibc.a``.
 
 .. code-block:: sh
 
   $> ninja libc
-  $> ninja install-llvmlibc
+  $> ninja check-libc
 
 Using the overlay static archive
 ================================
diff --git a/libc/docs/porting.rst b/libc/docs/porting.rst
index ef7a2ff5cc8758..a4df4e8cf0719d 100644
--- a/libc/docs/porting.rst
+++ b/libc/docs/porting.rst
@@ -43,21 +43,6 @@ have their own config directory.
    config directory for Fuchsia as the bring up is being done in the Fuchsia
    source tree.
 
-The api.td file
----------------
-
-If the :ref:`fullbuild_mode` is to be supported on the new operating system,
-then a file named ``api.td`` should be added in its config directory. It is
-written in the
-`LLVM tablegen language <https://llvm.org/docs/TableGen/ProgRef.html>`_.
-It lists all the relevant macros and type definitions we want in the
-public libc header files. See the existing Linux
-`api.td <https://github.com/llvm/llvm-project/blob/main/libc/config/linux/api.td>`_
-file as an example to prepare the ``api.td`` file for the new operating system.
-
-.. note:: In future, LLVM tablegen will be replaced with a different DSL to list
-   config information.
-
 Architecture Subdirectory
 =========================
 

>From 22d3fb182c9199ac3d51e5577c6647508a7a37f0 Mon Sep 17 00:00:00 2001
From: Mircea Trofin <mtrofin at google.com>
Date: Wed, 21 Aug 2024 10:52:10 -0700
Subject: [PATCH 034/116] [ctx_prof] Profile flatterner (#104539)

Eventually we'll need to flatten the profile (at the end of all IPO) and lower to "vanilla" `MD_prof`. This is the first part of that.

Issue #89287
---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  | 10 +++
 llvm/lib/Analysis/CtxProfAnalysis.cpp         | 40 ++++++++++++
 .../Analysis/CtxProfAnalysis/full-cycle.ll    | 65 ++++++++++++++++++-
 llvm/test/Analysis/CtxProfAnalysis/load.ll    |  5 ++
 4 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 43587d953fc4ca..23abcbe2c6e9d2 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -9,6 +9,8 @@
 #ifndef LLVM_ANALYSIS_CTXPROFANALYSIS_H
 #define LLVM_ANALYSIS_CTXPROFANALYSIS_H
 
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/PassManager.h"
@@ -18,6 +20,12 @@ namespace llvm {
 
 class CtxProfAnalysis;
 
+// Setting initial capacity to 1 because all contexts must have at least 1
+// counter, and then, because all contexts belonging to a function have the same
+// size, there'll be at most one other heap allocation.
+using CtxProfFlatProfile =
+    DenseMap<GlobalValue::GUID, SmallVector<uint64_t, 1>>;
+
 /// The instrumented contextual profile, produced by the CtxProfAnalysis.
 class PGOContextualProfile {
   friend class CtxProfAnalysis;
@@ -65,6 +73,8 @@ class PGOContextualProfile {
     return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
   }
 
+  const CtxProfFlatProfile flatten() const;
+
   bool invalidate(Module &, const PreservedAnalyses &PA,
                   ModuleAnalysisManager::Invalidator &) {
     // Check whether the analysis has been explicitly invalidated. Otherwise,
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index 51663196b13070..ceebb2cf06d235 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -184,6 +184,14 @@ PreservedAnalyses CtxProfAnalysisPrinterPass::run(Module &M,
   OS << "\nCurrent Profile:\n";
   OS << formatv("{0:2}", JSONed);
   OS << "\n";
+  OS << "\nFlat Profile:\n";
+  auto Flat = C.flatten();
+  for (const auto &[Guid, Counters] : Flat) {
+    OS << Guid << " : ";
+    for (auto V : Counters)
+      OS << V << " ";
+    OS << "\n";
+  }
   return PreservedAnalyses::all();
 }
 
@@ -193,3 +201,35 @@ InstrProfCallsite *CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) {
       return IPC;
   return nullptr;
 }
+
+static void
+preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles,
+              function_ref<void(const PGOCtxProfContext &)> Visitor) {
+  std::function<void(const PGOCtxProfContext &)> Traverser =
+      [&](const auto &Ctx) {
+        Visitor(Ctx);
+        for (const auto &[_, SubCtxSet] : Ctx.callsites())
+          for (const auto &[__, Subctx] : SubCtxSet)
+            Traverser(Subctx);
+      };
+  for (const auto &[_, P] : Profiles)
+    Traverser(P);
+}
+
+const CtxProfFlatProfile PGOContextualProfile::flatten() const {
+  assert(Profiles.has_value());
+  CtxProfFlatProfile Flat;
+  preorderVisit(*Profiles, [&](const PGOCtxProfContext &Ctx) {
+    auto [It, Ins] = Flat.insert({Ctx.guid(), {}});
+    if (Ins) {
+      llvm::append_range(It->second, Ctx.counters());
+      return;
+    }
+    assert(It->second.size() == Ctx.counters().size() &&
+           "All contexts corresponding to a function should have the exact "
+           "same number of counters.");
+    for (size_t I = 0, E = It->second.size(); I < E; ++I)
+      It->second[I] += Ctx.counters()[I];
+  });
+  return Flat;
+}
diff --git a/llvm/test/Analysis/CtxProfAnalysis/full-cycle.ll b/llvm/test/Analysis/CtxProfAnalysis/full-cycle.ll
index 0cdf82bd96efcb..06ba8b3542f7d5 100644
--- a/llvm/test/Analysis/CtxProfAnalysis/full-cycle.ll
+++ b/llvm/test/Analysis/CtxProfAnalysis/full-cycle.ll
@@ -4,6 +4,9 @@
 ; RUN: split-file %s %t
 ;
 ; Test that the GUID metadata survives through thinlink.
+; Also test that the flattener works correctly. f2 is called in 2 places, with
+; different counter values, and we expect resulting flat profile to be the sum
+; (of values at the same index).
 ;
 ; RUN: llvm-ctxprof-util fromJSON --input=%t/profile.json --output=%t/profile.ctxprofdata
 ;
@@ -17,7 +20,9 @@
 ; RUN: llvm-lto2 run %t/m1.bc %t/m2.bc -o %t/ -thinlto-distributed-indexes \
 ; RUN:  -use-ctx-profile=%t/profile.ctxprofdata \
 ; RUN:  -r %t/m1.bc,f1,plx \
+; RUN:  -r %t/m1.bc,f3,plx \
 ; RUN:  -r %t/m2.bc,f1 \
+; RUN:  -r %t/m2.bc,f3 \
 ; RUN:  -r %t/m2.bc,entrypoint,plx
 ; RUN: opt --passes='function-import,require<ctx-prof-analysis>,print<ctx-prof-analysis>' \
 ; RUN:  -summary-file=%t/m2.bc.thinlto.bc -use-ctx-profile=%t/profile.ctxprofdata %t/m2.bc \
@@ -38,6 +43,11 @@ define void @f1() #0 {
   ret void
 }
 
+define void @f3() #0 {
+  call void @f2()
+  ret void
+}
+
 attributes #0 = { noinline }
 !0 = !{ i64 3087265239403591524 }
 
@@ -48,9 +58,11 @@ target triple = "x86_64-pc-linux-gnu"
 source_filename = "random_path/m2.cc"
 
 declare void @f1()
+declare void @f3()
 
 define void @entrypoint() {
   call void @f1()
+  call void @f3()
   ret void
 }
 ;--- profile.json
@@ -63,7 +75,8 @@ define void @entrypoint() {
             [
               {
                 "Counters": [
-                  10
+                  10,
+                  7
                 ],
                 "Guid": 3087265239403591524
               }
@@ -74,6 +87,25 @@ define void @entrypoint() {
           ],
           "Guid": 2072045998141807037
         }
+      ],
+      [
+        {
+          "Callsites": [
+            [
+              {
+                "Counters": [
+                  1,
+                  2
+                ],
+                "Guid": 3087265239403591524
+              }
+            ]
+          ],
+          "Counters": [
+            2
+          ],
+          "Guid": 4197650231481825559
+        }
       ]
     ],
     "Counters": [
@@ -84,8 +116,9 @@ define void @entrypoint() {
 ]
 ;--- expected.txt
 Function Info:
-10507721908651011566 : entrypoint. MaxCounterID: 1. MaxCallsiteID: 1
+10507721908651011566 : entrypoint. MaxCounterID: 1. MaxCallsiteID: 2
 3087265239403591524 : f2.llvm.0. MaxCounterID: 1. MaxCallsiteID: 0
+4197650231481825559 : f3. MaxCounterID: 1. MaxCallsiteID: 1
 2072045998141807037 : f1. MaxCounterID: 1. MaxCallsiteID: 1
 
 Current Profile:
@@ -98,7 +131,8 @@ Current Profile:
             [
               {
                 "Counters": [
-                  10
+                  10,
+                  7
                 ],
                 "Guid": 3087265239403591524
               }
@@ -109,6 +143,25 @@ Current Profile:
           ],
           "Guid": 2072045998141807037
         }
+      ],
+      [
+        {
+          "Callsites": [
+            [
+              {
+                "Counters": [
+                  1,
+                  2
+                ],
+                "Guid": 3087265239403591524
+              }
+            ]
+          ],
+          "Counters": [
+            2
+          ],
+          "Guid": 4197650231481825559
+        }
       ]
     ],
     "Counters": [
@@ -117,3 +170,9 @@ Current Profile:
     "Guid": 10507721908651011566
   }
 ]
+
+Flat Profile:
+10507721908651011566 : 1 
+3087265239403591524 : 11 9 
+4197650231481825559 : 2 
+2072045998141807037 : 7 
diff --git a/llvm/test/Analysis/CtxProfAnalysis/load.ll b/llvm/test/Analysis/CtxProfAnalysis/load.ll
index 69806e334aaec9..fa09474f433151 100644
--- a/llvm/test/Analysis/CtxProfAnalysis/load.ll
+++ b/llvm/test/Analysis/CtxProfAnalysis/load.ll
@@ -86,6 +86,11 @@ Current Profile:
     "Guid": 12074870348631550642
   }
 ]
+
+Flat Profile:
+728453322856651412 : 6 7 
+12074870348631550642 : 5 
+11872291593386833696 : 1 
 ;--- example.ll
 declare void @bar()
 

>From a6bae5cb37919bb0b855dd468d4982340a5740d2 Mon Sep 17 00:00:00 2001
From: Jay Foad <jay.foad at amd.com>
Date: Wed, 21 Aug 2024 19:11:02 +0100
Subject: [PATCH 035/116] [AMDGPU] Split GCNSubtarget into its own file. NFC.
 (#105525)

---
 llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp    | 761 -----------------
 llvm/lib/Target/AMDGPU/CMakeLists.txt         |   1 +
 llvm/lib/Target/AMDGPU/GCNSubtarget.cpp       | 797 ++++++++++++++++++
 .../AMDGPU/sramecc-subtarget-feature-any.ll   |   6 +-
 .../sramecc-subtarget-feature-disabled.ll     |   6 +-
 .../sramecc-subtarget-feature-enabled.ll      |   6 +-
 .../AMDGPU/xnack-subtarget-feature-any.ll     |  14 +-
 .../xnack-subtarget-feature-disabled.ll       |  14 +-
 .../AMDGPU/xnack-subtarget-feature-enabled.ll |  14 +-
 9 files changed, 828 insertions(+), 791 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/GCNSubtarget.cpp

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
index 2e1bdf46924783..67d8715d3f1c26 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -17,7 +17,6 @@
 #include "AMDGPULegalizerInfo.h"
 #include "AMDGPURegisterBankInfo.h"
 #include "AMDGPUTargetMachine.h"
-#include "GCNSubtarget.h"
 #include "R600Subtarget.h"
 #include "SIMachineFunctionInfo.h"
 #include "Utils/AMDGPUBaseInfo.h"
@@ -36,308 +35,12 @@ using namespace llvm;
 
 #define DEBUG_TYPE "amdgpu-subtarget"
 
-#define GET_SUBTARGETINFO_TARGET_DESC
-#define GET_SUBTARGETINFO_CTOR
-#define AMDGPUSubtarget GCNSubtarget
-#include "AMDGPUGenSubtargetInfo.inc"
-#undef AMDGPUSubtarget
-
-static cl::opt<bool> EnablePowerSched(
-  "amdgpu-enable-power-sched",
-  cl::desc("Enable scheduling to minimize mAI power bursts"),
-  cl::init(false));
-
-static cl::opt<bool> EnableVGPRIndexMode(
-  "amdgpu-vgpr-index-mode",
-  cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),
-  cl::init(false));
-
-static cl::opt<bool> UseAA("amdgpu-use-aa-in-codegen",
-                           cl::desc("Enable the use of AA during codegen."),
-                           cl::init(true));
-
-static cl::opt<unsigned> NSAThreshold("amdgpu-nsa-threshold",
-                                      cl::desc("Number of addresses from which to enable MIMG NSA."),
-                                      cl::init(3), cl::Hidden);
-
-GCNSubtarget::~GCNSubtarget() = default;
-
-GCNSubtarget &
-GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
-                                              StringRef GPU, StringRef FS) {
-  // Determine default and user-specified characteristics
-  //
-  // We want to be able to turn these off, but making this a subtarget feature
-  // for SI has the unhelpful behavior that it unsets everything else if you
-  // disable it.
-  //
-  // Similarly we want enable-prt-strict-null to be on by default and not to
-  // unset everything else if it is disabled
-
-  SmallString<256> FullFS("+promote-alloca,+load-store-opt,+enable-ds128,");
-
-  // Turn on features that HSA ABI requires. Also turn on FlatForGlobal by default
-  if (isAmdHsaOS())
-    FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,";
-
-  FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS
-
-  // Disable mutually exclusive bits.
-  if (FS.contains_insensitive("+wavefrontsize")) {
-    if (!FS.contains_insensitive("wavefrontsize16"))
-      FullFS += "-wavefrontsize16,";
-    if (!FS.contains_insensitive("wavefrontsize32"))
-      FullFS += "-wavefrontsize32,";
-    if (!FS.contains_insensitive("wavefrontsize64"))
-      FullFS += "-wavefrontsize64,";
-  }
-
-  FullFS += FS;
-
-  ParseSubtargetFeatures(GPU, /*TuneCPU*/ GPU, FullFS);
-
-  // Implement the "generic" processors, which acts as the default when no
-  // generation features are enabled (e.g for -mcpu=''). HSA OS defaults to
-  // the first amdgcn target that supports flat addressing. Other OSes defaults
-  // to the first amdgcn target.
-  if (Gen == AMDGPUSubtarget::INVALID) {
-     Gen = TT.getOS() == Triple::AMDHSA ? AMDGPUSubtarget::SEA_ISLANDS
-                                        : AMDGPUSubtarget::SOUTHERN_ISLANDS;
-  }
-
-  if (!hasFeature(AMDGPU::FeatureWavefrontSize32) &&
-      !hasFeature(AMDGPU::FeatureWavefrontSize64)) {
-    // If there is no default wave size it must be a generation before gfx10,
-    // these have FeatureWavefrontSize64 in their definition already. For gfx10+
-    // set wave32 as a default.
-    ToggleFeature(AMDGPU::FeatureWavefrontSize32);
-  }
-
-  // We don't support FP64 for EG/NI atm.
-  assert(!hasFP64() || (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS));
-
-  // Targets must either support 64-bit offsets for MUBUF instructions, and/or
-  // support flat operations, otherwise they cannot access a 64-bit global
-  // address space
-  assert(hasAddr64() || hasFlat());
-  // Unless +-flat-for-global is specified, turn on FlatForGlobal for targets
-  // that do not support ADDR64 variants of MUBUF instructions. Such targets
-  // cannot use a 64 bit offset with a MUBUF instruction to access the global
-  // address space
-  if (!hasAddr64() && !FS.contains("flat-for-global") && !FlatForGlobal) {
-    ToggleFeature(AMDGPU::FeatureFlatForGlobal);
-    FlatForGlobal = true;
-  }
-  // Unless +-flat-for-global is specified, use MUBUF instructions for global
-  // address space access if flat operations are not available.
-  if (!hasFlat() && !FS.contains("flat-for-global") && FlatForGlobal) {
-    ToggleFeature(AMDGPU::FeatureFlatForGlobal);
-    FlatForGlobal = false;
-  }
-
-  // Set defaults if needed.
-  if (MaxPrivateElementSize == 0)
-    MaxPrivateElementSize = 4;
-
-  if (LDSBankCount == 0)
-    LDSBankCount = 32;
-
-  if (TT.getArch() == Triple::amdgcn) {
-    if (LocalMemorySize == 0)
-      LocalMemorySize = 32768;
-
-    // Do something sensible for unspecified target.
-    if (!HasMovrel && !HasVGPRIndexMode)
-      HasMovrel = true;
-  }
-
-  AddressableLocalMemorySize = LocalMemorySize;
-
-  if (AMDGPU::isGFX10Plus(*this) &&
-      !getFeatureBits().test(AMDGPU::FeatureCuMode))
-    LocalMemorySize *= 2;
-
-  // Don't crash on invalid devices.
-  if (WavefrontSizeLog2 == 0)
-    WavefrontSizeLog2 = 5;
-
-  HasFminFmaxLegacy = getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;
-  HasSMulHi = getGeneration() >= AMDGPUSubtarget::GFX9;
-
-  TargetID.setTargetIDFromFeaturesString(FS);
-
-  LLVM_DEBUG(dbgs() << "xnack setting for subtarget: "
-                    << TargetID.getXnackSetting() << '\n');
-  LLVM_DEBUG(dbgs() << "sramecc setting for subtarget: "
-                    << TargetID.getSramEccSetting() << '\n');
-
-  return *this;
-}
-
-void GCNSubtarget::checkSubtargetFeatures(const Function &F) const {
-  LLVMContext &Ctx = F.getContext();
-  if (hasFeature(AMDGPU::FeatureWavefrontSize32) ==
-      hasFeature(AMDGPU::FeatureWavefrontSize64)) {
-    Ctx.diagnose(DiagnosticInfoUnsupported(
-        F, "must specify exactly one of wavefrontsize32 and wavefrontsize64"));
-  }
-}
-
 AMDGPUSubtarget::AMDGPUSubtarget(Triple TT) : TargetTriple(std::move(TT)) {}
 
 bool AMDGPUSubtarget::useRealTrue16Insts() const {
   return hasTrue16BitInsts() && EnableRealTrue16Insts;
 }
 
-GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
-                           const GCNTargetMachine &TM)
-    : // clang-format off
-    AMDGPUGenSubtargetInfo(TT, GPU, /*TuneCPU*/ GPU, FS),
-    AMDGPUSubtarget(TT),
-    TargetTriple(TT),
-    TargetID(*this),
-    InstrItins(getInstrItineraryForCPU(GPU)),
-    InstrInfo(initializeSubtargetDependencies(TT, GPU, FS)),
-    TLInfo(TM, *this),
-    FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0) {
-  // clang-format on
-  MaxWavesPerEU = AMDGPU::IsaInfo::getMaxWavesPerEU(this);
-  EUsPerCU = AMDGPU::IsaInfo::getEUsPerCU(this);
-  CallLoweringInfo = std::make_unique<AMDGPUCallLowering>(*getTargetLowering());
-  InlineAsmLoweringInfo =
-      std::make_unique<InlineAsmLowering>(getTargetLowering());
-  Legalizer = std::make_unique<AMDGPULegalizerInfo>(*this, TM);
-  RegBankInfo = std::make_unique<AMDGPURegisterBankInfo>(*this);
-  InstSelector =
-      std::make_unique<AMDGPUInstructionSelector>(*this, *RegBankInfo, TM);
-}
-
-unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {
-  if (getGeneration() < GFX10)
-    return 1;
-
-  switch (Opcode) {
-  case AMDGPU::V_LSHLREV_B64_e64:
-  case AMDGPU::V_LSHLREV_B64_gfx10:
-  case AMDGPU::V_LSHLREV_B64_e64_gfx11:
-  case AMDGPU::V_LSHLREV_B64_e32_gfx12:
-  case AMDGPU::V_LSHLREV_B64_e64_gfx12:
-  case AMDGPU::V_LSHL_B64_e64:
-  case AMDGPU::V_LSHRREV_B64_e64:
-  case AMDGPU::V_LSHRREV_B64_gfx10:
-  case AMDGPU::V_LSHRREV_B64_e64_gfx11:
-  case AMDGPU::V_LSHRREV_B64_e64_gfx12:
-  case AMDGPU::V_LSHR_B64_e64:
-  case AMDGPU::V_ASHRREV_I64_e64:
-  case AMDGPU::V_ASHRREV_I64_gfx10:
-  case AMDGPU::V_ASHRREV_I64_e64_gfx11:
-  case AMDGPU::V_ASHRREV_I64_e64_gfx12:
-  case AMDGPU::V_ASHR_I64_e64:
-    return 1;
-  }
-
-  return 2;
-}
-
-/// This list was mostly derived from experimentation.
-bool GCNSubtarget::zeroesHigh16BitsOfDest(unsigned Opcode) const {
-  switch (Opcode) {
-  case AMDGPU::V_CVT_F16_F32_e32:
-  case AMDGPU::V_CVT_F16_F32_e64:
-  case AMDGPU::V_CVT_F16_U16_e32:
-  case AMDGPU::V_CVT_F16_U16_e64:
-  case AMDGPU::V_CVT_F16_I16_e32:
-  case AMDGPU::V_CVT_F16_I16_e64:
-  case AMDGPU::V_RCP_F16_e64:
-  case AMDGPU::V_RCP_F16_e32:
-  case AMDGPU::V_RSQ_F16_e64:
-  case AMDGPU::V_RSQ_F16_e32:
-  case AMDGPU::V_SQRT_F16_e64:
-  case AMDGPU::V_SQRT_F16_e32:
-  case AMDGPU::V_LOG_F16_e64:
-  case AMDGPU::V_LOG_F16_e32:
-  case AMDGPU::V_EXP_F16_e64:
-  case AMDGPU::V_EXP_F16_e32:
-  case AMDGPU::V_SIN_F16_e64:
-  case AMDGPU::V_SIN_F16_e32:
-  case AMDGPU::V_COS_F16_e64:
-  case AMDGPU::V_COS_F16_e32:
-  case AMDGPU::V_FLOOR_F16_e64:
-  case AMDGPU::V_FLOOR_F16_e32:
-  case AMDGPU::V_CEIL_F16_e64:
-  case AMDGPU::V_CEIL_F16_e32:
-  case AMDGPU::V_TRUNC_F16_e64:
-  case AMDGPU::V_TRUNC_F16_e32:
-  case AMDGPU::V_RNDNE_F16_e64:
-  case AMDGPU::V_RNDNE_F16_e32:
-  case AMDGPU::V_FRACT_F16_e64:
-  case AMDGPU::V_FRACT_F16_e32:
-  case AMDGPU::V_FREXP_MANT_F16_e64:
-  case AMDGPU::V_FREXP_MANT_F16_e32:
-  case AMDGPU::V_FREXP_EXP_I16_F16_e64:
-  case AMDGPU::V_FREXP_EXP_I16_F16_e32:
-  case AMDGPU::V_LDEXP_F16_e64:
-  case AMDGPU::V_LDEXP_F16_e32:
-  case AMDGPU::V_LSHLREV_B16_e64:
-  case AMDGPU::V_LSHLREV_B16_e32:
-  case AMDGPU::V_LSHRREV_B16_e64:
-  case AMDGPU::V_LSHRREV_B16_e32:
-  case AMDGPU::V_ASHRREV_I16_e64:
-  case AMDGPU::V_ASHRREV_I16_e32:
-  case AMDGPU::V_ADD_U16_e64:
-  case AMDGPU::V_ADD_U16_e32:
-  case AMDGPU::V_SUB_U16_e64:
-  case AMDGPU::V_SUB_U16_e32:
-  case AMDGPU::V_SUBREV_U16_e64:
-  case AMDGPU::V_SUBREV_U16_e32:
-  case AMDGPU::V_MUL_LO_U16_e64:
-  case AMDGPU::V_MUL_LO_U16_e32:
-  case AMDGPU::V_ADD_F16_e64:
-  case AMDGPU::V_ADD_F16_e32:
-  case AMDGPU::V_SUB_F16_e64:
-  case AMDGPU::V_SUB_F16_e32:
-  case AMDGPU::V_SUBREV_F16_e64:
-  case AMDGPU::V_SUBREV_F16_e32:
-  case AMDGPU::V_MUL_F16_e64:
-  case AMDGPU::V_MUL_F16_e32:
-  case AMDGPU::V_MAX_F16_e64:
-  case AMDGPU::V_MAX_F16_e32:
-  case AMDGPU::V_MIN_F16_e64:
-  case AMDGPU::V_MIN_F16_e32:
-  case AMDGPU::V_MAX_U16_e64:
-  case AMDGPU::V_MAX_U16_e32:
-  case AMDGPU::V_MIN_U16_e64:
-  case AMDGPU::V_MIN_U16_e32:
-  case AMDGPU::V_MAX_I16_e64:
-  case AMDGPU::V_MAX_I16_e32:
-  case AMDGPU::V_MIN_I16_e64:
-  case AMDGPU::V_MIN_I16_e32:
-  case AMDGPU::V_MAD_F16_e64:
-  case AMDGPU::V_MAD_U16_e64:
-  case AMDGPU::V_MAD_I16_e64:
-  case AMDGPU::V_FMA_F16_e64:
-  case AMDGPU::V_DIV_FIXUP_F16_e64:
-    // On gfx10, all 16-bit instructions preserve the high bits.
-    return getGeneration() <= AMDGPUSubtarget::GFX9;
-  case AMDGPU::V_MADAK_F16:
-  case AMDGPU::V_MADMK_F16:
-  case AMDGPU::V_MAC_F16_e64:
-  case AMDGPU::V_MAC_F16_e32:
-  case AMDGPU::V_FMAMK_F16:
-  case AMDGPU::V_FMAAK_F16:
-  case AMDGPU::V_FMAC_F16_e64:
-  case AMDGPU::V_FMAC_F16_e32:
-    // In gfx9, the preferred handling of the unused high 16-bits changed. Most
-    // instructions maintain the legacy behavior of 0ing. Some instructions
-    // changed to preserving the high bits.
-    return getGeneration() == AMDGPUSubtarget::VOLCANIC_ISLANDS;
-  case AMDGPU::V_MAD_MIXLO_F16:
-  case AMDGPU::V_MAD_MIXHI_F16:
-  default:
-    return false;
-  }
-}
-
 // Returns the maximum per-workgroup LDS allocation size (in bytes) that still
 // allows the given function to achieve an occupancy of NWaves waves per
 // SIMD / EU, taking into account only the function's *maximum* workgroup size.
@@ -650,391 +353,6 @@ AMDGPUDwarfFlavour AMDGPUSubtarget::getAMDGPUDwarfFlavour() const {
                                   : AMDGPUDwarfFlavour::Wave64;
 }
 
-void GCNSubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
-                                      unsigned NumRegionInstrs) const {
-  // Track register pressure so the scheduler can try to decrease
-  // pressure once register usage is above the threshold defined by
-  // SIRegisterInfo::getRegPressureSetLimit()
-  Policy.ShouldTrackPressure = true;
-
-  // Enabling both top down and bottom up scheduling seems to give us less
-  // register spills than just using one of these approaches on its own.
-  Policy.OnlyTopDown = false;
-  Policy.OnlyBottomUp = false;
-
-  // Enabling ShouldTrackLaneMasks crashes the SI Machine Scheduler.
-  if (!enableSIScheduler())
-    Policy.ShouldTrackLaneMasks = true;
-}
-
-void GCNSubtarget::mirFileLoaded(MachineFunction &MF) const {
-  if (isWave32()) {
-    // Fix implicit $vcc operands after MIParser has verified that they match
-    // the instruction definitions.
-    for (auto &MBB : MF) {
-      for (auto &MI : MBB)
-        InstrInfo.fixImplicitOperands(MI);
-    }
-  }
-}
-
-bool GCNSubtarget::hasMadF16() const {
-  return InstrInfo.pseudoToMCOpcode(AMDGPU::V_MAD_F16_e64) != -1;
-}
-
-bool GCNSubtarget::useVGPRIndexMode() const {
-  return !hasMovrel() || (EnableVGPRIndexMode && hasVGPRIndexMode());
-}
-
-bool GCNSubtarget::useAA() const { return UseAA; }
-
-unsigned GCNSubtarget::getOccupancyWithNumSGPRs(unsigned SGPRs) const {
-  return AMDGPU::IsaInfo::getOccupancyWithNumSGPRs(SGPRs, getMaxWavesPerEU(),
-                                                   getGeneration());
-}
-
-unsigned GCNSubtarget::getOccupancyWithNumVGPRs(unsigned NumVGPRs) const {
-  return AMDGPU::IsaInfo::getNumWavesPerEUWithNumVGPRs(this, NumVGPRs);
-}
-
-unsigned
-GCNSubtarget::getBaseReservedNumSGPRs(const bool HasFlatScratch) const {
-  if (getGeneration() >= AMDGPUSubtarget::GFX10)
-    return 2; // VCC. FLAT_SCRATCH and XNACK are no longer in SGPRs.
-
-  if (HasFlatScratch || HasArchitectedFlatScratch) {
-    if (getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
-      return 6; // FLAT_SCRATCH, XNACK, VCC (in that order).
-    if (getGeneration() == AMDGPUSubtarget::SEA_ISLANDS)
-      return 4; // FLAT_SCRATCH, VCC (in that order).
-  }
-
-  if (isXNACKEnabled())
-    return 4; // XNACK, VCC (in that order).
-  return 2; // VCC.
-}
-
-unsigned GCNSubtarget::getReservedNumSGPRs(const MachineFunction &MF) const {
-  const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
-  return getBaseReservedNumSGPRs(MFI.getUserSGPRInfo().hasFlatScratchInit());
-}
-
-unsigned GCNSubtarget::getReservedNumSGPRs(const Function &F) const {
-  // In principle we do not need to reserve SGPR pair used for flat_scratch if
-  // we know flat instructions do not access the stack anywhere in the
-  // program. For now assume it's needed if we have flat instructions.
-  const bool KernelUsesFlatScratch = hasFlatAddressSpace();
-  return getBaseReservedNumSGPRs(KernelUsesFlatScratch);
-}
-
-unsigned GCNSubtarget::computeOccupancy(const Function &F, unsigned LDSSize,
-                                        unsigned NumSGPRs,
-                                        unsigned NumVGPRs) const {
-  unsigned Occupancy =
-    std::min(getMaxWavesPerEU(),
-             getOccupancyWithLocalMemSize(LDSSize, F));
-  if (NumSGPRs)
-    Occupancy = std::min(Occupancy, getOccupancyWithNumSGPRs(NumSGPRs));
-  if (NumVGPRs)
-    Occupancy = std::min(Occupancy, getOccupancyWithNumVGPRs(NumVGPRs));
-  return Occupancy;
-}
-
-unsigned GCNSubtarget::getBaseMaxNumSGPRs(
-    const Function &F, std::pair<unsigned, unsigned> WavesPerEU,
-    unsigned PreloadedSGPRs, unsigned ReservedNumSGPRs) const {
-  // Compute maximum number of SGPRs function can use using default/requested
-  // minimum number of waves per execution unit.
-  unsigned MaxNumSGPRs = getMaxNumSGPRs(WavesPerEU.first, false);
-  unsigned MaxAddressableNumSGPRs = getMaxNumSGPRs(WavesPerEU.first, true);
-
-  // Check if maximum number of SGPRs was explicitly requested using
-  // "amdgpu-num-sgpr" attribute.
-  if (F.hasFnAttribute("amdgpu-num-sgpr")) {
-    unsigned Requested =
-        F.getFnAttributeAsParsedInteger("amdgpu-num-sgpr", MaxNumSGPRs);
-
-    // Make sure requested value does not violate subtarget's specifications.
-    if (Requested && (Requested <= ReservedNumSGPRs))
-      Requested = 0;
-
-    // If more SGPRs are required to support the input user/system SGPRs,
-    // increase to accommodate them.
-    //
-    // FIXME: This really ends up using the requested number of SGPRs + number
-    // of reserved special registers in total. Theoretically you could re-use
-    // the last input registers for these special registers, but this would
-    // require a lot of complexity to deal with the weird aliasing.
-    unsigned InputNumSGPRs = PreloadedSGPRs;
-    if (Requested && Requested < InputNumSGPRs)
-      Requested = InputNumSGPRs;
-
-    // Make sure requested value is compatible with values implied by
-    // default/requested minimum/maximum number of waves per execution unit.
-    if (Requested && Requested > getMaxNumSGPRs(WavesPerEU.first, false))
-      Requested = 0;
-    if (WavesPerEU.second &&
-        Requested && Requested < getMinNumSGPRs(WavesPerEU.second))
-      Requested = 0;
-
-    if (Requested)
-      MaxNumSGPRs = Requested;
-  }
-
-  if (hasSGPRInitBug())
-    MaxNumSGPRs = AMDGPU::IsaInfo::FIXED_NUM_SGPRS_FOR_INIT_BUG;
-
-  return std::min(MaxNumSGPRs - ReservedNumSGPRs, MaxAddressableNumSGPRs);
-}
-
-unsigned GCNSubtarget::getMaxNumSGPRs(const MachineFunction &MF) const {
-  const Function &F = MF.getFunction();
-  const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
-  return getBaseMaxNumSGPRs(F, MFI.getWavesPerEU(), MFI.getNumPreloadedSGPRs(),
-                            getReservedNumSGPRs(MF));
-}
-
-static unsigned getMaxNumPreloadedSGPRs() {
-  using USI = GCNUserSGPRUsageInfo;
-  // Max number of user SGPRs
-  const unsigned MaxUserSGPRs =
-      USI::getNumUserSGPRForField(USI::PrivateSegmentBufferID) +
-      USI::getNumUserSGPRForField(USI::DispatchPtrID) +
-      USI::getNumUserSGPRForField(USI::QueuePtrID) +
-      USI::getNumUserSGPRForField(USI::KernargSegmentPtrID) +
-      USI::getNumUserSGPRForField(USI::DispatchIdID) +
-      USI::getNumUserSGPRForField(USI::FlatScratchInitID) +
-      USI::getNumUserSGPRForField(USI::ImplicitBufferPtrID);
-
-  // Max number of system SGPRs
-  const unsigned MaxSystemSGPRs = 1 + // WorkGroupIDX
-                                  1 + // WorkGroupIDY
-                                  1 + // WorkGroupIDZ
-                                  1 + // WorkGroupInfo
-                                  1;  // private segment wave byte offset
-
-  // Max number of synthetic SGPRs
-  const unsigned SyntheticSGPRs = 1; // LDSKernelId
-
-  return MaxUserSGPRs + MaxSystemSGPRs + SyntheticSGPRs;
-}
-
-unsigned GCNSubtarget::getMaxNumSGPRs(const Function &F) const {
-  return getBaseMaxNumSGPRs(F, getWavesPerEU(F), getMaxNumPreloadedSGPRs(),
-                            getReservedNumSGPRs(F));
-}
-
-unsigned GCNSubtarget::getBaseMaxNumVGPRs(
-    const Function &F, std::pair<unsigned, unsigned> WavesPerEU) const {
-  // Compute maximum number of VGPRs function can use using default/requested
-  // minimum number of waves per execution unit.
-  unsigned MaxNumVGPRs = getMaxNumVGPRs(WavesPerEU.first);
-
-  // Check if maximum number of VGPRs was explicitly requested using
-  // "amdgpu-num-vgpr" attribute.
-  if (F.hasFnAttribute("amdgpu-num-vgpr")) {
-    unsigned Requested =
-        F.getFnAttributeAsParsedInteger("amdgpu-num-vgpr", MaxNumVGPRs);
-
-    if (hasGFX90AInsts())
-      Requested *= 2;
-
-    // Make sure requested value is compatible with values implied by
-    // default/requested minimum/maximum number of waves per execution unit.
-    if (Requested && Requested > getMaxNumVGPRs(WavesPerEU.first))
-      Requested = 0;
-    if (WavesPerEU.second &&
-        Requested && Requested < getMinNumVGPRs(WavesPerEU.second))
-      Requested = 0;
-
-    if (Requested)
-      MaxNumVGPRs = Requested;
-  }
-
-  return MaxNumVGPRs;
-}
-
-unsigned GCNSubtarget::getMaxNumVGPRs(const Function &F) const {
-  return getBaseMaxNumVGPRs(F, getWavesPerEU(F));
-}
-
-unsigned GCNSubtarget::getMaxNumVGPRs(const MachineFunction &MF) const {
-  const Function &F = MF.getFunction();
-  const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
-  return getBaseMaxNumVGPRs(F, MFI.getWavesPerEU());
-}
-
-void GCNSubtarget::adjustSchedDependency(
-    SUnit *Def, int DefOpIdx, SUnit *Use, int UseOpIdx, SDep &Dep,
-    const TargetSchedModel *SchedModel) const {
-  if (Dep.getKind() != SDep::Kind::Data || !Dep.getReg() ||
-      !Def->isInstr() || !Use->isInstr())
-    return;
-
-  MachineInstr *DefI = Def->getInstr();
-  MachineInstr *UseI = Use->getInstr();
-
-  if (DefI->isBundle()) {
-    const SIRegisterInfo *TRI = getRegisterInfo();
-    auto Reg = Dep.getReg();
-    MachineBasicBlock::const_instr_iterator I(DefI->getIterator());
-    MachineBasicBlock::const_instr_iterator E(DefI->getParent()->instr_end());
-    unsigned Lat = 0;
-    for (++I; I != E && I->isBundledWithPred(); ++I) {
-      if (I->modifiesRegister(Reg, TRI))
-        Lat = InstrInfo.getInstrLatency(getInstrItineraryData(), *I);
-      else if (Lat)
-        --Lat;
-    }
-    Dep.setLatency(Lat);
-  } else if (UseI->isBundle()) {
-    const SIRegisterInfo *TRI = getRegisterInfo();
-    auto Reg = Dep.getReg();
-    MachineBasicBlock::const_instr_iterator I(UseI->getIterator());
-    MachineBasicBlock::const_instr_iterator E(UseI->getParent()->instr_end());
-    unsigned Lat = InstrInfo.getInstrLatency(getInstrItineraryData(), *DefI);
-    for (++I; I != E && I->isBundledWithPred() && Lat; ++I) {
-      if (I->readsRegister(Reg, TRI))
-        break;
-      --Lat;
-    }
-    Dep.setLatency(Lat);
-  } else if (Dep.getLatency() == 0 && Dep.getReg() == AMDGPU::VCC_LO) {
-    // Work around the fact that SIInstrInfo::fixImplicitOperands modifies
-    // implicit operands which come from the MCInstrDesc, which can fool
-    // ScheduleDAGInstrs::addPhysRegDataDeps into treating them as implicit
-    // pseudo operands.
-    Dep.setLatency(InstrInfo.getSchedModel().computeOperandLatency(
-        DefI, DefOpIdx, UseI, UseOpIdx));
-  }
-}
-
-namespace {
-struct FillMFMAShadowMutation : ScheduleDAGMutation {
-  const SIInstrInfo *TII;
-
-  ScheduleDAGMI *DAG;
-
-  FillMFMAShadowMutation(const SIInstrInfo *tii) : TII(tii) {}
-
-  bool isSALU(const SUnit *SU) const {
-    const MachineInstr *MI = SU->getInstr();
-    return MI && TII->isSALU(*MI) && !MI->isTerminator();
-  }
-
-  bool isVALU(const SUnit *SU) const {
-    const MachineInstr *MI = SU->getInstr();
-    return MI && TII->isVALU(*MI);
-  }
-
-  // Link as many SALU instructions in chain as possible. Return the size
-  // of the chain. Links up to MaxChain instructions.
-  unsigned linkSALUChain(SUnit *From, SUnit *To, unsigned MaxChain,
-                         SmallPtrSetImpl<SUnit *> &Visited) const {
-    SmallVector<SUnit *, 8> Worklist({To});
-    unsigned Linked = 0;
-
-    while (!Worklist.empty() && MaxChain-- > 0) {
-      SUnit *SU = Worklist.pop_back_val();
-      if (!Visited.insert(SU).second)
-        continue;
-
-      LLVM_DEBUG(dbgs() << "Inserting edge from\n" ; DAG->dumpNode(*From);
-                 dbgs() << "to\n"; DAG->dumpNode(*SU); dbgs() << '\n');
-
-      if (SU != From && From != &DAG->ExitSU && DAG->canAddEdge(SU, From))
-        if (DAG->addEdge(SU, SDep(From, SDep::Artificial)))
-          ++Linked;
-
-      for (SDep &SI : From->Succs) {
-        SUnit *SUv = SI.getSUnit();
-        if (SUv != From && SU != &DAG->ExitSU && isVALU(SUv) &&
-            DAG->canAddEdge(SUv, SU))
-          DAG->addEdge(SUv, SDep(SU, SDep::Artificial));
-      }
-
-      for (SDep &SI : SU->Succs) {
-        SUnit *Succ = SI.getSUnit();
-        if (Succ != SU && isSALU(Succ))
-          Worklist.push_back(Succ);
-      }
-    }
-
-    return Linked;
-  }
-
-  void apply(ScheduleDAGInstrs *DAGInstrs) override {
-    const GCNSubtarget &ST = DAGInstrs->MF.getSubtarget<GCNSubtarget>();
-    if (!ST.hasMAIInsts())
-      return;
-    DAG = static_cast<ScheduleDAGMI*>(DAGInstrs);
-    const TargetSchedModel *TSchedModel = DAGInstrs->getSchedModel();
-    if (!TSchedModel || DAG->SUnits.empty())
-      return;
-
-    // Scan for MFMA long latency instructions and try to add a dependency
-    // of available SALU instructions to give them a chance to fill MFMA
-    // shadow. That is desirable to fill MFMA shadow with SALU instructions
-    // rather than VALU to prevent power consumption bursts and throttle.
-    auto LastSALU = DAG->SUnits.begin();
-    auto E = DAG->SUnits.end();
-    SmallPtrSet<SUnit*, 32> Visited;
-    for (SUnit &SU : DAG->SUnits) {
-      MachineInstr &MAI = *SU.getInstr();
-      if (!TII->isMAI(MAI) ||
-           MAI.getOpcode() == AMDGPU::V_ACCVGPR_WRITE_B32_e64 ||
-           MAI.getOpcode() == AMDGPU::V_ACCVGPR_READ_B32_e64)
-        continue;
-
-      unsigned Lat = TSchedModel->computeInstrLatency(&MAI) - 1;
-
-      LLVM_DEBUG(dbgs() << "Found MFMA: "; DAG->dumpNode(SU);
-                 dbgs() << "Need " << Lat
-                        << " instructions to cover latency.\n");
-
-      // Find up to Lat independent scalar instructions as early as
-      // possible such that they can be scheduled after this MFMA.
-      for ( ; Lat && LastSALU != E; ++LastSALU) {
-        if (Visited.count(&*LastSALU))
-          continue;
-
-        if (&SU == &DAG->ExitSU || &SU == &*LastSALU || !isSALU(&*LastSALU) ||
-            !DAG->canAddEdge(&*LastSALU, &SU))
-          continue;
-
-        Lat -= linkSALUChain(&SU, &*LastSALU, Lat, Visited);
-      }
-    }
-  }
-};
-} // namespace
-
-void GCNSubtarget::getPostRAMutations(
-    std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {
-  Mutations.push_back(std::make_unique<FillMFMAShadowMutation>(&InstrInfo));
-}
-
-std::unique_ptr<ScheduleDAGMutation>
-GCNSubtarget::createFillMFMAShadowMutation(const TargetInstrInfo *TII) const {
-  return EnablePowerSched ? std::make_unique<FillMFMAShadowMutation>(&InstrInfo)
-                          : nullptr;
-}
-
-unsigned GCNSubtarget::getNSAThreshold(const MachineFunction &MF) const {
-  if (getGeneration() >= AMDGPUSubtarget::GFX12)
-    return 0; // Not MIMG encoding.
-
-  if (NSAThreshold.getNumOccurrences() > 0)
-    return std::max(NSAThreshold.getValue(), 2u);
-
-  int Value = MF.getFunction().getFnAttributeAsParsedInteger(
-      "amdgpu-nsa-threshold", -1);
-  if (Value > 0)
-    return std::max(Value, 2);
-
-  return 3;
-}
-
 const AMDGPUSubtarget &AMDGPUSubtarget::get(const MachineFunction &MF) {
   if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)
     return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<GCNSubtarget>());
@@ -1048,85 +366,6 @@ const AMDGPUSubtarget &AMDGPUSubtarget::get(const TargetMachine &TM, const Funct
       TM.getSubtarget<R600Subtarget>(F));
 }
 
-GCNUserSGPRUsageInfo::GCNUserSGPRUsageInfo(const Function &F,
-                                           const GCNSubtarget &ST)
-    : ST(ST) {
-  const CallingConv::ID CC = F.getCallingConv();
-  const bool IsKernel =
-      CC == CallingConv::AMDGPU_KERNEL || CC == CallingConv::SPIR_KERNEL;
-  // FIXME: Should have analysis or something rather than attribute to detect
-  // calls.
-  const bool HasCalls = F.hasFnAttribute("amdgpu-calls");
-  // FIXME: This attribute is a hack, we just need an analysis on the function
-  // to look for allocas.
-  const bool HasStackObjects = F.hasFnAttribute("amdgpu-stack-objects");
-
-  if (IsKernel && (!F.arg_empty() || ST.getImplicitArgNumBytes(F) != 0))
-    KernargSegmentPtr = true;
-
-  bool IsAmdHsaOrMesa = ST.isAmdHsaOrMesa(F);
-  if (IsAmdHsaOrMesa && !ST.enableFlatScratch())
-    PrivateSegmentBuffer = true;
-  else if (ST.isMesaGfxShader(F))
-    ImplicitBufferPtr = true;
-
-  if (!AMDGPU::isGraphics(CC)) {
-    if (!F.hasFnAttribute("amdgpu-no-dispatch-ptr"))
-      DispatchPtr = true;
-
-    // FIXME: Can this always be disabled with < COv5?
-    if (!F.hasFnAttribute("amdgpu-no-queue-ptr"))
-      QueuePtr = true;
-
-    if (!F.hasFnAttribute("amdgpu-no-dispatch-id"))
-      DispatchID = true;
-  }
-
-  // TODO: This could be refined a lot. The attribute is a poor way of
-  // detecting calls or stack objects that may require it before argument
-  // lowering.
-  if (ST.hasFlatAddressSpace() && AMDGPU::isEntryFunctionCC(CC) &&
-      (IsAmdHsaOrMesa || ST.enableFlatScratch()) &&
-      (HasCalls || HasStackObjects || ST.enableFlatScratch()) &&
-      !ST.flatScratchIsArchitected()) {
-    FlatScratchInit = true;
-  }
-
-  if (hasImplicitBufferPtr())
-    NumUsedUserSGPRs += getNumUserSGPRForField(ImplicitBufferPtrID);
-
-  if (hasPrivateSegmentBuffer())
-    NumUsedUserSGPRs += getNumUserSGPRForField(PrivateSegmentBufferID);
-
-  if (hasDispatchPtr())
-    NumUsedUserSGPRs += getNumUserSGPRForField(DispatchPtrID);
-
-  if (hasQueuePtr())
-    NumUsedUserSGPRs += getNumUserSGPRForField(QueuePtrID);
-
-  if (hasKernargSegmentPtr())
-    NumUsedUserSGPRs += getNumUserSGPRForField(KernargSegmentPtrID);
-
-  if (hasDispatchID())
-    NumUsedUserSGPRs += getNumUserSGPRForField(DispatchIdID);
-
-  if (hasFlatScratchInit())
-    NumUsedUserSGPRs += getNumUserSGPRForField(FlatScratchInitID);
-
-  if (hasPrivateSegmentSize())
-    NumUsedUserSGPRs += getNumUserSGPRForField(PrivateSegmentSizeID);
-}
-
-void GCNUserSGPRUsageInfo::allocKernargPreloadSGPRs(unsigned NumSGPRs) {
-  assert(NumKernargPreloadSGPRs + NumSGPRs <= AMDGPU::getMaxNumUserSGPRs(ST));
-  NumKernargPreloadSGPRs += NumSGPRs;
-  NumUsedUserSGPRs += NumSGPRs;
-}
-
-unsigned GCNUserSGPRUsageInfo::getNumFreeUserSGPRs() {
-  return AMDGPU::getMaxNumUserSGPRs(ST) - NumUsedUserSGPRs;
-}
-
 SmallVector<unsigned>
 AMDGPUSubtarget::getMaxNumWorkGroups(const Function &F) const {
   return AMDGPU::getIntegerVecAttribute(F, "amdgpu-max-num-workgroups", 3);
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index 85a59e01230237..18a8e917fbb71f 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -119,6 +119,7 @@ add_llvm_target(AMDGPUCodeGen
   GCNRegPressure.cpp
   GCNRewritePartialRegUses.cpp
   GCNSchedStrategy.cpp
+  GCNSubtarget.cpp
   GCNVOPDUtils.cpp
   R600AsmPrinter.cpp
   R600ClauseMergePass.cpp
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
new file mode 100644
index 00000000000000..b3872a6374261b
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -0,0 +1,797 @@
+//===-- GCNSubtarget.cpp - GCN Subtarget Information ----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Implements the GCN specific subclass of TargetSubtarget.
+//
+//===----------------------------------------------------------------------===//
+
+#include "GCNSubtarget.h"
+#include "AMDGPUCallLowering.h"
+#include "AMDGPUInstructionSelector.h"
+#include "AMDGPULegalizerInfo.h"
+#include "AMDGPURegisterBankInfo.h"
+#include "AMDGPUTargetMachine.h"
+#include "R600Subtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/CodeGen/GlobalISel/InlineAsmLowering.h"
+#include "llvm/CodeGen/MachineScheduler.h"
+#include "llvm/CodeGen/TargetFrameLowering.h"
+#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/IntrinsicsR600.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/MC/MCSubtargetInfo.h"
+#include <algorithm>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "gcn-subtarget"
+
+#define GET_SUBTARGETINFO_TARGET_DESC
+#define GET_SUBTARGETINFO_CTOR
+#define AMDGPUSubtarget GCNSubtarget
+#include "AMDGPUGenSubtargetInfo.inc"
+#undef AMDGPUSubtarget
+
+static cl::opt<bool>
+    EnablePowerSched("amdgpu-enable-power-sched",
+                     cl::desc("Enable scheduling to minimize mAI power bursts"),
+                     cl::init(false));
+
+static cl::opt<bool> EnableVGPRIndexMode(
+    "amdgpu-vgpr-index-mode",
+    cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),
+    cl::init(false));
+
+static cl::opt<bool> UseAA("amdgpu-use-aa-in-codegen",
+                           cl::desc("Enable the use of AA during codegen."),
+                           cl::init(true));
+
+static cl::opt<unsigned>
+    NSAThreshold("amdgpu-nsa-threshold",
+                 cl::desc("Number of addresses from which to enable MIMG NSA."),
+                 cl::init(3), cl::Hidden);
+
+GCNSubtarget::~GCNSubtarget() = default;
+
+GCNSubtarget &GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
+                                                            StringRef GPU,
+                                                            StringRef FS) {
+  // Determine default and user-specified characteristics
+  //
+  // We want to be able to turn these off, but making this a subtarget feature
+  // for SI has the unhelpful behavior that it unsets everything else if you
+  // disable it.
+  //
+  // Similarly we want enable-prt-strict-null to be on by default and not to
+  // unset everything else if it is disabled
+
+  SmallString<256> FullFS("+promote-alloca,+load-store-opt,+enable-ds128,");
+
+  // Turn on features that HSA ABI requires. Also turn on FlatForGlobal by
+  // default
+  if (isAmdHsaOS())
+    FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,";
+
+  FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS
+
+  // Disable mutually exclusive bits.
+  if (FS.contains_insensitive("+wavefrontsize")) {
+    if (!FS.contains_insensitive("wavefrontsize16"))
+      FullFS += "-wavefrontsize16,";
+    if (!FS.contains_insensitive("wavefrontsize32"))
+      FullFS += "-wavefrontsize32,";
+    if (!FS.contains_insensitive("wavefrontsize64"))
+      FullFS += "-wavefrontsize64,";
+  }
+
+  FullFS += FS;
+
+  ParseSubtargetFeatures(GPU, /*TuneCPU*/ GPU, FullFS);
+
+  // Implement the "generic" processors, which acts as the default when no
+  // generation features are enabled (e.g for -mcpu=''). HSA OS defaults to
+  // the first amdgcn target that supports flat addressing. Other OSes defaults
+  // to the first amdgcn target.
+  if (Gen == AMDGPUSubtarget::INVALID) {
+    Gen = TT.getOS() == Triple::AMDHSA ? AMDGPUSubtarget::SEA_ISLANDS
+                                       : AMDGPUSubtarget::SOUTHERN_ISLANDS;
+  }
+
+  if (!hasFeature(AMDGPU::FeatureWavefrontSize32) &&
+      !hasFeature(AMDGPU::FeatureWavefrontSize64)) {
+    // If there is no default wave size it must be a generation before gfx10,
+    // these have FeatureWavefrontSize64 in their definition already. For gfx10+
+    // set wave32 as a default.
+    ToggleFeature(AMDGPU::FeatureWavefrontSize32);
+  }
+
+  // We don't support FP64 for EG/NI atm.
+  assert(!hasFP64() || (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS));
+
+  // Targets must either support 64-bit offsets for MUBUF instructions, and/or
+  // support flat operations, otherwise they cannot access a 64-bit global
+  // address space
+  assert(hasAddr64() || hasFlat());
+  // Unless +-flat-for-global is specified, turn on FlatForGlobal for targets
+  // that do not support ADDR64 variants of MUBUF instructions. Such targets
+  // cannot use a 64 bit offset with a MUBUF instruction to access the global
+  // address space
+  if (!hasAddr64() && !FS.contains("flat-for-global") && !FlatForGlobal) {
+    ToggleFeature(AMDGPU::FeatureFlatForGlobal);
+    FlatForGlobal = true;
+  }
+  // Unless +-flat-for-global is specified, use MUBUF instructions for global
+  // address space access if flat operations are not available.
+  if (!hasFlat() && !FS.contains("flat-for-global") && FlatForGlobal) {
+    ToggleFeature(AMDGPU::FeatureFlatForGlobal);
+    FlatForGlobal = false;
+  }
+
+  // Set defaults if needed.
+  if (MaxPrivateElementSize == 0)
+    MaxPrivateElementSize = 4;
+
+  if (LDSBankCount == 0)
+    LDSBankCount = 32;
+
+  if (TT.getArch() == Triple::amdgcn) {
+    if (LocalMemorySize == 0)
+      LocalMemorySize = 32768;
+
+    // Do something sensible for unspecified target.
+    if (!HasMovrel && !HasVGPRIndexMode)
+      HasMovrel = true;
+  }
+
+  AddressableLocalMemorySize = LocalMemorySize;
+
+  if (AMDGPU::isGFX10Plus(*this) &&
+      !getFeatureBits().test(AMDGPU::FeatureCuMode))
+    LocalMemorySize *= 2;
+
+  // Don't crash on invalid devices.
+  if (WavefrontSizeLog2 == 0)
+    WavefrontSizeLog2 = 5;
+
+  HasFminFmaxLegacy = getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;
+  HasSMulHi = getGeneration() >= AMDGPUSubtarget::GFX9;
+
+  TargetID.setTargetIDFromFeaturesString(FS);
+
+  LLVM_DEBUG(dbgs() << "xnack setting for subtarget: "
+                    << TargetID.getXnackSetting() << '\n');
+  LLVM_DEBUG(dbgs() << "sramecc setting for subtarget: "
+                    << TargetID.getSramEccSetting() << '\n');
+
+  return *this;
+}
+
+void GCNSubtarget::checkSubtargetFeatures(const Function &F) const {
+  LLVMContext &Ctx = F.getContext();
+  if (hasFeature(AMDGPU::FeatureWavefrontSize32) ==
+      hasFeature(AMDGPU::FeatureWavefrontSize64)) {
+    Ctx.diagnose(DiagnosticInfoUnsupported(
+        F, "must specify exactly one of wavefrontsize32 and wavefrontsize64"));
+  }
+}
+
+GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
+                           const GCNTargetMachine &TM)
+    : // clang-format off
+    AMDGPUGenSubtargetInfo(TT, GPU, /*TuneCPU*/ GPU, FS),
+    AMDGPUSubtarget(TT),
+    TargetTriple(TT),
+    TargetID(*this),
+    InstrItins(getInstrItineraryForCPU(GPU)),
+    InstrInfo(initializeSubtargetDependencies(TT, GPU, FS)),
+    TLInfo(TM, *this),
+    FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0) {
+  // clang-format on
+  MaxWavesPerEU = AMDGPU::IsaInfo::getMaxWavesPerEU(this);
+  EUsPerCU = AMDGPU::IsaInfo::getEUsPerCU(this);
+  CallLoweringInfo = std::make_unique<AMDGPUCallLowering>(*getTargetLowering());
+  InlineAsmLoweringInfo =
+      std::make_unique<InlineAsmLowering>(getTargetLowering());
+  Legalizer = std::make_unique<AMDGPULegalizerInfo>(*this, TM);
+  RegBankInfo = std::make_unique<AMDGPURegisterBankInfo>(*this);
+  InstSelector =
+      std::make_unique<AMDGPUInstructionSelector>(*this, *RegBankInfo, TM);
+}
+
+unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {
+  if (getGeneration() < GFX10)
+    return 1;
+
+  switch (Opcode) {
+  case AMDGPU::V_LSHLREV_B64_e64:
+  case AMDGPU::V_LSHLREV_B64_gfx10:
+  case AMDGPU::V_LSHLREV_B64_e64_gfx11:
+  case AMDGPU::V_LSHLREV_B64_e32_gfx12:
+  case AMDGPU::V_LSHLREV_B64_e64_gfx12:
+  case AMDGPU::V_LSHL_B64_e64:
+  case AMDGPU::V_LSHRREV_B64_e64:
+  case AMDGPU::V_LSHRREV_B64_gfx10:
+  case AMDGPU::V_LSHRREV_B64_e64_gfx11:
+  case AMDGPU::V_LSHRREV_B64_e64_gfx12:
+  case AMDGPU::V_LSHR_B64_e64:
+  case AMDGPU::V_ASHRREV_I64_e64:
+  case AMDGPU::V_ASHRREV_I64_gfx10:
+  case AMDGPU::V_ASHRREV_I64_e64_gfx11:
+  case AMDGPU::V_ASHRREV_I64_e64_gfx12:
+  case AMDGPU::V_ASHR_I64_e64:
+    return 1;
+  }
+
+  return 2;
+}
+
+/// This list was mostly derived from experimentation.
+bool GCNSubtarget::zeroesHigh16BitsOfDest(unsigned Opcode) const {
+  switch (Opcode) {
+  case AMDGPU::V_CVT_F16_F32_e32:
+  case AMDGPU::V_CVT_F16_F32_e64:
+  case AMDGPU::V_CVT_F16_U16_e32:
+  case AMDGPU::V_CVT_F16_U16_e64:
+  case AMDGPU::V_CVT_F16_I16_e32:
+  case AMDGPU::V_CVT_F16_I16_e64:
+  case AMDGPU::V_RCP_F16_e64:
+  case AMDGPU::V_RCP_F16_e32:
+  case AMDGPU::V_RSQ_F16_e64:
+  case AMDGPU::V_RSQ_F16_e32:
+  case AMDGPU::V_SQRT_F16_e64:
+  case AMDGPU::V_SQRT_F16_e32:
+  case AMDGPU::V_LOG_F16_e64:
+  case AMDGPU::V_LOG_F16_e32:
+  case AMDGPU::V_EXP_F16_e64:
+  case AMDGPU::V_EXP_F16_e32:
+  case AMDGPU::V_SIN_F16_e64:
+  case AMDGPU::V_SIN_F16_e32:
+  case AMDGPU::V_COS_F16_e64:
+  case AMDGPU::V_COS_F16_e32:
+  case AMDGPU::V_FLOOR_F16_e64:
+  case AMDGPU::V_FLOOR_F16_e32:
+  case AMDGPU::V_CEIL_F16_e64:
+  case AMDGPU::V_CEIL_F16_e32:
+  case AMDGPU::V_TRUNC_F16_e64:
+  case AMDGPU::V_TRUNC_F16_e32:
+  case AMDGPU::V_RNDNE_F16_e64:
+  case AMDGPU::V_RNDNE_F16_e32:
+  case AMDGPU::V_FRACT_F16_e64:
+  case AMDGPU::V_FRACT_F16_e32:
+  case AMDGPU::V_FREXP_MANT_F16_e64:
+  case AMDGPU::V_FREXP_MANT_F16_e32:
+  case AMDGPU::V_FREXP_EXP_I16_F16_e64:
+  case AMDGPU::V_FREXP_EXP_I16_F16_e32:
+  case AMDGPU::V_LDEXP_F16_e64:
+  case AMDGPU::V_LDEXP_F16_e32:
+  case AMDGPU::V_LSHLREV_B16_e64:
+  case AMDGPU::V_LSHLREV_B16_e32:
+  case AMDGPU::V_LSHRREV_B16_e64:
+  case AMDGPU::V_LSHRREV_B16_e32:
+  case AMDGPU::V_ASHRREV_I16_e64:
+  case AMDGPU::V_ASHRREV_I16_e32:
+  case AMDGPU::V_ADD_U16_e64:
+  case AMDGPU::V_ADD_U16_e32:
+  case AMDGPU::V_SUB_U16_e64:
+  case AMDGPU::V_SUB_U16_e32:
+  case AMDGPU::V_SUBREV_U16_e64:
+  case AMDGPU::V_SUBREV_U16_e32:
+  case AMDGPU::V_MUL_LO_U16_e64:
+  case AMDGPU::V_MUL_LO_U16_e32:
+  case AMDGPU::V_ADD_F16_e64:
+  case AMDGPU::V_ADD_F16_e32:
+  case AMDGPU::V_SUB_F16_e64:
+  case AMDGPU::V_SUB_F16_e32:
+  case AMDGPU::V_SUBREV_F16_e64:
+  case AMDGPU::V_SUBREV_F16_e32:
+  case AMDGPU::V_MUL_F16_e64:
+  case AMDGPU::V_MUL_F16_e32:
+  case AMDGPU::V_MAX_F16_e64:
+  case AMDGPU::V_MAX_F16_e32:
+  case AMDGPU::V_MIN_F16_e64:
+  case AMDGPU::V_MIN_F16_e32:
+  case AMDGPU::V_MAX_U16_e64:
+  case AMDGPU::V_MAX_U16_e32:
+  case AMDGPU::V_MIN_U16_e64:
+  case AMDGPU::V_MIN_U16_e32:
+  case AMDGPU::V_MAX_I16_e64:
+  case AMDGPU::V_MAX_I16_e32:
+  case AMDGPU::V_MIN_I16_e64:
+  case AMDGPU::V_MIN_I16_e32:
+  case AMDGPU::V_MAD_F16_e64:
+  case AMDGPU::V_MAD_U16_e64:
+  case AMDGPU::V_MAD_I16_e64:
+  case AMDGPU::V_FMA_F16_e64:
+  case AMDGPU::V_DIV_FIXUP_F16_e64:
+    // On gfx10, all 16-bit instructions preserve the high bits.
+    return getGeneration() <= AMDGPUSubtarget::GFX9;
+  case AMDGPU::V_MADAK_F16:
+  case AMDGPU::V_MADMK_F16:
+  case AMDGPU::V_MAC_F16_e64:
+  case AMDGPU::V_MAC_F16_e32:
+  case AMDGPU::V_FMAMK_F16:
+  case AMDGPU::V_FMAAK_F16:
+  case AMDGPU::V_FMAC_F16_e64:
+  case AMDGPU::V_FMAC_F16_e32:
+    // In gfx9, the preferred handling of the unused high 16-bits changed. Most
+    // instructions maintain the legacy behavior of 0ing. Some instructions
+    // changed to preserving the high bits.
+    return getGeneration() == AMDGPUSubtarget::VOLCANIC_ISLANDS;
+  case AMDGPU::V_MAD_MIXLO_F16:
+  case AMDGPU::V_MAD_MIXHI_F16:
+  default:
+    return false;
+  }
+}
+
+void GCNSubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
+                                       unsigned NumRegionInstrs) const {
+  // Track register pressure so the scheduler can try to decrease
+  // pressure once register usage is above the threshold defined by
+  // SIRegisterInfo::getRegPressureSetLimit()
+  Policy.ShouldTrackPressure = true;
+
+  // Enabling both top down and bottom up scheduling seems to give us less
+  // register spills than just using one of these approaches on its own.
+  Policy.OnlyTopDown = false;
+  Policy.OnlyBottomUp = false;
+
+  // Enabling ShouldTrackLaneMasks crashes the SI Machine Scheduler.
+  if (!enableSIScheduler())
+    Policy.ShouldTrackLaneMasks = true;
+}
+
+void GCNSubtarget::mirFileLoaded(MachineFunction &MF) const {
+  if (isWave32()) {
+    // Fix implicit $vcc operands after MIParser has verified that they match
+    // the instruction definitions.
+    for (auto &MBB : MF) {
+      for (auto &MI : MBB)
+        InstrInfo.fixImplicitOperands(MI);
+    }
+  }
+}
+
+bool GCNSubtarget::hasMadF16() const {
+  return InstrInfo.pseudoToMCOpcode(AMDGPU::V_MAD_F16_e64) != -1;
+}
+
+bool GCNSubtarget::useVGPRIndexMode() const {
+  return !hasMovrel() || (EnableVGPRIndexMode && hasVGPRIndexMode());
+}
+
+bool GCNSubtarget::useAA() const { return UseAA; }
+
+unsigned GCNSubtarget::getOccupancyWithNumSGPRs(unsigned SGPRs) const {
+  return AMDGPU::IsaInfo::getOccupancyWithNumSGPRs(SGPRs, getMaxWavesPerEU(),
+                                                   getGeneration());
+}
+
+unsigned GCNSubtarget::getOccupancyWithNumVGPRs(unsigned NumVGPRs) const {
+  return AMDGPU::IsaInfo::getNumWavesPerEUWithNumVGPRs(this, NumVGPRs);
+}
+
+unsigned
+GCNSubtarget::getBaseReservedNumSGPRs(const bool HasFlatScratch) const {
+  if (getGeneration() >= AMDGPUSubtarget::GFX10)
+    return 2; // VCC. FLAT_SCRATCH and XNACK are no longer in SGPRs.
+
+  if (HasFlatScratch || HasArchitectedFlatScratch) {
+    if (getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+      return 6; // FLAT_SCRATCH, XNACK, VCC (in that order).
+    if (getGeneration() == AMDGPUSubtarget::SEA_ISLANDS)
+      return 4; // FLAT_SCRATCH, VCC (in that order).
+  }
+
+  if (isXNACKEnabled())
+    return 4; // XNACK, VCC (in that order).
+  return 2;   // VCC.
+}
+
+unsigned GCNSubtarget::getReservedNumSGPRs(const MachineFunction &MF) const {
+  const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
+  return getBaseReservedNumSGPRs(MFI.getUserSGPRInfo().hasFlatScratchInit());
+}
+
+unsigned GCNSubtarget::getReservedNumSGPRs(const Function &F) const {
+  // In principle we do not need to reserve SGPR pair used for flat_scratch if
+  // we know flat instructions do not access the stack anywhere in the
+  // program. For now assume it's needed if we have flat instructions.
+  const bool KernelUsesFlatScratch = hasFlatAddressSpace();
+  return getBaseReservedNumSGPRs(KernelUsesFlatScratch);
+}
+
+unsigned GCNSubtarget::computeOccupancy(const Function &F, unsigned LDSSize,
+                                        unsigned NumSGPRs,
+                                        unsigned NumVGPRs) const {
+  unsigned Occupancy =
+      std::min(getMaxWavesPerEU(), getOccupancyWithLocalMemSize(LDSSize, F));
+  if (NumSGPRs)
+    Occupancy = std::min(Occupancy, getOccupancyWithNumSGPRs(NumSGPRs));
+  if (NumVGPRs)
+    Occupancy = std::min(Occupancy, getOccupancyWithNumVGPRs(NumVGPRs));
+  return Occupancy;
+}
+
+unsigned GCNSubtarget::getBaseMaxNumSGPRs(
+    const Function &F, std::pair<unsigned, unsigned> WavesPerEU,
+    unsigned PreloadedSGPRs, unsigned ReservedNumSGPRs) const {
+  // Compute maximum number of SGPRs function can use using default/requested
+  // minimum number of waves per execution unit.
+  unsigned MaxNumSGPRs = getMaxNumSGPRs(WavesPerEU.first, false);
+  unsigned MaxAddressableNumSGPRs = getMaxNumSGPRs(WavesPerEU.first, true);
+
+  // Check if maximum number of SGPRs was explicitly requested using
+  // "amdgpu-num-sgpr" attribute.
+  if (F.hasFnAttribute("amdgpu-num-sgpr")) {
+    unsigned Requested =
+        F.getFnAttributeAsParsedInteger("amdgpu-num-sgpr", MaxNumSGPRs);
+
+    // Make sure requested value does not violate subtarget's specifications.
+    if (Requested && (Requested <= ReservedNumSGPRs))
+      Requested = 0;
+
+    // If more SGPRs are required to support the input user/system SGPRs,
+    // increase to accommodate them.
+    //
+    // FIXME: This really ends up using the requested number of SGPRs + number
+    // of reserved special registers in total. Theoretically you could re-use
+    // the last input registers for these special registers, but this would
+    // require a lot of complexity to deal with the weird aliasing.
+    unsigned InputNumSGPRs = PreloadedSGPRs;
+    if (Requested && Requested < InputNumSGPRs)
+      Requested = InputNumSGPRs;
+
+    // Make sure requested value is compatible with values implied by
+    // default/requested minimum/maximum number of waves per execution unit.
+    if (Requested && Requested > getMaxNumSGPRs(WavesPerEU.first, false))
+      Requested = 0;
+    if (WavesPerEU.second && Requested &&
+        Requested < getMinNumSGPRs(WavesPerEU.second))
+      Requested = 0;
+
+    if (Requested)
+      MaxNumSGPRs = Requested;
+  }
+
+  if (hasSGPRInitBug())
+    MaxNumSGPRs = AMDGPU::IsaInfo::FIXED_NUM_SGPRS_FOR_INIT_BUG;
+
+  return std::min(MaxNumSGPRs - ReservedNumSGPRs, MaxAddressableNumSGPRs);
+}
+
+unsigned GCNSubtarget::getMaxNumSGPRs(const MachineFunction &MF) const {
+  const Function &F = MF.getFunction();
+  const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
+  return getBaseMaxNumSGPRs(F, MFI.getWavesPerEU(), MFI.getNumPreloadedSGPRs(),
+                            getReservedNumSGPRs(MF));
+}
+
+static unsigned getMaxNumPreloadedSGPRs() {
+  using USI = GCNUserSGPRUsageInfo;
+  // Max number of user SGPRs
+  const unsigned MaxUserSGPRs =
+      USI::getNumUserSGPRForField(USI::PrivateSegmentBufferID) +
+      USI::getNumUserSGPRForField(USI::DispatchPtrID) +
+      USI::getNumUserSGPRForField(USI::QueuePtrID) +
+      USI::getNumUserSGPRForField(USI::KernargSegmentPtrID) +
+      USI::getNumUserSGPRForField(USI::DispatchIdID) +
+      USI::getNumUserSGPRForField(USI::FlatScratchInitID) +
+      USI::getNumUserSGPRForField(USI::ImplicitBufferPtrID);
+
+  // Max number of system SGPRs
+  const unsigned MaxSystemSGPRs = 1 + // WorkGroupIDX
+                                  1 + // WorkGroupIDY
+                                  1 + // WorkGroupIDZ
+                                  1 + // WorkGroupInfo
+                                  1;  // private segment wave byte offset
+
+  // Max number of synthetic SGPRs
+  const unsigned SyntheticSGPRs = 1; // LDSKernelId
+
+  return MaxUserSGPRs + MaxSystemSGPRs + SyntheticSGPRs;
+}
+
+unsigned GCNSubtarget::getMaxNumSGPRs(const Function &F) const {
+  return getBaseMaxNumSGPRs(F, getWavesPerEU(F), getMaxNumPreloadedSGPRs(),
+                            getReservedNumSGPRs(F));
+}
+
+unsigned GCNSubtarget::getBaseMaxNumVGPRs(
+    const Function &F, std::pair<unsigned, unsigned> WavesPerEU) const {
+  // Compute maximum number of VGPRs function can use using default/requested
+  // minimum number of waves per execution unit.
+  unsigned MaxNumVGPRs = getMaxNumVGPRs(WavesPerEU.first);
+
+  // Check if maximum number of VGPRs was explicitly requested using
+  // "amdgpu-num-vgpr" attribute.
+  if (F.hasFnAttribute("amdgpu-num-vgpr")) {
+    unsigned Requested =
+        F.getFnAttributeAsParsedInteger("amdgpu-num-vgpr", MaxNumVGPRs);
+
+    if (hasGFX90AInsts())
+      Requested *= 2;
+
+    // Make sure requested value is compatible with values implied by
+    // default/requested minimum/maximum number of waves per execution unit.
+    if (Requested && Requested > getMaxNumVGPRs(WavesPerEU.first))
+      Requested = 0;
+    if (WavesPerEU.second && Requested &&
+        Requested < getMinNumVGPRs(WavesPerEU.second))
+      Requested = 0;
+
+    if (Requested)
+      MaxNumVGPRs = Requested;
+  }
+
+  return MaxNumVGPRs;
+}
+
+unsigned GCNSubtarget::getMaxNumVGPRs(const Function &F) const {
+  return getBaseMaxNumVGPRs(F, getWavesPerEU(F));
+}
+
+unsigned GCNSubtarget::getMaxNumVGPRs(const MachineFunction &MF) const {
+  const Function &F = MF.getFunction();
+  const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
+  return getBaseMaxNumVGPRs(F, MFI.getWavesPerEU());
+}
+
+void GCNSubtarget::adjustSchedDependency(
+    SUnit *Def, int DefOpIdx, SUnit *Use, int UseOpIdx, SDep &Dep,
+    const TargetSchedModel *SchedModel) const {
+  if (Dep.getKind() != SDep::Kind::Data || !Dep.getReg() || !Def->isInstr() ||
+      !Use->isInstr())
+    return;
+
+  MachineInstr *DefI = Def->getInstr();
+  MachineInstr *UseI = Use->getInstr();
+
+  if (DefI->isBundle()) {
+    const SIRegisterInfo *TRI = getRegisterInfo();
+    auto Reg = Dep.getReg();
+    MachineBasicBlock::const_instr_iterator I(DefI->getIterator());
+    MachineBasicBlock::const_instr_iterator E(DefI->getParent()->instr_end());
+    unsigned Lat = 0;
+    for (++I; I != E && I->isBundledWithPred(); ++I) {
+      if (I->modifiesRegister(Reg, TRI))
+        Lat = InstrInfo.getInstrLatency(getInstrItineraryData(), *I);
+      else if (Lat)
+        --Lat;
+    }
+    Dep.setLatency(Lat);
+  } else if (UseI->isBundle()) {
+    const SIRegisterInfo *TRI = getRegisterInfo();
+    auto Reg = Dep.getReg();
+    MachineBasicBlock::const_instr_iterator I(UseI->getIterator());
+    MachineBasicBlock::const_instr_iterator E(UseI->getParent()->instr_end());
+    unsigned Lat = InstrInfo.getInstrLatency(getInstrItineraryData(), *DefI);
+    for (++I; I != E && I->isBundledWithPred() && Lat; ++I) {
+      if (I->readsRegister(Reg, TRI))
+        break;
+      --Lat;
+    }
+    Dep.setLatency(Lat);
+  } else if (Dep.getLatency() == 0 && Dep.getReg() == AMDGPU::VCC_LO) {
+    // Work around the fact that SIInstrInfo::fixImplicitOperands modifies
+    // implicit operands which come from the MCInstrDesc, which can fool
+    // ScheduleDAGInstrs::addPhysRegDataDeps into treating them as implicit
+    // pseudo operands.
+    Dep.setLatency(InstrInfo.getSchedModel().computeOperandLatency(
+        DefI, DefOpIdx, UseI, UseOpIdx));
+  }
+}
+
+namespace {
+struct FillMFMAShadowMutation : ScheduleDAGMutation {
+  const SIInstrInfo *TII;
+
+  ScheduleDAGMI *DAG;
+
+  FillMFMAShadowMutation(const SIInstrInfo *tii) : TII(tii) {}
+
+  bool isSALU(const SUnit *SU) const {
+    const MachineInstr *MI = SU->getInstr();
+    return MI && TII->isSALU(*MI) && !MI->isTerminator();
+  }
+
+  bool isVALU(const SUnit *SU) const {
+    const MachineInstr *MI = SU->getInstr();
+    return MI && TII->isVALU(*MI);
+  }
+
+  // Link as many SALU instructions in chain as possible. Return the size
+  // of the chain. Links up to MaxChain instructions.
+  unsigned linkSALUChain(SUnit *From, SUnit *To, unsigned MaxChain,
+                         SmallPtrSetImpl<SUnit *> &Visited) const {
+    SmallVector<SUnit *, 8> Worklist({To});
+    unsigned Linked = 0;
+
+    while (!Worklist.empty() && MaxChain-- > 0) {
+      SUnit *SU = Worklist.pop_back_val();
+      if (!Visited.insert(SU).second)
+        continue;
+
+      LLVM_DEBUG(dbgs() << "Inserting edge from\n"; DAG->dumpNode(*From);
+                 dbgs() << "to\n"; DAG->dumpNode(*SU); dbgs() << '\n');
+
+      if (SU != From && From != &DAG->ExitSU && DAG->canAddEdge(SU, From))
+        if (DAG->addEdge(SU, SDep(From, SDep::Artificial)))
+          ++Linked;
+
+      for (SDep &SI : From->Succs) {
+        SUnit *SUv = SI.getSUnit();
+        if (SUv != From && SU != &DAG->ExitSU && isVALU(SUv) &&
+            DAG->canAddEdge(SUv, SU))
+          DAG->addEdge(SUv, SDep(SU, SDep::Artificial));
+      }
+
+      for (SDep &SI : SU->Succs) {
+        SUnit *Succ = SI.getSUnit();
+        if (Succ != SU && isSALU(Succ))
+          Worklist.push_back(Succ);
+      }
+    }
+
+    return Linked;
+  }
+
+  void apply(ScheduleDAGInstrs *DAGInstrs) override {
+    const GCNSubtarget &ST = DAGInstrs->MF.getSubtarget<GCNSubtarget>();
+    if (!ST.hasMAIInsts())
+      return;
+    DAG = static_cast<ScheduleDAGMI *>(DAGInstrs);
+    const TargetSchedModel *TSchedModel = DAGInstrs->getSchedModel();
+    if (!TSchedModel || DAG->SUnits.empty())
+      return;
+
+    // Scan for MFMA long latency instructions and try to add a dependency
+    // of available SALU instructions to give them a chance to fill MFMA
+    // shadow. That is desirable to fill MFMA shadow with SALU instructions
+    // rather than VALU to prevent power consumption bursts and throttle.
+    auto LastSALU = DAG->SUnits.begin();
+    auto E = DAG->SUnits.end();
+    SmallPtrSet<SUnit *, 32> Visited;
+    for (SUnit &SU : DAG->SUnits) {
+      MachineInstr &MAI = *SU.getInstr();
+      if (!TII->isMAI(MAI) ||
+          MAI.getOpcode() == AMDGPU::V_ACCVGPR_WRITE_B32_e64 ||
+          MAI.getOpcode() == AMDGPU::V_ACCVGPR_READ_B32_e64)
+        continue;
+
+      unsigned Lat = TSchedModel->computeInstrLatency(&MAI) - 1;
+
+      LLVM_DEBUG(dbgs() << "Found MFMA: "; DAG->dumpNode(SU);
+                 dbgs() << "Need " << Lat
+                        << " instructions to cover latency.\n");
+
+      // Find up to Lat independent scalar instructions as early as
+      // possible such that they can be scheduled after this MFMA.
+      for (; Lat && LastSALU != E; ++LastSALU) {
+        if (Visited.count(&*LastSALU))
+          continue;
+
+        if (&SU == &DAG->ExitSU || &SU == &*LastSALU || !isSALU(&*LastSALU) ||
+            !DAG->canAddEdge(&*LastSALU, &SU))
+          continue;
+
+        Lat -= linkSALUChain(&SU, &*LastSALU, Lat, Visited);
+      }
+    }
+  }
+};
+} // namespace
+
+void GCNSubtarget::getPostRAMutations(
+    std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {
+  Mutations.push_back(std::make_unique<FillMFMAShadowMutation>(&InstrInfo));
+}
+
+std::unique_ptr<ScheduleDAGMutation>
+GCNSubtarget::createFillMFMAShadowMutation(const TargetInstrInfo *TII) const {
+  return EnablePowerSched ? std::make_unique<FillMFMAShadowMutation>(&InstrInfo)
+                          : nullptr;
+}
+
+unsigned GCNSubtarget::getNSAThreshold(const MachineFunction &MF) const {
+  if (getGeneration() >= AMDGPUSubtarget::GFX12)
+    return 0; // Not MIMG encoding.
+
+  if (NSAThreshold.getNumOccurrences() > 0)
+    return std::max(NSAThreshold.getValue(), 2u);
+
+  int Value = MF.getFunction().getFnAttributeAsParsedInteger(
+      "amdgpu-nsa-threshold", -1);
+  if (Value > 0)
+    return std::max(Value, 2);
+
+  return 3;
+}
+
+GCNUserSGPRUsageInfo::GCNUserSGPRUsageInfo(const Function &F,
+                                           const GCNSubtarget &ST)
+    : ST(ST) {
+  const CallingConv::ID CC = F.getCallingConv();
+  const bool IsKernel =
+      CC == CallingConv::AMDGPU_KERNEL || CC == CallingConv::SPIR_KERNEL;
+  // FIXME: Should have analysis or something rather than attribute to detect
+  // calls.
+  const bool HasCalls = F.hasFnAttribute("amdgpu-calls");
+  // FIXME: This attribute is a hack, we just need an analysis on the function
+  // to look for allocas.
+  const bool HasStackObjects = F.hasFnAttribute("amdgpu-stack-objects");
+
+  if (IsKernel && (!F.arg_empty() || ST.getImplicitArgNumBytes(F) != 0))
+    KernargSegmentPtr = true;
+
+  bool IsAmdHsaOrMesa = ST.isAmdHsaOrMesa(F);
+  if (IsAmdHsaOrMesa && !ST.enableFlatScratch())
+    PrivateSegmentBuffer = true;
+  else if (ST.isMesaGfxShader(F))
+    ImplicitBufferPtr = true;
+
+  if (!AMDGPU::isGraphics(CC)) {
+    if (!F.hasFnAttribute("amdgpu-no-dispatch-ptr"))
+      DispatchPtr = true;
+
+    // FIXME: Can this always be disabled with < COv5?
+    if (!F.hasFnAttribute("amdgpu-no-queue-ptr"))
+      QueuePtr = true;
+
+    if (!F.hasFnAttribute("amdgpu-no-dispatch-id"))
+      DispatchID = true;
+  }
+
+  // TODO: This could be refined a lot. The attribute is a poor way of
+  // detecting calls or stack objects that may require it before argument
+  // lowering.
+  if (ST.hasFlatAddressSpace() && AMDGPU::isEntryFunctionCC(CC) &&
+      (IsAmdHsaOrMesa || ST.enableFlatScratch()) &&
+      (HasCalls || HasStackObjects || ST.enableFlatScratch()) &&
+      !ST.flatScratchIsArchitected()) {
+    FlatScratchInit = true;
+  }
+
+  if (hasImplicitBufferPtr())
+    NumUsedUserSGPRs += getNumUserSGPRForField(ImplicitBufferPtrID);
+
+  if (hasPrivateSegmentBuffer())
+    NumUsedUserSGPRs += getNumUserSGPRForField(PrivateSegmentBufferID);
+
+  if (hasDispatchPtr())
+    NumUsedUserSGPRs += getNumUserSGPRForField(DispatchPtrID);
+
+  if (hasQueuePtr())
+    NumUsedUserSGPRs += getNumUserSGPRForField(QueuePtrID);
+
+  if (hasKernargSegmentPtr())
+    NumUsedUserSGPRs += getNumUserSGPRForField(KernargSegmentPtrID);
+
+  if (hasDispatchID())
+    NumUsedUserSGPRs += getNumUserSGPRForField(DispatchIdID);
+
+  if (hasFlatScratchInit())
+    NumUsedUserSGPRs += getNumUserSGPRForField(FlatScratchInitID);
+
+  if (hasPrivateSegmentSize())
+    NumUsedUserSGPRs += getNumUserSGPRForField(PrivateSegmentSizeID);
+}
+
+void GCNUserSGPRUsageInfo::allocKernargPreloadSGPRs(unsigned NumSGPRs) {
+  assert(NumKernargPreloadSGPRs + NumSGPRs <= AMDGPU::getMaxNumUserSGPRs(ST));
+  NumKernargPreloadSGPRs += NumSGPRs;
+  NumUsedUserSGPRs += NumSGPRs;
+}
+
+unsigned GCNUserSGPRUsageInfo::getNumFreeUserSGPRs() {
+  return AMDGPU::getMaxNumUserSGPRs(ST) - NumUsedUserSGPRs;
+}
diff --git a/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-any.ll b/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-any.ll
index 331518c0c9d339..a3fed314fed243 100644
--- a/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-any.ll
+++ b/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-any.ll
@@ -1,6 +1,6 @@
-; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx908 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx908 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
 
 ; REQUIRES: asserts
 
diff --git a/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-disabled.ll b/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-disabled.ll
index 1e4e9f3e13fe2b..65b289bcd29d9a 100644
--- a/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-disabled.ll
+++ b/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-disabled.ll
@@ -1,6 +1,6 @@
-; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx908 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx908 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
 
 ; REQUIRES: asserts
 
diff --git a/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-enabled.ll b/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-enabled.ll
index 713b276ddedb3c..bd665eb432f481 100644
--- a/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-enabled.ll
+++ b/llvm/test/CodeGen/AMDGPU/sramecc-subtarget-feature-enabled.ll
@@ -1,6 +1,6 @@
-; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx908 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx908 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
 
 ; REQUIRES: asserts
 
diff --git a/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-any.ll b/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-any.ll
index b7da3b77c96371..5aaf81d0e10e2e 100644
--- a/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-any.ll
+++ b/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-any.ll
@@ -1,10 +1,10 @@
-; RUN: llc -mtriple=amdgcn -mcpu=gfx600 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx801 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx902 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx600 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx801 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx902 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ANY %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=NOT-SUPPORTED %s
 
 ; REQUIRES: asserts
 
diff --git a/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-disabled.ll b/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-disabled.ll
index 23baeabc6a1bb4..4ced763abc2ac3 100644
--- a/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-disabled.ll
+++ b/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-disabled.ll
@@ -1,10 +1,10 @@
-; RUN: llc -mtriple=amdgcn -mcpu=gfx600 -debug-only=amdgpu-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=amdgpu-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx801 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx600 -debug-only=gcn-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=gcn-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx801 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=OFF %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
 
 ; REQUIRES: asserts
 
diff --git a/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-enabled.ll b/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-enabled.ll
index a52c842afb291f..20354f6828f9c9 100644
--- a/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-enabled.ll
+++ b/llvm/test/CodeGen/AMDGPU/xnack-subtarget-feature-enabled.ll
@@ -1,10 +1,10 @@
-; RUN: llc -mtriple=amdgcn -mcpu=gfx600 -debug-only=amdgpu-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=amdgpu-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx801 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
-; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -debug-only=amdgpu-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx600 -debug-only=gcn-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -debug-only=gcn-subtarget -o /dev/null %s 2>&1 | FileCheck --check-prefix=WARN %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx801 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=ON %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -debug-only=gcn-subtarget -o - %s 2>&1 | FileCheck --check-prefix=WARN %s
 
 ; REQUIRES: asserts
 

>From 47e0212f00f707a4bb92714afe9c748116887d62 Mon Sep 17 00:00:00 2001
From: LLVM GN Syncbot <llvmgnsyncbot at gmail.com>
Date: Wed, 21 Aug 2024 18:11:13 +0000
Subject: [PATCH 036/116] [gn build] Port a6bae5cb3791

---
 llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn
index edd5be27900cc9..006e1ed700b821 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn
@@ -206,6 +206,7 @@ static_library("LLVMAMDGPUCodeGen") {
     "GCNRegPressure.cpp",
     "GCNRewritePartialRegUses.cpp",
     "GCNSchedStrategy.cpp",
+    "GCNSubtarget.cpp",
     "GCNVOPDUtils.cpp",
     "R600AsmPrinter.cpp",
     "R600ClauseMergePass.cpp",

>From c09fdac0b577ca0bfef141765d0a9ae1b6040893 Mon Sep 17 00:00:00 2001
From: Michael Kruse <llvm-project at meinersbur.de>
Date: Wed, 21 Aug 2024 20:21:04 +0200
Subject: [PATCH 037/116] [Docs] Update Loop Optimization WG call.

The WebEx link will become invalid soon, we are switching to Google
Meet. Also, changing the cadence from biweekly to monthly.
---
 llvm/docs/GettingInvolved.rst          |   2 +-
 llvm/docs/_static/LoopOptWG_invite.ics | 126 ++++++++++++++++---------
 2 files changed, 81 insertions(+), 47 deletions(-)

diff --git a/llvm/docs/GettingInvolved.rst b/llvm/docs/GettingInvolved.rst
index 646f1d09dfab0b..32d3a83738a8eb 100644
--- a/llvm/docs/GettingInvolved.rst
+++ b/llvm/docs/GettingInvolved.rst
@@ -150,7 +150,7 @@ what to add to your calendar invite.
      - Calendar link
      - Minutes/docs link
    * - Loop Optimization Working Group
-     - Every 2 weeks on Wednesday
+     - Every first Wednesday of the month
      - `ics <./_static/LoopOptWG_invite.ics>`__
      - `Minutes/docs <https://docs.google.com/document/d/1sdzoyB11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit>`__
    * - RISC-V
diff --git a/llvm/docs/_static/LoopOptWG_invite.ics b/llvm/docs/_static/LoopOptWG_invite.ics
index 3ec76e577ab746..65597d90a9c852 100644
--- a/llvm/docs/_static/LoopOptWG_invite.ics
+++ b/llvm/docs/_static/LoopOptWG_invite.ics
@@ -1,46 +1,80 @@
-BEGIN:VCALENDAR
-PRODID:-//Microsoft Corporation//Outlook 10.0 MIMEDIR//EN
-VERSION:2.0
-METHOD:PUBLISH
-BEGIN:VTIMEZONE
-TZID:America/New_York
-LAST-MODIFIED:20201011T015911Z
-TZURL:http://tzurl.org/zoneinfo-outlook/America/New_York
-X-LIC-LOCATION:America/New_York
-BEGIN:DAYLIGHT
-TZNAME:EDT
-TZOFFSETFROM:-0500
-TZOFFSETTO:-0400
-DTSTART:19700308T020000
-RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
-END:DAYLIGHT
-BEGIN:STANDARD
-TZNAME:EST
-TZOFFSETFROM:-0400
-TZOFFSETTO:-0500
-DTSTART:19701101T020000
-RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
-END:STANDARD
-END:VTIMEZONE
-BEGIN:VEVENT
-DTSTAMP:20210908T145817Z
-ORGANIZER;CN="Bardia Mahjour":MAILTO:bmahjour at ca.ibm.com
-DTSTART;TZID=America/New_York:20210908T110000
-DTEND;TZID=America/New_York:20210908T120000
-LOCATION:https://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6
-TRANSP:OPAQUE
-SEQUENCE:1631113097
-UID:862486b7-998c-41f8-b7ca-0906ac06f113
-DESCRIPTION:\n\n\n\n\nJOIN WEBEX MEETING\nhttps://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6\nMeeting number (access code): 145 067 2790\n\nMeeting password: PQduM8RxN52 (77386879 from phones and video systems)\n\n\n\nTAP TO JOIN FROM A MOBILE DEVICE (ATTENDEES ONLY)\n1-844-531-0958,,1450672790#77386879# tel:1-844-531-0958,,*01*1450672790%2377386879%23*01* United States Toll Free\n+1-669-234-1178,,1450672790#77386879# tel:%2B1-669-234-1178,,*01*1450672790%2377386879%23*01* United States Toll\nSome mobile devices may ask attendees to enter a numeric password.\n\n\nJOIN BY PHONE\n1-844-531-0958 United States Toll Free\n1-669-234-1178 United States Toll\n\nGlobal call-in numbers\nhttps://ibm.webex.com/ibm/globalcallin.php?MTID=mb6e6082af7e7e7fe3948dbe9ab0025cf\n\nToll-free calling restrictions\nhttps://ibm.webex.com/ibm/customer_tollfree_restrictions.pdf\n\n\nJOIN FROM A VIDEO SYSTEM OR APPLICATION\nDial sip:1450672790 at ibm.webex.com\nYou can also dial 173.243.2.68 and enter your meeting number.\n\n\nJoin using Microsoft Lync or Microsoft Skype for Business\nDial sip:1450672790.ibm at lync.webex.com\n\n\n\n\n\nCan't join the meeting?\nhttps://collaborationhelp.cisco.com/article/WBX000029055\n\n\nIMPORTANT NOTICE: Please note that this Webex service allows audio and other information sent during the session to be recorded, which may be discoverable in a legal matter. By joining this session, you automatically consent to such recordings. If you do not consent to being recorded, discuss your concerns with the host or do not join the session.\n
-X-ALT-DESC;FMTTYPE=text/html:<style type="text/css">\ntable {\n	border-collapse: separate; width =100%;	border: 0;	border-spacing: 0;}\n\ntr {\n	line-height: 18px;}\n\na, td {\n	font-size: 14px;	font-family: Arial;	color: #333;	word-wrap: break-word;	word-break: normal;	padding: 0;}\n\n.title {\n	font-size: 28px;}\n\n.image {\n	width: auto;	max-width: auto;}\n\n.footer {\n	width: 604px;}\n\n.main {\n\n}@media screen and (max-device-width: 800px) {\n	.title {\n		font-size: 22px !important;	}\n	.image {\n		width: auto !important;		max-width: 100% !important;	}\n	.footer {\n		width: 100% !important;		max-width: 604px !important\n	}\n	.main {\n		width: 100% !important;		max-width: 604px !important\n	}\n}\n</style>\n\n<table bgcolor="#FFFFFF" style="padding: 0; margin: 0; border: 0; width: 100%;" align="left">\n	<tr style="height: 28px"><td> </td></tr>\n	<tr>\n		<td align="left" style="padding: 0 20px; margin: 0">\n			<!--<table bgcolor="#FFFFFF" style="border: 0px; width: 100%; padding-left: 50px; padding-right: 50px;" align="left" class="main">\n				<tr>\n					<td align="center" valign="top" > 					</td>\n				</tr>\n			</table>-->\n\n\n\n\n\n			<table>\n				<tr>\n					<td>\n						<FONT SIZE="4" COLOR="#666666" FACE="arial">When it's time, join the Webex meeting here.</FONT>\n					</td>\n				</tr>\n		</table>\n        <table>\n        	<tr style="line-height: 20px;"><td style="height:20px"> </td></tr>\n			<tr>\n				<td style="width:auto!important; ">\n					<table border="0" cellpadding="0" cellspacing="0" style="width:auto;width:auto!important;background-color:#00823B; border:0px solid #00823B; border-radius:25px; min-width:160px!important;">\n						<tr>\n							<td align="center" style="padding:10px 36px;"><a href="https://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6" style="color:#FFFFFF; font-size:20px; text-decoration:none;">Join meeting</a></td>\n						</tr>\n					</table>\n				</td>\n			</tr>\n		</table>\n		<table>\n			<tr style="line-height: 20px;"><td style="height:20px"> </td></tr>\n			<tr>\n				<td>\n						<FONT SIZE="3" COLOR="#666666" FACE="arial">More ways to join:</FONT>\n				</td>\n        	</tr>\n            <tr style="line-height: 10px;"><td style="height: 10px;"> </td></tr>\n        	<tr>\n				<td>\n						<FONT SIZE="3" COLOR="#666666" FACE="arial">Join from the meeting link</FONT>\n				</td>\n        	</tr>\n        	<tr>\n				<td>\n						<FONT SIZE="2" COLOR="#666666" FACE="arial"><a href='https://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6' style='color:#005E7D;  text-decoration:none; font-family: Arial;font-size: 14px;line-height: 24px;'>https://ibm.webex.com/ibm/j.php?MTID=m450e0c4009445e16df43ff82ea58f7a6</a></FONT>\n				</td>\n        	</tr>\n        	<tr style="line-height: 20px;"><td style="height:20px"> </td></tr>\n			<tr>\n				<td>\n						<FONT SIZE="3" COLOR="#666666" FACE="arial">Join by meeting number</FONT>\n				</td>\n        	</tr>\n			<tr>\n				<td>\n					<FONT SIZE="2" COLOR="#666666" FACE="arial">Meeting number (access code): 145 067 2790</FONT>\n				</td>\n			</tr>\n		</table>\n		<table><tr><td><FONT SIZE="2" COLOR="#666666" FACE="arial">Meeting password:</FONT></td><td><FONT SIZE="2"  COLOR="#666666" FACE="arial">PQduM8RxN52 (77386879 from phones and video systems)</FONT></td></tr></table>\n\n <FONT size="2" COLOR="#FF0000" style="font-family: Arial;"></FONT>\n\n  <BR><FONT SIZE="4" FACE="ARIAL"><FONT SIZE="3" COLOR="#666666" FACE="arial">Tap to join from a mobile device (attendees only)</FONT>   <BR><FONT SIZE="2" COLOR="#666666" FACE="arial"><a href='tel:1-844-531-0958,,*01*1450672790%2377386879%23*01*' style='color:#005E7D;  text-decoration:none; font-family: Arial;font-size: 14px;line-height: 24px;'>1-844-531-0958,,1450672790#77386879#</a> United States Toll Free</FONT>   <BR><FONT SIZE="2" COLOR="#666666" FACE="arial"><a href='tel:%2B1-669-234-1178,,*01*1450672790%2377386879%23*01*' style='color:#005E7D;  text-decoration:none; font-family: Arial;font-size: 14px;line-height: 24px;'>+1-669-234-1178,,1450672790#77386879#</a> United States Toll</FONT>   <BR><FONT SIZE="2" COLOR="#666666" FACE="arial">Some mobile devices may ask attendees to enter a numeric password.</FONT>  <BR><BR><FONT SIZE="4" FACE="ARIAL"><FONT SIZE="3" COLOR="#666666" FACE="arial">Join by phone</FONT>   <BR><FONT SIZE="2" COLOR="#666666" FACE="arial">1-844-531-0958 United States Toll Free</FONT>   <BR><FONT SIZE="2" COLOR="#666666" FACE="arial">1-669-234-1178 United States Toll</FONT>   <BR><FONT SIZE="2" COLOR="#666666" FACE="arial"><a href="https://ibm.webex.com/ibm/globalcallin.php?MTID=mb6e6082af7e7e7fe3948dbe9ab0025cf" style="text-decoration:none;font-size:14px;color:#005E7D">Global call-in numbers</a>  |  <a href="https://ibm.webex.com/ibm/customer_tollfree_restrictions.pdf" style="text-decoration:none;font-size:14px;color:#005E7D">Toll-free calling restrictions</a></FONT>  <BR><BR><BR>\n\n<table><tr style="line-height: 20px;"><td style="height:20px"> </td></tr></table>\n\n<FONT SIZE="4" FACE="ARIAL"><FONT SIZE="3" COLOR="#666666" FACE="arial">Join from a video system or application</FONT><BR><FONT SIZE="2" COLOR="#666666" FACE="arial">Dial</FONT> <a href="sip:1450672790 at ibm.webex.com"><FONT SIZE="2" COLOR="#005E7D" FACE="arial">1450672790 at ibm.webex.com</FONT></a>  <BR><FONT SIZE="2" COLOR="#666666" FACE="arial">You can also dial 173.243.2.68 and enter your meeting number.</FONT>   <BR></FONT>  <BR>\n\n<table><tr style="line-height: 20px;"><td style="height:20px"> </td></tr></table><table cellpadding="0" cellspacing="0"><tr><td  style="color: #000000; font-family: Arial;font-size: 12px; font-weight: bold; line-height: 24px;"><b>Join using Microsoft Lync or Microsoft Skype for Business</b></td></tr><tr style="margin:0px"><td style="color: #333333; font-family: Arial; font-size: 14px; line-height: 24px;">Dial <a href=" sip:1450672790.ibm at lync.webex.com"   style="text-decoration:none;color:#005E7D">1450672790.ibm at lync.webex.com</a></td></tr></table>\n\n	<table><tr style="line-height: 20px"><td style="height:20px"> </td></tr></table>\n	\n\n			<table style="width: 100%;" align="left" class="main">\n                <tr style="height: 20px"><td> </td></tr>\n				<tr>\n					<td style="height: 24px; color: #000000; font-family:Arial; font-size: 14px; line-height: 24px;">Need help? Go to <a href="https://help.webex.com" style="color:#005E7D; text-decoration:none;">https://help.webex.com</a>\n					</td>\n				</tr>\n                <tr style="height: 44px"><td> </td></tr>\n			</table>\n		</td>\n	</tr>\n</table>\n
-SUMMARY:Loop Opt WG
-PRIORITY:5
-CLASS:PUBLIC
-RRULE:FREQ=WEEKLY;WKST=SU;INTERVAL=2;BYDAY=WE
-BEGIN:VALARM
-TRIGGER:-PT5M
-ACTION:DISPLAY
-DESCRIPTION:Reminder
-END:VALARM
-END:VEVENT
-END:VCALENDAR
+BEGIN:VCALENDAR
+PRODID:-//Google Inc//Google Calendar 70.9054//EN
+VERSION:2.0
+CALSCALE:GREGORIAN
+METHOD:PUBLISH
+X-WR-CALNAME:LLVM Loop Optimization Discussion
+X-WR-TIMEZONE:Europe/Berlin
+BEGIN:VTIMEZONE
+TZID:America/New_York
+X-LIC-LOCATION:America/New_York
+BEGIN:DAYLIGHT
+TZOFFSETFROM:-0500
+TZOFFSETTO:-0400
+TZNAME:EDT
+DTSTART:19700308T020000
+RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
+END:DAYLIGHT
+BEGIN:STANDARD
+TZOFFSETFROM:-0400
+TZOFFSETTO:-0500
+TZNAME:EST
+DTSTART:19701101T020000
+RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
+END:STANDARD
+END:VTIMEZONE
+BEGIN:VEVENT
+DTSTART;TZID=America/New_York:20240904T110000
+DTEND;TZID=America/New_York:20240904T120000
+RRULE:FREQ=MONTHLY;BYDAY=1WE
+DTSTAMP:20240821T160951Z
+UID:58h3f0kd3aooohmeii0johh23c at google.com
+X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg
+CREATED:20240821T151507Z
+DESCRIPTION:LLVM Loop Optimization Discussion<br>Video call link: <a href="
+ https://meet.google.com/fmz-gspu-odg" target="_blank">https://meet.google.c
+ om/fmz-gspu-odg</a><br>Agenda/Minutes/Discussion: <a href="https://docs.goo
+ gle.com/document/d/1sdzoyB11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sh
+ aring" class="pastedDriveLink-0">https://docs.google.com/document/d/1sdzoyB
+ 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing</a>\n\n-::~:~::~:~:~
+ :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\
+ nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE)
+  +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm
+ z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp
+ ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n
+ -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~
+ :~:~::~:~::-
+LAST-MODIFIED:20240821T160941Z
+SEQUENCE:0
+STATUS:CONFIRMED
+SUMMARY:LLVM Loop Optimization Discussion
+TRANSP:OPAQUE
+END:VEVENT
+BEGIN:VEVENT
+DTSTART;TZID=America/New_York:20240904T110000
+DTEND;TZID=America/New_York:20240904T120000
+DTSTAMP:20240821T160951Z
+UID:58h3f0kd3aooohmeii0johh23c at google.com
+X-GOOGLE-CONFERENCE:https://meet.google.com/fmz-gspu-odg
+RECURRENCE-ID;TZID=America/New_York:20240904T110000
+CREATED:20240821T151507Z
+DESCRIPTION:LLVM Loop Optimization Discussion<br>Video call link: <a href="
+ https://meet.google.com/fmz-gspu-odg" target="_blank">https://meet.google.c
+ om/fmz-gspu-odg</a><br>Agenda/Minutes/Discussion: <a href="https://docs.goo
+ gle.com/document/d/1sdzoyB11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sh
+ aring" class="pastedDriveLink-0">https://docs.google.com/document/d/1sdzoyB
+ 11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit?usp=sharing</a>\n\n-::~:~::~:~:~
+ :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\
+ nJoin with Google Meet: https://meet.google.com/fmz-gspu-odg\nOr dial: (DE)
+  +49 40 8081617343 PIN: 948106286#\nMore phone numbers: https://tel.meet/fm
+ z-gspu-odg?pin=6273693382184&hs=7\n\nLearn more about Meet at: https://supp
+ ort.google.com/a/users/answer/9282720\n\nPlease do not edit this section.\n
+ -::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~
+ :~:~::~:~::-
+LAST-MODIFIED:20240821T160941Z
+SEQUENCE:0
+STATUS:CONFIRMED
+SUMMARY:LLVM Loop Optimization Discussion
+TRANSP:OPAQUE
+END:VEVENT
+END:VCALENDAR

>From 6257a98b258a3f17b78af31bf43009a559c5dd1d Mon Sep 17 00:00:00 2001
From: Adrian Vogelsgesang <avogelsgesang at salesforce.com>
Date: Wed, 21 Aug 2024 20:30:10 +0200
Subject: [PATCH 038/116] [lldb-dap] Implement `StepGranularity` for "next" and
 "step-in" (#105464)

VS Code requests the `instruction` stepping granularity if the assembly
view is currently focused. By implementing `StepGranularity`, we can
hence properly single-step through assembly code.
---
 .../test/tools/lldb-dap/dap_server.py         | 12 ++++---
 .../test/tools/lldb-dap/lldbdap_testcase.py   | 12 ++++---
 .../API/tools/lldb-dap/step/TestDAP_step.py   | 13 ++++++++
 lldb/tools/lldb-dap/lldb-dap.cpp              | 33 +++++++++++++++++--
 4 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index a324af57b61df3..874383a13e2bb6 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -816,17 +816,21 @@ def request_launch(
             self.wait_for_event(filter=["process", "initialized"])
         return response
 
-    def request_next(self, threadId):
+    def request_next(self, threadId, granularity="statement"):
         if self.exit_status is not None:
             raise ValueError("request_continue called after process exited")
-        args_dict = {"threadId": threadId}
+        args_dict = {"threadId": threadId, "granularity": granularity}
         command_dict = {"command": "next", "type": "request", "arguments": args_dict}
         return self.send_recv(command_dict)
 
-    def request_stepIn(self, threadId, targetId):
+    def request_stepIn(self, threadId, targetId, granularity="statement"):
         if self.exit_status is not None:
             raise ValueError("request_stepIn called after process exited")
-        args_dict = {"threadId": threadId, "targetId": targetId}
+        args_dict = {
+            "threadId": threadId,
+            "targetId": targetId,
+            "granularity": granularity,
+        }
         command_dict = {"command": "stepIn", "type": "request", "arguments": args_dict}
         return self.send_recv(command_dict)
 
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index a312a88ebd7e58..27545816f20707 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -222,14 +222,18 @@ def set_global(self, name, value, id=None):
         """Set a top level global variable only."""
         return self.dap_server.request_setVariable(2, name, str(value), id=id)
 
-    def stepIn(self, threadId=None, targetId=None, waitForStop=True):
-        self.dap_server.request_stepIn(threadId=threadId, targetId=targetId)
+    def stepIn(
+        self, threadId=None, targetId=None, waitForStop=True, granularity="statement"
+    ):
+        self.dap_server.request_stepIn(
+            threadId=threadId, targetId=targetId, granularity=granularity
+        )
         if waitForStop:
             return self.dap_server.wait_for_stopped()
         return None
 
-    def stepOver(self, threadId=None, waitForStop=True):
-        self.dap_server.request_next(threadId=threadId)
+    def stepOver(self, threadId=None, waitForStop=True, granularity="statement"):
+        self.dap_server.request_next(threadId=threadId, granularity=granularity)
         if waitForStop:
             return self.dap_server.wait_for_stopped()
         return None
diff --git a/lldb/test/API/tools/lldb-dap/step/TestDAP_step.py b/lldb/test/API/tools/lldb-dap/step/TestDAP_step.py
index 8a1bb76340be73..42a39e3c8c080b 100644
--- a/lldb/test/API/tools/lldb-dap/step/TestDAP_step.py
+++ b/lldb/test/API/tools/lldb-dap/step/TestDAP_step.py
@@ -68,5 +68,18 @@ def test_step(self):
                     self.assertEqual(x4, x3, "verify step over variable")
                     self.assertGreater(line4, line3, "verify step over line")
                     self.assertEqual(src1, src4, "verify step over source")
+
+                    # Step a single assembly instruction.
+                    # Unfortunately, there is no portable way to verify the correct
+                    # stepping behavior here, because the generated assembly code
+                    # depends highly on the compiler, its version, the operating
+                    # system, and many more factors.
+                    self.stepOver(
+                        threadId=tid, waitForStop=True, granularity="instruction"
+                    )
+                    self.stepIn(
+                        threadId=tid, waitForStop=True, granularity="instruction"
+                    )
+
                     # only step one thread that is at the breakpoint and stop
                     break
diff --git a/lldb/tools/lldb-dap/lldb-dap.cpp b/lldb/tools/lldb-dap/lldb-dap.cpp
index f50a6c17310739..b534a48660a5f8 100644
--- a/lldb/tools/lldb-dap/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/lldb-dap.cpp
@@ -1677,6 +1677,9 @@ void request_initialize(const llvm::json::Object &request) {
   body.try_emplace("supportsCompletionsRequest", true);
   // The debug adapter supports the disassembly request.
   body.try_emplace("supportsDisassembleRequest", true);
+  // The debug adapter supports stepping granularities (argument `granularity`)
+  // for the stepping requests.
+  body.try_emplace("supportsSteppingGranularity", true);
 
   llvm::json::Array completion_characters;
   completion_characters.emplace_back(".");
@@ -1985,6 +1988,14 @@ void request_launch(const llvm::json::Object &request) {
   g_dap.SendJSON(CreateEventObject("initialized"));
 }
 
+// Check if the step-granularity is `instruction`
+static bool hasInstructionGranularity(const llvm::json::Object &requestArgs) {
+  if (std::optional<llvm::StringRef> value =
+          requestArgs.getString("granularity"))
+    return value == "instruction";
+  return false;
+}
+
 // "NextRequest": {
 //   "allOf": [ { "$ref": "#/definitions/Request" }, {
 //     "type": "object",
@@ -2012,6 +2023,11 @@ void request_launch(const llvm::json::Object &request) {
 //     "threadId": {
 //       "type": "integer",
 //       "description": "Execute 'next' for this thread."
+//     },
+//     "granularity": {
+//       "$ref": "#/definitions/SteppingGranularity",
+//       "description": "Stepping granularity. If no granularity is specified, a
+//                       granularity of `statement` is assumed."
 //     }
 //   },
 //   "required": [ "threadId" ]
@@ -2032,7 +2048,11 @@ void request_next(const llvm::json::Object &request) {
     // Remember the thread ID that caused the resume so we can set the
     // "threadCausedFocus" boolean value in the "stopped" events.
     g_dap.focus_tid = thread.GetThreadID();
-    thread.StepOver();
+    if (hasInstructionGranularity(*arguments)) {
+      thread.StepInstruction(/*step_over=*/true);
+    } else {
+      thread.StepOver();
+    }
   } else {
     response["success"] = llvm::json::Value(false);
   }
@@ -3193,6 +3213,11 @@ void request_stackTrace(const llvm::json::Object &request) {
 //     "targetId": {
 //       "type": "integer",
 //       "description": "Optional id of the target to step into."
+//     },
+//     "granularity": {
+//       "$ref": "#/definitions/SteppingGranularity",
+//       "description": "Stepping granularity. If no granularity is specified, a
+//                       granularity of `statement` is assumed."
 //     }
 //   },
 //   "required": [ "threadId" ]
@@ -3223,7 +3248,11 @@ void request_stepIn(const llvm::json::Object &request) {
     // Remember the thread ID that caused the resume so we can set the
     // "threadCausedFocus" boolean value in the "stopped" events.
     g_dap.focus_tid = thread.GetThreadID();
-    thread.StepInto(step_in_target.c_str(), run_mode);
+    if (hasInstructionGranularity(*arguments)) {
+      thread.StepInstruction(/*step_over=*/false);
+    } else {
+      thread.StepInto(step_in_target.c_str(), run_mode);
+    }
   } else {
     response["success"] = llvm::json::Value(false);
   }

>From 8b4d4bee2a45f637fb4dcda49b592374e93a6480 Mon Sep 17 00:00:00 2001
From: Rahul Joshi <rjoshi at nvidia.com>
Date: Wed, 21 Aug 2024 11:38:13 -0700
Subject: [PATCH 039/116] [NFC][ADT] Remove << operators from StringRefTest
 (#105500)

- Remove ostream << operators for StringRef and StringRef pair from
StringTest.
  Both of these are natively supported by googletest framework.
---
 llvm/unittests/ADT/StringRefTest.cpp | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/llvm/unittests/ADT/StringRefTest.cpp b/llvm/unittests/ADT/StringRefTest.cpp
index 40351c99d0185c..a0529b03ae8c22 100644
--- a/llvm/unittests/ADT/StringRefTest.cpp
+++ b/llvm/unittests/ADT/StringRefTest.cpp
@@ -16,21 +16,6 @@
 #include "gtest/gtest.h"
 using namespace llvm;
 
-namespace llvm {
-
-std::ostream &operator<<(std::ostream &OS, const StringRef &S) {
-  OS << S.str();
-  return OS;
-}
-
-std::ostream &operator<<(std::ostream &OS,
-                         const std::pair<StringRef, StringRef> &P) {
-  OS << "(" << P.first << ", " << P.second << ")";
-  return OS;
-}
-
-}
-
 // Check that we can't accidentally assign a temporary std::string to a
 // StringRef. (Unfortunately we can't make use of the same thing with
 // constructors.)

>From 89c556cfda4de346774c9fe547da6af9121dfa97 Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek <Krzysztof.Parzyszek at amd.com>
Date: Wed, 21 Aug 2024 13:48:50 -0500
Subject: [PATCH 040/116] [flang][OpenMP] Follow-up to build-breakage fix
 (#102028)

Adjust the handling of a few of the new clauses.
---
 flang/lib/Lower/OpenMP/Clauses.cpp | 30 ++++++------------------------
 flang/lib/Lower/OpenMP/Clauses.h   |  2 +-
 2 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/Clauses.cpp b/flang/lib/Lower/OpenMP/Clauses.cpp
index 75054204bb19db..efac7757ca5855 100644
--- a/flang/lib/Lower/OpenMP/Clauses.cpp
+++ b/flang/lib/Lower/OpenMP/Clauses.cpp
@@ -218,9 +218,9 @@ MAKE_EMPTY_CLASS(Full, Full);
 MAKE_EMPTY_CLASS(Inbranch, Inbranch);
 MAKE_EMPTY_CLASS(Mergeable, Mergeable);
 MAKE_EMPTY_CLASS(Nogroup, Nogroup);
-// MAKE_EMPTY_CLASS(NoOpenmp, );         // missing-in-parser
-// MAKE_EMPTY_CLASS(NoOpenmpRoutines, ); // missing-in-parser
-// MAKE_EMPTY_CLASS(NoParallelism, );    // missing-in-parser
+MAKE_EMPTY_CLASS(NoOpenmp, NoOpenmp);
+MAKE_EMPTY_CLASS(NoOpenmpRoutines, NoOpenmpRoutines);
+MAKE_EMPTY_CLASS(NoParallelism, NoParallelism);
 MAKE_EMPTY_CLASS(Notinbranch, Notinbranch);
 MAKE_EMPTY_CLASS(Nowait, Nowait);
 MAKE_EMPTY_CLASS(OmpxAttribute, OmpxAttribute);
@@ -321,7 +321,6 @@ ReductionOperator makeReductionOperator(const parser::OmpReductionOperator &inp,
 // --------------------------------------------------------------------
 // Actual clauses. Each T (where tomp::T exists in ClauseT) has its "make".
 
-// Absent: missing-in-parser
 Absent make(const parser::OmpClause::Absent &inp,
             semantics::SemanticsContext &semaCtx) {
   llvm_unreachable("Unimplemented: absent");
@@ -450,7 +449,6 @@ Collapse make(const parser::OmpClause::Collapse &inp,
 
 // Compare: empty
 
-// Contains: missing-in-parser
 Contains make(const parser::OmpClause::Contains &inp,
               semantics::SemanticsContext &semaCtx) {
   llvm_unreachable("Unimplemented: contains");
@@ -714,7 +712,6 @@ Hint make(const parser::OmpClause::Hint &inp,
   return Hint{/*HintExpr=*/makeExpr(inp.v, semaCtx)};
 }
 
-// Holds: missing-in-parser
 Holds make(const parser::OmpClause::Holds &inp,
            semantics::SemanticsContext &semaCtx) {
   llvm_unreachable("Unimplemented: holds");
@@ -897,24 +894,9 @@ Nontemporal make(const parser::OmpClause::Nontemporal &inp,
   return Nontemporal{/*List=*/makeList(inp.v, makeObjectFn(semaCtx))};
 }
 
-// NoOpenmp: missing-in-parser
-NoOpenmp make(const parser::OmpClause::NoOpenmp &inp,
-              semantics::SemanticsContext &semaCtx) {
-  llvm_unreachable("Unimplemented: no_openmp");
-}
-
-// NoOpenmpRoutines: missing-in-parser
-NoOpenmpRoutines make(const parser::OmpClause::NoOpenmpRoutines &inp,
-                      semantics::SemanticsContext &semaCtx) {
-  llvm_unreachable("Unimplemented: no_openmp_routines");
-}
-
-// NoParallelism: missing-in-parser
-NoParallelism make(const parser::OmpClause::NoParallelism &inp,
-                   semantics::SemanticsContext &semaCtx) {
-  llvm_unreachable("Unimplemented: no_parallelism");
-}
-
+// NoOpenmp: empty
+// NoOpenmpRoutines: empty
+// NoParallelism: empty
 // Notinbranch: empty
 
 Novariants make(const parser::OmpClause::Novariants &inp,
diff --git a/flang/lib/Lower/OpenMP/Clauses.h b/flang/lib/Lower/OpenMP/Clauses.h
index c7874935d8605a..51bf0eab0f8d07 100644
--- a/flang/lib/Lower/OpenMP/Clauses.h
+++ b/flang/lib/Lower/OpenMP/Clauses.h
@@ -175,8 +175,8 @@ using At = tomp::clause::AtT<TypeTy, IdTy, ExprTy>;
 using Bind = tomp::clause::BindT<TypeTy, IdTy, ExprTy>;
 using Capture = tomp::clause::CaptureT<TypeTy, IdTy, ExprTy>;
 using Collapse = tomp::clause::CollapseT<TypeTy, IdTy, ExprTy>;
-using Contains = tomp::clause::ContainsT<TypeTy, IdTy, ExprTy>;
 using Compare = tomp::clause::CompareT<TypeTy, IdTy, ExprTy>;
+using Contains = tomp::clause::ContainsT<TypeTy, IdTy, ExprTy>;
 using Copyin = tomp::clause::CopyinT<TypeTy, IdTy, ExprTy>;
 using Copyprivate = tomp::clause::CopyprivateT<TypeTy, IdTy, ExprTy>;
 using Defaultmap = tomp::clause::DefaultmapT<TypeTy, IdTy, ExprTy>;

>From 6ec3130a38e6982a61e7fa74bd5223c95c0bb918 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee <kyulee at meta.com>
Date: Wed, 21 Aug 2024 12:21:43 -0700
Subject: [PATCH 041/116] [CGData] Fix tests for sed without using options
 (#105546)

This fixes a build issue for AIX --
https://github.com/llvm/llvm-project/pull/101461.
---
 llvm/test/tools/llvm-cgdata/merge-archive.test | 8 ++++----
 llvm/test/tools/llvm-cgdata/merge-concat.test  | 6 +++---
 llvm/test/tools/llvm-cgdata/merge-double.test  | 8 ++++----
 llvm/test/tools/llvm-cgdata/merge-single.test  | 4 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/llvm/test/tools/llvm-cgdata/merge-archive.test b/llvm/test/tools/llvm-cgdata/merge-archive.test
index d70ac7c3c938d8..03eb9106b54562 100644
--- a/llvm/test/tools/llvm-cgdata/merge-archive.test
+++ b/llvm/test/tools/llvm-cgdata/merge-archive.test
@@ -8,13 +8,13 @@ RUN: split-file %s %t
 # Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
 RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-1-bytes.txt
-RUN: sed -ie "s/<RAW_1_BYTES>/$(cat %t/raw-1-bytes.txt)/g" %t/merge-1.ll
+RUN: sed "s/<RAW_1_BYTES>/$(cat %t/raw-1-bytes.txt)/g" %t/merge-1-template.ll > %t/merge-1.ll
 RUN: llc -filetype=obj -mtriple arm64-apple-darwin %t/merge-1.ll -o %t/merge-1.o
 
 # Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
 RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-2-bytes.txt
-RUN: sed -ie "s/<RAW_2_BYTES>/$(cat %t/raw-2-bytes.txt)/g" %t/merge-2.ll
+RUN: sed "s/<RAW_2_BYTES>/$(cat %t/raw-2-bytes.txt)/g" %t/merge-2-template.ll > %t/merge-2.ll
 RUN: llc -filetype=obj -mtriple arm64-apple-darwin %t/merge-2.ll -o %t/merge-2.o
 
 # Make an archive from two object files
@@ -66,7 +66,7 @@ TREE-NEXT: ...
   SuccessorIds:    [  ]
 ...
 
-;--- merge-1.ll
+;--- merge-1-template.ll
 @.data = private unnamed_addr constant [72 x i8] c"<RAW_1_BYTES>", section "__DATA,__llvm_outline"
 
 
@@ -86,5 +86,5 @@ TREE-NEXT: ...
   SuccessorIds:    [  ]
 ...
 
-;--- merge-2.ll
+;--- merge-2-template.ll
 @.data = private unnamed_addr constant [72 x i8] c"<RAW_2_BYTES>", section "__DATA,__llvm_outline"
diff --git a/llvm/test/tools/llvm-cgdata/merge-concat.test b/llvm/test/tools/llvm-cgdata/merge-concat.test
index cc39c673cf9a5e..ac0e7a6e29e878 100644
--- a/llvm/test/tools/llvm-cgdata/merge-concat.test
+++ b/llvm/test/tools/llvm-cgdata/merge-concat.test
@@ -9,10 +9,10 @@ RUN: split-file %s %t
 # Concatenate them in merge-concat.ll
 RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-1-bytes.txt
-RUN: sed -ie "s/<RAW_1_BYTES>/$(cat %t/raw-1-bytes.txt)/g" %t/merge-concat.ll
+RUN: sed "s/<RAW_1_BYTES>/$(cat %t/raw-1-bytes.txt)/g" %t/merge-concat-template.ll > %t/merge-concat-template-2.ll
 RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-2-bytes.txt
-RUN: sed -ie "s/<RAW_2_BYTES>/$(cat %t/raw-2-bytes.txt)/g" %t/merge-concat.ll
+RUN: sed "s/<RAW_2_BYTES>/$(cat %t/raw-2-bytes.txt)/g" %t/merge-concat-template-2.ll > %t/merge-concat.ll
 
 RUN: llc -filetype=obj -mtriple arm64-apple-darwin %t/merge-concat.ll -o %t/merge-concat.o
 RUN: llvm-cgdata --merge %t/merge-concat.o -o %t/merge-concat.cgdata
@@ -76,7 +76,7 @@ TREE-NEXT: ...
   SuccessorIds:    [  ]
 ...
 
-;--- merge-concat.ll
+;--- merge-concat-template.ll
 
 ; In an linked executable (as opposed to an object file), cgdata in __llvm_outline might be concatenated. Although this is not a typical workflow, we simply support this case to parse cgdata that is concatenated. In other words, the following two trees are encoded back-to-back in a binary format.
 @.data1 = private unnamed_addr constant [72 x i8] c"<RAW_1_BYTES>", section "__DATA,__llvm_outline"
diff --git a/llvm/test/tools/llvm-cgdata/merge-double.test b/llvm/test/tools/llvm-cgdata/merge-double.test
index 950a88c66f7bb4..1ae8064291019e 100644
--- a/llvm/test/tools/llvm-cgdata/merge-double.test
+++ b/llvm/test/tools/llvm-cgdata/merge-double.test
@@ -8,13 +8,13 @@ RUN: split-file %s %t
 # Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
 RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-1-bytes.txt
-RUN: sed -ie "s/<RAW_1_BYTES>/$(cat %t/raw-1-bytes.txt)/g" %t/merge-1.ll
+RUN: sed "s/<RAW_1_BYTES>/$(cat %t/raw-1-bytes.txt)/g" %t/merge-1-template.ll > %t/merge-1.ll
 RUN: llc -filetype=obj -mtriple arm64-apple-darwin %t/merge-1.ll -o %t/merge-1.o
 
 # Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
 RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-2-bytes.txt
-RUN: sed -ie "s/<RAW_2_BYTES>/$(cat %t/raw-2-bytes.txt)/g" %t/merge-2.ll
+RUN: sed "s/<RAW_2_BYTES>/$(cat %t/raw-2-bytes.txt)/g" %t/merge-2-template.ll > %t/merge-2.ll
 RUN: llc -filetype=obj -mtriple arm64-apple-darwin %t/merge-2.ll -o %t/merge-2.o
 
 # Merge two object files into the codegen data file.
@@ -64,7 +64,7 @@ TREE-NEXT: ...
   SuccessorIds:    [  ]
 ...
 
-;--- merge-1.ll
+;--- merge-1-template.ll
 @.data = private unnamed_addr constant [72 x i8] c"<RAW_1_BYTES>", section "__DATA,__llvm_outline"
 
 ;--- raw-2.cgtext
@@ -83,5 +83,5 @@ TREE-NEXT: ...
   SuccessorIds:    [  ]
 ...
 
-;--- merge-2.ll
+;--- merge-2-template.ll
 @.data = private unnamed_addr constant [72 x i8] c"<RAW_2_BYTES>", section "__DATA,__llvm_outline"
diff --git a/llvm/test/tools/llvm-cgdata/merge-single.test b/llvm/test/tools/llvm-cgdata/merge-single.test
index 783c7b979f541e..47e3cb3f4f50fb 100644
--- a/llvm/test/tools/llvm-cgdata/merge-single.test
+++ b/llvm/test/tools/llvm-cgdata/merge-single.test
@@ -15,7 +15,7 @@ RUN: llvm-cgdata --show %t/merge-empty.cgdata | count 0
 RUN: llvm-cgdata --convert --format binary %t/raw-single.cgtext -o %t/raw-single.cgdata
 RUN: od -t x1 -j 24 -An %t/raw-single.cgdata | tr -d '\n\r\t' | sed 's/[ ]*$//' | sed 's/[ ][ ]*/\\\\/g' > %t/raw-single-bytes.txt
 
-RUN: sed -ie "s/<RAW_1_BYTES>/$(cat %t/raw-single-bytes.txt)/g" %t/merge-single.ll
+RUN: sed "s/<RAW_1_BYTES>/$(cat %t/raw-single-bytes.txt)/g" %t/merge-single-template.ll > %t/merge-single.ll
 RUN: llc -filetype=obj -mtriple arm64-apple-darwin %t/merge-single.ll -o %t/merge-single.o
 
 # Merge an object file having cgdata (__llvm_outline)
@@ -45,5 +45,5 @@ CHECK-NEXT:  Depth: 2
   SuccessorIds:    [  ]
 ...
 
-;--- merge-single.ll
+;--- merge-single-template.ll
 @.data = private unnamed_addr constant [72 x i8] c"<RAW_1_BYTES>", section "__DATA,__llvm_outline"

>From e31252bf54dedadfe78b36d07ea6084156faa38a Mon Sep 17 00:00:00 2001
From: Alexey Bataev <a.bataev at outlook.com>
Date: Wed, 21 Aug 2024 11:47:00 -0700
Subject: [PATCH 042/116] [SLP]Fix PR105120: fix the order of phi nodes
 vectorization.

The operands of the phi nodes should be vectorized in the same order, in
which they were created, otherwise the compiler may crash when trying
to correctly build dependency for nodes with non-schedulable
instructions for gather/buildvector nodes.

Fixes https://github.com/llvm/llvm-project/issues/105120
---
 .../Transforms/Vectorize/SLPVectorizer.cpp    | 24 ++++++++++--
 .../X86/phi-nodes-as-operand-reorder.ll       | 38 +++++++++++++++++++
 2 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 llvm/test/Transforms/SLPVectorizer/X86/phi-nodes-as-operand-reorder.ll

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index dee6d688b1b903..848e0de20e7b6c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -7227,6 +7227,22 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
 
   unsigned ShuffleOrOp = S.isAltShuffle() ?
                 (unsigned) Instruction::ShuffleVector : S.getOpcode();
+  auto CreateOperandNodes = [&](TreeEntry *TE, const auto &Operands) {
+    // Postpone PHI nodes creation
+    SmallVector<unsigned> PHIOps;
+    for (unsigned I : seq<unsigned>(Operands.size())) {
+      ArrayRef<Value *> Op = Operands[I];
+      if (Op.empty())
+        continue;
+      InstructionsState S = getSameOpcode(Op, *TLI);
+      if (S.getOpcode() != Instruction::PHI || S.isAltShuffle())
+        buildTree_rec(Op, Depth + 1, {TE, I});
+      else
+        PHIOps.push_back(I);
+    }
+    for (unsigned I : PHIOps)
+      buildTree_rec(Operands[I], Depth + 1, {TE, I});
+  };
   switch (ShuffleOrOp) {
     case Instruction::PHI: {
       auto *PH = cast<PHINode>(VL0);
@@ -7238,10 +7254,12 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
       // Keeps the reordered operands to avoid code duplication.
       PHIHandler Handler(*DT, PH, VL);
       Handler.buildOperands();
-      for (unsigned I : seq<unsigned>(0, PH->getNumOperands()))
+      for (unsigned I : seq<unsigned>(PH->getNumOperands()))
         TE->setOperand(I, Handler.getOperands(I));
-      for (unsigned I : seq<unsigned>(0, PH->getNumOperands()))
-        buildTree_rec(Handler.getOperands(I), Depth + 1, {TE, I});
+      SmallVector<ArrayRef<Value *>> Operands(PH->getNumOperands());
+      for (unsigned I : seq<unsigned>(PH->getNumOperands()))
+        Operands[I] = Handler.getOperands(I);
+      CreateOperandNodes(TE, Operands);
       return;
     }
     case Instruction::ExtractValue:
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/phi-nodes-as-operand-reorder.ll b/llvm/test/Transforms/SLPVectorizer/X86/phi-nodes-as-operand-reorder.ll
new file mode 100644
index 00000000000000..51ce970bf06bc8
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/X86/phi-nodes-as-operand-reorder.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S --passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-99999 < %s | FileCheck %s
+
+define void @test() {
+; CHECK-LABEL: define void @test() {
+; CHECK-NEXT:  [[BB:.*]]:
+; CHECK-NEXT:    br label %[[BB1:.*]]
+; CHECK:       [[BB1]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = phi <2 x i32> [ zeroinitializer, %[[BB]] ], [ [[TMP3:%.*]], %[[BB3:.*]] ]
+; CHECK-NEXT:    br i1 false, label %[[BB6:.*]], label %[[BB3]]
+; CHECK:       [[BB3]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> <i32 0, i32 poison>, <2 x i32> <i32 2, i32 1>
+; CHECK-NEXT:    [[TMP2:%.*]] = add <2 x i32> zeroinitializer, [[TMP1]]
+; CHECK-NEXT:    [[TMP3]] = add <2 x i32> zeroinitializer, [[TMP1]]
+; CHECK-NEXT:    br i1 false, label %[[BB6]], label %[[BB1]]
+; CHECK:       [[BB6]]:
+; CHECK-NEXT:    [[TMP4:%.*]] = phi <2 x i32> [ [[TMP0]], %[[BB1]] ], [ [[TMP2]], %[[BB3]] ]
+; CHECK-NEXT:    ret void
+;
+bb:
+  br label %bb1
+
+bb1:
+  %phi = phi i32 [ 0, %bb ], [ %add5, %bb3 ]
+  %phi2 = phi i32 [ 0, %bb ], [ %add, %bb3 ]
+  br i1 false, label %bb6, label %bb3
+
+bb3:
+  %add = add i32 0, 0
+  %add4 = add i32 0, 0
+  %add5 = add i32 %phi, 0
+  br i1 false, label %bb6, label %bb1
+
+bb6:
+  %phi7 = phi i32 [ %phi2, %bb1 ], [ %add4, %bb3 ]
+  %phi8 = phi i32 [ %phi, %bb1 ], [ %add5, %bb3 ]
+  ret void
+}

>From b765fdd997be9ff0afb6de87077cd53d5f3d349c Mon Sep 17 00:00:00 2001
From: Alexey Bataev <a.bataev at outlook.com>
Date: Wed, 21 Aug 2024 15:23:47 -0400
Subject: [PATCH 043/116] [SLP]Try to keep scalars, used in phi nodes, if phi
 nodes from same block are vectorized.

Before doing the vectorization of the PHI nodes, the compiler sorts them
by the opcodes of the operands. If the scalar is replaced during the
vectorization by extractelement, it breaks this sorting and prevent some
further vectorization attempts. Patch tries to improve this by doing
extra analysis of the scalars and tries to keep them, if it is found that
this scalar is used in other (external) PHI node in the same block.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/103923
---
 .../Transforms/Vectorize/SLPVectorizer.cpp    | 27 +++++++++-
 llvm/test/Transforms/SLPVectorizer/X86/phi.ll | 53 +++++++++----------
 2 files changed, 51 insertions(+), 29 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 848e0de20e7b6c..8f70a43465b8ac 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -10930,8 +10930,31 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
       if (CanBeUsedAsScalar) {
         InstructionCost ScalarCost = TTI->getInstructionCost(Inst, CostKind);
         bool KeepScalar = ScalarCost <= ExtraCost;
-        if (KeepScalar && ScalarCost != TTI::TCC_Free &&
-            ExtraCost - ScalarCost <= TTI::TCC_Basic) {
+        // Try to keep original scalar if the user is the phi node from the same
+        // block as the root phis, currently vectorized. It allows to keep
+        // better ordering info of PHIs, being vectorized currently.
+        bool IsProfitablePHIUser =
+            (KeepScalar || (ScalarCost - ExtraCost <= TTI::TCC_Basic &&
+                            VectorizableTree.front()->Scalars.size() > 2)) &&
+            VectorizableTree.front()->getOpcode() == Instruction::PHI &&
+            !Inst->hasNUsesOrMore(UsesLimit) &&
+            none_of(Inst->users(),
+                    [&](User *U) {
+                      auto *PHIUser = dyn_cast<PHINode>(U);
+                      return (!PHIUser ||
+                              PHIUser->getParent() !=
+                                  cast<Instruction>(
+                                      VectorizableTree.front()->getMainOp())
+                                      ->getParent()) &&
+                             !getTreeEntry(U);
+                    }) &&
+            count_if(Entry->Scalars, [&](Value *V) {
+              return ValueToExtUses->contains(V);
+            }) <= 2;
+        if (IsProfitablePHIUser) {
+          KeepScalar = true;
+        } else if (KeepScalar && ScalarCost != TTI::TCC_Free &&
+                   ExtraCost - ScalarCost <= TTI::TCC_Basic) {
           unsigned ScalarUsesCount = count_if(Entry->Scalars, [&](Value *V) {
             return ValueToExtUses->contains(V);
           });
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/phi.ll b/llvm/test/Transforms/SLPVectorizer/X86/phi.ll
index 495a503311ab9e..96151e0bd6c418 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/phi.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/phi.ll
@@ -136,42 +136,41 @@ for.end:                                          ; preds = %for.body
 define float @foo3(ptr nocapture readonly %A) #0 {
 ; CHECK-LABEL: @foo3(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[A:%.*]], align 4
-; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 1
+; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[A:%.*]], i64 1
+; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 4
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, ptr [[ARRAYIDX1]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <2 x i32> <i32 poison, i32 0>
-; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP0]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
-; CHECK-NEXT:    [[R_052:%.*]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.*]], [[FOR_BODY]] ]
-; CHECK-NEXT:    [[TMP4:%.*]] = phi <4 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP13:%.*]], [[FOR_BODY]] ]
-; CHECK-NEXT:    [[TMP5:%.*]] = phi <2 x float> [ [[TMP3]], [[ENTRY]] ], [ [[TMP9:%.*]], [[FOR_BODY]] ]
-; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
-; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[TMP6]], 7.000000e+00
+; CHECK-NEXT:    [[R_052:%.*]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD6:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = phi <4 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP12:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = phi <2 x float> [ [[TMP0]], [[ENTRY]] ], [ [[TMP8:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
+; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[TMP5]], 7.000000e+00
 ; CHECK-NEXT:    [[ADD6]] = fadd float [[R_052]], [[MUL]]
-; CHECK-NEXT:    [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2
-; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP7]]
-; CHECK-NEXT:    [[TMP8:%.*]] = load float, ptr [[ARRAYIDX14]], align 4
+; CHECK-NEXT:    [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 2
+; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP6]]
+; CHECK-NEXT:    [[TMP7:%.*]] = load float, ptr [[ARRAYIDX14]], align 4
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
 ; CHECK-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
-; CHECK-NEXT:    [[TMP9]] = load <2 x float>, ptr [[ARRAYIDX19]], align 4
-; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP9]], <4 x i32> <i32 1, i32 poison, i32 2, i32 3>
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> [[TMP10]], float [[TMP8]], i32 1
-; CHECK-NEXT:    [[TMP12:%.*]] = fmul <4 x float> [[TMP11]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
-; CHECK-NEXT:    [[TMP13]] = fadd <4 x float> [[TMP4]], [[TMP12]]
-; CHECK-NEXT:    [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
-; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP14]], 121
+; CHECK-NEXT:    [[TMP8]] = load <2 x float>, ptr [[ARRAYIDX19]], align 4
+; CHECK-NEXT:    [[TMP9:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP8]], <4 x i32> <i32 1, i32 poison, i32 2, i32 3>
+; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x float> [[TMP9]], float [[TMP7]], i32 1
+; CHECK-NEXT:    [[TMP11:%.*]] = fmul <4 x float> [[TMP10]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
+; CHECK-NEXT:    [[TMP12]] = fadd <4 x float> [[TMP3]], [[TMP11]]
+; CHECK-NEXT:    [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP13]], 121
 ; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
 ; CHECK:       for.end:
-; CHECK-NEXT:    [[TMP15:%.*]] = extractelement <4 x float> [[TMP13]], i32 0
-; CHECK-NEXT:    [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP15]]
-; CHECK-NEXT:    [[TMP16:%.*]] = extractelement <4 x float> [[TMP13]], i32 1
-; CHECK-NEXT:    [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP16]]
-; CHECK-NEXT:    [[TMP17:%.*]] = extractelement <4 x float> [[TMP13]], i32 2
-; CHECK-NEXT:    [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP17]]
-; CHECK-NEXT:    [[TMP18:%.*]] = extractelement <4 x float> [[TMP13]], i32 3
-; CHECK-NEXT:    [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP18]]
+; CHECK-NEXT:    [[TMP14:%.*]] = extractelement <4 x float> [[TMP12]], i32 0
+; CHECK-NEXT:    [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP14]]
+; CHECK-NEXT:    [[TMP15:%.*]] = extractelement <4 x float> [[TMP12]], i32 1
+; CHECK-NEXT:    [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP15]]
+; CHECK-NEXT:    [[TMP16:%.*]] = extractelement <4 x float> [[TMP12]], i32 2
+; CHECK-NEXT:    [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP16]]
+; CHECK-NEXT:    [[TMP17:%.*]] = extractelement <4 x float> [[TMP12]], i32 3
+; CHECK-NEXT:    [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP17]]
 ; CHECK-NEXT:    ret float [[ADD31]]
 ;
 entry:

>From 4b35624ce0ac5b487d39880e75b5d85f4d49eec0 Mon Sep 17 00:00:00 2001
From: Sander de Smalen <sander.desmalen at arm.com>
Date: Wed, 21 Aug 2024 20:34:03 +0100
Subject: [PATCH 044/116] [AArch64] Add SVE lowering of fixed-length UABD/SABD
 (#104991)

---
 .../Target/AArch64/AArch64ISelLowering.cpp    |   2 +
 .../AArch64/sve-fixed-length-int-abd.ll       | 183 +++++++++++
 ...sve-streaming-mode-fixed-length-int-abd.ll | 292 ++++++++++++++++++
 3 files changed, 477 insertions(+)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-int-abd.ll
 create mode 100644 llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-abd.ll

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index dbe9413f05d013..e98b430e62389b 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2055,6 +2055,8 @@ void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
   bool PreferSVE = !PreferNEON && Subtarget->isSVEAvailable();
 
   // Lower fixed length vector operations to scalable equivalents.
+  setOperationAction(ISD::ABDS, VT, Default);
+  setOperationAction(ISD::ABDU, VT, Default);
   setOperationAction(ISD::ABS, VT, Default);
   setOperationAction(ISD::ADD, VT, Default);
   setOperationAction(ISD::AND, VT, Default);
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-int-abd.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-int-abd.ll
new file mode 100644
index 00000000000000..08a974fa2d9f40
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-int-abd.ll
@@ -0,0 +1,183 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -aarch64-sve-vector-bits-min=256  < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_256
+; RUN: llc -aarch64-sve-vector-bits-min=512  < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
+; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s | FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
+
+target triple = "aarch64-unknown-linux-gnu"
+
+; Don't use SVE for 128-bit vectors.
+define void @sabd_v16i8_v16i16(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: sabd_v16i8_v16i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    ldr q1, [x1]
+; CHECK-NEXT:    sabd v0.16b, v0.16b, v1.16b
+; CHECK-NEXT:    str q0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <16 x i8>, ptr %a
+  %b.ld = load <16 x i8>, ptr %b
+  %a.sext = sext <16 x i8> %a.ld to <16 x i16>
+  %b.sext = sext <16 x i8> %b.ld to <16 x i16>
+  %sub = sub <16 x i16> %a.sext, %b.sext
+  %abs = call <16 x i16> @llvm.abs.v16i16(<16 x i16> %sub, i1 true)
+  %trunc = trunc <16 x i16> %abs to <16 x i8>
+  store <16 x i8> %trunc, ptr %a
+  ret void
+}
+
+; Don't use SVE for 128-bit vectors.
+define void @sabd_v16i8_v16i32(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: sabd_v16i8_v16i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    ldr q1, [x1]
+; CHECK-NEXT:    sabd v0.16b, v0.16b, v1.16b
+; CHECK-NEXT:    str q0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <16 x i8>, ptr %a
+  %b.ld = load <16 x i8>, ptr %b
+  %a.sext = sext <16 x i8> %a.ld to <16 x i32>
+  %b.sext = sext <16 x i8> %b.ld to <16 x i32>
+  %sub = sub <16 x i32> %a.sext, %b.sext
+  %abs = call <16 x i32> @llvm.abs.v16i32(<16 x i32> %sub, i1 true)
+  %trunc = trunc <16 x i32> %abs to <16 x i8>
+  store <16 x i8> %trunc, ptr %a
+  ret void
+}
+
+; Don't use SVE for 128-bit vectors.
+define void @sabd_v16i8_v16i64(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: sabd_v16i8_v16i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    ldr q1, [x1]
+; CHECK-NEXT:    sabd v0.16b, v0.16b, v1.16b
+; CHECK-NEXT:    str q0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <16 x i8>, ptr %a
+  %b.ld = load <16 x i8>, ptr %b
+  %a.sext = sext <16 x i8> %a.ld to <16 x i64>
+  %b.sext = sext <16 x i8> %b.ld to <16 x i64>
+  %sub = sub <16 x i64> %a.sext, %b.sext
+  %abs = call <16 x i64> @llvm.abs.v16i64(<16 x i64> %sub, i1 true)
+  %trunc = trunc <16 x i64> %abs to <16 x i8>
+  store <16 x i8> %trunc, ptr %a
+  ret void
+}
+
+define void @sabd_v32i8_v32i16(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: sabd_v32i8_v32i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b, vl32
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.b }, p0/z, [x1]
+; CHECK-NEXT:    sabd z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT:    st1b { z0.b }, p0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <32 x i8>, ptr %a
+  %b.ld = load <32 x i8>, ptr %b
+  %a.sext = sext <32 x i8> %a.ld to <32 x i16>
+  %b.sext = sext <32 x i8> %b.ld to <32 x i16>
+  %sub = sub <32 x i16> %a.sext, %b.sext
+  %abs = call <32 x i16> @llvm.abs.v32i16(<32 x i16> %sub, i1 true)
+  %trunc = trunc <32 x i16> %abs to <32 x i8>
+  store <32 x i8> %trunc, ptr %a
+  ret void
+}
+
+define void @uabd_v32i8_v32i16(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: uabd_v32i8_v32i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b, vl32
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.b }, p0/z, [x1]
+; CHECK-NEXT:    uabd z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT:    st1b { z0.b }, p0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <32 x i8>, ptr %a
+  %b.ld = load <32 x i8>, ptr %b
+  %a.zext = zext <32 x i8> %a.ld to <32 x i16>
+  %b.zext = zext <32 x i8> %b.ld to <32 x i16>
+  %sub = sub <32 x i16> %a.zext, %b.zext
+  %abs = call <32 x i16> @llvm.abs.v32i16(<32 x i16> %sub, i1 true)
+  %trunc = trunc <32 x i16> %abs to <32 x i8>
+  store <32 x i8> %trunc, ptr %a
+  ret void
+}
+
+define void @sabd_v32i8_v32i32(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: sabd_v32i8_v32i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b, vl32
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.b }, p0/z, [x1]
+; CHECK-NEXT:    sabd z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT:    st1b { z0.b }, p0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <32 x i8>, ptr %a
+  %b.ld = load <32 x i8>, ptr %b
+  %a.sext = sext <32 x i8> %a.ld to <32 x i32>
+  %b.sext = sext <32 x i8> %b.ld to <32 x i32>
+  %sub = sub <32 x i32> %a.sext, %b.sext
+  %abs = call <32 x i32> @llvm.abs.v32i32(<32 x i32> %sub, i1 true)
+  %trunc = trunc <32 x i32> %abs to <32 x i8>
+  store <32 x i8> %trunc, ptr %a
+  ret void
+}
+
+define void @sabd_v32i8_v32i64(ptr %a, ptr %b) #0 {
+; CHECK-LABEL: sabd_v32i8_v32i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b, vl32
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0]
+; CHECK-NEXT:    ld1b { z1.b }, p0/z, [x1]
+; CHECK-NEXT:    sabd z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT:    st1b { z0.b }, p0, [x0]
+; CHECK-NEXT:    ret
+  %a.ld = load <32 x i8>, ptr %a
+  %b.ld = load <32 x i8>, ptr %b
+  %a.sext = sext <32 x i8> %a.ld to <32 x i64>
+  %b.sext = sext <32 x i8> %b.ld to <32 x i64>
+  %sub = sub <32 x i64> %a.sext, %b.sext
+  %abs = call <32 x i64> @llvm.abs.v32i64(<32 x i64> %sub, i1 true)
+  %trunc = trunc <32 x i64> %abs to <32 x i8>
+  store <32 x i8> %trunc, ptr %a
+  ret void
+}
+
+define void @sabd_v64i8_v64i64(ptr %a, ptr %b) #0 {
+; VBITS_GE_256-LABEL: sabd_v64i8_v64i64:
+; VBITS_GE_256:       // %bb.0:
+; VBITS_GE_256-NEXT:    ptrue p0.b, vl32
+; VBITS_GE_256-NEXT:    mov w8, #32 // =0x20
+; VBITS_GE_256-NEXT:    ld1b { z0.b }, p0/z, [x0, x8]
+; VBITS_GE_256-NEXT:    ld1b { z1.b }, p0/z, [x1, x8]
+; VBITS_GE_256-NEXT:    ld1b { z2.b }, p0/z, [x0]
+; VBITS_GE_256-NEXT:    ld1b { z3.b }, p0/z, [x1]
+; VBITS_GE_256-NEXT:    sabd z0.b, p0/m, z0.b, z1.b
+; VBITS_GE_256-NEXT:    movprfx z1, z2
+; VBITS_GE_256-NEXT:    sabd z1.b, p0/m, z1.b, z3.b
+; VBITS_GE_256-NEXT:    st1b { z0.b }, p0, [x0, x8]
+; VBITS_GE_256-NEXT:    st1b { z1.b }, p0, [x0]
+; VBITS_GE_256-NEXT:    ret
+;
+; VBITS_GE_512-LABEL: sabd_v64i8_v64i64:
+; VBITS_GE_512:       // %bb.0:
+; VBITS_GE_512-NEXT:    ptrue p0.b, vl64
+; VBITS_GE_512-NEXT:    ld1b { z0.b }, p0/z, [x0]
+; VBITS_GE_512-NEXT:    ld1b { z1.b }, p0/z, [x1]
+; VBITS_GE_512-NEXT:    sabd z0.b, p0/m, z0.b, z1.b
+; VBITS_GE_512-NEXT:    st1b { z0.b }, p0, [x0]
+; VBITS_GE_512-NEXT:    ret
+  %a.ld = load <64 x i8>, ptr %a
+  %b.ld = load <64 x i8>, ptr %b
+  %a.sext = sext <64 x i8> %a.ld to <64 x i64>
+  %b.sext = sext <64 x i8> %b.ld to <64 x i64>
+  %sub = sub <64 x i64> %a.sext, %b.sext
+  %abs = call <64 x i64> @llvm.abs.v64i64(<64 x i64> %sub, i1 true)
+  %trunc = trunc <64 x i64> %abs to <64 x i8>
+  store <64 x i8> %trunc, ptr %a
+  ret void
+}
+
+attributes #0 = { "target-features"="+neon,+sve" }
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-abd.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-abd.ll
new file mode 100644
index 00000000000000..2dd64bc7df189a
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-abd.ll
@@ -0,0 +1,292 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mattr=+sve -force-streaming-compatible  < %s | FileCheck %s
+; RUN: llc -mattr=+sme -force-streaming  < %s | FileCheck %s
+; RUN: llc -force-streaming-compatible < %s | FileCheck %s --check-prefix=NONEON-NOSVE
+
+target triple = "aarch64-unknown-linux-gnu"
+
+define void @uabd_v16i8_v16i16(ptr %a, ptr %b) {
+; CHECK-LABEL: uabd_v16i8_v16i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b, vl16
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    ldr q1, [x1]
+; CHECK-NEXT:    uabd z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT:    str q0, [x0]
+; CHECK-NEXT:    ret
+;
+; NONEON-NOSVE-LABEL: uabd_v16i8_v16i16:
+; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    ldr q0, [x1]
+; NONEON-NOSVE-NEXT:    ldr q1, [x0]
+; NONEON-NOSVE-NEXT:    stp q1, q0, [sp, #-48]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 48
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #31]
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #15]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #14]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #47]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #30]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #13]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #46]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #29]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #12]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #45]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #28]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #11]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #44]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #27]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #10]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #43]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #26]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #9]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #42]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #25]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #8]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #41]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #24]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #7]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #40]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #23]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #6]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #39]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #22]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #5]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #38]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #21]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #4]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #37]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #20]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #3]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #36]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #19]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #2]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #35]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #18]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp, #1]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #34]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #17]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrb w9, [sp]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #33]
+; NONEON-NOSVE-NEXT:    ldrb w8, [sp, #16]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, hi
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #32]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #32]
+; NONEON-NOSVE-NEXT:    str q0, [x0]
+; NONEON-NOSVE-NEXT:    add sp, sp, #48
+; NONEON-NOSVE-NEXT:    ret
+  %a.ld = load <16 x i8>, ptr %a
+  %b.ld = load <16 x i8>, ptr %b
+  %a.sext = zext <16 x i8> %a.ld to <16 x i16>
+  %b.sext = zext <16 x i8> %b.ld to <16 x i16>
+  %sub = sub <16 x i16> %a.sext, %b.sext
+  %abs = call <16 x i16> @llvm.abs.v16i16(<16 x i16> %sub, i1 true)
+  %trunc = trunc <16 x i16> %abs to <16 x i8>
+  store <16 x i8> %trunc, ptr %a
+  ret void
+}
+
+define void @sabd_v16i8_v16i16(ptr %a, ptr %b) {
+; CHECK-LABEL: sabd_v16i8_v16i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b, vl16
+; CHECK-NEXT:    ldr q0, [x0]
+; CHECK-NEXT:    ldr q1, [x1]
+; CHECK-NEXT:    sabd z0.b, p0/m, z0.b, z1.b
+; CHECK-NEXT:    str q0, [x0]
+; CHECK-NEXT:    ret
+;
+; NONEON-NOSVE-LABEL: sabd_v16i8_v16i16:
+; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    ldr q0, [x1]
+; NONEON-NOSVE-NEXT:    ldr q1, [x0]
+; NONEON-NOSVE-NEXT:    stp q1, q0, [sp, #-48]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 48
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #31]
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #15]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #14]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #47]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #30]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #13]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #46]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #29]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #12]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #45]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #28]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #11]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #44]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #27]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #10]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #43]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #26]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #9]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #42]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #25]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #8]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #41]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #24]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #7]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #40]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #23]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #6]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #39]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #22]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #5]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #38]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #21]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #4]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #37]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #20]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #3]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #36]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #19]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #2]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #35]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #18]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp, #1]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #34]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #17]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    ldrsb w9, [sp]
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #33]
+; NONEON-NOSVE-NEXT:    ldrsb w8, [sp, #16]
+; NONEON-NOSVE-NEXT:    subs w8, w9, w8
+; NONEON-NOSVE-NEXT:    csetm w9, gt
+; NONEON-NOSVE-NEXT:    eor w8, w8, w9
+; NONEON-NOSVE-NEXT:    sub w8, w9, w8
+; NONEON-NOSVE-NEXT:    strb w8, [sp, #32]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #32]
+; NONEON-NOSVE-NEXT:    str q0, [x0]
+; NONEON-NOSVE-NEXT:    add sp, sp, #48
+; NONEON-NOSVE-NEXT:    ret
+  %a.ld = load <16 x i8>, ptr %a
+  %b.ld = load <16 x i8>, ptr %b
+  %a.sext = sext <16 x i8> %a.ld to <16 x i16>
+  %b.sext = sext <16 x i8> %b.ld to <16 x i16>
+  %sub = sub <16 x i16> %a.sext, %b.sext
+  %abs = call <16 x i16> @llvm.abs.v16i16(<16 x i16> %sub, i1 true)
+  %trunc = trunc <16 x i16> %abs to <16 x i8>
+  store <16 x i8> %trunc, ptr %a
+  ret void
+}

>From 716594da176b4cbc956e7c7ab90988db6f907686 Mon Sep 17 00:00:00 2001
From: Jorge Gorbe Moya <jgorbe at google.com>
Date: Wed, 21 Aug 2024 12:42:44 -0700
Subject: [PATCH 045/116] [SandboxIR] Add ShuffleVectorInst (#104891)

This is missing tracking for `setShuffleMask`. I'll add it in a follow-up.
---
 llvm/include/llvm/SandboxIR/SandboxIR.h       | 503 +++++++++++++++++-
 .../llvm/SandboxIR/SandboxIRValues.def        |  77 +--
 llvm/lib/SandboxIR/SandboxIR.cpp              |  74 +++
 llvm/unittests/SandboxIR/SandboxIRTest.cpp    | 417 +++++++++++++++
 4 files changed, 1008 insertions(+), 63 deletions(-)

diff --git a/llvm/include/llvm/SandboxIR/SandboxIR.h b/llvm/include/llvm/SandboxIR/SandboxIR.h
index ca71566091bf82..01ef8013ea42a0 100644
--- a/llvm/include/llvm/SandboxIR/SandboxIR.h
+++ b/llvm/include/llvm/SandboxIR/SandboxIR.h
@@ -114,6 +114,7 @@ class Instruction;
 class SelectInst;
 class ExtractElementInst;
 class InsertElementInst;
+class ShuffleVectorInst;
 class BranchInst;
 class UnaryInstruction;
 class LoadInst;
@@ -240,31 +241,32 @@ class Value {
   /// order.
   llvm::Value *Val = nullptr;
 
-  friend class Context;            // For getting `Val`.
-  friend class User;               // For getting `Val`.
-  friend class Use;                // For getting `Val`.
-  friend class SelectInst;         // For getting `Val`.
-  friend class ExtractElementInst; // For getting `Val`.
-  friend class InsertElementInst;  // For getting `Val`.
-  friend class BranchInst;         // For getting `Val`.
-  friend class LoadInst;           // For getting `Val`.
-  friend class StoreInst;          // For getting `Val`.
-  friend class ReturnInst;         // For getting `Val`.
-  friend class CallBase;           // For getting `Val`.
-  friend class CallInst;           // For getting `Val`.
-  friend class InvokeInst;         // For getting `Val`.
-  friend class CallBrInst;         // For getting `Val`.
-  friend class GetElementPtrInst;  // For getting `Val`.
-  friend class CatchSwitchInst;    // For getting `Val`.
-  friend class SwitchInst;         // For getting `Val`.
-  friend class UnaryOperator;      // For getting `Val`.
-  friend class BinaryOperator;     // For getting `Val`.
-  friend class AtomicRMWInst;      // For getting `Val`.
-  friend class AtomicCmpXchgInst;  // For getting `Val`.
-  friend class AllocaInst;         // For getting `Val`.
-  friend class CastInst;           // For getting `Val`.
-  friend class PHINode;            // For getting `Val`.
-  friend class UnreachableInst;    // For getting `Val`.
+  friend class Context;               // For getting `Val`.
+  friend class User;                  // For getting `Val`.
+  friend class Use;                   // For getting `Val`.
+  friend class SelectInst;            // For getting `Val`.
+  friend class ExtractElementInst;    // For getting `Val`.
+  friend class InsertElementInst;     // For getting `Val`.
+  friend class ShuffleVectorInst;     // For getting `Val`.
+  friend class BranchInst;            // For getting `Val`.
+  friend class LoadInst;              // For getting `Val`.
+  friend class StoreInst;             // For getting `Val`.
+  friend class ReturnInst;            // For getting `Val`.
+  friend class CallBase;              // For getting `Val`.
+  friend class CallInst;              // For getting `Val`.
+  friend class InvokeInst;            // For getting `Val`.
+  friend class CallBrInst;            // For getting `Val`.
+  friend class GetElementPtrInst;     // For getting `Val`.
+  friend class CatchSwitchInst;       // For getting `Val`.
+  friend class SwitchInst;            // For getting `Val`.
+  friend class UnaryOperator;         // For getting `Val`.
+  friend class BinaryOperator;        // For getting `Val`.
+  friend class AtomicRMWInst;         // For getting `Val`.
+  friend class AtomicCmpXchgInst;     // For getting `Val`.
+  friend class AllocaInst;            // For getting `Val`.
+  friend class CastInst;              // For getting `Val`.
+  friend class PHINode;               // For getting `Val`.
+  friend class UnreachableInst;       // For getting `Val`.
   friend class CatchSwitchAddHandler; // For `Val`.
 
   /// All values point to the context.
@@ -669,6 +671,7 @@ class Instruction : public sandboxir::User {
   friend class SelectInst;         // For getTopmostLLVMInstruction().
   friend class ExtractElementInst; // For getTopmostLLVMInstruction().
   friend class InsertElementInst;  // For getTopmostLLVMInstruction().
+  friend class ShuffleVectorInst;  // For getTopmostLLVMInstruction().
   friend class BranchInst;         // For getTopmostLLVMInstruction().
   friend class LoadInst;           // For getTopmostLLVMInstruction().
   friend class StoreInst;          // For getTopmostLLVMInstruction().
@@ -949,6 +952,454 @@ class ExtractElementInst final
   }
 };
 
+class ShuffleVectorInst final
+    : public SingleLLVMInstructionImpl<llvm::ShuffleVectorInst> {
+  /// Use Context::createShuffleVectorInst() instead.
+  ShuffleVectorInst(llvm::Instruction *I, Context &Ctx)
+      : SingleLLVMInstructionImpl(ClassID::ShuffleVector, Opcode::ShuffleVector,
+                                  I, Ctx) {}
+  friend class Context; // For accessing the constructor in create*()
+
+public:
+  static Value *create(Value *V1, Value *V2, Value *Mask,
+                       Instruction *InsertBefore, Context &Ctx,
+                       const Twine &Name = "");
+  static Value *create(Value *V1, Value *V2, Value *Mask,
+                       BasicBlock *InsertAtEnd, Context &Ctx,
+                       const Twine &Name = "");
+  static Value *create(Value *V1, Value *V2, ArrayRef<int> Mask,
+                       Instruction *InsertBefore, Context &Ctx,
+                       const Twine &Name = "");
+  static Value *create(Value *V1, Value *V2, ArrayRef<int> Mask,
+                       BasicBlock *InsertAtEnd, Context &Ctx,
+                       const Twine &Name = "");
+  static bool classof(const Value *From) {
+    return From->getSubclassID() == ClassID::ShuffleVector;
+  }
+
+  /// Swap the operands and adjust the mask to preserve the semantics of the
+  /// instruction.
+  void commute() { cast<llvm::ShuffleVectorInst>(Val)->commute(); }
+
+  /// Return true if a shufflevector instruction can be formed with the
+  /// specified operands.
+  static bool isValidOperands(const Value *V1, const Value *V2,
+                              const Value *Mask) {
+    return llvm::ShuffleVectorInst::isValidOperands(V1->Val, V2->Val,
+                                                    Mask->Val);
+  }
+  static bool isValidOperands(const Value *V1, const Value *V2,
+                              ArrayRef<int> Mask) {
+    return llvm::ShuffleVectorInst::isValidOperands(V1->Val, V2->Val, Mask);
+  }
+
+  /// Overload to return most specific vector type.
+  VectorType *getType() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->getType();
+  }
+
+  /// Return the shuffle mask value of this instruction for the given element
+  /// index. Return PoisonMaskElem if the element is undef.
+  int getMaskValue(unsigned Elt) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->getMaskValue(Elt);
+  }
+
+  /// Convert the input shuffle mask operand to a vector of integers. Undefined
+  /// elements of the mask are returned as PoisonMaskElem.
+  static void getShuffleMask(const Constant *Mask,
+                             SmallVectorImpl<int> &Result) {
+    llvm::ShuffleVectorInst::getShuffleMask(cast<llvm::Constant>(Mask->Val),
+                                            Result);
+  }
+
+  /// Return the mask for this instruction as a vector of integers. Undefined
+  /// elements of the mask are returned as PoisonMaskElem.
+  void getShuffleMask(SmallVectorImpl<int> &Result) const {
+    cast<llvm::ShuffleVectorInst>(Val)->getShuffleMask(Result);
+  }
+
+  /// Return the mask for this instruction, for use in bitcode.
+  Constant *getShuffleMaskForBitcode() const;
+
+  static Constant *convertShuffleMaskForBitcode(ArrayRef<int> Mask,
+                                                Type *ResultTy, Context &Ctx);
+
+  void setShuffleMask(ArrayRef<int> Mask) {
+    cast<llvm::ShuffleVectorInst>(Val)->setShuffleMask(Mask);
+  }
+
+  ArrayRef<int> getShuffleMask() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->getShuffleMask();
+  }
+
+  /// Return true if this shuffle returns a vector with a different number of
+  /// elements than its source vectors.
+  /// Examples: shufflevector <4 x n> A, <4 x n> B, <1,2,3>
+  ///           shufflevector <4 x n> A, <4 x n> B, <1,2,3,4,5>
+  bool changesLength() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->changesLength();
+  }
+
+  /// Return true if this shuffle returns a vector with a greater number of
+  /// elements than its source vectors.
+  /// Example: shufflevector <2 x n> A, <2 x n> B, <1,2,3>
+  bool increasesLength() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->increasesLength();
+  }
+
+  /// Return true if this shuffle mask chooses elements from exactly one source
+  /// vector.
+  /// Example: <7,5,undef,7>
+  /// This assumes that vector operands (of length \p NumSrcElts) are the same
+  /// length as the mask.
+  static bool isSingleSourceMask(ArrayRef<int> Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isSingleSourceMask(Mask, NumSrcElts);
+  }
+  static bool isSingleSourceMask(const Constant *Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isSingleSourceMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts);
+  }
+
+  /// Return true if this shuffle chooses elements from exactly one source
+  /// vector without changing the length of that vector.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <3,0,undef,3>
+  bool isSingleSource() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isSingleSource();
+  }
+
+  /// Return true if this shuffle mask chooses elements from exactly one source
+  /// vector without lane crossings. A shuffle using this mask is not
+  /// necessarily a no-op because it may change the number of elements from its
+  /// input vectors or it may provide demanded bits knowledge via undef lanes.
+  /// Example: <undef,undef,2,3>
+  static bool isIdentityMask(ArrayRef<int> Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isIdentityMask(Mask, NumSrcElts);
+  }
+  static bool isIdentityMask(const Constant *Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isIdentityMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts);
+  }
+
+  /// Return true if this shuffle chooses elements from exactly one source
+  /// vector without lane crossings and does not change the number of elements
+  /// from its input vectors.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <4,undef,6,undef>
+  bool isIdentity() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isIdentity();
+  }
+
+  /// Return true if this shuffle lengthens exactly one source vector with
+  /// undefs in the high elements.
+  bool isIdentityWithPadding() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isIdentityWithPadding();
+  }
+
+  /// Return true if this shuffle extracts the first N elements of exactly one
+  /// source vector.
+  bool isIdentityWithExtract() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isIdentityWithExtract();
+  }
+
+  /// Return true if this shuffle concatenates its 2 source vectors. This
+  /// returns false if either input is undefined. In that case, the shuffle is
+  /// is better classified as an identity with padding operation.
+  bool isConcat() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isConcat();
+  }
+
+  /// Return true if this shuffle mask chooses elements from its source vectors
+  /// without lane crossings. A shuffle using this mask would be
+  /// equivalent to a vector select with a constant condition operand.
+  /// Example: <4,1,6,undef>
+  /// This returns false if the mask does not choose from both input vectors.
+  /// In that case, the shuffle is better classified as an identity shuffle.
+  /// This assumes that vector operands are the same length as the mask
+  /// (a length-changing shuffle can never be equivalent to a vector select).
+  static bool isSelectMask(ArrayRef<int> Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isSelectMask(Mask, NumSrcElts);
+  }
+  static bool isSelectMask(const Constant *Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isSelectMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts);
+  }
+
+  /// Return true if this shuffle chooses elements from its source vectors
+  /// without lane crossings and all operands have the same number of elements.
+  /// In other words, this shuffle is equivalent to a vector select with a
+  /// constant condition operand.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <undef,1,6,3>
+  /// This returns false if the mask does not choose from both input vectors.
+  /// In that case, the shuffle is better classified as an identity shuffle.
+  bool isSelect() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isSelect();
+  }
+
+  /// Return true if this shuffle mask swaps the order of elements from exactly
+  /// one source vector.
+  /// Example: <7,6,undef,4>
+  /// This assumes that vector operands (of length \p NumSrcElts) are the same
+  /// length as the mask.
+  static bool isReverseMask(ArrayRef<int> Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isReverseMask(Mask, NumSrcElts);
+  }
+  static bool isReverseMask(const Constant *Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isReverseMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts);
+  }
+
+  /// Return true if this shuffle swaps the order of elements from exactly
+  /// one source vector.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <3,undef,1,undef>
+  bool isReverse() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isReverse();
+  }
+
+  /// Return true if this shuffle mask chooses all elements with the same value
+  /// as the first element of exactly one source vector.
+  /// Example: <4,undef,undef,4>
+  /// This assumes that vector operands (of length \p NumSrcElts) are the same
+  /// length as the mask.
+  static bool isZeroEltSplatMask(ArrayRef<int> Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isZeroEltSplatMask(Mask, NumSrcElts);
+  }
+  static bool isZeroEltSplatMask(const Constant *Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isZeroEltSplatMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts);
+  }
+
+  /// Return true if all elements of this shuffle are the same value as the
+  /// first element of exactly one source vector without changing the length
+  /// of that vector.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <undef,0,undef,0>
+  bool isZeroEltSplat() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isZeroEltSplat();
+  }
+
+  /// Return true if this shuffle mask is a transpose mask.
+  /// Transpose vector masks transpose a 2xn matrix. They read corresponding
+  /// even- or odd-numbered vector elements from two n-dimensional source
+  /// vectors and write each result into consecutive elements of an
+  /// n-dimensional destination vector. Two shuffles are necessary to complete
+  /// the transpose, one for the even elements and another for the odd elements.
+  /// This description closely follows how the TRN1 and TRN2 AArch64
+  /// instructions operate.
+  ///
+  /// For example, a simple 2x2 matrix can be transposed with:
+  ///
+  ///   ; Original matrix
+  ///   m0 = < a, b >
+  ///   m1 = < c, d >
+  ///
+  ///   ; Transposed matrix
+  ///   t0 = < a, c > = shufflevector m0, m1, < 0, 2 >
+  ///   t1 = < b, d > = shufflevector m0, m1, < 1, 3 >
+  ///
+  /// For matrices having greater than n columns, the resulting nx2 transposed
+  /// matrix is stored in two result vectors such that one vector contains
+  /// interleaved elements from all the even-numbered rows and the other vector
+  /// contains interleaved elements from all the odd-numbered rows. For example,
+  /// a 2x4 matrix can be transposed with:
+  ///
+  ///   ; Original matrix
+  ///   m0 = < a, b, c, d >
+  ///   m1 = < e, f, g, h >
+  ///
+  ///   ; Transposed matrix
+  ///   t0 = < a, e, c, g > = shufflevector m0, m1 < 0, 4, 2, 6 >
+  ///   t1 = < b, f, d, h > = shufflevector m0, m1 < 1, 5, 3, 7 >
+  static bool isTransposeMask(ArrayRef<int> Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isTransposeMask(Mask, NumSrcElts);
+  }
+  static bool isTransposeMask(const Constant *Mask, int NumSrcElts) {
+    return llvm::ShuffleVectorInst::isTransposeMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts);
+  }
+
+  /// Return true if this shuffle transposes the elements of its inputs without
+  /// changing the length of the vectors. This operation may also be known as a
+  /// merge or interleave. See the description for isTransposeMask() for the
+  /// exact specification.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <0,4,2,6>
+  bool isTranspose() const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isTranspose();
+  }
+
+  /// Return true if this shuffle mask is a splice mask, concatenating the two
+  /// inputs together and then extracts an original width vector starting from
+  /// the splice index.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>
+  /// This assumes that vector operands (of length \p NumSrcElts) are the same
+  /// length as the mask.
+  static bool isSpliceMask(ArrayRef<int> Mask, int NumSrcElts, int &Index) {
+    return llvm::ShuffleVectorInst::isSpliceMask(Mask, NumSrcElts, Index);
+  }
+  static bool isSpliceMask(const Constant *Mask, int NumSrcElts, int &Index) {
+    return llvm::ShuffleVectorInst::isSpliceMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts, Index);
+  }
+
+  /// Return true if this shuffle splices two inputs without changing the length
+  /// of the vectors. This operation concatenates the two inputs together and
+  /// then extracts an original width vector starting from the splice index.
+  /// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>
+  bool isSplice(int &Index) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isSplice(Index);
+  }
+
+  /// Return true if this shuffle mask is an extract subvector mask.
+  /// A valid extract subvector mask returns a smaller vector from a single
+  /// source operand. The base extraction index is returned as well.
+  static bool isExtractSubvectorMask(ArrayRef<int> Mask, int NumSrcElts,
+                                     int &Index) {
+    return llvm::ShuffleVectorInst::isExtractSubvectorMask(Mask, NumSrcElts,
+                                                           Index);
+  }
+  static bool isExtractSubvectorMask(const Constant *Mask, int NumSrcElts,
+                                     int &Index) {
+    return llvm::ShuffleVectorInst::isExtractSubvectorMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts, Index);
+  }
+
+  /// Return true if this shuffle mask is an extract subvector mask.
+  bool isExtractSubvectorMask(int &Index) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isExtractSubvectorMask(Index);
+  }
+
+  /// Return true if this shuffle mask is an insert subvector mask.
+  /// A valid insert subvector mask inserts the lowest elements of a second
+  /// source operand into an in-place first source operand.
+  /// Both the sub vector width and the insertion index is returned.
+  static bool isInsertSubvectorMask(ArrayRef<int> Mask, int NumSrcElts,
+                                    int &NumSubElts, int &Index) {
+    return llvm::ShuffleVectorInst::isInsertSubvectorMask(Mask, NumSrcElts,
+                                                          NumSubElts, Index);
+  }
+  static bool isInsertSubvectorMask(const Constant *Mask, int NumSrcElts,
+                                    int &NumSubElts, int &Index) {
+    return llvm::ShuffleVectorInst::isInsertSubvectorMask(
+        cast<llvm::Constant>(Mask->Val), NumSrcElts, NumSubElts, Index);
+  }
+
+  /// Return true if this shuffle mask is an insert subvector mask.
+  bool isInsertSubvectorMask(int &NumSubElts, int &Index) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isInsertSubvectorMask(NumSubElts,
+                                                                     Index);
+  }
+
+  /// Return true if this shuffle mask replicates each of the \p VF elements
+  /// in a vector \p ReplicationFactor times.
+  /// For example, the mask for \p ReplicationFactor=3 and \p VF=4 is:
+  ///   <0,0,0,1,1,1,2,2,2,3,3,3>
+  static bool isReplicationMask(ArrayRef<int> Mask, int &ReplicationFactor,
+                                int &VF) {
+    return llvm::ShuffleVectorInst::isReplicationMask(Mask, ReplicationFactor,
+                                                      VF);
+  }
+  static bool isReplicationMask(const Constant *Mask, int &ReplicationFactor,
+                                int &VF) {
+    return llvm::ShuffleVectorInst::isReplicationMask(
+        cast<llvm::Constant>(Mask->Val), ReplicationFactor, VF);
+  }
+
+  /// Return true if this shuffle mask is a replication mask.
+  bool isReplicationMask(int &ReplicationFactor, int &VF) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isReplicationMask(
+        ReplicationFactor, VF);
+  }
+
+  /// Return true if this shuffle mask represents "clustered" mask of size VF,
+  /// i.e. each index between [0..VF) is used exactly once in each submask of
+  /// size VF.
+  /// For example, the mask for \p VF=4 is:
+  /// 0, 1, 2, 3, 3, 2, 0, 1 - "clustered", because each submask of size 4
+  /// (0,1,2,3 and 3,2,0,1) uses indices [0..VF) exactly one time.
+  /// 0, 1, 2, 3, 3, 3, 1, 0 - not "clustered", because
+  ///                          element 3 is used twice in the second submask
+  ///                          (3,3,1,0) and index 2 is not used at all.
+  static bool isOneUseSingleSourceMask(ArrayRef<int> Mask, int VF) {
+    return llvm::ShuffleVectorInst::isOneUseSingleSourceMask(Mask, VF);
+  }
+
+  /// Return true if this shuffle mask is a one-use-single-source("clustered")
+  /// mask.
+  bool isOneUseSingleSourceMask(int VF) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isOneUseSingleSourceMask(VF);
+  }
+
+  /// Change values in a shuffle permute mask assuming the two vector operands
+  /// of length InVecNumElts have swapped position.
+  static void commuteShuffleMask(MutableArrayRef<int> Mask,
+                                 unsigned InVecNumElts) {
+    llvm::ShuffleVectorInst::commuteShuffleMask(Mask, InVecNumElts);
+  }
+
+  /// Return if this shuffle interleaves its two input vectors together.
+  bool isInterleave(unsigned Factor) const {
+    return cast<llvm::ShuffleVectorInst>(Val)->isInterleave(Factor);
+  }
+
+  /// Return true if the mask interleaves one or more input vectors together.
+  ///
+  /// I.e. <0, LaneLen, ... , LaneLen*(Factor - 1), 1, LaneLen + 1, ...>
+  /// E.g. For a Factor of 2 (LaneLen=4):
+  ///   <0, 4, 1, 5, 2, 6, 3, 7>
+  /// E.g. For a Factor of 3 (LaneLen=4):
+  ///   <4, 0, 9, 5, 1, 10, 6, 2, 11, 7, 3, 12>
+  /// E.g. For a Factor of 4 (LaneLen=2):
+  ///   <0, 2, 6, 4, 1, 3, 7, 5>
+  ///
+  /// NumInputElts is the total number of elements in the input vectors.
+  ///
+  /// StartIndexes are the first indexes of each vector being interleaved,
+  /// substituting any indexes that were undef
+  /// E.g. <4, -1, 2, 5, 1, 3> (Factor=3): StartIndexes=<4, 0, 2>
+  ///
+  /// Note that this does not check if the input vectors are consecutive:
+  /// It will return true for masks such as
+  /// <0, 4, 6, 1, 5, 7> (Factor=3, LaneLen=2)
+  static bool isInterleaveMask(ArrayRef<int> Mask, unsigned Factor,
+                               unsigned NumInputElts,
+                               SmallVectorImpl<unsigned> &StartIndexes) {
+    return llvm::ShuffleVectorInst::isInterleaveMask(Mask, Factor, NumInputElts,
+                                                     StartIndexes);
+  }
+  static bool isInterleaveMask(ArrayRef<int> Mask, unsigned Factor,
+                               unsigned NumInputElts) {
+    return llvm::ShuffleVectorInst::isInterleaveMask(Mask, Factor,
+                                                     NumInputElts);
+  }
+
+  /// Check if the mask is a DE-interleave mask of the given factor
+  /// \p Factor like:
+  ///     <Index, Index+Factor, ..., Index+(NumElts-1)*Factor>
+  static bool isDeInterleaveMaskOfFactor(ArrayRef<int> Mask, unsigned Factor,
+                                         unsigned &Index) {
+    return llvm::ShuffleVectorInst::isDeInterleaveMaskOfFactor(Mask, Factor,
+                                                               Index);
+  }
+  static bool isDeInterleaveMaskOfFactor(ArrayRef<int> Mask, unsigned Factor) {
+    return llvm::ShuffleVectorInst::isDeInterleaveMaskOfFactor(Mask, Factor);
+  }
+
+  /// Checks if the shuffle is a bit rotation of the first operand across
+  /// multiple subelements, e.g:
+  ///
+  /// shuffle <8 x i8> %a, <8 x i8> poison, <8 x i32> <1, 0, 3, 2, 5, 4, 7, 6>
+  ///
+  /// could be expressed as
+  ///
+  /// rotl <4 x i16> %a, 8
+  ///
+  /// If it can be expressed as a rotation, returns the number of subelements to
+  /// group by in NumSubElts and the number of bits to rotate left in RotateAmt.
+  static bool isBitRotateMask(ArrayRef<int> Mask, unsigned EltSizeInBits,
+                              unsigned MinSubElts, unsigned MaxSubElts,
+                              unsigned &NumSubElts, unsigned &RotateAmt) {
+    return llvm::ShuffleVectorInst::isBitRotateMask(
+        Mask, EltSizeInBits, MinSubElts, MaxSubElts, NumSubElts, RotateAmt);
+  }
+};
+
 class BranchInst : public SingleLLVMInstructionImpl<llvm::BranchInst> {
   /// Use Context::createBranchInst(). Don't call the constructor directly.
   BranchInst(llvm::BranchInst *BI, Context &Ctx)
@@ -2280,6 +2731,8 @@ class Context {
   friend InsertElementInst; // For createInsertElementInst()
   ExtractElementInst *createExtractElementInst(llvm::ExtractElementInst *EEI);
   friend ExtractElementInst; // For createExtractElementInst()
+  ShuffleVectorInst *createShuffleVectorInst(llvm::ShuffleVectorInst *SVI);
+  friend ShuffleVectorInst; // For createShuffleVectorInst()
   BranchInst *createBranchInst(llvm::BranchInst *I);
   friend BranchInst; // For createBranchInst()
   LoadInst *createLoadInst(llvm::LoadInst *LI);
diff --git a/llvm/include/llvm/SandboxIR/SandboxIRValues.def b/llvm/include/llvm/SandboxIR/SandboxIRValues.def
index 402b6f3324a222..56720f564a7cae 100644
--- a/llvm/include/llvm/SandboxIR/SandboxIRValues.def
+++ b/llvm/include/llvm/SandboxIR/SandboxIRValues.def
@@ -33,46 +33,47 @@ DEF_USER(ConstantInt, ConstantInt)
 #define OPCODES(...)
 #endif
 // clang-format off
-//       ClassID,        Opcode(s),         Class
-DEF_INSTR(Opaque,        OP(Opaque),        OpaqueInst)
+//        ClassID,        Opcode(s),          Class
+DEF_INSTR(Opaque,         OP(Opaque),         OpaqueInst)
 DEF_INSTR(ExtractElement, OP(ExtractElement), ExtractElementInst)
-DEF_INSTR(InsertElement, OP(InsertElement), InsertElementInst)
-DEF_INSTR(Select,        OP(Select),        SelectInst)
-DEF_INSTR(Br,            OP(Br),            BranchInst)
-DEF_INSTR(Load,          OP(Load),          LoadInst)
-DEF_INSTR(Store,         OP(Store),         StoreInst)
-DEF_INSTR(Ret,           OP(Ret),           ReturnInst)
-DEF_INSTR(Call,          OP(Call),          CallInst)
-DEF_INSTR(Invoke,        OP(Invoke),        InvokeInst)
-DEF_INSTR(CallBr,        OP(CallBr),        CallBrInst)
-DEF_INSTR(GetElementPtr, OP(GetElementPtr), GetElementPtrInst)
-DEF_INSTR(CatchSwitch,   OP(CatchSwitch),   CatchSwitchInst)
-DEF_INSTR(Switch,        OP(Switch),        SwitchInst)
-DEF_INSTR(UnOp,          OPCODES( \
-                         OP(FNeg) \
-                         ),                 UnaryOperator)
+DEF_INSTR(InsertElement,  OP(InsertElement),  InsertElementInst)
+DEF_INSTR(ShuffleVector,  OP(ShuffleVector),  ShuffleVectorInst)
+DEF_INSTR(Select,         OP(Select),         SelectInst)
+DEF_INSTR(Br,             OP(Br),             BranchInst)
+DEF_INSTR(Load,           OP(Load),           LoadInst)
+DEF_INSTR(Store,          OP(Store),          StoreInst)
+DEF_INSTR(Ret,            OP(Ret),            ReturnInst)
+DEF_INSTR(Call,           OP(Call),           CallInst)
+DEF_INSTR(Invoke,         OP(Invoke),         InvokeInst)
+DEF_INSTR(CallBr,         OP(CallBr),         CallBrInst)
+DEF_INSTR(GetElementPtr,  OP(GetElementPtr),  GetElementPtrInst)
+DEF_INSTR(CatchSwitch,    OP(CatchSwitch),    CatchSwitchInst)
+DEF_INSTR(Switch,         OP(Switch),         SwitchInst)
+DEF_INSTR(UnOp,           OPCODES( \
+                          OP(FNeg) \
+                          ),                  UnaryOperator)
 DEF_INSTR(BinaryOperator, OPCODES(\
-                         OP(Add)  \
-                         OP(FAdd) \
-                         OP(Sub)  \
-                         OP(FSub) \
-                         OP(Mul)  \
-                         OP(FMul) \
-                         OP(UDiv) \
-                         OP(SDiv) \
-                         OP(FDiv) \
-                         OP(URem) \
-                         OP(SRem) \
-                         OP(FRem) \
-                         OP(Shl)  \
-                         OP(LShr) \
-                         OP(AShr) \
-                         OP(And)  \
-                         OP(Or)   \
-                         OP(Xor)  \
-                         ),                 BinaryOperator)
-DEF_INSTR(AtomicRMW,     OP(AtomicRMW),     AtomicRMWInst)
-DEF_INSTR(AtomicCmpXchg, OP(AtomicCmpXchg), AtomicCmpXchgInst)
+                          OP(Add)  \
+                          OP(FAdd) \
+                          OP(Sub)  \
+                          OP(FSub) \
+                          OP(Mul)  \
+                          OP(FMul) \
+                          OP(UDiv) \
+                          OP(SDiv) \
+                          OP(FDiv) \
+                          OP(URem) \
+                          OP(SRem) \
+                          OP(FRem) \
+                          OP(Shl)  \
+                          OP(LShr) \
+                          OP(AShr) \
+                          OP(And)  \
+                          OP(Or)   \
+                          OP(Xor)  \
+                          ),                  BinaryOperator)
+DEF_INSTR(AtomicRMW,      OP(AtomicRMW),      AtomicRMWInst)
+DEF_INSTR(AtomicCmpXchg,  OP(AtomicCmpXchg),  AtomicCmpXchgInst)
 DEF_INSTR(Alloca,         OP(Alloca),         AllocaInst)
 DEF_INSTR(Cast,   OPCODES(\
                           OP(ZExt)          \
diff --git a/llvm/lib/SandboxIR/SandboxIR.cpp b/llvm/lib/SandboxIR/SandboxIR.cpp
index 5b170cee20c940..a62c879b91e8b9 100644
--- a/llvm/lib/SandboxIR/SandboxIR.cpp
+++ b/llvm/lib/SandboxIR/SandboxIR.cpp
@@ -1818,6 +1818,67 @@ Value *ExtractElementInst::create(Value *Vec, Value *Idx,
   return Ctx.getOrCreateConstant(cast<llvm::Constant>(NewV));
 }
 
+Value *ShuffleVectorInst::create(Value *V1, Value *V2, Value *Mask,
+                                 Instruction *InsertBefore, Context &Ctx,
+                                 const Twine &Name) {
+  auto &Builder = Ctx.getLLVMIRBuilder();
+  Builder.SetInsertPoint(InsertBefore->getTopmostLLVMInstruction());
+  llvm::Value *NewV =
+      Builder.CreateShuffleVector(V1->Val, V2->Val, Mask->Val, Name);
+  if (auto *NewShuffle = dyn_cast<llvm::ShuffleVectorInst>(NewV))
+    return Ctx.createShuffleVectorInst(NewShuffle);
+  assert(isa<llvm::Constant>(NewV) && "Expected constant");
+  return Ctx.getOrCreateConstant(cast<llvm::Constant>(NewV));
+}
+
+Value *ShuffleVectorInst::create(Value *V1, Value *V2, Value *Mask,
+                                 BasicBlock *InsertAtEnd, Context &Ctx,
+                                 const Twine &Name) {
+  auto &Builder = Ctx.getLLVMIRBuilder();
+  Builder.SetInsertPoint(cast<llvm::BasicBlock>(InsertAtEnd->Val));
+  llvm::Value *NewV =
+      Builder.CreateShuffleVector(V1->Val, V2->Val, Mask->Val, Name);
+  if (auto *NewShuffle = dyn_cast<llvm::ShuffleVectorInst>(NewV))
+    return Ctx.createShuffleVectorInst(NewShuffle);
+  assert(isa<llvm::Constant>(NewV) && "Expected constant");
+  return Ctx.getOrCreateConstant(cast<llvm::Constant>(NewV));
+}
+
+Value *ShuffleVectorInst::create(Value *V1, Value *V2, ArrayRef<int> Mask,
+                                 Instruction *InsertBefore, Context &Ctx,
+                                 const Twine &Name) {
+  auto &Builder = Ctx.getLLVMIRBuilder();
+  Builder.SetInsertPoint(InsertBefore->getTopmostLLVMInstruction());
+  llvm::Value *NewV = Builder.CreateShuffleVector(V1->Val, V2->Val, Mask, Name);
+  if (auto *NewShuffle = dyn_cast<llvm::ShuffleVectorInst>(NewV))
+    return Ctx.createShuffleVectorInst(NewShuffle);
+  assert(isa<llvm::Constant>(NewV) && "Expected constant");
+  return Ctx.getOrCreateConstant(cast<llvm::Constant>(NewV));
+}
+
+Value *ShuffleVectorInst::create(Value *V1, Value *V2, ArrayRef<int> Mask,
+                                 BasicBlock *InsertAtEnd, Context &Ctx,
+                                 const Twine &Name) {
+  auto &Builder = Ctx.getLLVMIRBuilder();
+  Builder.SetInsertPoint(cast<llvm::BasicBlock>(InsertAtEnd->Val));
+  llvm::Value *NewV = Builder.CreateShuffleVector(V1->Val, V2->Val, Mask, Name);
+  if (auto *NewShuffle = dyn_cast<llvm::ShuffleVectorInst>(NewV))
+    return Ctx.createShuffleVectorInst(NewShuffle);
+  assert(isa<llvm::Constant>(NewV) && "Expected constant");
+  return Ctx.getOrCreateConstant(cast<llvm::Constant>(NewV));
+}
+
+Constant *ShuffleVectorInst::getShuffleMaskForBitcode() const {
+  return Ctx.getOrCreateConstant(
+      cast<llvm::ShuffleVectorInst>(Val)->getShuffleMaskForBitcode());
+}
+
+Constant *ShuffleVectorInst::convertShuffleMaskForBitcode(
+    llvm::ArrayRef<int> Mask, llvm::Type *ResultTy, Context &Ctx) {
+  return Ctx.getOrCreateConstant(
+      llvm::ShuffleVectorInst::convertShuffleMaskForBitcode(Mask, ResultTy));
+}
+
 #ifndef NDEBUG
 void Constant::dumpOS(raw_ostream &OS) const {
   dumpCommonPrefix(OS);
@@ -1957,6 +2018,12 @@ Value *Context::getOrCreateValueInternal(llvm::Value *LLVMV, llvm::User *U) {
         new InsertElementInst(LLVMIns, *this));
     return It->second.get();
   }
+  case llvm::Instruction::ShuffleVector: {
+    auto *LLVMIns = cast<llvm::ShuffleVectorInst>(LLVMV);
+    It->second = std::unique_ptr<ShuffleVectorInst>(
+        new ShuffleVectorInst(LLVMIns, *this));
+    return It->second.get();
+  }
   case llvm::Instruction::Br: {
     auto *LLVMBr = cast<llvm::BranchInst>(LLVMV);
     It->second = std::unique_ptr<BranchInst>(new BranchInst(LLVMBr, *this));
@@ -2121,6 +2188,13 @@ Context::createInsertElementInst(llvm::InsertElementInst *IEI) {
   return cast<InsertElementInst>(registerValue(std::move(NewPtr)));
 }
 
+ShuffleVectorInst *
+Context::createShuffleVectorInst(llvm::ShuffleVectorInst *SVI) {
+  auto NewPtr =
+      std::unique_ptr<ShuffleVectorInst>(new ShuffleVectorInst(SVI, *this));
+  return cast<ShuffleVectorInst>(registerValue(std::move(NewPtr)));
+}
+
 BranchInst *Context::createBranchInst(llvm::BranchInst *BI) {
   auto NewPtr = std::unique_ptr<BranchInst>(new BranchInst(BI, *this));
   return cast<BranchInst>(registerValue(std::move(NewPtr)));
diff --git a/llvm/unittests/SandboxIR/SandboxIRTest.cpp b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
index 712865fd07cd7b..94d8ac27be3bc8 100644
--- a/llvm/unittests/SandboxIR/SandboxIRTest.cpp
+++ b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
@@ -15,6 +15,7 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Support/SourceMgr.h"
+#include "gmock/gmock-matchers.h"
 #include "gtest/gtest.h"
 
 using namespace llvm;
@@ -739,6 +740,422 @@ define void @foo(i8 %v0, i8 %v1, <2 x i8> %vec) {
       llvm::InsertElementInst::isValidOperands(LLVMArg0, LLVMArgVec, LLVMZero));
 }
 
+TEST_F(SandboxIRTest, ShuffleVectorInst) {
+  parseIR(C, R"IR(
+define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
+  %shuf = shufflevector <2 x i8> %v1, <2 x i8> %v2, <2 x i32> <i32 0, i32 2>
+  %extr = extractelement <2 x i8> <i8 0, i8 1>, i32 0
+  ret void
+}
+)IR");
+  Function &LLVMF = *M->getFunction("foo");
+  sandboxir::Context Ctx(C);
+  auto &F = *Ctx.createFunction(&LLVMF);
+  auto *ArgV1 = F.getArg(0);
+  auto *ArgV2 = F.getArg(1);
+  auto *BB = &*F.begin();
+  auto It = BB->begin();
+  auto *SVI = cast<sandboxir::ShuffleVectorInst>(&*It++);
+  auto *EEI = cast<sandboxir::ExtractElementInst>(&*It++);
+  auto *Ret = &*It++;
+
+  EXPECT_EQ(SVI->getOpcode(), sandboxir::Instruction::Opcode::ShuffleVector);
+  EXPECT_EQ(SVI->getOperand(0), ArgV1);
+  EXPECT_EQ(SVI->getOperand(1), ArgV2);
+
+  // In order to test all the methods we need masks of different lengths, so we
+  // can't simply reuse one of the instructions created above. This helper
+  // creates a new `shufflevector %v1, %2, <mask>` with the given mask indices.
+  auto CreateShuffleWithMask = [&](auto &&...Indices) {
+    SmallVector<int, 4> Mask = {Indices...};
+    return cast<sandboxir::ShuffleVectorInst>(
+        sandboxir::ShuffleVectorInst::create(ArgV1, ArgV2, Mask, Ret, Ctx));
+  };
+
+  // create (InsertBefore)
+  auto *NewI1 =
+      cast<sandboxir::ShuffleVectorInst>(sandboxir::ShuffleVectorInst::create(
+          ArgV1, ArgV2, ArrayRef<int>({0, 2, 1, 3}), Ret, Ctx,
+          "NewShuffleBeforeRet"));
+  EXPECT_EQ(NewI1->getOperand(0), ArgV1);
+  EXPECT_EQ(NewI1->getOperand(1), ArgV2);
+  EXPECT_EQ(NewI1->getNextNode(), Ret);
+#ifndef NDEBUG
+  EXPECT_EQ(NewI1->getName(), "NewShuffleBeforeRet");
+#endif
+
+  // create (InsertAtEnd)
+  auto *NewI2 =
+      cast<sandboxir::ShuffleVectorInst>(sandboxir::ShuffleVectorInst::create(
+          ArgV1, ArgV2, ArrayRef<int>({0, 1}), BB, Ctx, "NewShuffleAtEndOfBB"));
+  EXPECT_EQ(NewI2->getPrevNode(), Ret);
+
+  // Test the path that creates a folded constant. We're currently using an
+  // extractelement instruction with a constant operand in the textual IR above
+  // to obtain a constant vector to work with.
+  // TODO: Refactor this once sandboxir::ConstantVector lands.
+  auto *ShouldBeConstant = sandboxir::ShuffleVectorInst::create(
+      EEI->getOperand(0), EEI->getOperand(0), ArrayRef<int>({0, 3}), BB, Ctx);
+  EXPECT_TRUE(isa<sandboxir::Constant>(ShouldBeConstant));
+
+  // isValidOperands
+  auto *LLVMArgV1 = LLVMF.getArg(0);
+  auto *LLVMArgV2 = LLVMF.getArg(1);
+  ArrayRef<int> Mask({1, 2});
+  EXPECT_EQ(
+      sandboxir::ShuffleVectorInst::isValidOperands(ArgV1, ArgV2, Mask),
+      llvm::ShuffleVectorInst::isValidOperands(LLVMArgV1, LLVMArgV2, Mask));
+  EXPECT_EQ(sandboxir::ShuffleVectorInst::isValidOperands(ArgV1, ArgV1, ArgV1),
+            llvm::ShuffleVectorInst::isValidOperands(LLVMArgV1, LLVMArgV1,
+                                                     LLVMArgV1));
+
+  // commute
+  {
+    auto *I = CreateShuffleWithMask(0, 2);
+    I->commute();
+    EXPECT_EQ(I->getOperand(0), ArgV2);
+    EXPECT_EQ(I->getOperand(1), ArgV1);
+    EXPECT_THAT(I->getShuffleMask(),
+                testing::ContainerEq(ArrayRef<int>({2, 0})));
+  }
+
+  // getType
+  EXPECT_EQ(SVI->getType(), ArgV1->getType());
+
+  // getMaskValue
+  EXPECT_EQ(SVI->getMaskValue(0), 0);
+  EXPECT_EQ(SVI->getMaskValue(1), 2);
+
+  // getShuffleMask / getShuffleMaskForBitcode
+  {
+    EXPECT_THAT(SVI->getShuffleMask(),
+                testing::ContainerEq(ArrayRef<int>({0, 2})));
+
+    SmallVector<int, 2> Result;
+    SVI->getShuffleMask(Result);
+    EXPECT_THAT(Result, testing::ContainerEq(ArrayRef<int>({0, 2})));
+
+    Result.clear();
+    sandboxir::ShuffleVectorInst::getShuffleMask(
+        SVI->getShuffleMaskForBitcode(), Result);
+    EXPECT_THAT(Result, testing::ContainerEq(ArrayRef<int>({0, 2})));
+  }
+
+  // convertShuffleMaskForBitcode
+  {
+    auto *C = sandboxir::ShuffleVectorInst::convertShuffleMaskForBitcode(
+        ArrayRef<int>({2, 3}), ArgV1->getType(), Ctx);
+    SmallVector<int, 2> Result;
+    sandboxir::ShuffleVectorInst::getShuffleMask(C, Result);
+    EXPECT_THAT(Result, testing::ContainerEq(ArrayRef<int>({2, 3})));
+  }
+
+  // setShuffleMask
+  {
+    auto *I = CreateShuffleWithMask(0, 1);
+    I->setShuffleMask(ArrayRef<int>({2, 3}));
+    EXPECT_THAT(I->getShuffleMask(),
+                testing::ContainerEq(ArrayRef<int>({2, 3})));
+  }
+
+  // The following functions check different mask properties. Note that most
+  // of these come in three different flavors: a method that checks the mask
+  // in the current instructions and two static member functions that check
+  // a mask given as an ArrayRef<int> or Constant*, so there's quite a bit of
+  // repetition in order to check all of them.
+
+  // changesLength / increasesLength
+  {
+    auto *I = CreateShuffleWithMask(1);
+    EXPECT_TRUE(I->changesLength());
+    EXPECT_FALSE(I->increasesLength());
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 1);
+    EXPECT_FALSE(I->changesLength());
+    EXPECT_FALSE(I->increasesLength());
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 1, 1);
+    EXPECT_TRUE(I->changesLength());
+    EXPECT_TRUE(I->increasesLength());
+  }
+
+  // isSingleSource / isSingleSourceMask
+  {
+    auto *I = CreateShuffleWithMask(0, 1);
+    EXPECT_TRUE(I->isSingleSource());
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isSingleSourceMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isSingleSourceMask(
+        I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(0, 2);
+    EXPECT_FALSE(I->isSingleSource());
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isSingleSourceMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isSingleSourceMask(
+        I->getShuffleMask(), 2));
+  }
+
+  // isIdentity / isIdentityMask
+  {
+    auto *I = CreateShuffleWithMask(0, 1);
+    EXPECT_TRUE(I->isIdentity());
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isIdentityMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_TRUE(
+        sandboxir::ShuffleVectorInst::isIdentityMask(I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 0);
+    EXPECT_FALSE(I->isIdentity());
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isIdentityMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_FALSE(
+        sandboxir::ShuffleVectorInst::isIdentityMask(I->getShuffleMask(), 2));
+  }
+
+  // isIdentityWithPadding
+  EXPECT_TRUE(CreateShuffleWithMask(0, 1, -1, -1)->isIdentityWithPadding());
+  EXPECT_FALSE(CreateShuffleWithMask(0, 1)->isIdentityWithPadding());
+
+  // isIdentityWithExtract
+  EXPECT_TRUE(CreateShuffleWithMask(0)->isIdentityWithExtract());
+  EXPECT_FALSE(CreateShuffleWithMask(0, 1)->isIdentityWithExtract());
+  EXPECT_FALSE(CreateShuffleWithMask(0, 1, 2)->isIdentityWithExtract());
+  EXPECT_FALSE(CreateShuffleWithMask(1)->isIdentityWithExtract());
+
+  // isConcat
+  EXPECT_TRUE(CreateShuffleWithMask(0, 1, 2, 3)->isConcat());
+  EXPECT_FALSE(CreateShuffleWithMask(0, 3)->isConcat());
+
+  // isSelect / isSelectMask
+  {
+    auto *I = CreateShuffleWithMask(0, 3);
+    EXPECT_TRUE(I->isSelect());
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isSelectMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_TRUE(
+        sandboxir::ShuffleVectorInst::isSelectMask(I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(0, 2);
+    EXPECT_FALSE(I->isSelect());
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isSelectMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_FALSE(
+        sandboxir::ShuffleVectorInst::isSelectMask(I->getShuffleMask(), 2));
+  }
+
+  // isReverse / isReverseMask
+  {
+    auto *I = CreateShuffleWithMask(1, 0);
+    EXPECT_TRUE(I->isReverse());
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isReverseMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_TRUE(
+        sandboxir::ShuffleVectorInst::isReverseMask(I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 2);
+    EXPECT_FALSE(I->isReverse());
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isReverseMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_FALSE(
+        sandboxir::ShuffleVectorInst::isReverseMask(I->getShuffleMask(), 2));
+  }
+
+  // isZeroEltSplat / isZeroEltSplatMask
+  {
+    auto *I = CreateShuffleWithMask(0, 0);
+    EXPECT_TRUE(I->isZeroEltSplat());
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isZeroEltSplatMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isZeroEltSplatMask(
+        I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 1);
+    EXPECT_FALSE(I->isZeroEltSplat());
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isZeroEltSplatMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isZeroEltSplatMask(
+        I->getShuffleMask(), 2));
+  }
+
+  // isTranspose / isTransposeMask
+  {
+    auto *I = CreateShuffleWithMask(0, 2);
+    EXPECT_TRUE(I->isTranspose());
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isTransposeMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_TRUE(
+        sandboxir::ShuffleVectorInst::isTransposeMask(I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 1);
+    EXPECT_FALSE(I->isTranspose());
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isTransposeMask(
+        I->getShuffleMaskForBitcode(), 2));
+    EXPECT_FALSE(
+        sandboxir::ShuffleVectorInst::isTransposeMask(I->getShuffleMask(), 2));
+  }
+
+  // isSplice / isSpliceMask
+  {
+    auto *I = CreateShuffleWithMask(1, 2);
+    int Index;
+    EXPECT_TRUE(I->isSplice(Index));
+    EXPECT_EQ(Index, 1);
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isSpliceMask(
+        I->getShuffleMaskForBitcode(), 2, Index));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isSpliceMask(I->getShuffleMask(),
+                                                           2, Index));
+  }
+  {
+    auto *I = CreateShuffleWithMask(2, 1);
+    int Index;
+    EXPECT_FALSE(I->isSplice(Index));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isSpliceMask(
+        I->getShuffleMaskForBitcode(), 2, Index));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isSpliceMask(I->getShuffleMask(),
+                                                            2, Index));
+  }
+
+  // isExtractSubvectorMask
+  {
+    auto *I = CreateShuffleWithMask(1);
+    int Index;
+    EXPECT_TRUE(I->isExtractSubvectorMask(Index));
+    EXPECT_EQ(Index, 1);
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isExtractSubvectorMask(
+        I->getShuffleMaskForBitcode(), 2, Index));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isExtractSubvectorMask(
+        I->getShuffleMask(), 2, Index));
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 2);
+    int Index;
+    EXPECT_FALSE(I->isExtractSubvectorMask(Index));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isExtractSubvectorMask(
+        I->getShuffleMaskForBitcode(), 2, Index));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isExtractSubvectorMask(
+        I->getShuffleMask(), 2, Index));
+  }
+
+  // isInsertSubvectorMask
+  {
+    auto *I = CreateShuffleWithMask(0, 2);
+    int NumSubElts, Index;
+    EXPECT_TRUE(I->isInsertSubvectorMask(NumSubElts, Index));
+    EXPECT_EQ(Index, 1);
+    EXPECT_EQ(NumSubElts, 1);
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isInsertSubvectorMask(
+        I->getShuffleMaskForBitcode(), 2, NumSubElts, Index));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isInsertSubvectorMask(
+        I->getShuffleMask(), 2, NumSubElts, Index));
+  }
+  {
+    auto *I = CreateShuffleWithMask(0, 1);
+    int NumSubElts, Index;
+    EXPECT_FALSE(I->isInsertSubvectorMask(NumSubElts, Index));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isInsertSubvectorMask(
+        I->getShuffleMaskForBitcode(), 2, NumSubElts, Index));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isInsertSubvectorMask(
+        I->getShuffleMask(), 2, NumSubElts, Index));
+  }
+
+  // isReplicationMask
+  {
+    auto *I = CreateShuffleWithMask(0, 0, 0, 1, 1, 1);
+    int ReplicationFactor, VF;
+    EXPECT_TRUE(I->isReplicationMask(ReplicationFactor, VF));
+    EXPECT_EQ(ReplicationFactor, 3);
+    EXPECT_EQ(VF, 2);
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isReplicationMask(
+        I->getShuffleMaskForBitcode(), ReplicationFactor, VF));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isReplicationMask(
+        I->getShuffleMask(), ReplicationFactor, VF));
+  }
+  {
+    auto *I = CreateShuffleWithMask(1, 2);
+    int ReplicationFactor, VF;
+    EXPECT_FALSE(I->isReplicationMask(ReplicationFactor, VF));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isReplicationMask(
+        I->getShuffleMaskForBitcode(), ReplicationFactor, VF));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isReplicationMask(
+        I->getShuffleMask(), ReplicationFactor, VF));
+  }
+
+  // isOneUseSingleSourceMask
+  {
+    auto *I = CreateShuffleWithMask(0, 1, 1, 0);
+    EXPECT_TRUE(I->isOneUseSingleSourceMask(2));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isOneUseSingleSourceMask(
+        I->getShuffleMask(), 2));
+  }
+  {
+    auto *I = CreateShuffleWithMask(0, 1, 0, 0);
+    EXPECT_FALSE(I->isOneUseSingleSourceMask(2));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isOneUseSingleSourceMask(
+        I->getShuffleMask(), 2));
+  }
+
+  // commuteShuffleMask
+  {
+    SmallVector<int, 4> M = {0, 2, 1, 3};
+    ShuffleVectorInst::commuteShuffleMask(M, 2);
+    EXPECT_THAT(M, testing::ContainerEq(ArrayRef<int>({2, 0, 3, 1})));
+  }
+
+  // isInterleave / isInterleaveMask
+  {
+    auto *I = CreateShuffleWithMask(0, 2, 1, 3);
+    EXPECT_TRUE(I->isInterleave(2));
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isInterleaveMask(
+        I->getShuffleMask(), 2, 4));
+    SmallVector<unsigned, 4> StartIndexes;
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isInterleaveMask(
+        I->getShuffleMask(), 2, 4, StartIndexes));
+    EXPECT_THAT(StartIndexes, testing::ContainerEq(ArrayRef<unsigned>({0, 2})));
+  }
+  {
+    auto *I = CreateShuffleWithMask(0, 3, 1, 2);
+    EXPECT_FALSE(I->isInterleave(2));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isInterleaveMask(
+        I->getShuffleMask(), 2, 4));
+  }
+
+  // isDeInterleaveMaskOfFactor
+  {
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isDeInterleaveMaskOfFactor(
+        ArrayRef<int>({0, 2}), 2));
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isDeInterleaveMaskOfFactor(
+        ArrayRef<int>({0, 1}), 2));
+
+    unsigned Index;
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isDeInterleaveMaskOfFactor(
+        ArrayRef<int>({1, 3}), 2, Index));
+    EXPECT_EQ(Index, 1u);
+  }
+
+  // isBitRotateMask
+  {
+    unsigned NumSubElts, RotateAmt;
+    EXPECT_TRUE(sandboxir::ShuffleVectorInst::isBitRotateMask(
+        ArrayRef<int>({1, 0, 3, 2, 5, 4, 7, 6}), 8, 2, 2, NumSubElts,
+        RotateAmt));
+    EXPECT_EQ(NumSubElts, 2u);
+    EXPECT_EQ(RotateAmt, 8u);
+
+    EXPECT_FALSE(sandboxir::ShuffleVectorInst::isBitRotateMask(
+        ArrayRef<int>({0, 7, 1, 6, 2, 5, 3, 4}), 8, 2, 2, NumSubElts,
+        RotateAmt));
+  }
+}
+
 TEST_F(SandboxIRTest, BranchInst) {
   parseIR(C, R"IR(
 define void @foo(i1 %cond0, i1 %cond2) {

>From b03b170dd39799b4fb25ffe70b81d0cf0c7d7346 Mon Sep 17 00:00:00 2001
From: Rahul Joshi <rjoshi at nvidia.com>
Date: Wed, 21 Aug 2024 13:28:28 -0700
Subject: [PATCH 046/116] [ADT] Add `isPunct` to StringExtras (#105461)

- Add `isPunct` to StringExtras.h.
- Add unit test for `isPunct` to StringExtrasTest.
---
 llvm/include/llvm/ADT/StringExtras.h    | 11 +++++++++++
 llvm/unittests/ADT/StringExtrasTest.cpp | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/llvm/include/llvm/ADT/StringExtras.h b/llvm/include/llvm/ADT/StringExtras.h
index 20e6ad1f68f996..1317d521d4c191 100644
--- a/llvm/include/llvm/ADT/StringExtras.h
+++ b/llvm/include/llvm/ADT/StringExtras.h
@@ -140,6 +140,17 @@ inline bool isPrint(char C) {
   return (0x20 <= UC) && (UC <= 0x7E);
 }
 
+/// Checks whether character \p C is a punctuation character.
+///
+/// Locale-independent version of the C standard library ispunct. The list of
+/// punctuation characters can be found in the documentation of std::ispunct:
+/// https://en.cppreference.com/w/cpp/string/byte/ispunct.
+inline bool isPunct(char C) {
+  static constexpr StringLiteral Punctuations =
+      R"(!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~)";
+  return Punctuations.contains(C);
+}
+
 /// Checks whether character \p C is whitespace in the "C" locale.
 ///
 /// Locale-independent version of the C standard library isspace.
diff --git a/llvm/unittests/ADT/StringExtrasTest.cpp b/llvm/unittests/ADT/StringExtrasTest.cpp
index 1fb1fea6577911..51f7c3948a3146 100644
--- a/llvm/unittests/ADT/StringExtrasTest.cpp
+++ b/llvm/unittests/ADT/StringExtrasTest.cpp
@@ -59,6 +59,18 @@ TEST(StringExtrasTest, isUpper) {
   EXPECT_FALSE(isUpper('\?'));
 }
 
+TEST(StringExtrasTest, isPunct) {
+  EXPECT_FALSE(isPunct('a'));
+  EXPECT_FALSE(isPunct('b'));
+  EXPECT_FALSE(isPunct('z'));
+  EXPECT_TRUE(isPunct('-'));
+  EXPECT_TRUE(isPunct(';'));
+  EXPECT_TRUE(isPunct('@'));
+  EXPECT_FALSE(isPunct('0'));
+  EXPECT_FALSE(isPunct('1'));
+  EXPECT_FALSE(isPunct('x'));
+}
+
 template <class ContainerT> void testJoin() {
   ContainerT Items;
   EXPECT_EQ("", join(Items.begin(), Items.end(), " <sep> "));

>From 84fa7b438e1fba0c88b21784e716926017b9fe49 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 21 Aug 2024 12:54:05 -0400
Subject: [PATCH 047/116] [libc++] Improve the granularity of status tracking
 from Github issues

This enhances the Github - CSV synchronization script to understand
some of the idioms we use in the CSV status files, like |Nothing To Do|
and others.
---
 libcxx/utils/synchronize_csv_status_files.py | 173 ++++++++++++++-----
 1 file changed, 129 insertions(+), 44 deletions(-)

diff --git a/libcxx/utils/synchronize_csv_status_files.py b/libcxx/utils/synchronize_csv_status_files.py
index 68df5756e884d6..5ff718e5a8f916 100755
--- a/libcxx/utils/synchronize_csv_status_files.py
+++ b/libcxx/utils/synchronize_csv_status_files.py
@@ -19,6 +19,101 @@
 # Number of the 'Libc++ Standards Conformance' project on Github
 LIBCXX_CONFORMANCE_PROJECT = '31'
 
+class PaperStatus:
+    TODO = 1
+    IN_PROGRESS = 2
+    PARTIAL = 3
+    DONE = 4
+    NOTHING_TO_DO = 5
+
+    _status: int
+
+    _original: Optional[str]
+    """
+    Optional string from which the paper status was created. This is used to carry additional
+    information from CSV rows, like any notes associated to the status.
+    """
+
+    def __init__(self, status: int, original: Optional[str] = None):
+        self._status = status
+        self._original = original
+
+    def __eq__(self, other) -> bool:
+        return self._status == other._status
+
+    def __lt__(self, other) -> bool:
+        relative_order = {
+            PaperStatus.TODO: 0,
+            PaperStatus.IN_PROGRESS: 1,
+            PaperStatus.PARTIAL: 2,
+            PaperStatus.DONE: 3,
+            PaperStatus.NOTHING_TO_DO: 3,
+        }
+        return relative_order[self._status] < relative_order[other._status]
+
+    @staticmethod
+    def from_csv_entry(entry: str):
+        """
+        Parse a paper status out of a CSV row entry. Entries can look like:
+        - '' (an empty string, which means the paper is not done yet)
+        - '|In Progress|'
+        - '|Partial|'
+        - '|Complete|'
+        - '|Nothing To Do|'
+
+        Note that since we sometimes add additional notes after the status, we only check that the entry
+        starts with the above patterns.
+        """
+        if entry == '':
+            return PaperStatus(PaperStatus.TODO, entry)
+        elif entry.startswith('|In Progress|'):
+            return PaperStatus(PaperStatus.IN_PROGRESS, entry)
+        elif entry.startswith('|Partial|'):
+            return PaperStatus(PaperStatus.PARTIAL, entry)
+        elif entry.startswith('|Complete|'):
+            return PaperStatus(PaperStatus.DONE, entry)
+        elif entry.startswith('|Nothing To Do|'):
+            return PaperStatus(PaperStatus.NOTHING_TO_DO, entry)
+        else:
+            raise RuntimeError(f'Unexpected CSV entry for status: {entry}')
+
+    @staticmethod
+    def from_github_issue(issue: Dict):
+        """
+        Parse a paper status out of a Github issue obtained from querying a Github project.
+        """
+        if 'status' not in issue:
+            return PaperStatus(PaperStatus.TODO)
+        elif issue['status'] == 'Todo':
+            return PaperStatus(PaperStatus.TODO)
+        elif issue['status'] == 'In Progress':
+            return PaperStatus(PaperStatus.IN_PROGRESS)
+        elif issue['status'] == 'Partial':
+            return PaperStatus(PaperStatus.PARTIAL)
+        elif issue['status'] == 'Done':
+            return PaperStatus(PaperStatus.DONE)
+        elif issue['status'] == 'Nothing To Do':
+            return PaperStatus(PaperStatus.NOTHING_TO_DO)
+        else:
+            raise RuntimeError(f"Received unrecognizable Github issue status: {issue['status']}")
+
+    def to_csv_entry(self) -> str:
+        """
+        Return the issue state formatted for a CSV entry. The status is formatted as '|Complete|',
+        '|In Progress|', etc.
+        """
+        mapping = {
+            PaperStatus.TODO: '',
+            PaperStatus.IN_PROGRESS: '|In Progress|',
+            PaperStatus.PARTIAL: '|Partial|',
+            PaperStatus.DONE: '|Complete|',
+            PaperStatus.NOTHING_TO_DO: '|Nothing To Do|',
+        }
+        return self._original if self._original is not None else mapping[self._status]
+
+    def is_done(self) -> bool:
+        return self._status == PaperStatus.DONE or self._status == PaperStatus.NOTHING_TO_DO
+
 class PaperInfo:
     paper_number: str
     """
@@ -30,15 +125,14 @@ class PaperInfo:
     Plain text string representing the name of the paper.
     """
 
-    meeting: Optional[str]
+    status: PaperStatus
     """
-    Plain text string representing the meeting at which the paper/issue was voted.
+    Status of the paper/issue. This can be complete, in progress, partial, or done.
     """
 
-    status: Optional[str]
+    meeting: Optional[str]
     """
-    Status of the paper/issue. This must be '|Complete|', '|Nothing To Do|', '|In Progress|',
-    '|Partial|' or 'Resolved by <something>'.
+    Plain text string representing the meeting at which the paper/issue was voted.
     """
 
     first_released_version: Optional[str]
@@ -59,15 +153,15 @@ class PaperInfo:
     """
 
     def __init__(self, paper_number: str, paper_name: str,
+                       status: PaperStatus,
                        meeting: Optional[str] = None,
-                       status: Optional[str] = None,
                        first_released_version: Optional[str] = None,
                        labels: Optional[List[str]] = None,
                        original: Optional[object] = None):
         self.paper_number = paper_number
         self.paper_name = paper_name
-        self.meeting = meeting
         self.status = status
+        self.meeting = meeting
         self.first_released_version = first_released_version
         self.labels = labels
         self.original = original
@@ -77,7 +171,7 @@ def for_printing(self) -> Tuple[str, str, str, str, str, str]:
             f'`{self.paper_number} <https://wg21.link/{self.paper_number}>`__',
             self.paper_name,
             self.meeting if self.meeting is not None else '',
-            self.status if self.status is not None else '',
+            self.status.to_csv_entry(),
             self.first_released_version if self.first_released_version is not None else '',
             ' '.join(f'|{label}|' for label in self.labels) if self.labels is not None else '',
         )
@@ -85,13 +179,6 @@ def for_printing(self) -> Tuple[str, str, str, str, str, str]:
     def __repr__(self) -> str:
         return repr(self.original) if self.original is not None else repr(self.for_printing())
 
-    def is_implemented(self) -> bool:
-        if self.status is None:
-            return False
-        if re.search(r'(in progress|partial)', self.status.lower()):
-            return False
-        return True
-
     @staticmethod
     def from_csv_row(row: Tuple[str, str, str, str, str, str]):# -> PaperInfo:
         """
@@ -105,8 +192,8 @@ def from_csv_row(row: Tuple[str, str, str, str, str, str]):# -> PaperInfo:
         return PaperInfo(
             paper_number=match.group(1),
             paper_name=row[1],
+            status=PaperStatus.from_csv_entry(row[3]),
             meeting=row[2] or None,
-            status=row[3] or None,
             first_released_version=row[4] or None,
             labels=[l.strip('|') for l in row[5].split(' ') if l] or None,
             original=row,
@@ -123,12 +210,6 @@ def from_github_issue(issue: Dict):# -> PaperInfo:
             raise RuntimeError(f"Issue doesn't have a title that we know how to parse: {issue}")
         paper = match.group(1)
 
-        # Figure out the status of the paper according to the Github project information.
-        #
-        # Sadly, we can't make a finer-grained distiction about *how* the issue
-        # was closed (such as Nothing To Do or similar).
-        status = '|Complete|' if 'status' in issue and issue['status'] == 'Done' else None
-
         # Handle labels
         valid_labels = ('format', 'ranges', 'spaceship', 'flat_containers', 'concurrency TS', 'DR')
         labels = [label for label in issue['labels'] if label in valid_labels]
@@ -136,8 +217,8 @@ def from_github_issue(issue: Dict):# -> PaperInfo:
         return PaperInfo(
             paper_number=paper,
             paper_name=issue['title'],
+            status=PaperStatus.from_github_issue(issue),
             meeting=issue.get('meeting Voted', None),
-            status=status,
             first_released_version=None, # TODO
             labels=labels if labels else None,
             original=issue,
@@ -177,30 +258,34 @@ def sync_csv(rows: List[Tuple], from_github: List[PaperInfo]) -> List[Tuple]:
 
         paper = PaperInfo.from_csv_row(row)
 
-        # If the row is already implemented, basically keep it unchanged but also validate that we're not
-        # out-of-sync with any still-open Github issue tracking the same paper.
-        if paper.is_implemented():
-            dangling = [gh for gh in from_github if gh.paper_number == paper.paper_number and not gh.is_implemented()]
-            if dangling:
-                print(f"We found the following open tracking issues for a row which is already marked as implemented:\nrow: {row}\ntracking issues: {dangling}")
-                print("The Github issue should be closed if the work has indeed been done.")
-            results.append(paper.for_printing())
-        else:
-            # Find any Github issues tracking this paper
-            tracking = [gh for gh in from_github if paper.paper_number == gh.paper_number]
+        # Find any Github issues tracking this paper. Each row must have one and exactly one Github
+        # issue tracking it, which we validate below.
+        tracking = [gh for gh in from_github if paper.paper_number == gh.paper_number]
 
-            # If there is no tracking issue for that row in the CSV, this is an error since we're
-            # missing a Github issue.
-            if not tracking:
-                raise RuntimeError(f"Can't find any Github issue for CSV row which isn't marked as done yet: {row}")
+        # If there is no tracking issue for that row in the CSV, this is an error since we're
+        # missing a Github issue.
+        if len(tracking) == 0:
+            print(f"Can't find any Github issue for CSV row: {row}")
+            results.append(row)
+            continue
 
-            # If there's more than one tracking issue, something is weird too.
-            if len(tracking) > 1:
-                raise RuntimeError(f"Found a row with more than one tracking issue: {row}\ntracked by: {tracking}")
+        # If there's more than one tracking issue, something is weird too.
+        if len(tracking) > 1:
+            print(f"Found a row with more than one tracking issue: {row}\ntracked by: {tracking}")
+            results.append(row)
+            continue
 
-            # If the issue is closed, synchronize the row based on the Github issue. Otherwise, use the
-            # existing CSV row as-is.
-            results.append(tracking[0].for_printing() if tracking[0].is_implemented() else row)
+        gh = tracking[0]
+
+        # If the CSV row has a status that is "less advanced" than the Github issue, simply update the CSV
+        # row with the newer status. Otherwise, report an error if they have a different status because
+        # something must be wrong.
+        if paper.status < gh.status:
+            results.append(gh.for_printing())
+            continue
+        elif paper.status != gh.status:
+            print(f"We found a CSV row and a Github issue with different statuses:\nrow: {row}\Github issue: {gh}")
+        results.append(row)
 
     return results
 

>From cfd4c1805ead139f84a4465719c49cca53f07f27 Mon Sep 17 00:00:00 2001
From: Slava Zakharin <szakharin at nvidia.com>
Date: Wed, 21 Aug 2024 13:37:03 -0700
Subject: [PATCH 048/116] [RFC][flang] Replace special symbols in uniqued
 global names. (#104859)

This change addresses more "issues" as the one resolved in #71338.
Some targets (e.g. NVPTX) do not accept global names containing
`.`. In particular, the global variables created to represent
the runtime information of derived types use `.` in their names.
A derived type's descriptor object may be used in the device code,
e.g. to initialize a descriptor of a variable of this type.
Thus, the runtime type info objects may need to be compiled
for the device.

Moreover, at least the derived types' descriptor objects
may need to be registered (think of `omp declare target`)
for the host-device association so that the addendum pointer
can be properly mapped to the device for descriptors using
a derived type's descriptor as their addendum pointer.
The registration implies knowing the name of the global variable
in the device image so that proper host code can be created.
So it is better to name the globals the same way for the host
and the device.

CompilerGeneratedNamesConversion pass renames all uniqued globals
such that the special symbols (currently `.`) are replaced
with `X`. The pass is supposed to be run for the host and the device.

An option is added to FIR-to-LLVM conversion pass to indicate
whether the new pass has been run before or not. This setting
affects how the codegen computes the names of the derived types'
descriptors for FIR derived types.

fir::NameUniquer now allows `X` to be part of a name, because
the name deconstruction may be applied to the mangled names
after CompilerGeneratedNamesConversion pass.
---
 .../flang/Optimizer/CodeGen/CGPasses.td       |  6 +-
 .../include/flang/Optimizer/CodeGen/CodeGen.h | 10 +++
 .../flang/Optimizer/Support/InternalNames.h   | 25 ++++--
 .../flang/Optimizer/Transforms/Passes.h       |  1 +
 .../flang/Optimizer/Transforms/Passes.td      | 18 +++++
 flang/include/flang/Tools/CLOptions.inc       | 10 +++
 flang/lib/Optimizer/CodeGen/CodeGen.cpp       | 13 ++-
 flang/lib/Optimizer/Support/InternalNames.cpp | 31 +++++--
 flang/lib/Optimizer/Transforms/CMakeLists.txt |  1 +
 .../Transforms/CompilerGeneratedNames.cpp     | 80 +++++++++++++++++++
 flang/lib/Semantics/runtime-type-info.cpp     | 69 ++++++++++------
 .../test/Driver/mlir-debug-pass-pipeline.f90  |  1 +
 flang/test/Driver/mlir-pass-pipeline.f90      |  1 +
 flang/test/Fir/basic-program.fir              |  1 +
 flang/test/Fir/convert-to-llvm.fir            |  2 +-
 flang/test/Fir/convert-type-desc-to-llvm.fir  | 29 +++++++
 flang/test/Fir/polymorphic.fir                |  4 +-
 flang/test/Fir/type-descriptor.fir            |  4 +-
 flang/test/Lower/allocatable-polymorphic.f90  | 14 ++--
 flang/test/Lower/dense-array-any-rank.f90     |  6 +-
 20 files changed, 270 insertions(+), 56 deletions(-)
 create mode 100644 flang/lib/Optimizer/Transforms/CompilerGeneratedNames.cpp
 create mode 100644 flang/test/Fir/convert-type-desc-to-llvm.fir

diff --git a/flang/include/flang/Optimizer/CodeGen/CGPasses.td b/flang/include/flang/Optimizer/CodeGen/CGPasses.td
index 989e3943882a19..e9e303df09eeba 100644
--- a/flang/include/flang/Optimizer/CodeGen/CGPasses.td
+++ b/flang/include/flang/Optimizer/CodeGen/CGPasses.td
@@ -36,7 +36,11 @@ def FIRToLLVMLowering : Pass<"fir-to-llvm-ir", "mlir::ModuleOp"> {
     Option<"forcedTargetFeatures", "target-features", "std::string",
            /*default=*/"", "Override module's target features.">,
     Option<"applyTBAA", "apply-tbaa", "bool", /*default=*/"false",
-           "Attach TBAA tags to memory accessing operations.">
+           "Attach TBAA tags to memory accessing operations.">,
+    Option<"typeDescriptorsRenamedForAssembly",
+           "type-descriptors-renamed-for-assembly", "bool", /*default=*/"false",
+           "Global variables created to describe derived types "
+           "have been renamed to avoid special symbols in their names.">
   ];
 }
 
diff --git a/flang/include/flang/Optimizer/CodeGen/CodeGen.h b/flang/include/flang/Optimizer/CodeGen/CodeGen.h
index 06961819bb19c8..390f00e1ac77c2 100644
--- a/flang/include/flang/Optimizer/CodeGen/CodeGen.h
+++ b/flang/include/flang/Optimizer/CodeGen/CodeGen.h
@@ -44,6 +44,16 @@ struct FIRToLLVMPassOptions {
 
   // Force the usage of a unified tbaa tree in TBAABuilder.
   bool forceUnifiedTBAATree = false;
+
+  // If set to true, then the global variables created
+  // for the derived types have been renamed to avoid usage
+  // of special symbols that may not be supported by all targets.
+  // The renaming is done by the CompilerGeneratedNamesConversion pass.
+  // If it is true, FIR-to-LLVM pass has to use
+  // fir::NameUniquer::getTypeDescriptorAssemblyName() to take
+  // the name of the global variable corresponding to a derived
+  // type's descriptor.
+  bool typeDescriptorsRenamedForAssembly = false;
 };
 
 /// Convert FIR to the LLVM IR dialect with default options.
diff --git a/flang/include/flang/Optimizer/Support/InternalNames.h b/flang/include/flang/Optimizer/Support/InternalNames.h
index 9e13b4a7668b7a..67ab36cf8da7ff 100644
--- a/flang/include/flang/Optimizer/Support/InternalNames.h
+++ b/flang/include/flang/Optimizer/Support/InternalNames.h
@@ -14,13 +14,23 @@
 #include <cstdint>
 #include <optional>
 
-static constexpr llvm::StringRef typeDescriptorSeparator = ".dt.";
-static constexpr llvm::StringRef componentInitSeparator = ".di.";
-static constexpr llvm::StringRef bindingTableSeparator = ".v.";
-static constexpr llvm::StringRef boxprocSuffix = "UnboxProc";
-
 namespace fir {
 
+static constexpr llvm::StringRef kNameSeparator = ".";
+static constexpr llvm::StringRef kBoundsSeparator = ".b.";
+static constexpr llvm::StringRef kComponentSeparator = ".c.";
+static constexpr llvm::StringRef kComponentInitSeparator = ".di.";
+static constexpr llvm::StringRef kDataPtrInitSeparator = ".dp.";
+static constexpr llvm::StringRef kTypeDescriptorSeparator = ".dt.";
+static constexpr llvm::StringRef kKindParameterSeparator = ".kp.";
+static constexpr llvm::StringRef kLenKindSeparator = ".lpk.";
+static constexpr llvm::StringRef kLenParameterSeparator = ".lv.";
+static constexpr llvm::StringRef kNameStringSeparator = ".n.";
+static constexpr llvm::StringRef kProcPtrSeparator = ".p.";
+static constexpr llvm::StringRef kSpecialBindingSeparator = ".s.";
+static constexpr llvm::StringRef kBindingTableSeparator = ".v.";
+static constexpr llvm::StringRef boxprocSuffix = "UnboxProc";
+
 /// Internal name mangling of identifiers
 ///
 /// In order to generate symbolically referencable artifacts in a ModuleOp,
@@ -150,6 +160,9 @@ struct NameUniquer {
   /// not a valid mangled derived type name.
   static std::string getTypeDescriptorName(llvm::StringRef mangledTypeName);
 
+  static std::string
+  getTypeDescriptorAssemblyName(llvm::StringRef mangledTypeName);
+
   /// Given a mangled derived type name, get the name of the related binding
   /// table object. Returns an empty string if \p mangledTypeName is not a valid
   /// mangled derived type name.
@@ -169,6 +182,8 @@ struct NameUniquer {
   static llvm::StringRef
   dropTypeConversionMarkers(llvm::StringRef mangledTypeName);
 
+  static std::string replaceSpecialSymbols(const std::string &name);
+
 private:
   static std::string intAsString(std::int64_t i);
   static std::string doKind(std::int64_t kind);
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.h b/flang/include/flang/Optimizer/Transforms/Passes.h
index 96b0e9714b95af..6f98e3a25ec125 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.h
+++ b/flang/include/flang/Optimizer/Transforms/Passes.h
@@ -59,6 +59,7 @@ namespace fir {
 #define GEN_PASS_DECL_VSCALEATTR
 #define GEN_PASS_DECL_FUNCTIONATTR
 #define GEN_PASS_DECL_CONSTANTARGUMENTGLOBALISATIONOPT
+#define GEN_PASS_DECL_COMPILERGENERATEDNAMESCONVERSION
 
 #include "flang/Optimizer/Transforms/Passes.h.inc"
 
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index c703a62c03b7d9..a0211384667ed1 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -170,6 +170,24 @@ def ExternalNameConversion : Pass<"external-name-interop", "mlir::ModuleOp"> {
   ];
 }
 
+def CompilerGeneratedNamesConversion : Pass<"compiler-generated-names",
+    "mlir::ModuleOp"> {
+  let summary = "Convert names of compiler generated globals";
+  let description = [{
+    Transforms names of compiler generated globals to avoid
+    characters that might be unsupported by some target toolchains.
+    All special symbols are replaced with a predefined 'X' character.
+    This is only done for uniqued names that are not externally facing.
+    The uniqued names always use '_Q' prefix, and the user entity names
+    are always lower cased, so using 'X' instead of the special symbols
+    will guarantee that the converted name will not conflict with the user
+    space. This pass does not affect the externally facing names,
+    because the expectation is that the compiler will not generate
+    externally facing names on its own, and these names cannot use
+    special symbols.
+  }];
+}
+
 def MemRefDataFlowOpt : Pass<"fir-memref-dataflow-opt", "::mlir::func::FuncOp"> {
   let summary =
     "Perform store/load forwarding and potentially removing dead stores.";
diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index 7df50449494631..57b90017d052e4 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -93,6 +93,8 @@ DisableOption(ExternalNameConversion, "external-name-interop",
     "convert names with external convention");
 EnableOption(ConstantArgumentGlobalisation, "constant-argument-globalisation",
     "the local constant argument to global constant conversion");
+DisableOption(CompilerGeneratedNamesConversion, "compiler-generated-names",
+    "replace special symbols in compiler generated names");
 
 using PassConstructor = std::unique_ptr<mlir::Pass>();
 
@@ -222,6 +224,8 @@ inline void addFIRToLLVMPass(
   options.ignoreMissingTypeDescriptors = ignoreMissingTypeDescriptors;
   options.applyTBAA = config.AliasAnalysis;
   options.forceUnifiedTBAATree = useOldAliasTags;
+  options.typeDescriptorsRenamedForAssembly =
+      !disableCompilerGeneratedNamesConversion;
   addPassConditionally(pm, disableFirToLlvmIr,
       [&]() { return fir::createFIRToLLVMPass(options); });
   // The dialect conversion framework may leave dead unrealized_conversion_cast
@@ -248,6 +252,11 @@ inline void addExternalNameConversionPass(
       [&]() { return fir::createExternalNameConversion({appendUnderscore}); });
 }
 
+inline void addCompilerGeneratedNamesConversionPass(mlir::PassManager &pm) {
+  addPassConditionally(pm, disableCompilerGeneratedNamesConversion,
+      [&]() { return fir::createCompilerGeneratedNamesConversion(); });
+}
+
 // Use inliner extension point callback to register the default inliner pass.
 inline void registerDefaultInlinerPass(MLIRToLLVMPassPipelineConfig &config) {
   config.registerFIRInlinerCallback(
@@ -379,6 +388,7 @@ inline void createDefaultFIRCodeGenPassPipeline(mlir::PassManager &pm,
   fir::addCodeGenRewritePass(
       pm, (config.DebugInfo != llvm::codegenoptions::NoDebugInfo));
   fir::addTargetRewritePass(pm);
+  fir::addCompilerGeneratedNamesConversionPass(pm);
   fir::addExternalNameConversionPass(pm, config.Underscoring);
   fir::createDebugPasses(pm, config.DebugInfo, config.OptLevel, inputFilename);
 
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index 1713cf98a8b961..e419b261252995 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -1201,7 +1201,9 @@ struct EmboxCommonConversion : public fir::FIROpConversion<OP> {
                                 mlir::Location loc,
                                 fir::RecordType recType) const {
     std::string name =
-        fir::NameUniquer::getTypeDescriptorName(recType.getName());
+        this->options.typeDescriptorsRenamedForAssembly
+            ? fir::NameUniquer::getTypeDescriptorAssemblyName(recType.getName())
+            : fir::NameUniquer::getTypeDescriptorName(recType.getName());
     mlir::Type llvmPtrTy = ::getLlvmPtrType(mod.getContext());
     if (auto global = mod.template lookupSymbol<fir::GlobalOp>(name)) {
       return rewriter.create<mlir::LLVM::AddressOfOp>(loc, llvmPtrTy,
@@ -2704,7 +2706,10 @@ struct TypeDescOpConversion : public fir::FIROpConversion<fir::TypeDescOp> {
     auto recordType = mlir::dyn_cast<fir::RecordType>(inTy);
     auto module = typeDescOp.getOperation()->getParentOfType<mlir::ModuleOp>();
     std::string typeDescName =
-        fir::NameUniquer::getTypeDescriptorName(recordType.getName());
+        this->options.typeDescriptorsRenamedForAssembly
+            ? fir::NameUniquer::getTypeDescriptorAssemblyName(
+                  recordType.getName())
+            : fir::NameUniquer::getTypeDescriptorName(recordType.getName());
     auto llvmPtrTy = ::getLlvmPtrType(typeDescOp.getContext());
     if (auto global = module.lookupSymbol<mlir::LLVM::GlobalOp>(typeDescName)) {
       rewriter.replaceOpWithNewOp<mlir::LLVM::AddressOfOp>(
@@ -3653,6 +3658,10 @@ class FIRToLLVMLowering
     if (!forcedTargetFeatures.empty())
       fir::setTargetFeatures(mod, forcedTargetFeatures);
 
+    if (typeDescriptorsRenamedForAssembly)
+      options.typeDescriptorsRenamedForAssembly =
+          typeDescriptorsRenamedForAssembly;
+
     // Run dynamic pass pipeline for converting Math dialect
     // operations into other dialects (llvm, func, etc.).
     // Some conversions of Math operations cannot be done
diff --git a/flang/lib/Optimizer/Support/InternalNames.cpp b/flang/lib/Optimizer/Support/InternalNames.cpp
index b2e2cd38f48e60..58a5da5de79720 100644
--- a/flang/lib/Optimizer/Support/InternalNames.cpp
+++ b/flang/lib/Optimizer/Support/InternalNames.cpp
@@ -16,6 +16,7 @@
 #include "mlir/IR/Diagnostics.h"
 #include "llvm/Support/CommandLine.h"
 #include <optional>
+#include <regex>
 
 static llvm::cl::opt<std::string> mainEntryName(
     "main-entry-name",
@@ -59,7 +60,11 @@ convertToStringRef(const std::optional<std::string> &from) {
 
 static std::string readName(llvm::StringRef uniq, std::size_t &i,
                             std::size_t init, std::size_t end) {
-  for (i = init; i < end && (uniq[i] < 'A' || uniq[i] > 'Z'); ++i) {
+  // Allow 'X' to be part of the mangled name, which
+  // can happen after the special symbols are replaced
+  // in the mangled names by CompilerGeneratedNamesConversionPass.
+  for (i = init; i < end && (uniq[i] < 'A' || uniq[i] > 'Z' || uniq[i] == 'X');
+       ++i) {
     // do nothing
   }
   return uniq.substr(init, i - init).str();
@@ -348,7 +353,7 @@ mangleTypeDescriptorKinds(llvm::ArrayRef<std::int64_t> kinds) {
     return "";
   std::string result;
   for (std::int64_t kind : kinds)
-    result += "." + std::to_string(kind);
+    result += (fir::kNameSeparator + std::to_string(kind)).str();
   return result;
 }
 
@@ -373,12 +378,18 @@ static std::string getDerivedTypeObjectName(llvm::StringRef mangledTypeName,
 
 std::string
 fir::NameUniquer::getTypeDescriptorName(llvm::StringRef mangledTypeName) {
-  return getDerivedTypeObjectName(mangledTypeName, typeDescriptorSeparator);
+  return getDerivedTypeObjectName(mangledTypeName,
+                                  fir::kTypeDescriptorSeparator);
+}
+
+std::string fir::NameUniquer::getTypeDescriptorAssemblyName(
+    llvm::StringRef mangledTypeName) {
+  return replaceSpecialSymbols(getTypeDescriptorName(mangledTypeName));
 }
 
 std::string fir::NameUniquer::getTypeDescriptorBindingTableName(
     llvm::StringRef mangledTypeName) {
-  return getDerivedTypeObjectName(mangledTypeName, bindingTableSeparator);
+  return getDerivedTypeObjectName(mangledTypeName, fir::kBindingTableSeparator);
 }
 
 std::string
@@ -386,13 +397,17 @@ fir::NameUniquer::getComponentInitName(llvm::StringRef mangledTypeName,
                                        llvm::StringRef componentName) {
 
   std::string prefix =
-      getDerivedTypeObjectName(mangledTypeName, componentInitSeparator);
-  return prefix + "." + componentName.str();
+      getDerivedTypeObjectName(mangledTypeName, fir::kComponentInitSeparator);
+  return (prefix + fir::kNameSeparator + componentName).str();
 }
 
 llvm::StringRef
 fir::NameUniquer::dropTypeConversionMarkers(llvm::StringRef mangledTypeName) {
-  if (mangledTypeName.ends_with(boxprocSuffix))
-    return mangledTypeName.drop_back(boxprocSuffix.size());
+  if (mangledTypeName.ends_with(fir::boxprocSuffix))
+    return mangledTypeName.drop_back(fir::boxprocSuffix.size());
   return mangledTypeName;
 }
+
+std::string fir::NameUniquer::replaceSpecialSymbols(const std::string &name) {
+  return std::regex_replace(name, std::regex{"\\."}, "X");
+}
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index 3869633bd98e02..bf0a8d14d95df6 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -6,6 +6,7 @@ add_flang_library(FIRTransforms
   AnnotateConstant.cpp
   AssumedRankOpConversion.cpp
   CharacterConversion.cpp
+  CompilerGeneratedNames.cpp
   ConstantArgumentGlobalisation.cpp
   ControlFlowConverter.cpp
   CufOpConversion.cpp
diff --git a/flang/lib/Optimizer/Transforms/CompilerGeneratedNames.cpp b/flang/lib/Optimizer/Transforms/CompilerGeneratedNames.cpp
new file mode 100644
index 00000000000000..7f2cc41275e593
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/CompilerGeneratedNames.cpp
@@ -0,0 +1,80 @@
+//=== CompilerGeneratedNames.cpp - convert special symbols in global names ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/FIRDialect.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIROpsSupport.h"
+#include "flang/Optimizer/Support/InternalNames.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/IR/Attributes.h"
+#include "mlir/IR/SymbolTable.h"
+#include "mlir/Pass/Pass.h"
+
+namespace fir {
+#define GEN_PASS_DEF_COMPILERGENERATEDNAMESCONVERSION
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+using namespace mlir;
+
+namespace {
+
+class CompilerGeneratedNamesConversionPass
+    : public fir::impl::CompilerGeneratedNamesConversionBase<
+          CompilerGeneratedNamesConversionPass> {
+public:
+  using CompilerGeneratedNamesConversionBase<
+      CompilerGeneratedNamesConversionPass>::
+      CompilerGeneratedNamesConversionBase;
+
+  mlir::ModuleOp getModule() { return getOperation(); }
+  void runOnOperation() override;
+};
+} // namespace
+
+void CompilerGeneratedNamesConversionPass::runOnOperation() {
+  auto op = getOperation();
+  auto *context = &getContext();
+
+  llvm::DenseMap<mlir::StringAttr, mlir::FlatSymbolRefAttr> remappings;
+  for (auto &funcOrGlobal : op->getRegion(0).front()) {
+    if (llvm::isa<mlir::func::FuncOp>(funcOrGlobal) ||
+        llvm::isa<fir::GlobalOp>(funcOrGlobal)) {
+      auto symName = funcOrGlobal.getAttrOfType<mlir::StringAttr>(
+          mlir::SymbolTable::getSymbolAttrName());
+      auto deconstructedName = fir::NameUniquer::deconstruct(symName);
+      if (deconstructedName.first != fir::NameUniquer::NameKind::NOT_UNIQUED &&
+          !fir::NameUniquer::isExternalFacingUniquedName(deconstructedName)) {
+        std::string newName =
+            fir::NameUniquer::replaceSpecialSymbols(symName.getValue().str());
+        if (newName != symName) {
+          auto newAttr = mlir::StringAttr::get(context, newName);
+          mlir::SymbolTable::setSymbolName(&funcOrGlobal, newAttr);
+          auto newSymRef = mlir::FlatSymbolRefAttr::get(newAttr);
+          remappings.try_emplace(symName, newSymRef);
+        }
+      }
+    }
+  }
+
+  if (remappings.empty())
+    return;
+
+  // Update all uses of the functions and globals that have been renamed.
+  op.walk([&remappings](mlir::Operation *nestedOp) {
+    llvm::SmallVector<std::pair<mlir::StringAttr, mlir::SymbolRefAttr>> updates;
+    for (const mlir::NamedAttribute &attr : nestedOp->getAttrDictionary())
+      if (auto symRef = llvm::dyn_cast<mlir::SymbolRefAttr>(attr.getValue()))
+        if (auto remap = remappings.find(symRef.getRootReference());
+            remap != remappings.end())
+          updates.emplace_back(std::pair<mlir::StringAttr, mlir::SymbolRefAttr>{
+              attr.getName(), mlir::SymbolRefAttr(remap->second)});
+    for (auto update : updates)
+      nestedOp->setAttr(update.first, update.second);
+  });
+}
diff --git a/flang/lib/Semantics/runtime-type-info.cpp b/flang/lib/Semantics/runtime-type-info.cpp
index 66909241966735..9f3eb5fbe11a15 100644
--- a/flang/lib/Semantics/runtime-type-info.cpp
+++ b/flang/lib/Semantics/runtime-type-info.cpp
@@ -12,6 +12,7 @@
 #include "flang/Evaluate/fold.h"
 #include "flang/Evaluate/tools.h"
 #include "flang/Evaluate/type.h"
+#include "flang/Optimizer/Support/InternalNames.h"
 #include "flang/Semantics/scope.h"
 #include "flang/Semantics/tools.h"
 #include <functional>
@@ -377,9 +378,12 @@ static std::optional<std::string> GetSuffixIfTypeKindParameters(
           if (pv->GetExplicit()) {
             if (auto instantiatedValue{evaluate::ToInt64(*pv->GetExplicit())}) {
               if (suffix.has_value()) {
-                *suffix += "."s + std::to_string(*instantiatedValue);
+                *suffix +=
+                    (fir::kNameSeparator + llvm::Twine(*instantiatedValue))
+                        .str();
               } else {
-                suffix = "."s + std::to_string(*instantiatedValue);
+                suffix = (fir::kNameSeparator + llvm::Twine(*instantiatedValue))
+                             .str();
               }
             }
           }
@@ -448,7 +452,7 @@ const Symbol *RuntimeTableBuilder::DescribeType(Scope &dtScope) {
   } else if (isPDTDefinitionWithKindParameters) {
     return nullptr;
   }
-  std::string dtDescName{".dt."s + distinctName};
+  std::string dtDescName{(fir::kTypeDescriptorSeparator + distinctName).str()};
   Scope *dtSymbolScope{const_cast<Scope *>(dtSymbol->scope())};
   Scope &scope{
       GetContainingNonDerivedScope(dtSymbolScope ? *dtSymbolScope : dtScope)};
@@ -518,11 +522,13 @@ const Symbol *RuntimeTableBuilder::DescribeType(Scope &dtScope) {
     }
   }
   AddValue(dtValues, derivedTypeSchema_, "kindparameter"s,
-      SaveNumericPointerTarget<Int8>(
-          scope, SaveObjectName(".kp."s + distinctName), std::move(kinds)));
+      SaveNumericPointerTarget<Int8>(scope,
+          SaveObjectName((fir::kKindParameterSeparator + distinctName).str()),
+          std::move(kinds)));
   AddValue(dtValues, derivedTypeSchema_, "lenparameterkind"s,
-      SaveNumericPointerTarget<Int1>(
-          scope, SaveObjectName(".lpk."s + distinctName), std::move(lenKinds)));
+      SaveNumericPointerTarget<Int1>(scope,
+          SaveObjectName((fir::kLenKindSeparator + distinctName).str()),
+          std::move(lenKinds)));
   // Traverse the components of the derived type
   if (!isPDTDefinitionWithKindParameters) {
     std::vector<const Symbol *> dataComponentSymbols;
@@ -570,13 +576,15 @@ const Symbol *RuntimeTableBuilder::DescribeType(Scope &dtScope) {
               dtScope, distinctName, parameters));
     }
     AddValue(dtValues, derivedTypeSchema_, "component"s,
-        SaveDerivedPointerTarget(scope, SaveObjectName(".c."s + distinctName),
+        SaveDerivedPointerTarget(scope,
+            SaveObjectName((fir::kComponentSeparator + distinctName).str()),
             std::move(dataComponents),
             evaluate::ConstantSubscripts{
                 static_cast<evaluate::ConstantSubscript>(
                     dataComponents.size())}));
     AddValue(dtValues, derivedTypeSchema_, "procptr"s,
-        SaveDerivedPointerTarget(scope, SaveObjectName(".p."s + distinctName),
+        SaveDerivedPointerTarget(scope,
+            SaveObjectName((fir::kProcPtrSeparator + distinctName).str()),
             std::move(procPtrComponents),
             evaluate::ConstantSubscripts{
                 static_cast<evaluate::ConstantSubscript>(
@@ -587,7 +595,9 @@ const Symbol *RuntimeTableBuilder::DescribeType(Scope &dtScope) {
       std::vector<evaluate::StructureConstructor> bindings{
           DescribeBindings(dtScope, scope)};
       AddValue(dtValues, derivedTypeSchema_, bindingDescCompName,
-          SaveDerivedPointerTarget(scope, SaveObjectName(".v."s + distinctName),
+          SaveDerivedPointerTarget(scope,
+              SaveObjectName(
+                  (fir::kBindingTableSeparator + distinctName).str()),
               std::move(bindings),
               evaluate::ConstantSubscripts{
                   static_cast<evaluate::ConstantSubscript>(bindings.size())}));
@@ -623,7 +633,9 @@ const Symbol *RuntimeTableBuilder::DescribeType(Scope &dtScope) {
         sortedSpecials.emplace_back(std::move(pair.second));
       }
       AddValue(dtValues, derivedTypeSchema_, "special"s,
-          SaveDerivedPointerTarget(scope, SaveObjectName(".s."s + distinctName),
+          SaveDerivedPointerTarget(scope,
+              SaveObjectName(
+                  (fir::kSpecialBindingSeparator + distinctName).str()),
               std::move(sortedSpecials),
               evaluate::ConstantSubscripts{
                   static_cast<evaluate::ConstantSubscript>(specials.size())}));
@@ -730,10 +742,12 @@ SomeExpr RuntimeTableBuilder::SaveNameAsPointerTarget(
   using evaluate::Ascii;
   using AsciiExpr = evaluate::Expr<Ascii>;
   object.set_init(evaluate::AsGenericExpr(AsciiExpr{name}));
-  Symbol &symbol{*scope
-                      .try_emplace(SaveObjectName(".n."s + name),
-                          Attrs{Attr::TARGET, Attr::SAVE}, std::move(object))
-                      .first->second};
+  Symbol &symbol{
+      *scope
+           .try_emplace(
+               SaveObjectName((fir::kNameStringSeparator + name).str()),
+               Attrs{Attr::TARGET, Attr::SAVE}, std::move(object))
+           .first->second};
   SetReadOnlyCompilerCreatedFlags(symbol);
   return evaluate::AsGenericExpr(
       AsciiExpr{evaluate::Designator<Ascii>{symbol}});
@@ -821,8 +835,9 @@ evaluate::StructureConstructor RuntimeTableBuilder::DescribeComponent(
   if (!lenParams.empty()) {
     AddValue(values, componentSchema_, "lenvalue"s,
         SaveDerivedPointerTarget(scope,
-            SaveObjectName(
-                ".lv."s + distinctName + "."s + symbol.name().ToString()),
+            SaveObjectName((fir::kLenParameterSeparator + distinctName +
+                fir::kNameSeparator + symbol.name().ToString())
+                               .str()),
             std::move(lenParams),
             evaluate::ConstantSubscripts{
                 static_cast<evaluate::ConstantSubscript>(lenParams.size())}));
@@ -845,8 +860,9 @@ evaluate::StructureConstructor RuntimeTableBuilder::DescribeComponent(
     }
     AddValue(values, componentSchema_, "bounds"s,
         SaveDerivedPointerTarget(scope,
-            SaveObjectName(
-                ".b."s + distinctName + "."s + symbol.name().ToString()),
+            SaveObjectName((fir::kBoundsSeparator + distinctName +
+                fir::kNameSeparator + symbol.name().ToString())
+                               .str()),
             std::move(bounds), evaluate::ConstantSubscripts{2, rank}));
   } else {
     AddValue(
@@ -868,8 +884,9 @@ evaluate::StructureConstructor RuntimeTableBuilder::DescribeComponent(
     if (hasDataInit) {
       AddValue(values, componentSchema_, "initialization"s,
           SaveObjectInit(scope,
-              SaveObjectName(
-                  ".di."s + distinctName + "."s + symbol.name().ToString()),
+              SaveObjectName((fir::kComponentInitSeparator + distinctName +
+                  fir::kNameSeparator + symbol.name().ToString())
+                                 .str()),
               object));
     }
   }
@@ -918,8 +935,9 @@ bool RuntimeTableBuilder::InitializeDataPointer(
     const ObjectEntityDetails &object, Scope &scope, Scope &dtScope,
     const std::string &distinctName) {
   if (object.init().has_value()) {
-    SourceName ptrDtName{SaveObjectName(
-        ".dp."s + distinctName + "."s + symbol.name().ToString())};
+    SourceName ptrDtName{SaveObjectName((fir::kDataPtrInitSeparator +
+        distinctName + fir::kNameSeparator + symbol.name().ToString())
+                                            .str())};
     Symbol &ptrDtSym{
         *scope.try_emplace(ptrDtName, Attrs{}, UnknownDetails{}).first->second};
     SetReadOnlyCompilerCreatedFlags(ptrDtSym);
@@ -952,8 +970,9 @@ bool RuntimeTableBuilder::InitializeDataPointer(
         Structure(ptrDtDeclType, std::move(ptrInitValues))));
     AddValue(values, componentSchema_, "initialization"s,
         SaveObjectInit(scope,
-            SaveObjectName(
-                ".di."s + distinctName + "."s + symbol.name().ToString()),
+            SaveObjectName((fir::kComponentInitSeparator + distinctName +
+                fir::kNameSeparator + symbol.name().ToString())
+                               .str()),
             ptrInitObj));
     return true;
   } else {
diff --git a/flang/test/Driver/mlir-debug-pass-pipeline.f90 b/flang/test/Driver/mlir-debug-pass-pipeline.f90
index 6e9846fa422e55..a6316ee7c83123 100644
--- a/flang/test/Driver/mlir-debug-pass-pipeline.f90
+++ b/flang/test/Driver/mlir-debug-pass-pipeline.f90
@@ -109,6 +109,7 @@
 ! ALL-NEXT: CodeGenRewrite
 ! ALL-NEXT:   (S) 0 num-dce'd - Number of operations eliminated
 ! ALL-NEXT: TargetRewrite
+! ALL-NEXT: CompilerGeneratedNamesConversion
 ! ALL-NEXT: ExternalNameConversion
 ! DEBUG-NEXT: AddDebugInfo
 ! NO-DEBUG-NOT: AddDebugInfo
diff --git a/flang/test/Driver/mlir-pass-pipeline.f90 b/flang/test/Driver/mlir-pass-pipeline.f90
index db4551e93fe64c..2f35f928e99cfc 100644
--- a/flang/test/Driver/mlir-pass-pipeline.f90
+++ b/flang/test/Driver/mlir-pass-pipeline.f90
@@ -118,6 +118,7 @@
 ! ALL-NEXT: CodeGenRewrite
 ! ALL-NEXT:   (S) 0 num-dce'd - Number of operations eliminated
 ! ALL-NEXT: TargetRewrite
+! ALL-NEXT: CompilerGeneratedNamesConversion
 ! ALL-NEXT: ExternalNameConversion
 ! ALL-NEXT: FIRToLLVMLowering
 ! ALL-NOT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index dda4f32872fef5..bca454c13ff9cc 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -118,6 +118,7 @@ func.func @_QQmain() {
 // PASSES-NEXT: CodeGenRewrite
 // PASSES-NEXT:   (S) 0 num-dce'd - Number of operations eliminated
 // PASSES-NEXT: TargetRewrite
+// PASSES-NEXT: CompilerGeneratedNamesConversion
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/convert-to-llvm.fir b/flang/test/Fir/convert-to-llvm.fir
index 194a11456f2569..a4e8170af036c9 100644
--- a/flang/test/Fir/convert-to-llvm.fir
+++ b/flang/test/Fir/convert-to-llvm.fir
@@ -1,7 +1,7 @@
 // RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=x86_64-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
 // RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=aarch64-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
 // RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=i386-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
-// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=powerpc64le-unknown-linux-gn" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
+// RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=powerpc64le-unknown-linux-gnu" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
 // RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=x86_64-pc-win32" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT,GENERIC
 // RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=aarch64-apple-darwin" %s | FileCheck %s --check-prefixes=CHECK,CHECK-NO-COMDAT,GENERIC 
 // RUN: fir-opt --split-input-file --fir-to-llvm-ir="target=amdgcn-amd-amdhsa, datalayout=e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-P0" %s | FileCheck -check-prefixes=CHECK,AMDGPU %s
diff --git a/flang/test/Fir/convert-type-desc-to-llvm.fir b/flang/test/Fir/convert-type-desc-to-llvm.fir
new file mode 100644
index 00000000000000..251c95d9c84216
--- /dev/null
+++ b/flang/test/Fir/convert-type-desc-to-llvm.fir
@@ -0,0 +1,29 @@
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=x86_64-unknown-linux-gnu type-descriptors-renamed-for-assembly=true" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=aarch64-unknown-linux-gnu type-descriptors-renamed-for-assembly=true" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=i386-unknown-linux-gnu type-descriptors-renamed-for-assembly=true" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=powerpc64le-unknown-linux-gnu type-descriptors-renamed-for-assembly=true" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=x86_64-pc-win32 type-descriptors-renamed-for-assembly=true" %s | FileCheck %s --check-prefixes=CHECK,CHECK-COMDAT
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=aarch64-apple-darwin type-descriptors-renamed-for-assembly=true" %s | FileCheck %s --check-prefixes=CHECK,CHECK-NO-COMDAT
+// RUN: fir-opt --split-input-file --compiler-generated-names --fir-to-llvm-ir="target=amdgcn-amd-amdhsa type-descriptors-renamed-for-assembly=1 datalayout=e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-P0" %s | FileCheck -check-prefixes=CHECK %s
+
+// Check descriptor for a derived type. Check that the f18Addendum flag is set
+// to 1 meaning the addendum is present (true) and the addendum values are
+// inserted.
+
+fir.global linkonce @_QMtest_dinitE.dt.tseq constant : i8
+
+func.func @embox1(%arg0: !fir.ref<!fir.type<_QMtest_dinitTtseq{i:i32}>>) {
+  %0 = fir.embox %arg0() : (!fir.ref<!fir.type<_QMtest_dinitTtseq{i:i32}>>) -> !fir.box<!fir.type<_QMtest_dinitTtseq{i:i32}>>
+  return
+}
+
+// CHECK-COMDAT: llvm.mlir.global linkonce constant @_QMtest_dinitEXdtXtseq() comdat(@__llvm_comdat::@_QMtest_dinitEXdtXtseq) {addr_space = 0 : i32} : i8
+// CHECK-NO-COMDAT: llvm.mlir.global linkonce constant @_QMtest_dinitEXdtXtseq() {addr_space = 0 : i32} : i8
+// CHECK-LABEL: llvm.func @embox1
+// CHECK:         %[[TYPE_CODE:.*]] = llvm.mlir.constant(42 : i32) : i32
+// CHECK:         %[[VERSION:.*]] = llvm.mlir.constant(20240719 : i32) : i32
+// CHECK:         %{{.*}} = llvm.insertvalue %[[VERSION]], %{{.*}}[2] : !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, ptr, array<1 x i64>)> 
+// CHECK:         %[[TYPE_CODE_I8:.*]] = llvm.trunc %[[TYPE_CODE]] : i32 to i8
+// CHECK:         %{{.*}} = llvm.insertvalue %[[TYPE_CODE_I8]], %{{.*}}[4] : !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, ptr, array<1 x i{{.*}}>)>
+// CHECK:         %[[TDESC:.*]] = llvm.mlir.addressof @_QMtest_dinitEXdtXtseq : !llvm.ptr
+// CHECK:         %{{.*}} = llvm.insertvalue %[[TDESC]], %{{.*}}[7] : !llvm.struct<(ptr, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, i{{.*}}, ptr, array<1 x i{{.*}}>)>
diff --git a/flang/test/Fir/polymorphic.fir b/flang/test/Fir/polymorphic.fir
index a6b166367a4a1b..40204314e8df79 100644
--- a/flang/test/Fir/polymorphic.fir
+++ b/flang/test/Fir/polymorphic.fir
@@ -157,7 +157,7 @@ func.func @_QQmain() {
 // CHECK-LABEL: define void @_QQmain(){{.*}}{
 // CHECK: %[[CLASS_NONE:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }
 // CHECK: %[[DESC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-// CHECK: store { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } { ptr @_QMmod1Ea, i64 ptrtoint (ptr getelementptr (%_QMmod1TtK2, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 42, i8 1, i8 1, ptr @_QMmod1E.dt.t.2, [1 x i64] zeroinitializer }, ptr %[[CLASS_NONE]], align 8
+// CHECK: store { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } { ptr @_QMmod1Ea, i64 ptrtoint (ptr getelementptr (%_QMmod1TtK2, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 42, i8 1, i8 1, ptr @_QMmod1EXdtXtX2, [1 x i64] zeroinitializer }, ptr %[[CLASS_NONE]], align 8
 // CHECK: %[[LOAD:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %[[CLASS_NONE]]
 // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } %[[LOAD]], ptr %[[DESC]]
 // CHECK: call void @_QMmod1Psub1(ptr %[[DESC]])
@@ -197,4 +197,4 @@ func.func @_QQembox_input_type(%arg0 : !fir.ref<!fir.type<_QMmod1Tp2{v:!fir.arra
 }
 
 // CHECK-LABEL: define void @_QQembox_input_type
-// CHECK: %{{.*}} = insertvalue { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } { ptr undef, i64 ptrtoint (ptr getelementptr (%_QMmod1Tp2, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 42, i8 0, i8 1, ptr @_QMmod1E.dt.p2, [1 x i64] zeroinitializer }, ptr %{{.*}}, 0
+// CHECK: %{{.*}} = insertvalue { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } { ptr undef, i64 ptrtoint (ptr getelementptr (%_QMmod1Tp2, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 42, i8 0, i8 1, ptr @_QMmod1EXdtXp2, [1 x i64] zeroinitializer }, ptr %{{.*}}, 0
diff --git a/flang/test/Fir/type-descriptor.fir b/flang/test/Fir/type-descriptor.fir
index f0ebd8ddeee1e9..3b58a2f68251a7 100644
--- a/flang/test/Fir/type-descriptor.fir
+++ b/flang/test/Fir/type-descriptor.fir
@@ -14,7 +14,7 @@ fir.global internal @_QFfooEx : !fir.box<!fir.heap<!sometype>> {
 }
 // CHECK: @_QFfooEx = internal global { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }
 // CHECK-SAME: { ptr null, i64 ptrtoint (ptr getelementptr (%_QFfooTsometype, ptr null, i32 1) to i64),
-// CHECK-SAME: i32 20240719, i8 0, i8 42, i8 2, i8 1, ptr @_QFfooE.dt.sometype, [1 x i64] zeroinitializer }
+// CHECK-SAME: i32 20240719, i8 0, i8 42, i8 2, i8 1, ptr @_QFfooEXdtXsometype, [1 x i64] zeroinitializer }
 
 !some_pdt_type = !fir.type<_QFfooTsome_pdt_typeK42K43{num:i32,values:!fir.box<!fir.ptr<!fir.array<?x?xf32>>>}>
 fir.global internal @_QFfooE.dt.some_pdt_type.42.43 constant : i8
@@ -26,4 +26,4 @@ fir.global internal @_QFfooEx2 : !fir.box<!fir.heap<!some_pdt_type>> {
 }
 // CHECK: @_QFfooEx2 = internal global { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }
 // CHECK-SAME: { ptr null, i64 ptrtoint (ptr getelementptr (%_QFfooTsome_pdt_typeK42K43, ptr null, i32 1) to i64),
-// CHECK-SAME: i32 20240719, i8 0, i8 42, i8 2, i8 1, ptr @_QFfooE.dt.some_pdt_type.42.43, [1 x i64] zeroinitializer }
+// CHECK-SAME: i32 20240719, i8 0, i8 42, i8 2, i8 1, ptr @_QFfooEXdtXsome_pdt_typeX42X43, [1 x i64] zeroinitializer }
diff --git a/flang/test/Lower/allocatable-polymorphic.f90 b/flang/test/Lower/allocatable-polymorphic.f90
index 8fe06450d6119e..e23e38ffb4b013 100644
--- a/flang/test/Lower/allocatable-polymorphic.f90
+++ b/flang/test/Lower/allocatable-polymorphic.f90
@@ -591,16 +591,16 @@ program test_alloc
 
 ! LLVM-LABEL: define void @_QMpolyPtest_allocatable()
 
-! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyE.dt.p1, i32 0, i32 0)
+! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyEXdtXp1, i32 0, i32 0)
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableAllocate(ptr %{{.*}}, i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
-! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyE.dt.p1, i32 0, i32 0)
+! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyEXdtXp1, i32 0, i32 0)
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableAllocate(ptr %{{.*}}, i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
-! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyE.dt.p2, i32 0, i32 0)
+! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyEXdtXp2, i32 0, i32 0)
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableAllocate(ptr %{{.*}}, i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
-! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyE.dt.p1, i32 1, i32 0)
+! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyEXdtXp1, i32 1, i32 0)
 ! LLVM: %{{.*}} = call {} @_FortranAAllocatableSetBounds(ptr %{{.*}}, i32 0, i64 1, i64 10)
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableAllocate(ptr %{{.*}}, i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
-! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyE.dt.p2, i32 1, i32 0)
+! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %{{.*}}, ptr @_QMpolyEXdtXp2, i32 1, i32 0)
 ! LLVM: %{{.*}} = call {} @_FortranAAllocatableSetBounds(ptr %{{.*}}, i32 0, i64 1, i64 20)
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableAllocate(ptr %{{.*}}, i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
 ! LLVM-COUNT-2:  call void %{{.*}}()
@@ -685,9 +685,9 @@ program test_alloc
 ! allocatable.
 
 ! LLVM-LABEL: define void @_QMpolyPtest_deallocate()
-! LLVM: store { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } { ptr null, i64 ptrtoint (ptr getelementptr (%_QMpolyTp1, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 42, i8 2, i8 1, ptr @_QMpolyE.dt.p1, [1 x i64] zeroinitializer }, ptr %[[ALLOCA1:[0-9]*]]
+! LLVM: store { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } { ptr null, i64 ptrtoint (ptr getelementptr (%_QMpolyTp1, ptr null, i32 1) to i64), i32 20240719, i8 0, i8 42, i8 2, i8 1, ptr @_QMpolyEXdtXp1, [1 x i64] zeroinitializer }, ptr %[[ALLOCA1:[0-9]*]]
 ! LLVM: %[[LOAD:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %[[ALLOCA1]]
 ! LLVM: store { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] } %[[LOAD]], ptr %[[ALLOCA2:[0-9]*]]
-! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %[[ALLOCA2]], ptr @_QMpolyE.dt.p1, i32 0, i32 0)
+! LLVM: %{{.*}} = call {} @_FortranAAllocatableInitDerivedForAllocate(ptr %[[ALLOCA2]], ptr @_QMpolyEXdtXp1, i32 0, i32 0)
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableAllocate(ptr %[[ALLOCA2]], i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
 ! LLVM: %{{.*}} = call i32 @_FortranAAllocatableDeallocatePolymorphic(ptr %[[ALLOCA2]], ptr {{.*}}, i1 false, ptr null, ptr @_QQclX{{.*}}, i32 {{.*}})
diff --git a/flang/test/Lower/dense-array-any-rank.f90 b/flang/test/Lower/dense-array-any-rank.f90
index 437fdec2da10ec..129adf41de07ff 100644
--- a/flang/test/Lower/dense-array-any-rank.f90
+++ b/flang/test/Lower/dense-array-any-rank.f90
@@ -14,12 +14,12 @@ subroutine test()
 
 ! a1 array constructor
 ! CHECK-FIR: fir.global internal @_QQro.10xi4.{{.*}}(dense<[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]> : tensor<10xi32>) constant : !fir.array<10xi32>
-! CHECK-LLVMIR: @_QQro.10xi4.0 = internal constant [10 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10]
+! CHECK-LLVMIR: @_QQroX10xi4X0 = internal constant [10 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10]
 
 ! a2 array constructor
 ! CHECK-FIR: fir.global internal @_QQro.3x4xi4.{{.*}}(dense<{{\[\[11, 12, 13], \[21, 22, 23], \[31, 32, 33], \[41, 42, 43]]}}> : tensor<4x3xi32>) constant : !fir.array<3x4xi32>
-! CHECK-LLVMIR: @_QQro.3x4xi4.1 = internal constant [4 x [3 x i32]] {{\[\[3 x i32] \[i32 11, i32 12, i32 13], \[3 x i32] \[i32 21, i32 22, i32 23], \[3 x i32] \[i32 31, i32 32, i32 33], \[3 x i32] \[i32 41, i32 42, i32 43]]}}
+! CHECK-LLVMIR: @_QQroX3x4xi4X1 = internal constant [4 x [3 x i32]] {{\[\[3 x i32] \[i32 11, i32 12, i32 13], \[3 x i32] \[i32 21, i32 22, i32 23], \[3 x i32] \[i32 31, i32 32, i32 33], \[3 x i32] \[i32 41, i32 42, i32 43]]}}
 
 ! a3 array constructor
 ! CHECK-FIR: fir.global internal @_QQro.2x3x4xi4.{{.*}}(dense<{{\[\[\[111, 112], \[121, 122], \[131, 132]], \[\[211, 212], \[221, 222], \[231, 232]], \[\[311, 312], \[321, 322], \[331, 332]], \[\[411, 412], \[421, 422], \[431, 432]]]}}> : tensor<4x3x2xi32>) constant : !fir.array<2x3x4xi32>
-! CHECK-LLVMIR: @_QQro.2x3x4xi4.2 = internal constant [4 x [3 x [2 x i32]]] {{\[\[3 x \[2 x i32]] \[\[2 x i32] \[i32 111, i32 112], \[2 x i32] \[i32 121, i32 122], \[2 x i32] \[i32 131, i32 132]], \[3 x \[2 x i32]] \[\[2 x i32] \[i32 211, i32 212], \[2 x i32] \[i32 221, i32 222], \[2 x i32] \[i32 231, i32 232]], \[3 x \[2 x i32]] \[\[2 x i32] \[i32 311, i32 312], \[2 x i32] \[i32 321, i32 322], \[2 x i32] \[i32 331, i32 332]], \[3 x \[2 x i32]] \[\[2 x i32] \[i32 411, i32 412], \[2 x i32] \[i32 421, i32 422], \[2 x i32] \[i32 431, i32 432]]]}}
+! CHECK-LLVMIR: @_QQroX2x3x4xi4X2 = internal constant [4 x [3 x [2 x i32]]] {{\[\[3 x \[2 x i32]] \[\[2 x i32] \[i32 111, i32 112], \[2 x i32] \[i32 121, i32 122], \[2 x i32] \[i32 131, i32 132]], \[3 x \[2 x i32]] \[\[2 x i32] \[i32 211, i32 212], \[2 x i32] \[i32 221, i32 222], \[2 x i32] \[i32 231, i32 232]], \[3 x \[2 x i32]] \[\[2 x i32] \[i32 311, i32 312], \[2 x i32] \[i32 321, i32 322], \[2 x i32] \[i32 331, i32 332]], \[3 x \[2 x i32]] \[\[2 x i32] \[i32 411, i32 412], \[2 x i32] \[i32 421, i32 422], \[2 x i32] \[i32 431, i32 432]]]}}

>From 30ca06c4d0d06f67f10a9e19d4333acc2074811b Mon Sep 17 00:00:00 2001
From: John Harrison <harjohn at google.com>
Date: Wed, 21 Aug 2024 13:48:29 -0700
Subject: [PATCH 049/116] [lldb-dap] When sending a DAP Output Event break each
 message into separate lines. (#105456)

Previously, when output like `"hello\nworld\n"` was produced by lldb (or
the process) the message would be sent as a single Output event. By
being a single event this causes VS Code to treat this as a single
message in the console when handling displaying and filtering in the
Debug Console.

Instead, with these changes we send each line as its own event. This
results in VS Code representing each line of output from lldb-dap as an
individual output message.

Resolves #105444
---
 .../test/tools/lldb-dap/lldbdap_testcase.py   |  5 +++
 lldb/test/API/tools/lldb-dap/output/Makefile  |  3 ++
 .../tools/lldb-dap/output/TestDAP_output.py   | 31 +++++++++++++++++++
 lldb/test/API/tools/lldb-dap/output/main.c    | 12 +++++++
 lldb/tools/lldb-dap/DAP.cpp                   | 22 +++++++++----
 lldb/tools/lldb-dap/DAP.h                     |  4 +++
 lldb/tools/lldb-dap/OutputRedirector.cpp      |  3 +-
 lldb/tools/lldb-dap/lldb-dap.cpp              |  2 +-
 8 files changed, 74 insertions(+), 8 deletions(-)
 create mode 100644 lldb/test/API/tools/lldb-dap/output/Makefile
 create mode 100644 lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
 create mode 100644 lldb/test/API/tools/lldb-dap/output/main.c

diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index 27545816f20707..86eba355da83db 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -202,6 +202,11 @@ def collect_console(self, timeout_secs, pattern=None):
             "console", timeout_secs=timeout_secs, pattern=pattern
         )
 
+    def collect_stdout(self, timeout_secs, pattern=None):
+        return self.dap_server.collect_output(
+            "stdout", timeout_secs=timeout_secs, pattern=pattern
+        )
+
     def get_local_as_int(self, name, threadId=None):
         value = self.dap_server.get_local_variable_value(name, threadId=threadId)
         # 'value' may have the variable value and summary.
diff --git a/lldb/test/API/tools/lldb-dap/output/Makefile b/lldb/test/API/tools/lldb-dap/output/Makefile
new file mode 100644
index 00000000000000..10495940055b63
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/output/Makefile
@@ -0,0 +1,3 @@
+C_SOURCES := main.c
+
+include Makefile.rules
diff --git a/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py b/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
new file mode 100644
index 00000000000000..0d40ce993dc31c
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
@@ -0,0 +1,31 @@
+"""
+Test lldb-dap output events
+"""
+
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+import lldbdap_testcase
+
+
+class TestDAP_output(lldbdap_testcase.DAPTestCaseBase):
+    def test_output(self):
+        program = self.getBuildArtifact("a.out")
+        self.build_and_launch(program)
+        source = "main.c"
+        lines = [line_number(source, "// breakpoint 1")]
+        breakpoint_ids = self.set_source_breakpoints(source, lines)
+        self.continue_to_breakpoints(breakpoint_ids)
+        
+        # Ensure partial messages are still sent.
+        output = self.collect_stdout(timeout_secs=1.0, pattern="abcdef")
+        self.assertTrue(output and len(output) > 0, "expect no program output")
+
+        self.continue_to_exit()
+        
+        output += self.get_stdout(timeout=lldbdap_testcase.DAPTestCaseBase.timeoutval)
+        self.assertTrue(output and len(output) > 0, "expect no program output")
+        self.assertIn(
+            "abcdefghi\r\nhello world\r\n",
+            output,
+            'full output not found in: ' + output,
+        )
diff --git a/lldb/test/API/tools/lldb-dap/output/main.c b/lldb/test/API/tools/lldb-dap/output/main.c
new file mode 100644
index 00000000000000..0cfcf604aa68f7
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/output/main.c
@@ -0,0 +1,12 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+int main() {
+  // Ensure multiple partial lines are detected and sent.
+  printf("abc");
+  printf("def");
+  printf("ghi\n");
+  printf("hello world\n"); // breakpoint 1
+  return 0;
+}
diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp
index c3c70e9d739846..1fd560f21904ab 100644
--- a/lldb/tools/lldb-dap/DAP.cpp
+++ b/lldb/tools/lldb-dap/DAP.cpp
@@ -294,8 +294,6 @@ void DAP::SendOutput(OutputType o, const llvm::StringRef output) {
   if (output.empty())
     return;
 
-  llvm::json::Object event(CreateEventObject("output"));
-  llvm::json::Object body;
   const char *category = nullptr;
   switch (o) {
   case OutputType::Console:
@@ -311,10 +309,22 @@ void DAP::SendOutput(OutputType o, const llvm::StringRef output) {
     category = "telemetry";
     break;
   }
-  body.try_emplace("category", category);
-  EmplaceSafeString(body, "output", output.str());
-  event.try_emplace("body", std::move(body));
-  SendJSON(llvm::json::Value(std::move(event)));
+
+  // Send each line of output as an individual event, including the newline if
+  // present.
+  ::size_t idx = 0;
+  do {
+    ::size_t end = output.find('\n', idx);
+    if (end == llvm::StringRef::npos)
+      end = output.size() - 1;
+    llvm::json::Object event(CreateEventObject("output"));
+    llvm::json::Object body;
+    body.try_emplace("category", category);
+    EmplaceSafeString(body, "output", output.slice(idx, end + 1).str());
+    event.try_emplace("body", std::move(body));
+    SendJSON(llvm::json::Value(std::move(event)));
+    idx = end + 1;
+  } while (idx < output.size());
 }
 
 // interface ProgressStartEvent extends Event {
diff --git a/lldb/tools/lldb-dap/DAP.h b/lldb/tools/lldb-dap/DAP.h
index 7828272aa15a7d..27ea6c7ff8423f 100644
--- a/lldb/tools/lldb-dap/DAP.h
+++ b/lldb/tools/lldb-dap/DAP.h
@@ -68,8 +68,12 @@ namespace lldb_dap {
 
 typedef llvm::DenseMap<uint32_t, SourceBreakpoint> SourceBreakpointMap;
 typedef llvm::StringMap<FunctionBreakpoint> FunctionBreakpointMap;
+
 enum class OutputType { Console, Stdout, Stderr, Telemetry };
 
+/// Buffer size for handling output events.
+constexpr uint64_t OutputBufferSize = (1u << 12);
+
 enum DAPBroadcasterBits {
   eBroadcastBitStopEventThread = 1u << 0,
   eBroadcastBitStopProgressThread = 1u << 1
diff --git a/lldb/tools/lldb-dap/OutputRedirector.cpp b/lldb/tools/lldb-dap/OutputRedirector.cpp
index 4e6907ce6c7806..2c2f49569869b4 100644
--- a/lldb/tools/lldb-dap/OutputRedirector.cpp
+++ b/lldb/tools/lldb-dap/OutputRedirector.cpp
@@ -13,6 +13,7 @@
 #include <unistd.h>
 #endif
 
+#include "DAP.h"
 #include "OutputRedirector.h"
 #include "llvm/ADT/StringRef.h"
 
@@ -42,7 +43,7 @@ Error RedirectFd(int fd, std::function<void(llvm::StringRef)> callback) {
 
   int read_fd = new_fd[0];
   std::thread t([read_fd, callback]() {
-    char buffer[4096];
+    char buffer[OutputBufferSize];
     while (true) {
       ssize_t bytes_count = read(read_fd, &buffer, sizeof(buffer));
       if (bytes_count == 0)
diff --git a/lldb/tools/lldb-dap/lldb-dap.cpp b/lldb/tools/lldb-dap/lldb-dap.cpp
index b534a48660a5f8..7b83767d1afeab 100644
--- a/lldb/tools/lldb-dap/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/lldb-dap.cpp
@@ -399,7 +399,7 @@ void SendProcessEvent(LaunchMethod launch_method) {
 // Grab any STDOUT and STDERR from the process and send it up to VS Code
 // via an "output" event to the "stdout" and "stderr" categories.
 void SendStdOutStdErr(lldb::SBProcess &process) {
-  char buffer[1024];
+  char buffer[OutputBufferSize];
   size_t count;
   while ((count = process.GetSTDOUT(buffer, sizeof(buffer))) > 0)
     g_dap.SendOutput(OutputType::Stdout, llvm::StringRef(buffer, count));

>From 46c94bed5af48f3785c3370a9297ea29d7918cd5 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Wed, 21 Aug 2024 16:49:41 -0400
Subject: [PATCH 050/116] [libc++] Mark LWG3404 as implemented

LWG3404 was implemented along with subrange.

Closes #104282
---
 libcxx/docs/Status/Cxx20Issues.csv | 2 +-
 libcxx/docs/Status/Cxx23Issues.csv | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libcxx/docs/Status/Cxx20Issues.csv b/libcxx/docs/Status/Cxx20Issues.csv
index d72a3682420620..d6fdc813b1f0de 100644
--- a/libcxx/docs/Status/Cxx20Issues.csv
+++ b/libcxx/docs/Status/Cxx20Issues.csv
@@ -218,7 +218,7 @@
 "`LWG3269 <https://wg21.link/LWG3269>`__","Parse manipulators do not specify the result of the extraction from stream","2020-02 (Prague)","","","|chrono|"
 "`LWG3270 <https://wg21.link/LWG3270>`__","Parsing and formatting ``%j``\  with ``duration``\ s","2020-02 (Prague)","|Partial|","","|chrono| |format|"
 "`LWG3280 <https://wg21.link/LWG3280>`__","View converting constructors can cause constraint recursion and are unneeded","2020-02 (Prague)","|Complete|","15.0","|ranges|"
-"`LWG3281 <https://wg21.link/LWG3281>`__","Conversion from ``*pair-like*``\  types to ``subrange``\  is a silent semantic promotion","2020-02 (Prague)","|Complete|","15.0","|ranges|"
+"`LWG3281 <https://wg21.link/LWG3281>`__","Conversion from ``*pair-like*``\  types to ``subrange``\  is a silent semantic promotion","2020-02 (Prague)","|Complete|","13.0","|ranges|"
 "`LWG3282 <https://wg21.link/LWG3282>`__","``subrange``\  converting constructor should disallow derived to base conversions","2020-02 (Prague)","|Complete|","15.0","|ranges|"
 "`LWG3284 <https://wg21.link/LWG3284>`__","``random_access_iterator``\  semantic constraints accidentally promote difference type using unary negate","2020-02 (Prague)","|Nothing To Do|","","|ranges|"
 "`LWG3285 <https://wg21.link/LWG3285>`__","The type of a customization point object shall satisfy ``semiregular``\ ","2020-02 (Prague)","|Nothing To Do|","","|ranges|"
diff --git a/libcxx/docs/Status/Cxx23Issues.csv b/libcxx/docs/Status/Cxx23Issues.csv
index a0a9ccdca48c3c..8cb0a46b4dd25e 100644
--- a/libcxx/docs/Status/Cxx23Issues.csv
+++ b/libcxx/docs/Status/Cxx23Issues.csv
@@ -20,7 +20,7 @@
 "`LWG3171 <https://wg21.link/LWG3171>`__","LWG2989 breaks ``directory_entry`` stream insertion","2020-11 (Virtual)","|Complete|","14.0",""
 "`LWG3306 <https://wg21.link/LWG3306>`__","``ranges::advance`` violates its preconditions","2020-11 (Virtual)","|Complete|","14.0","|ranges|"
 "`LWG3403 <https://wg21.link/LWG3403>`__","Domain of ``ranges::ssize(E)`` doesn't ``match ranges::size(E)``","2020-11 (Virtual)","","","|ranges|"
-"`LWG3404 <https://wg21.link/LWG3404>`__","Finish removing subrange's conversions from pair-like","2020-11 (Virtual)","","","|ranges|"
+"`LWG3404 <https://wg21.link/LWG3404>`__","Finish removing subrange's conversions from pair-like","2020-11 (Virtual)","|Complete|","13.0","|ranges|"
 "`LWG3405 <https://wg21.link/LWG3405>`__","``common_view``'s converting constructor is bad, too","2020-11 (Virtual)","|Complete|","14.0","|ranges|"
 "`LWG3406 <https://wg21.link/LWG3406>`__","``elements_view::begin()`` and ``elements_view::end()`` have incompatible constraints","2020-11 (Virtual)","|Complete|","16.0","|ranges|"
 "`LWG3419 <https://wg21.link/LWG3419>`__","[algorithms.requirements]/15 doesn't reserve as many rights as it intends to","2020-11 (Virtual)","|Nothing To Do|","",""

>From ab86fc74c04ff508f909b7b6131df1551dd833fc Mon Sep 17 00:00:00 2001
From: Jonas Rickert <Jonas.Rickert at amd.com>
Date: Wed, 21 Aug 2024 23:18:21 +0200
Subject: [PATCH 051/116] [mlir] Add nodiscard attribute to
 allowsUnregisteredDialects (#105530)

This getter can easily be confused with the similar named
allowUnregisteredDialects setter
---
 mlir/include/mlir/IR/MLIRContext.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mlir/include/mlir/IR/MLIRContext.h b/mlir/include/mlir/IR/MLIRContext.h
index 11e5329f43e681..d17bbac81655b5 100644
--- a/mlir/include/mlir/IR/MLIRContext.h
+++ b/mlir/include/mlir/IR/MLIRContext.h
@@ -133,7 +133,7 @@ class MLIRContext {
   Dialect *getOrLoadDialect(StringRef name);
 
   /// Return true if we allow to create operation for unregistered dialects.
-  bool allowsUnregisteredDialects();
+  [[nodiscard]] bool allowsUnregisteredDialects();
 
   /// Enables creating operations in unregistered dialects.
   /// This option is **heavily discouraged**: it is convenient during testing

>From f709cd5add0ea36bb14259e9716bd74e5c762128 Mon Sep 17 00:00:00 2001
From: Dmitri Gribenko <gribozavr at gmail.com>
Date: Wed, 21 Aug 2024 23:49:45 +0200
Subject: [PATCH 052/116] Revert "[Coroutines] Salvage the debug information
 for coroutine frames within optimizations"

This reverts commit 522c253f47ea27d8eeb759e06f8749092b1de71e.

This series of commits causes Clang crashes. The reproducer is posted on
https://github.com/llvm/llvm-project/commit/08a0dece2b2431db8abe650bb43cba01e781e1ce.
---
 .../test/CodeGenCoroutines/coro-dwarf-O2.cpp  | 39 -------------------
 llvm/lib/Transforms/Coroutines/CoroFrame.cpp  | 31 ++++++++-------
 llvm/lib/Transforms/Coroutines/CoroInternal.h |  8 ++--
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp  | 12 ++++--
 .../Transforms/Coroutines/coro-debug-O2.ll    |  6 +--
 5 files changed, 31 insertions(+), 65 deletions(-)
 delete mode 100644 clang/test/CodeGenCoroutines/coro-dwarf-O2.cpp

diff --git a/clang/test/CodeGenCoroutines/coro-dwarf-O2.cpp b/clang/test/CodeGenCoroutines/coro-dwarf-O2.cpp
deleted file mode 100644
index 53f4a07982e427..00000000000000
--- a/clang/test/CodeGenCoroutines/coro-dwarf-O2.cpp
+++ /dev/null
@@ -1,39 +0,0 @@
-// Check that we can still observe the value of the coroutine frame
-// with optimizations.
-//
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 \
-// RUN:   -emit-llvm %s -debug-info-kind=limited -dwarf-version=5 \
-// RUN:   -O2 -o - | FileCheck %s
-
-#include "Inputs/coroutine.h"
-
-template <>
-struct std::coroutine_traits<void> {
-  struct promise_type {
-    void get_return_object();
-    std::suspend_always initial_suspend();
-    std::suspend_always final_suspend() noexcept;
-    void return_void();
-    void unhandled_exception();
-  };
-};
-
-struct ScalarAwaiter {
-  template <typename F> void await_suspend(F);
-  bool await_ready();
-  int await_resume();
-};
-
-extern "C" void UseScalar(int);
-
-extern "C" void f() {
-  UseScalar(co_await ScalarAwaiter{});
-
-  int Val = co_await ScalarAwaiter{};
-
-  co_await ScalarAwaiter{};
-}
-
-// CHECK: define {{.*}}@f.resume({{.*}} %[[ARG:.*]])
-// CHECK:  #dbg_value(ptr %[[ARG]], ![[CORO_NUM:[0-9]+]], !DIExpression(DW_OP_deref)
-// CHECK: ![[CORO_NUM]] = !DILocalVariable(name: "__coro_frame"
diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index 00f49b7bdce294..fa04735340406d 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -1914,7 +1914,8 @@ static void insertSpills(const FrameDataInfo &FrameData, coro::Shape &Shape) {
           }
           // This dbg.declare is for the main function entry point.  It
           // will be deleted in all coro-split functions.
-          coro::salvageDebugInfo(ArgToAllocaMap, *DDI, false /*UseEntryValue*/);
+          coro::salvageDebugInfo(ArgToAllocaMap, *DDI, Shape.OptimizeFrame,
+                                 false /*UseEntryValue*/);
         };
         for_each(DIs, SalvageOne);
         for_each(DVRs, SalvageOne);
@@ -2851,8 +2852,9 @@ static void collectFrameAlloca(AllocaInst *AI, coro::Shape &Shape,
 
 static std::optional<std::pair<Value &, DIExpression &>>
 salvageDebugInfoImpl(SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
-                     bool UseEntryValue, Function *F, Value *Storage,
-                     DIExpression *Expr, bool SkipOutermostLoad) {
+                     bool OptimizeFrame, bool UseEntryValue, Function *F,
+                     Value *Storage, DIExpression *Expr,
+                     bool SkipOutermostLoad) {
   IRBuilder<> Builder(F->getContext());
   auto InsertPt = F->getEntryBlock().getFirstInsertionPt();
   while (isa<IntrinsicInst>(InsertPt))
@@ -2904,9 +2906,10 @@ salvageDebugInfoImpl(SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
 
   // If the coroutine frame is an Argument, store it in an alloca to improve
   // its availability (e.g. registers may be clobbered).
-  // Avoid this if the value is guaranteed to be available through other means
-  // (e.g. swift ABI guarantees).
-  if (StorageAsArg && !IsSwiftAsyncArg) {
+  // Avoid this if optimizations are enabled (they would remove the alloca) or
+  // if the value is guaranteed to be available through other means (e.g. swift
+  // ABI guarantees).
+  if (StorageAsArg && !OptimizeFrame && !IsSwiftAsyncArg) {
     auto &Cached = ArgToAllocaMap[StorageAsArg];
     if (!Cached) {
       Cached = Builder.CreateAlloca(Storage->getType(), 0, nullptr,
@@ -2929,7 +2932,7 @@ salvageDebugInfoImpl(SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
 
 void coro::salvageDebugInfo(
     SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
-    DbgVariableIntrinsic &DVI, bool UseEntryValue) {
+    DbgVariableIntrinsic &DVI, bool OptimizeFrame, bool UseEntryValue) {
 
   Function *F = DVI.getFunction();
   // Follow the pointer arithmetic all the way to the incoming
@@ -2937,9 +2940,9 @@ void coro::salvageDebugInfo(
   bool SkipOutermostLoad = !isa<DbgValueInst>(DVI);
   Value *OriginalStorage = DVI.getVariableLocationOp(0);
 
-  auto SalvagedInfo =
-      ::salvageDebugInfoImpl(ArgToAllocaMap, UseEntryValue, F, OriginalStorage,
-                             DVI.getExpression(), SkipOutermostLoad);
+  auto SalvagedInfo = ::salvageDebugInfoImpl(
+      ArgToAllocaMap, OptimizeFrame, UseEntryValue, F, OriginalStorage,
+      DVI.getExpression(), SkipOutermostLoad);
   if (!SalvagedInfo)
     return;
 
@@ -2971,7 +2974,7 @@ void coro::salvageDebugInfo(
 
 void coro::salvageDebugInfo(
     SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
-    DbgVariableRecord &DVR, bool UseEntryValue) {
+    DbgVariableRecord &DVR, bool OptimizeFrame, bool UseEntryValue) {
 
   Function *F = DVR.getFunction();
   // Follow the pointer arithmetic all the way to the incoming
@@ -2979,9 +2982,9 @@ void coro::salvageDebugInfo(
   bool SkipOutermostLoad = DVR.isDbgDeclare();
   Value *OriginalStorage = DVR.getVariableLocationOp(0);
 
-  auto SalvagedInfo =
-      ::salvageDebugInfoImpl(ArgToAllocaMap, UseEntryValue, F, OriginalStorage,
-                             DVR.getExpression(), SkipOutermostLoad);
+  auto SalvagedInfo = ::salvageDebugInfoImpl(
+      ArgToAllocaMap, OptimizeFrame, UseEntryValue, F, OriginalStorage,
+      DVR.getExpression(), SkipOutermostLoad);
   if (!SalvagedInfo)
     return;
 
diff --git a/llvm/lib/Transforms/Coroutines/CoroInternal.h b/llvm/lib/Transforms/Coroutines/CoroInternal.h
index d535ad7f85d74a..5716fd0ea4ab96 100644
--- a/llvm/lib/Transforms/Coroutines/CoroInternal.h
+++ b/llvm/lib/Transforms/Coroutines/CoroInternal.h
@@ -29,14 +29,14 @@ void replaceCoroFree(CoroIdInst *CoroId, bool Elide);
 /// Attempts to rewrite the location operand of debug intrinsics in terms of
 /// the coroutine frame pointer, folding pointer offsets into the DIExpression
 /// of the intrinsic.
-/// If the frame pointer is an Argument, store it into an alloca to enhance the
-/// debugability.
+/// If the frame pointer is an Argument, store it into an alloca if
+/// OptimizeFrame is false.
 void salvageDebugInfo(
     SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
-    DbgVariableIntrinsic &DVI, bool IsEntryPoint);
+    DbgVariableIntrinsic &DVI, bool OptimizeFrame, bool IsEntryPoint);
 void salvageDebugInfo(
     SmallDenseMap<Argument *, AllocaInst *, 4> &ArgToAllocaMap,
-    DbgVariableRecord &DVR, bool UseEntryValue);
+    DbgVariableRecord &DVR, bool OptimizeFrame, bool UseEntryValue);
 
 // Keeps data and helper functions for lowering coroutine intrinsics.
 struct LowererBase {
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 40bc932c3e0eef..8eceaef59a1e1f 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -735,9 +735,11 @@ void CoroCloner::salvageDebugInfo() {
   bool UseEntryValue =
       llvm::Triple(OrigF.getParent()->getTargetTriple()).isArch64Bit();
   for (DbgVariableIntrinsic *DVI : Worklist)
-    coro::salvageDebugInfo(ArgToAllocaMap, *DVI, UseEntryValue);
+    coro::salvageDebugInfo(ArgToAllocaMap, *DVI, Shape.OptimizeFrame,
+                           UseEntryValue);
   for (DbgVariableRecord *DVR : DbgVariableRecords)
-    coro::salvageDebugInfo(ArgToAllocaMap, *DVR, UseEntryValue);
+    coro::salvageDebugInfo(ArgToAllocaMap, *DVR, Shape.OptimizeFrame,
+                           UseEntryValue);
 
   // Remove all salvaged dbg.declare intrinsics that became
   // either unreachable or stale due to the CoroSplit transformation.
@@ -1960,9 +1962,11 @@ splitCoroutine(Function &F, SmallVectorImpl<Function *> &Clones,
   SmallDenseMap<Argument *, AllocaInst *, 4> ArgToAllocaMap;
   auto [DbgInsts, DbgVariableRecords] = collectDbgVariableIntrinsics(F);
   for (auto *DDI : DbgInsts)
-    coro::salvageDebugInfo(ArgToAllocaMap, *DDI, false /*UseEntryValue*/);
+    coro::salvageDebugInfo(ArgToAllocaMap, *DDI, Shape.OptimizeFrame,
+                           false /*UseEntryValue*/);
   for (DbgVariableRecord *DVR : DbgVariableRecords)
-    coro::salvageDebugInfo(ArgToAllocaMap, *DVR, false /*UseEntryValue*/);
+    coro::salvageDebugInfo(ArgToAllocaMap, *DVR, Shape.OptimizeFrame,
+                           false /*UseEntryValue*/);
   return Shape;
 }
 
diff --git a/llvm/test/Transforms/Coroutines/coro-debug-O2.ll b/llvm/test/Transforms/Coroutines/coro-debug-O2.ll
index 588f47959cc5d5..7ffa2ac153c853 100644
--- a/llvm/test/Transforms/Coroutines/coro-debug-O2.ll
+++ b/llvm/test/Transforms/Coroutines/coro-debug-O2.ll
@@ -1,14 +1,12 @@
 ; RUN: opt < %s -passes='module(coro-early),cgscc(coro-split<reuse-storage>),function(sroa)' -S | FileCheck %s
 ; RUN: opt --try-experimental-debuginfo-iterators < %s -passes='module(coro-early),cgscc(coro-split<reuse-storage>),function(sroa)' -S | FileCheck %s
 
-; Checks the dbg informations about promise and coroutine frames under O2.
+; Checks whether the dbg.declare for `__promise` remains valid under O2.
 
 ; CHECK-LABEL: define internal fastcc void @f.resume({{.*}})
 ; CHECK:       entry.resume:
-; CHECK:        #dbg_value(ptr poison, ![[PROMISEVAR_RESUME:[0-9]+]], !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 16
-; CHECK:        #dbg_value(ptr %begin, ![[CORO_FRAME:[0-9]+]], !DIExpression(DW_OP_deref)
+; CHECK:        #dbg_declare(ptr %begin, ![[PROMISEVAR_RESUME:[0-9]+]], !DIExpression(
 ;
-; CHECK: ![[CORO_FRAME]] = !DILocalVariable(name: "__coro_frame"
 ; CHECK: ![[PROMISEVAR_RESUME]] = !DILocalVariable(name: "__promise"
 %promise_type = type { i32, i32, double }
 

>From dc12ccd13f98a3f3ec4af07e60f6fe1344965e17 Mon Sep 17 00:00:00 2001
From: Dmitri Gribenko <gribozavr at gmail.com>
Date: Wed, 21 Aug 2024 23:50:19 +0200
Subject: [PATCH 053/116] Revert "[Coroutines] Fix -Wunused-variable in
 CoroFrame.cpp (NFC)"

This reverts commit d48b807aa8abd1cbfe8ac5d1ba27b8b3617fc5e6.

This series of commits causes Clang crashes. The reproducer is posted on
https://github.com/llvm/llvm-project/commit/08a0dece2b2431db8abe650bb43cba01e781e1ce
---
 llvm/lib/Transforms/Coroutines/CoroFrame.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index fa04735340406d..e0e4edd2800b29 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -1121,7 +1121,8 @@ static void buildFrameDebugInfo(Function &F, coro::Shape &Shape,
 
   DIBuilder DBuilder(*F.getParent(), /*AllowUnresolved*/ false);
 
-  assert(Shape.getPromiseAlloca() &&
+  AllocaInst *PromiseAlloca = Shape.getPromiseAlloca();
+  assert(PromiseAlloca &&
          "Coroutine with switch ABI should own Promise alloca");
 
   DIFile *DFile = DIS->getFile();

>From 5c7ae42c526b21acf65ab4b017d0a5fd4ac654a1 Mon Sep 17 00:00:00 2001
From: Dmitri Gribenko <gribozavr at gmail.com>
Date: Wed, 21 Aug 2024 23:50:46 +0200
Subject: [PATCH 054/116] Revert "[Coroutines] [NFCI] Don't search the
 DILocalVariable for __promise when constructing the debug varaible for
 __coro_frame"

This reverts commit 08a0dece2b2431db8abe650bb43cba01e781e1ce.

This series of commits causes Clang crashes. The reproducer is posted on
https://github.com/llvm/llvm-project/commit/08a0dece2b2431db8abe650bb43cba01e781e1ce.
---
 llvm/lib/Transforms/Coroutines/CoroFrame.cpp  | 48 ++++++++++++-------
 .../Coroutines/coro-debug-coro-frame.ll       | 16 +++----
 .../Coroutines/coro-debug-dbg.values.ll       |  2 -
 .../Coroutines/coro-debug-frame-variable.ll   |  2 -
 4 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index e0e4edd2800b29..73e30ea00a0e29 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -1125,8 +1125,26 @@ static void buildFrameDebugInfo(Function &F, coro::Shape &Shape,
   assert(PromiseAlloca &&
          "Coroutine with switch ABI should own Promise alloca");
 
-  DIFile *DFile = DIS->getFile();
-  unsigned LineNum = DIS->getLine();
+  TinyPtrVector<DbgDeclareInst *> DIs = findDbgDeclares(PromiseAlloca);
+  TinyPtrVector<DbgVariableRecord *> DVRs = findDVRDeclares(PromiseAlloca);
+
+  DILocalVariable *PromiseDIVariable = nullptr;
+  DILocation *DILoc = nullptr;
+  if (!DIs.empty()) {
+    DbgDeclareInst *PromiseDDI = DIs.front();
+    PromiseDIVariable = PromiseDDI->getVariable();
+    DILoc = PromiseDDI->getDebugLoc().get();
+  } else if (!DVRs.empty()) {
+    DbgVariableRecord *PromiseDVR = DVRs.front();
+    PromiseDIVariable = PromiseDVR->getVariable();
+    DILoc = PromiseDVR->getDebugLoc().get();
+  } else {
+    return;
+  }
+
+  DILocalScope *PromiseDIScope = PromiseDIVariable->getScope();
+  DIFile *DFile = PromiseDIScope->getFile();
+  unsigned LineNum = PromiseDIVariable->getLine();
 
   DICompositeType *FrameDITy = DBuilder.createStructType(
       DIS->getUnit(), Twine(F.getName() + ".coro_frame_ty").str(),
@@ -1236,9 +1254,10 @@ static void buildFrameDebugInfo(Function &F, coro::Shape &Shape,
 
   DBuilder.replaceArrays(FrameDITy, DBuilder.getOrCreateArray(Elements));
 
-  auto *FrameDIVar =
-      DBuilder.createAutoVariable(DIS, "__coro_frame", DFile, LineNum,
-                                  FrameDITy, true, DINode::FlagArtificial);
+  auto *FrameDIVar = DBuilder.createAutoVariable(PromiseDIScope, "__coro_frame",
+                                                 DFile, LineNum, FrameDITy,
+                                                 true, DINode::FlagArtificial);
+  assert(FrameDIVar->isValidLocationForIntrinsic(DILoc));
 
   // Subprogram would have ContainedNodes field which records the debug
   // variables it contained. So we need to add __coro_frame to the
@@ -1247,17 +1266,14 @@ static void buildFrameDebugInfo(Function &F, coro::Shape &Shape,
   // If we don't add __coro_frame to the RetainedNodes, user may get
   // `no symbol __coro_frame in context` rather than `__coro_frame`
   // is optimized out, which is more precise.
-  auto RetainedNodes = DIS->getRetainedNodes();
-  SmallVector<Metadata *, 32> RetainedNodesVec(RetainedNodes.begin(),
-                                               RetainedNodes.end());
-  RetainedNodesVec.push_back(FrameDIVar);
-  DIS->replaceOperandWith(7, (MDTuple::get(F.getContext(), RetainedNodesVec)));
-
-  // Construct the location for the frame debug variable. The column number
-  // is fake but it should be fine.
-  DILocation *DILoc =
-      DILocation::get(DIS->getContext(), LineNum, /*Column=*/1, DIS);
-  assert(FrameDIVar->isValidLocationForIntrinsic(DILoc));
+  if (auto *SubProgram = dyn_cast<DISubprogram>(PromiseDIScope)) {
+    auto RetainedNodes = SubProgram->getRetainedNodes();
+    SmallVector<Metadata *, 32> RetainedNodesVec(RetainedNodes.begin(),
+                                                 RetainedNodes.end());
+    RetainedNodesVec.push_back(FrameDIVar);
+    SubProgram->replaceOperandWith(
+        7, (MDTuple::get(F.getContext(), RetainedNodesVec)));
+  }
 
   if (UseNewDbgInfoFormat) {
     DbgVariableRecord *NewDVR =
diff --git a/llvm/test/Transforms/Coroutines/coro-debug-coro-frame.ll b/llvm/test/Transforms/Coroutines/coro-debug-coro-frame.ll
index 1d668fd0222f77..8e5c4ab52e78eb 100644
--- a/llvm/test/Transforms/Coroutines/coro-debug-coro-frame.ll
+++ b/llvm/test/Transforms/Coroutines/coro-debug-coro-frame.ll
@@ -15,7 +15,8 @@
 ;
 ; CHECK-DAG: ![[FILE:[0-9]+]] = !DIFile(filename: "coro-debug.cpp"
 ; CHECK-DAG: ![[RAMP:[0-9]+]] = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov",
-; CHECK-DAG: ![[CORO_FRAME]] = !DILocalVariable(name: "__coro_frame", scope: ![[RAMP]], file: ![[FILE]], line: [[CORO_FRAME_LINE:[0-9]+]], type: ![[FRAME_TYPE:[0-9]+]], flags: DIFlagArtificial)
+; CHECK-DAG: ![[RAMP_SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: ![[RAMP]], file: ![[FILE]], line: 23
+; CHECK-DAG: ![[CORO_FRAME]] = !DILocalVariable(name: "__coro_frame", scope: ![[RAMP_SCOPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE:[0-9]+]], type: ![[FRAME_TYPE:[0-9]+]], flags: DIFlagArtificial)
 ; CHECK-DAG: ![[FRAME_TYPE]] = !DICompositeType(tag: DW_TAG_structure_type, name: "f.coro_frame_ty", {{.*}}elements: ![[ELEMENTS:[0-9]+]]
 ; CHECK-DAG: ![[ELEMENTS]] = !{![[RESUME_FN:[0-9]+]], ![[DESTROY_FN:[0-9]+]], ![[PROMISE:[0-9]+]], ![[VECTOR_TYPE:[0-9]+]], ![[INT64_0:[0-9]+]], ![[DOUBLE_1:[0-9]+]], ![[INT64_PTR:[0-9]+]], ![[INT32_2:[0-9]+]], ![[INT32_3:[0-9]+]], ![[UNALIGNED_UNKNOWN:[0-9]+]], ![[STRUCT:[0-9]+]], ![[CORO_INDEX:[0-9]+]], ![[SMALL_UNKNOWN:[0-9]+]]
 ; CHECK-DAG: ![[RESUME_FN]] = !DIDerivedType(tag: DW_TAG_member, name: "__resume_fn"{{.*}}, baseType: ![[RESUME_FN_TYPE:[0-9]+]]{{.*}}, flags: DIFlagArtificial
@@ -28,26 +29,25 @@
 ; CHECK-DAG: ![[UNKNOWN_TYPE_BASE]] = !DIBasicType(name: "UnknownType", size: 8, encoding: DW_ATE_unsigned_char, flags: DIFlagArtificial)
 ; CHECK-DAG: ![[VECTOR_TYPE_BASE_ELEMENTS]] = !{![[VECTOR_TYPE_BASE_SUBRANGE:[0-9]+]]}
 ; CHECK-DAG: ![[VECTOR_TYPE_BASE_SUBRANGE]] = !DISubrange(count: 16, lowerBound: 0)
-; CHECK-DAG: ![[INT64_0]] = !DIDerivedType(tag: DW_TAG_member, name: "__int_64_1", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[CORO_FRAME_LINE]], baseType: ![[I64_BASE:[0-9]+]],{{.*}}, flags: DIFlagArtificial
+; CHECK-DAG: ![[INT64_0]] = !DIDerivedType(tag: DW_TAG_member, name: "__int_64_1", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE]], baseType: ![[I64_BASE:[0-9]+]],{{.*}}, flags: DIFlagArtificial
 ; CHECK-DAG: ![[I64_BASE]] = !DIBasicType(name: "__int_64", size: 64, encoding: DW_ATE_signed, flags: DIFlagArtificial)
-; CHECK-DAG: ![[DOUBLE_1]] = !DIDerivedType(tag: DW_TAG_member, name: "__double__2", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[CORO_FRAME_LINE]], baseType: ![[DOUBLE_BASE:[0-9]+]]{{.*}}, flags: DIFlagArtificial
+; CHECK-DAG: ![[DOUBLE_1]] = !DIDerivedType(tag: DW_TAG_member, name: "__double__2", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE]], baseType: ![[DOUBLE_BASE:[0-9]+]]{{.*}}, flags: DIFlagArtificial
 ; CHECK-DAG: ![[DOUBLE_BASE]] = !DIBasicType(name: "__double_", size: 64, encoding: DW_ATE_float, flags: DIFlagArtificial)
-; CHECK-DAG: ![[INT32_2]] = !DIDerivedType(tag: DW_TAG_member, name: "__int_32_4", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[CORO_FRAME_LINE]], baseType: ![[I32_BASE:[0-9]+]]{{.*}}, flags: DIFlagArtificial
+; CHECK-DAG: ![[INT32_2]] = !DIDerivedType(tag: DW_TAG_member, name: "__int_32_4", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE]], baseType: ![[I32_BASE:[0-9]+]]{{.*}}, flags: DIFlagArtificial
 ; CHECK-DAG: ![[I32_BASE]] = !DIBasicType(name: "__int_32", size: 32, encoding: DW_ATE_signed, flags: DIFlagArtificial)
-; CHECK-DAG: ![[INT32_3]] = !DIDerivedType(tag: DW_TAG_member, name: "__int_32_5", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[CORO_FRAME_LINE]], baseType: ![[I32_BASE]]
+; CHECK-DAG: ![[INT32_3]] = !DIDerivedType(tag: DW_TAG_member, name: "__int_32_5", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE]], baseType: ![[I32_BASE]]
 ; CHECK-DAG: ![[UNALIGNED_UNKNOWN]] = !DIDerivedType(tag: DW_TAG_member, name: "_6",{{.*}}baseType: ![[UNALIGNED_UNKNOWN_BASE:[0-9]+]], size: 9
 ; CHECK-DAG: ![[UNALIGNED_UNKNOWN_BASE]] = !DICompositeType(tag: DW_TAG_array_type, baseType: ![[UNKNOWN_TYPE_BASE]], size: 16,{{.*}} elements: ![[UNALIGNED_UNKNOWN_ELEMENTS:[0-9]+]])
 ; CHECK-DAG: ![[UNALIGNED_UNKNOWN_ELEMENTS]] = !{![[UNALIGNED_UNKNOWN_SUBRANGE:[0-9]+]]}
 ; CHECk-DAG: ![[UNALIGNED_UNKNOWN_SUBRANGE]] = !DISubrange(count: 2, lowerBound: 0)
-; CHECK-DAG: ![[STRUCT]] = !DIDerivedType(tag: DW_TAG_member, name: "struct_big_structure_7", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[CORO_FRAME_LINE]], baseType: ![[STRUCT_BASE:[0-9]+]]
+; CHECK-DAG: ![[STRUCT]] = !DIDerivedType(tag: DW_TAG_member, name: "struct_big_structure_7", scope: ![[FRAME_TYPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE]], baseType: ![[STRUCT_BASE:[0-9]+]]
 ; CHECK-DAG: ![[STRUCT_BASE]] = !DICompositeType(tag: DW_TAG_structure_type, name: "struct_big_structure"{{.*}}, align: 64, flags: DIFlagArtificial, elements: ![[STRUCT_ELEMENTS:[0-9]+]]
 ; CHECK-DAG: ![[STRUCT_ELEMENTS]] = !{![[MEM_TYPE:[0-9]+]]}
 ; CHECK-DAG: ![[MEM_TYPE]] = !DIDerivedType(tag: DW_TAG_member,{{.*}} baseType: ![[MEM_TYPE_BASE:[0-9]+]], size: 4000
 ; CHECK-DAG: ![[MEM_TYPE_BASE]] = !DICompositeType(tag: DW_TAG_array_type, baseType: ![[UNKNOWN_TYPE_BASE]], size: 4000,
 ; CHECK-DAG: ![[CORO_INDEX]] = !DIDerivedType(tag: DW_TAG_member, name: "__coro_index"
 ; CHECK-DAG: ![[SMALL_UNKNOWN]] = !DIDerivedType(tag: DW_TAG_member, name: "UnknownType_8",{{.*}} baseType: ![[UNKNOWN_TYPE_BASE]], size: 5
-; CHECK-DAG: ![[PROMISE_VAR:[0-9]+]] = !DILocalVariable(name: "__promise", scope: ![[RAMP_SCOPE:[0-9]+]], file: ![[FILE]]
-; CHECK-DAG: ![[RAMP_SCOPE]] = distinct !DILexicalBlock(scope: ![[RAMP]], file: ![[FILE]], line: 23
+; CHECK-DAG: ![[PROMISE_VAR:[0-9]+]] = !DILocalVariable(name: "__promise", scope: ![[RAMP_SCOPE]], file: ![[FILE]], line: [[PROMISE_VAR_LINE]]
 ; CHECK-DAG: ![[BAR_FUNC:[0-9]+]] = distinct !DISubprogram(name: "bar", linkageName: "_Z3barv",
 ; CHECK-DAG: ![[BAR_SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: ![[BAR_FUNC]], file: !1
 ; CHECK-DAG: ![[FRAME_TYPE_IN_BAR:[0-9]+]] = !DICompositeType(tag: DW_TAG_structure_type, name: "bar.coro_frame_ty", file: ![[FILE]], line: [[BAR_LINE:[0-9]+]]{{.*}}elements: ![[ELEMENTS_IN_BAR:[0-9]+]]
diff --git a/llvm/test/Transforms/Coroutines/coro-debug-dbg.values.ll b/llvm/test/Transforms/Coroutines/coro-debug-dbg.values.ll
index 28f5841bb20af7..0b3acc30a1eee0 100644
--- a/llvm/test/Transforms/Coroutines/coro-debug-dbg.values.ll
+++ b/llvm/test/Transforms/Coroutines/coro-debug-dbg.values.ll
@@ -25,7 +25,6 @@
 ; CHECK-SAME:                 ptr {{.*}} %[[frame:.*]])
 ; CHECK-SAME:  !dbg ![[RESUME_FN_DBG_NUM:[0-9]+]]
 ; CHECK:         %[[frame_alloca:.*]] = alloca ptr
-; CHECK-NEXT:    #dbg_declare(ptr %begin.debug, ![[FRAME_DI_NUM:[0-9]+]],
 ; CHECK-NEXT:    store ptr %[[frame]], ptr %[[frame_alloca]]
 ; CHECK:       init.ready:
 ; CHECK:         #dbg_value(ptr %[[frame_alloca]], ![[XVAR_RESUME:[0-9]+]],
@@ -39,7 +38,6 @@
 ; CHECK-SAME:        !DIExpression(DW_OP_deref, DW_OP_plus_uconst, [[OffsetJ]], DW_OP_deref)
 ;
 ; CHECK: ![[RESUME_FN_DBG_NUM]] = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov"
-; CHECK: ![[FRAME_DI_NUM]] = !DILocalVariable(name: "__coro_frame"
 ; CHECK: ![[IVAR_RESUME]] = !DILocalVariable(name: "i"
 ; CHECK: ![[XVAR_RESUME]] = !DILocalVariable(name: "x"
 ; CHECK: ![[JVAR_RESUME]] = !DILocalVariable(name: "j"
diff --git a/llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll b/llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll
index 93b22081cf12f6..4f5cdcf15618c7 100644
--- a/llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll
+++ b/llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll
@@ -42,14 +42,12 @@
 ; CHECK-NEXT:    %[[DBG_PTR:.*]] = alloca ptr
 ; CHECK-NEXT:    #dbg_declare(ptr %[[DBG_PTR]], ![[XVAR_RESUME:[0-9]+]],   !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 32),
 ; CHECK-NEXT:    #dbg_declare(ptr %[[DBG_PTR]], ![[IVAR_RESUME:[0-9]+]], !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 20), ![[IDBGLOC_RESUME:[0-9]+]]
-; CHECK-NEXT:    #dbg_declare(ptr %[[DBG_PTR]], ![[FRAME_RESUME:[0-9]+]], !DIExpression(DW_OP_deref),
 ; CHECK-NEXT:    store ptr {{.*}}, ptr %[[DBG_PTR]]
 ; CHECK:         %[[J:.*]] = alloca i32, align 4
 ; CHECK-NEXT:    #dbg_declare(ptr %[[J]], ![[JVAR_RESUME:[0-9]+]], !DIExpression(), ![[JDBGLOC_RESUME:[0-9]+]]
 ; CHECK:       init.ready:
 ; CHECK:       await.ready:
 ;
-; CHECK-DAG: ![[FRAME_RESUME]] = !DILocalVariable(name: "__coro_frame"
 ; CHECK-DAG: ![[IVAR]] = !DILocalVariable(name: "i"
 ; CHECK-DAG: ![[PROG_SCOPE:[0-9]+]] = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov"
 ; CHECK-DAG: ![[BLK_SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: ![[PROG_SCOPE]], file: !1, line: 23, column: 12)

>From be7d08cd59b0f23eea88e791b2413b44301949d3 Mon Sep 17 00:00:00 2001
From: Volodymyr Vasylkun <vvmposeydon at gmail.com>
Date: Wed, 21 Aug 2024 23:15:24 +0100
Subject: [PATCH 055/116] [InstCombine] Fold `sext(A < B) + zext(A > B)` into
 `ucmp/scmp(A, B)` (#103833)

This change also covers the fold of `zext(A > B) - zext(A < B)` since it
is already being canonicalized into the aforementioned pattern.

Proof: https://alive2.llvm.org/ce/z/AgnfMn
---
 .../InstCombine/InstCombineAddSub.cpp         |  20 ++
 llvm/test/Transforms/InstCombine/add.ll       |   4 +-
 .../sext-a-lt-b-plus-zext-a-gt-b-to-uscmp.ll  | 184 ++++++++++++++++++
 3 files changed, 206 insertions(+), 2 deletions(-)
 create mode 100644 llvm/test/Transforms/InstCombine/sext-a-lt-b-plus-zext-a-gt-b-to-uscmp.ll

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index dd4a64050f878a..d7758b5fbf1786 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -1626,6 +1626,26 @@ Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
       A->getType()->isIntOrIntVectorTy(1))
     return replaceInstUsesWith(I, Constant::getNullValue(I.getType()));
 
+  // sext(A < B) + zext(A > B) => ucmp/scmp(A, B)
+  ICmpInst::Predicate LTPred, GTPred;
+  if (match(&I,
+            m_c_Add(m_SExt(m_c_ICmp(LTPred, m_Value(A), m_Value(B))),
+                    m_ZExt(m_c_ICmp(GTPred, m_Deferred(A), m_Deferred(B))))) &&
+      A->getType()->isIntOrIntVectorTy()) {
+    if (ICmpInst::isGT(LTPred)) {
+      std::swap(LTPred, GTPred);
+      std::swap(A, B);
+    }
+
+    if (ICmpInst::isLT(LTPred) && ICmpInst::isGT(GTPred) &&
+        ICmpInst::isSigned(LTPred) == ICmpInst::isSigned(GTPred))
+      return replaceInstUsesWith(
+          I, Builder.CreateIntrinsic(
+                 Ty,
+                 ICmpInst::isSigned(LTPred) ? Intrinsic::scmp : Intrinsic::ucmp,
+                 {A, B}));
+  }
+
   // A+B --> A|B iff A and B have no bits set in common.
   WithCache<const Value *> LHSCache(LHS), RHSCache(RHS);
   if (haveNoCommonBitsSet(LHSCache, RHSCache, SQ.getWithInstruction(&I)))
diff --git a/llvm/test/Transforms/InstCombine/add.ll b/llvm/test/Transforms/InstCombine/add.ll
index 36da56d8441bf7..417c3a950d7805 100644
--- a/llvm/test/Transforms/InstCombine/add.ll
+++ b/llvm/test/Transforms/InstCombine/add.ll
@@ -1315,8 +1315,8 @@ define <2 x i8> @ashr_add_commute(<2 x i1> %x, <2 x i1> %y) {
 
 define i32 @cmp_math(i32 %x, i32 %y) {
 ; CHECK-LABEL: @cmp_math(
-; CHECK-NEXT:    [[LT:%.*]] = icmp ult i32 [[X:%.*]], [[Y:%.*]]
-; CHECK-NEXT:    [[R:%.*]] = zext i1 [[LT]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp ult i32 [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT:    [[R:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
   %gt = icmp ugt i32 %x, %y
diff --git a/llvm/test/Transforms/InstCombine/sext-a-lt-b-plus-zext-a-gt-b-to-uscmp.ll b/llvm/test/Transforms/InstCombine/sext-a-lt-b-plus-zext-a-gt-b-to-uscmp.ll
new file mode 100644
index 00000000000000..02ae7ce82f13ce
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/sext-a-lt-b-plus-zext-a-gt-b-to-uscmp.ll
@@ -0,0 +1,184 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+; sext(A s< B) + zext(A s> B) => scmp(A, B)
+define i8 @signed_add(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[R:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+; Unsigned version
+define i8 @unsigned_add(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @unsigned_add(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[R:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp ult i32 %a, %b
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp ugt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+; Commuted operands
+define i8 @signed_add_commuted1(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_commuted1(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[R:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[B]], i32 [[A]])
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = zext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = sext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+define i8 @signed_add_commuted2(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_commuted2(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[R:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp sgt i32 %b, %a
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+; zext(A s> B) - zext(A s< B) => scmp(A, B)
+define i8 @signed_sub(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_sub(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[R:%.*]] = call i8 @llvm.scmp.i8.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = zext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = sub i8 %gt8, %lt8
+  ret i8 %r
+}
+
+; Unsigned version
+define i8 @unsigned_sub(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @unsigned_sub(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[R:%.*]] = call i8 @llvm.ucmp.i8.i32(i32 [[A]], i32 [[B]])
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp ult i32 %a, %b
+  %lt8 = zext i1 %lt to i8
+  %gt = icmp ugt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = sub i8 %gt8, %lt8
+  ret i8 %r
+}
+
+; Negative test: incorrect predicates
+define i8 @signed_add_neg1(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_neg1(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[LT:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[LT8:%.*]] = sext i1 [[LT]] to i8
+; CHECK-NEXT:    [[GT:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[GT8:%.*]] = zext i1 [[GT]] to i8
+; CHECK-NEXT:    [[R:%.*]] = add nsw i8 [[LT8]], [[GT8]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp sgt i32 %a, %b
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+define i8 @signed_add_neg2(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_neg2(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[LT:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[LT8:%.*]] = sext i1 [[LT]] to i8
+; CHECK-NEXT:    [[GT:%.*]] = icmp ne i32 [[A]], [[B]]
+; CHECK-NEXT:    [[GT8:%.*]] = zext i1 [[GT]] to i8
+; CHECK-NEXT:    [[R:%.*]] = add nsw i8 [[LT8]], [[GT8]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp ne i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+; Negative test: mismatched signedness of predicates
+define i8 @signed_add_neg3(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_neg3(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[LT:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[LT8:%.*]] = sext i1 [[LT]] to i8
+; CHECK-NEXT:    [[GT:%.*]] = icmp ugt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[GT8:%.*]] = zext i1 [[GT]] to i8
+; CHECK-NEXT:    [[R:%.*]] = add nsw i8 [[LT8]], [[GT8]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp ugt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+; Negative test: zext instead of sext or vice-versa (NOT commuted operands)
+define i8 @signed_add_neg4(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_neg4(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[LT:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[LT8:%.*]] = sext i1 [[LT]] to i8
+; CHECK-NEXT:    [[GT:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[GT8:%.*]] = sext i1 [[GT]] to i8
+; CHECK-NEXT:    [[R:%.*]] = add nsw i8 [[LT8]], [[GT8]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = sext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = sext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}
+
+define i8 @signed_add_neg5(i32 %a, i32 %b) {
+; CHECK-LABEL: define i8 @signed_add_neg5(
+; CHECK-SAME: i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-NEXT:    [[LT:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[LT8:%.*]] = zext i1 [[LT]] to i8
+; CHECK-NEXT:    [[GT:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[GT8:%.*]] = zext i1 [[GT]] to i8
+; CHECK-NEXT:    [[R:%.*]] = add nuw nsw i8 [[LT8]], [[GT8]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %lt = icmp slt i32 %a, %b
+  %lt8 = zext i1 %lt to i8
+  %gt = icmp sgt i32 %a, %b
+  %gt8 = zext i1 %gt to i8
+  %r = add i8 %lt8, %gt8
+  ret i8 %r
+}

>From aa4c6557a1281df627cdf06684bdb08da2707200 Mon Sep 17 00:00:00 2001
From: Jorge Gorbe Moya <jgorbe at google.com>
Date: Wed, 21 Aug 2024 15:23:42 -0700
Subject: [PATCH 056/116] [SandboxIR] Fix use-of-uninitialized in
 ShuffleVectorInst unit test. (#105592)

I accidentally created a dangling ArrayRef local variable. Use a
SmallVector instead.
---
 llvm/unittests/SandboxIR/SandboxIRTest.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/unittests/SandboxIR/SandboxIRTest.cpp b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
index 94d8ac27be3bc8..8315ee38dbe187 100644
--- a/llvm/unittests/SandboxIR/SandboxIRTest.cpp
+++ b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
@@ -801,7 +801,7 @@ define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
   // isValidOperands
   auto *LLVMArgV1 = LLVMF.getArg(0);
   auto *LLVMArgV2 = LLVMF.getArg(1);
-  ArrayRef<int> Mask({1, 2});
+  SmallVector<int, 2> Mask({1, 2});
   EXPECT_EQ(
       sandboxir::ShuffleVectorInst::isValidOperands(ArgV1, ArgV2, Mask),
       llvm::ShuffleVectorInst::isValidOperands(LLVMArgV1, LLVMArgV2, Mask));

>From 9ebe8b9abde02340494883d1ed1897ef5837473b Mon Sep 17 00:00:00 2001
From: Rahul Joshi <rjoshi at nvidia.com>
Date: Wed, 21 Aug 2024 15:27:00 -0700
Subject: [PATCH 057/116] [NFC][TableGen] Change global variables from
 anonymous NS to static (#105504)

- Move global variables in TableGen.cpp out of anonymous namespace and
make them static, per LLVM coding standards.
---
 llvm/utils/TableGen/TableGen.cpp | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/llvm/utils/TableGen/TableGen.cpp b/llvm/utils/TableGen/TableGen.cpp
index b2ed48cffe6be5..c420843574cbf3 100644
--- a/llvm/utils/TableGen/TableGen.cpp
+++ b/llvm/utils/TableGen/TableGen.cpp
@@ -33,24 +33,23 @@ cl::opt<bool> EmitLongStrLiterals(
     cl::Hidden, cl::init(true));
 } // end namespace llvm
 
-namespace {
+static cl::OptionCategory PrintEnumsCat("Options for -print-enums");
+static cl::opt<std::string> Class("class",
+                                  cl::desc("Print Enum list for this class"),
+                                  cl::value_desc("class name"),
+                                  cl::cat(PrintEnumsCat));
 
-cl::OptionCategory PrintEnumsCat("Options for -print-enums");
-cl::opt<std::string> Class("class", cl::desc("Print Enum list for this class"),
-                           cl::value_desc("class name"),
-                           cl::cat(PrintEnumsCat));
-
-void PrintRecords(RecordKeeper &Records, raw_ostream &OS) {
+static void PrintRecords(RecordKeeper &Records, raw_ostream &OS) {
   OS << Records; // No argument, dump all contents
 }
 
-void PrintEnums(RecordKeeper &Records, raw_ostream &OS) {
+static void PrintEnums(RecordKeeper &Records, raw_ostream &OS) {
   for (Record *Rec : Records.getAllDerivedDefinitions(Class))
     OS << Rec->getName() << ", ";
   OS << "\n";
 }
 
-void PrintSets(RecordKeeper &Records, raw_ostream &OS) {
+static void PrintSets(RecordKeeper &Records, raw_ostream &OS) {
   SetTheory Sets;
   Sets.addFieldExpander("Set", "Elements");
   for (Record *Rec : Records.getAllDerivedDefinitions("Set")) {
@@ -63,7 +62,7 @@ void PrintSets(RecordKeeper &Records, raw_ostream &OS) {
   }
 }
 
-TableGen::Emitter::Opt X[] = {
+static TableGen::Emitter::Opt X[] = {
     {"print-records", PrintRecords, "Print all records to stdout (default)",
      true},
     {"print-detailed-records", EmitDetailedRecords,
@@ -75,8 +74,6 @@ TableGen::Emitter::Opt X[] = {
     {"print-sets", PrintSets, "Print expanded sets for testing DAG exprs"},
 };
 
-} // namespace
-
 int main(int argc, char **argv) {
   InitLLVM X(argc, argv);
   cl::ParseCommandLineOptions(argc, argv);

>From b5ba726577f7e7af880b62a6352c6208bda4cd0b Mon Sep 17 00:00:00 2001
From: Jorge Gorbe Moya <jgorbe at google.com>
Date: Wed, 21 Aug 2024 15:56:55 -0700
Subject: [PATCH 058/116] [SandboxIR] Add tracking for
 `ShuffleVectorInst::setShuffleMask`. (#105590)

---
 llvm/include/llvm/SandboxIR/SandboxIR.h  |  4 +---
 llvm/include/llvm/SandboxIR/Tracker.h    | 15 ++++++++++++++
 llvm/lib/SandboxIR/SandboxIR.cpp         |  5 +++++
 llvm/lib/SandboxIR/Tracker.cpp           | 14 +++++++++++++
 llvm/unittests/SandboxIR/TrackerTest.cpp | 26 ++++++++++++++++++++++++
 5 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/llvm/include/llvm/SandboxIR/SandboxIR.h b/llvm/include/llvm/SandboxIR/SandboxIR.h
index 01ef8013ea42a0..278951113aed84 100644
--- a/llvm/include/llvm/SandboxIR/SandboxIR.h
+++ b/llvm/include/llvm/SandboxIR/SandboxIR.h
@@ -1024,9 +1024,7 @@ class ShuffleVectorInst final
   static Constant *convertShuffleMaskForBitcode(ArrayRef<int> Mask,
                                                 Type *ResultTy, Context &Ctx);
 
-  void setShuffleMask(ArrayRef<int> Mask) {
-    cast<llvm::ShuffleVectorInst>(Val)->setShuffleMask(Mask);
-  }
+  void setShuffleMask(ArrayRef<int> Mask);
 
   ArrayRef<int> getShuffleMask() const {
     return cast<llvm::ShuffleVectorInst>(Val)->getShuffleMask();
diff --git a/llvm/include/llvm/SandboxIR/Tracker.h b/llvm/include/llvm/SandboxIR/Tracker.h
index 6f205ae2a075c6..c8a9e99a34341d 100644
--- a/llvm/include/llvm/SandboxIR/Tracker.h
+++ b/llvm/include/llvm/SandboxIR/Tracker.h
@@ -62,6 +62,7 @@ class AllocaInst;
 class CatchSwitchInst;
 class SwitchInst;
 class ConstantInt;
+class ShuffleVectorInst;
 
 /// The base class for IR Change classes.
 class IRChangeBase {
@@ -355,6 +356,20 @@ class CreateAndInsertInst final : public IRChangeBase {
 #endif
 };
 
+class ShuffleVectorSetMask final : public IRChangeBase {
+  ShuffleVectorInst *SVI;
+  SmallVector<int, 8> PrevMask;
+
+public:
+  ShuffleVectorSetMask(ShuffleVectorInst *SVI);
+  void revert(Tracker &Tracker) final;
+  void accept() final {}
+#ifndef NDEBUG
+  void dump(raw_ostream &OS) const final { OS << "ShuffleVectorSetMask"; }
+  LLVM_DUMP_METHOD void dump() const final;
+#endif
+};
+
 /// The tracker collects all the change objects and implements the main API for
 /// saving / reverting / accepting.
 class Tracker {
diff --git a/llvm/lib/SandboxIR/SandboxIR.cpp b/llvm/lib/SandboxIR/SandboxIR.cpp
index a62c879b91e8b9..92054e7cab86ee 100644
--- a/llvm/lib/SandboxIR/SandboxIR.cpp
+++ b/llvm/lib/SandboxIR/SandboxIR.cpp
@@ -1868,6 +1868,11 @@ Value *ShuffleVectorInst::create(Value *V1, Value *V2, ArrayRef<int> Mask,
   return Ctx.getOrCreateConstant(cast<llvm::Constant>(NewV));
 }
 
+void ShuffleVectorInst::setShuffleMask(ArrayRef<int> Mask) {
+  Ctx.getTracker().emplaceIfTracking<ShuffleVectorSetMask>(this);
+  cast<llvm::ShuffleVectorInst>(Val)->setShuffleMask(Mask);
+}
+
 Constant *ShuffleVectorInst::getShuffleMaskForBitcode() const {
   return Ctx.getOrCreateConstant(
       cast<llvm::ShuffleVectorInst>(Val)->getShuffleMaskForBitcode());
diff --git a/llvm/lib/SandboxIR/Tracker.cpp b/llvm/lib/SandboxIR/Tracker.cpp
index 38a1c03556650e..953d4bd51353a9 100644
--- a/llvm/lib/SandboxIR/Tracker.cpp
+++ b/llvm/lib/SandboxIR/Tracker.cpp
@@ -234,6 +234,20 @@ void CreateAndInsertInst::dump() const {
 }
 #endif
 
+ShuffleVectorSetMask::ShuffleVectorSetMask(ShuffleVectorInst *SVI)
+    : SVI(SVI), PrevMask(SVI->getShuffleMask()) {}
+
+void ShuffleVectorSetMask::revert(Tracker &Tracker) {
+  SVI->setShuffleMask(PrevMask);
+}
+
+#ifndef NDEBUG
+void ShuffleVectorSetMask::dump() const {
+  dump(dbgs());
+  dbgs() << "\n";
+}
+#endif
+
 void Tracker::save() { State = TrackerState::Record; }
 
 void Tracker::revert() {
diff --git a/llvm/unittests/SandboxIR/TrackerTest.cpp b/llvm/unittests/SandboxIR/TrackerTest.cpp
index 9f502375204024..a2c3080011f162 100644
--- a/llvm/unittests/SandboxIR/TrackerTest.cpp
+++ b/llvm/unittests/SandboxIR/TrackerTest.cpp
@@ -13,6 +13,7 @@
 #include "llvm/IR/Module.h"
 #include "llvm/SandboxIR/SandboxIR.h"
 #include "llvm/Support/SourceMgr.h"
+#include "gmock/gmock-matchers.h"
 #include "gtest/gtest.h"
 
 using namespace llvm;
@@ -792,6 +793,31 @@ define void @foo(i32 %cond0, i32 %cond1) {
   EXPECT_EQ(Switch->findCaseDest(BB1), One);
 }
 
+TEST_F(TrackerTest, ShuffleVectorInstSetters) {
+  parseIR(C, R"IR(
+define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
+  %shuf = shufflevector <2 x i8> %v1, <2 x i8> %v2, <2 x i32> <i32 1, i32 2>
+  ret void
+}
+)IR");
+  Function &LLVMF = *M->getFunction("foo");
+  sandboxir::Context Ctx(C);
+
+  auto *F = Ctx.createFunction(&LLVMF);
+  auto *BB = &*F->begin();
+  auto It = BB->begin();
+  auto *SVI = cast<sandboxir::ShuffleVectorInst>(&*It++);
+
+  // Check setShuffleMask.
+  SmallVector<int, 2> OrigMask(SVI->getShuffleMask());
+  Ctx.save();
+  SVI->setShuffleMask(ArrayRef<int>({0, 0}));
+  EXPECT_THAT(SVI->getShuffleMask(),
+              testing::Not(testing::ElementsAreArray(OrigMask)));
+  Ctx.revert();
+  EXPECT_THAT(SVI->getShuffleMask(), testing::ElementsAreArray(OrigMask));
+}
+
 TEST_F(TrackerTest, AtomicRMWSetters) {
   parseIR(C, R"IR(
 define void @foo(ptr %ptr, i8 %arg) {

>From 6b98a723653214a6cde05ae3cb5233af328ff101 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Wed, 21 Aug 2024 18:02:04 -0500
Subject: [PATCH 059/116] [libc] Add `scanf` support to the GPU build (#104812)

Summary:
The `scanf` function has a "system file" configuration, which is pretty
much what the GPU implementation does at this point. So we should be
able to use it in much the same way.
---
 libc/config/gpu/entrypoints.txt              |  2 +
 libc/docs/gpu/support.rst                    |  2 +
 libc/src/stdio/CMakeLists.txt                |  2 +-
 libc/src/stdio/scanf_core/CMakeLists.txt     | 46 ++++++++++++--------
 libc/src/stdio/scanf_core/vfscanf_internal.h | 28 +++++++++++-
 5 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index bbae3298fae615..d7f35bc1edf5a0 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -192,6 +192,8 @@ set(TARGET_LIBC_ENTRYPOINTS
     libc.src.stdio.vsprintf
     libc.src.stdio.asprintf
     libc.src.stdio.vasprintf
+    libc.src.stdio.scanf
+    libc.src.stdio.fscanf
     libc.src.stdio.sscanf
     libc.src.stdio.vsscanf
     libc.src.stdio.feof
diff --git a/libc/docs/gpu/support.rst b/libc/docs/gpu/support.rst
index 5ef298a2ba58f2..c8b1052ce16895 100644
--- a/libc/docs/gpu/support.rst
+++ b/libc/docs/gpu/support.rst
@@ -239,6 +239,8 @@ snprintf       |check|
 vsprintf       |check|
 vsnprintf      |check|
 sscanf         |check|
+scanf          |check|
+fscanf         |check|
 putchar        |check|    |check|
 fclose         |check|    |check|
 fopen          |check|    |check|
diff --git a/libc/src/stdio/CMakeLists.txt b/libc/src/stdio/CMakeLists.txt
index bc5ef5fe0e9b48..372b8fc8192455 100644
--- a/libc/src/stdio/CMakeLists.txt
+++ b/libc/src/stdio/CMakeLists.txt
@@ -101,7 +101,7 @@ list(APPEND scanf_deps
       libc.hdr.types.FILE
 )
 
-if(LLVM_LIBC_FULL_BUILD)
+if(LLVM_LIBC_FULL_BUILD AND NOT LIBC_TARGET_OS_IS_GPU)
   list(APPEND scanf_deps
       libc.src.__support.File.file
       libc.src.__support.File.platform_file
diff --git a/libc/src/stdio/scanf_core/CMakeLists.txt b/libc/src/stdio/scanf_core/CMakeLists.txt
index e2b49e0c915284..5c00ae0c9973c2 100644
--- a/libc/src/stdio/scanf_core/CMakeLists.txt
+++ b/libc/src/stdio/scanf_core/CMakeLists.txt
@@ -92,21 +92,33 @@ add_object_library(
     libc.src.__support.str_to_float
 )
 
-if(NOT (TARGET libc.src.__support.File.file) AND LLVM_LIBC_FULL_BUILD)
-  # Not all platforms have a file implementation. If file is unvailable, and a
-  # full build is requested, then we must skip all file based printf sections.
-  return()
+if(LIBC_TARGET_OS_IS_GPU)
+  add_header_library(
+    vfscanf_internal
+    HDRS
+      vfscanf_internal.h
+    DEPENDS
+      .reader
+      .scanf_main
+      libc.include.stdio
+      libc.src.__support.arg_list
+      libc.src.stdio.getc
+      libc.src.stdio.ungetc
+      libc.src.stdio.ferror
+    COMPILE_OPTIONS
+      -DLIBC_COPT_STDIO_USE_SYSTEM_FILE
+  )
+elseif(TARGET libc.src.__support.File.file OR (NOT LLVM_LIBC_FULL_BUILD))
+  add_header_library(
+    vfscanf_internal
+    HDRS
+      vfscanf_internal.h
+    DEPENDS
+      .reader
+      .scanf_main
+      libc.include.stdio
+      libc.src.__support.File.file
+      libc.src.__support.arg_list
+    ${use_system_file}
+  )
 endif()
-
-add_header_library(
-  vfscanf_internal
-  HDRS
-    vfscanf_internal.h
-  DEPENDS
-    .reader
-    .scanf_main
-    libc.include.stdio
-    libc.src.__support.File.file
-    libc.src.__support.arg_list
-  ${use_system_file}
-)
diff --git a/libc/src/stdio/scanf_core/vfscanf_internal.h b/libc/src/stdio/scanf_core/vfscanf_internal.h
index 2b0072a6ae35f3..67126431fcded5 100644
--- a/libc/src/stdio/scanf_core/vfscanf_internal.h
+++ b/libc/src/stdio/scanf_core/vfscanf_internal.h
@@ -12,9 +12,16 @@
 #include "src/__support/File/file.h"
 #include "src/__support/arg_list.h"
 #include "src/__support/macros/config.h"
+#include "src/__support/macros/properties/architectures.h"
 #include "src/stdio/scanf_core/reader.h"
 #include "src/stdio/scanf_core/scanf_main.h"
 
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+#include "src/stdio/ferror.h"
+#include "src/stdio/getc.h"
+#include "src/stdio/ungetc.h"
+#endif
+
 #include "hdr/types/FILE.h"
 #include <stddef.h>
 
@@ -22,7 +29,26 @@ namespace LIBC_NAMESPACE_DECL {
 
 namespace internal {
 
-#ifndef LIBC_COPT_STDIO_USE_SYSTEM_FILE
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+// The GPU build provides FILE access through the host operating system's
+// library. So here we simply use the public entrypoints like in the SYSTEM_FILE
+// interface. Entrypoints should normally not call others, this is an exception.
+// FIXME: We do not acquire any locks here, so this is not thread safe.
+LIBC_INLINE void flockfile(::FILE *) { return; }
+
+LIBC_INLINE void funlockfile(::FILE *) { return; }
+
+LIBC_INLINE int getc(void *f) {
+  return LIBC_NAMESPACE::getc(reinterpret_cast<::FILE *>(f));
+}
+
+LIBC_INLINE void ungetc(int c, void *f) {
+  LIBC_NAMESPACE::ungetc(c, reinterpret_cast<::FILE *>(f));
+}
+
+LIBC_INLINE int ferror_unlocked(::FILE *f) { return LIBC_NAMESPACE::ferror(f); }
+
+#elif !defined(LIBC_COPT_STDIO_USE_SYSTEM_FILE)
 
 LIBC_INLINE void flockfile(FILE *f) {
   reinterpret_cast<LIBC_NAMESPACE::File *>(f)->lock();

>From c557d8520413476221a4f3bf2b7b3fed17681691 Mon Sep 17 00:00:00 2001
From: Peter Klausler <35819229+klausler at users.noreply.github.com>
Date: Wed, 21 Aug 2024 16:08:06 -0700
Subject: [PATCH 060/116] [flang][runtime] Add build-time flags to runtime to
 adjust SELECTED_x_KIND() (#105575)

Add FLANG_RUNTIME_NO_INTEGER_16 and FLANG_RUNTIME_NO_REAL_{2,10,16} to
allow one to disable those kinds from being returned from
SELECTED_INT_KIND and SELECTED_REAL_KIND even if they are actually
available in the C++ build compiler.
---
 flang/runtime/numeric.cpp | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/flang/runtime/numeric.cpp b/flang/runtime/numeric.cpp
index 7c40beb31083ff..28687b1971b7ed 100644
--- a/flang/runtime/numeric.cpp
+++ b/flang/runtime/numeric.cpp
@@ -105,7 +105,7 @@ inline RT_API_ATTRS CppTypeFor<TypeCategory::Integer, 4> SelectedIntKind(T x) {
     return 4;
   } else if (x <= 18) {
     return 8;
-#ifdef __SIZEOF_INT128__
+#if defined __SIZEOF_INT128__ && !defined FLANG_RUNTIME_NO_INTEGER_16
   } else if (x <= 38) {
     return 16;
 #endif
@@ -137,23 +137,35 @@ inline RT_API_ATTRS CppTypeFor<TypeCategory::Integer, 4> SelectedRealKind(
     return -5;
   }
 
+#ifndef FLANG_RUNTIME_NO_REAL_2
+  constexpr bool hasReal2{true};
+#else
+  constexpr bool hasReal2{false};
+#endif
+#if defined LDBL_MANT_DIG == 64 && !defined FLANG_RUNTIME_NO_REAL_10
+  constexpr bool hasReal10{true};
+#else
+  constexpr bool hasReal10{false};
+#endif
+#if (LDBL_MANT_DIG == 64 || LDBL_MANT_DIG == 113) && \
+    !defined FLANG_RUNTIME_NO_REAL_16
+  constexpr bool hasReal16{true};
+#else
+  constexpr bool hasReal16{false};
+#endif
+
   int error{0};
   int kind{0};
-  if (p <= 3) {
+  if (hasReal2 && p <= 3) {
     kind = 2;
   } else if (p <= 6) {
     kind = 4;
   } else if (p <= 15) {
     kind = 8;
-#if LDBL_MANT_DIG == 64
-  } else if (p <= 18) {
+  } else if (hasReal10 && p <= 18) {
     kind = 10;
-  } else if (p <= 33) {
-    kind = 16;
-#elif LDBL_MANT_DIG == 113
-  } else if (p <= 33) {
+  } else if (hasReal16 && p <= 33) {
     kind = 16;
-#endif
   } else {
     error -= 1;
   }
@@ -164,13 +176,10 @@ inline RT_API_ATTRS CppTypeFor<TypeCategory::Integer, 4> SelectedRealKind(
     kind = kind < 3 ? (p == 3 ? 4 : 3) : kind;
   } else if (r <= 307) {
     kind = kind < 8 ? 8 : kind;
-#if LDBL_MANT_DIG == 64
-  } else if (r <= 4931) {
+  } else if (hasReal10 && r <= 4931) {
     kind = kind < 10 ? 10 : kind;
-#elif LDBL_MANT_DIG == 113
-  } else if (r <= 4931) {
+  } else if (hasReal16 && r <= 4931) {
     kind = kind < 16 ? 16 : kind;
-#endif
   } else {
     error -= 2;
   }

>From ec8fe7ad81af6c211fb26c34824092e5bca08f5e Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Wed, 21 Aug 2024 16:53:01 -0700
Subject: [PATCH 061/116] [LTO] Use enum class for ImportFailureReason (NFC)
 (#105564)

It turns out that all uses of the enum values here are already
qualified like FunctionImporter::ImportFailureReason::None, so we can
switch to enum class without touching the rest of the codebase.
---
 llvm/include/llvm/Transforms/IPO/FunctionImport.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/include/llvm/Transforms/IPO/FunctionImport.h b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
index 6df597c300c180..5dad572532c8ae 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionImport.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
@@ -42,7 +42,7 @@ class FunctionImporter {
 
   /// The different reasons selectCallee will chose not to import a
   /// candidate.
-  enum ImportFailureReason {
+  enum class ImportFailureReason {
     None,
     // We can encounter a global variable instead of a function in rare
     // situations with SamplePGO. See comments where this failure type is

>From fdbc4089e7a6eafa4002a7981bcde94fc378bc18 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Wed, 21 Aug 2024 16:53:18 -0700
Subject: [PATCH 062/116] [LTO] Compare std::optional<ImportKind> directly with
 ImportKind (NFC) (#105561)

Note that:

  Opt == Val if and only (Opt && *Opt == Val)

where:

  std::optional<T> Opt;
  T Val;
---
 llvm/lib/Transforms/IPO/FunctionImport.cpp | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/FunctionImport.cpp b/llvm/lib/Transforms/IPO/FunctionImport.cpp
index 6ae89a49b6b9a3..92371720e0eceb 100644
--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -1814,10 +1814,7 @@ Expected<bool> FunctionImporter::importFunctions(
         continue;
       auto GUID = F.getGUID();
       auto MaybeImportType = getImportType(ImportGUIDs, GUID);
-
-      bool ImportDefinition =
-          (MaybeImportType &&
-           (*MaybeImportType == GlobalValueSummary::Definition));
+      bool ImportDefinition = MaybeImportType == GlobalValueSummary::Definition;
 
       LLVM_DEBUG(dbgs() << (MaybeImportType ? "Is" : "Not")
                         << " importing function"
@@ -1853,10 +1850,7 @@ Expected<bool> FunctionImporter::importFunctions(
         continue;
       auto GUID = GV.getGUID();
       auto MaybeImportType = getImportType(ImportGUIDs, GUID);
-
-      bool ImportDefinition =
-          (MaybeImportType &&
-           (*MaybeImportType == GlobalValueSummary::Definition));
+      bool ImportDefinition = MaybeImportType == GlobalValueSummary::Definition;
 
       LLVM_DEBUG(dbgs() << (MaybeImportType ? "Is" : "Not")
                         << " importing global"
@@ -1876,10 +1870,7 @@ Expected<bool> FunctionImporter::importFunctions(
         continue;
       auto GUID = GA.getGUID();
       auto MaybeImportType = getImportType(ImportGUIDs, GUID);
-
-      bool ImportDefinition =
-          (MaybeImportType &&
-           (*MaybeImportType == GlobalValueSummary::Definition));
+      bool ImportDefinition = MaybeImportType == GlobalValueSummary::Definition;
 
       LLVM_DEBUG(dbgs() << (MaybeImportType ? "Is" : "Not")
                         << " importing alias"

>From 19d3f3417100dc99caa4394fbd26fc0c4702264e Mon Sep 17 00:00:00 2001
From: Adrian Prantl <aprantl at apple.com>
Date: Wed, 21 Aug 2024 16:51:54 -0700
Subject: [PATCH 063/116] [lldb] Speculative fix for trap_frame_sym_ctx.test

Unfortunately I can't actually reproduce this locally.
---
 lldb/test/Shell/Unwind/trap_frame_sym_ctx.test | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test b/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test
index 1bf1fb1d6e85f9..08a26616240e68 100644
--- a/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test
+++ b/lldb/test/Shell/Unwind/trap_frame_sym_ctx.test
@@ -15,7 +15,7 @@ breakpoint set -n bar
 process launch
 # CHECK: stop reason = breakpoint 1.1
 
-thread backtrace
+thread backtrace -u
 # CHECK: frame #0: {{.*}}`bar
 # CHECK: frame #1: {{.*}}`tramp
 # CHECK: frame #2: {{.*}}`main

>From 1e70122cbc187c08de91a3fb42843efb1221e0e9 Mon Sep 17 00:00:00 2001
From: Mircea Trofin <mtrofin at google.com>
Date: Wed, 21 Aug 2024 17:17:46 -0700
Subject: [PATCH 064/116] [ctx_prof] API to get the instrumentation of a BB
 (#105468)

Analogous to PR #104491

Issue #89287
---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  5 +++++
 llvm/lib/Analysis/CtxProfAnalysis.cpp         |  7 ++++++
 .../Analysis/CtxProfAnalysisTest.cpp          | 22 +++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -95,7 +95,12 @@ class CtxProfAnalysis : public AnalysisInfoMixin<CtxProfAnalysis> {
 
   PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM);
 
+  /// Get the instruction instrumenting a callsite, or nullptr if that cannot be
+  /// found.
   static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB);
+
+  /// Get the instruction instrumenting a BB, or nullptr if not present.
+  static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB);
 };
 
 class CtxProfAnalysisPrinterPass
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index ceebb2cf06d235..3fc1bc34afb97e 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -202,6 +202,13 @@ InstrProfCallsite *CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) {
   return nullptr;
 }
 
+InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) {
+  for (auto &I : BB)
+    if (auto *Incr = dyn_cast<InstrProfIncrementInst>(&I))
+      return Incr;
+  return nullptr;
+}
+
 static void
 preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles,
               function_ref<void(const PGOCtxProfContext &)> Visitor) {
diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
index 5f9bf3ec540eb3..fbe3a6e45109cc 100644
--- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
+++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
@@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) {
   EXPECT_EQ(IndIns, nullptr);
 }
 
+TEST_F(CtxProfAnalysisTest, GetBBIDTest) {
+  ModulePassManager MPM;
+  MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF));
+  EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved());
+  auto *F = M->getFunction("foo");
+  ASSERT_NE(F, nullptr);
+  std::map<std::string, int> BBNameAndID;
+
+  for (auto &BB : *F) {
+    auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB);
+    if (Ins)
+      BBNameAndID[BB.getName().str()] =
+          static_cast<int>(Ins->getIndex()->getZExtValue());
+    else
+      BBNameAndID[BB.getName().str()] = -1;
+  }
+
+  EXPECT_THAT(BBNameAndID,
+              testing::UnorderedElementsAre(
+                  testing::Pair("", 0), testing::Pair("yes", 1),
+                  testing::Pair("no", -1), testing::Pair("exit", -1)));
+}
 } // namespace

>From f25e6515aa04e53a642bc79eb09a96e418cbbb03 Mon Sep 17 00:00:00 2001
From: Connie Zhu <60797237+connieyzhu at users.noreply.github.com>
Date: Wed, 21 Aug 2024 17:26:16 -0700
Subject: [PATCH 065/116] [compiler-rt][test] Added REQUIRES:shell to fuzzer
 test with for-loop (#105557)

This patch makes the features_dir.test file require a shell when
running. This will make the test file unsupported when running llvm-lit
with its internal shell implementation, which is enabled by turning on
the LIT_USE_INTERNAL_SHELL environment variable. Lit's internal shell
currently does not support for-loop syntax.
---
 compiler-rt/test/fuzzer/features_dir.test | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compiler-rt/test/fuzzer/features_dir.test b/compiler-rt/test/fuzzer/features_dir.test
index c6beec01bc3ab2..ce63b3920708cc 100644
--- a/compiler-rt/test/fuzzer/features_dir.test
+++ b/compiler-rt/test/fuzzer/features_dir.test
@@ -1,5 +1,5 @@
 # Tests -features_dir=F
-# REQUIRES: linux
+# REQUIRES: linux, shell
 RUN: %cpp_compiler %S/SimpleTest.cpp -o %t-SimpleTest
 RUN: rm -rf %t-C %t-F
 RUN: mkdir %t-C %t-F

>From 04c827d0b5e629ba53e8ede94811a13a96db36a4 Mon Sep 17 00:00:00 2001
From: Jorge Gorbe Moya <jgorbe at google.com>
Date: Wed, 21 Aug 2024 17:37:17 -0700
Subject: [PATCH 066/116] [SandboxIR] Simplify matchers in ShuffleVectorInst
 unit test (NFC) (#105596)

Replace instances of `testing::ContainerEq(ArrayRef<int>({1, 2, 3, 4}))`
with `testing::ElementsAre(1, 2, 3, 4)` which is simpler and more
readable.
---
 llvm/unittests/SandboxIR/SandboxIRTest.cpp | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/llvm/unittests/SandboxIR/SandboxIRTest.cpp b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
index 8315ee38dbe187..b6981027b4c040 100644
--- a/llvm/unittests/SandboxIR/SandboxIRTest.cpp
+++ b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
@@ -815,8 +815,7 @@ define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
     I->commute();
     EXPECT_EQ(I->getOperand(0), ArgV2);
     EXPECT_EQ(I->getOperand(1), ArgV1);
-    EXPECT_THAT(I->getShuffleMask(),
-                testing::ContainerEq(ArrayRef<int>({2, 0})));
+    EXPECT_THAT(I->getShuffleMask(), testing::ElementsAre(2, 0));
   }
 
   // getType
@@ -828,17 +827,16 @@ define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
 
   // getShuffleMask / getShuffleMaskForBitcode
   {
-    EXPECT_THAT(SVI->getShuffleMask(),
-                testing::ContainerEq(ArrayRef<int>({0, 2})));
+    EXPECT_THAT(SVI->getShuffleMask(), testing::ElementsAre(0, 2));
 
     SmallVector<int, 2> Result;
     SVI->getShuffleMask(Result);
-    EXPECT_THAT(Result, testing::ContainerEq(ArrayRef<int>({0, 2})));
+    EXPECT_THAT(Result, testing::ElementsAre(0, 2));
 
     Result.clear();
     sandboxir::ShuffleVectorInst::getShuffleMask(
         SVI->getShuffleMaskForBitcode(), Result);
-    EXPECT_THAT(Result, testing::ContainerEq(ArrayRef<int>({0, 2})));
+    EXPECT_THAT(Result, testing::ElementsAre(0, 2));
   }
 
   // convertShuffleMaskForBitcode
@@ -847,15 +845,14 @@ define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
         ArrayRef<int>({2, 3}), ArgV1->getType(), Ctx);
     SmallVector<int, 2> Result;
     sandboxir::ShuffleVectorInst::getShuffleMask(C, Result);
-    EXPECT_THAT(Result, testing::ContainerEq(ArrayRef<int>({2, 3})));
+    EXPECT_THAT(Result, testing::ElementsAre(2, 3));
   }
 
   // setShuffleMask
   {
     auto *I = CreateShuffleWithMask(0, 1);
     I->setShuffleMask(ArrayRef<int>({2, 3}));
-    EXPECT_THAT(I->getShuffleMask(),
-                testing::ContainerEq(ArrayRef<int>({2, 3})));
+    EXPECT_THAT(I->getShuffleMask(), testing::ElementsAre(2, 3));
   }
 
   // The following functions check different mask properties. Note that most
@@ -1107,7 +1104,7 @@ define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
   {
     SmallVector<int, 4> M = {0, 2, 1, 3};
     ShuffleVectorInst::commuteShuffleMask(M, 2);
-    EXPECT_THAT(M, testing::ContainerEq(ArrayRef<int>({2, 0, 3, 1})));
+    EXPECT_THAT(M, testing::ElementsAre(2, 0, 3, 1));
   }
 
   // isInterleave / isInterleaveMask
@@ -1119,7 +1116,7 @@ define void @foo(<2 x i8> %v1, <2 x i8> %v2) {
     SmallVector<unsigned, 4> StartIndexes;
     EXPECT_TRUE(sandboxir::ShuffleVectorInst::isInterleaveMask(
         I->getShuffleMask(), 2, 4, StartIndexes));
-    EXPECT_THAT(StartIndexes, testing::ContainerEq(ArrayRef<unsigned>({0, 2})));
+    EXPECT_THAT(StartIndexes, testing::ElementsAre(0, 2));
   }
   {
     auto *I = CreateShuffleWithMask(0, 3, 1, 2);

>From 64e464349bfca0d90e07f6db2f710d4d53cdacd4 Mon Sep 17 00:00:00 2001
From: eddyz87 <eddyz87 at gmail.com>
Date: Thu, 22 Aug 2024 03:40:56 +0300
Subject: [PATCH 067/116] [BPF] introduce __attribute__((bpf_fastcall))
 (#105417)

This commit introduces attribute bpf_fastcall to declare BPF functions
that do not clobber some of the caller saved registers (R0-R5).

The idea is to generate the code complying with generic BPF ABI,
but allow compatible Linux Kernel to remove unnecessary spills and
fills of non-scratched registers (given some compiler assistance).

For such functions do register allocation as-if caller saved registers
are not clobbered, but later wrap the calls with spill and fill
patterns that are simple to recognize in kernel.

For example for the following C code:

    #define __bpf_fastcall __attribute__((bpf_fastcall))

    void bar(void) __bpf_fastcall;
    void buz(long i, long j, long k);

    void foo(long i, long j, long k) {
      bar();
      buz(i, j, k);
    }

First allocate registers as if:

    foo:
      call bar    # note: no spills for i,j,k (r1,r2,r3)
      call buz
      exit

And later insert spills fills on the peephole phase:

    foo:
      *(u64 *)(r10 - 8) = r1;  # Such call pattern is
      *(u64 *)(r10 - 16) = r2; # correct when used with
      *(u64 *)(r10 - 24) = r3; # old kernels.
      call bar
      r3 = *(u64 *)(r10 - 24); # But also allows new
      r2 = *(u64 *)(r10 - 16); # kernels to recognize the
      r1 = *(u64 *)(r10 - 8);  # pattern and remove spills/fills.
      call buz
      exit

The offsets for generated spills/fills are picked as minimal stack
offsets for the function. Allocated stack slots are not used for any
other purposes, in order to simplify in-kernel analysis.
---
 clang/include/clang/Basic/Attr.td             |   9 ++
 clang/include/clang/Basic/AttrDocs.td         |  19 +++
 clang/lib/CodeGen/CGCall.cpp                  |   2 +
 clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c  |  24 ++++
 ...a-attribute-supported-attributes-list.test |   1 +
 clang/test/Sema/bpf-attr-bpf-fastcall.c       |  14 +++
 llvm/lib/Target/BPF/BPFCallingConv.td         |   1 +
 llvm/lib/Target/BPF/BPFISelLowering.cpp       |  31 +++++
 llvm/lib/Target/BPF/BPFInstrInfo.td           |   4 +-
 llvm/lib/Target/BPF/BPFMIPeephole.cpp         |  84 +++++++++++++
 llvm/lib/Target/BPF/BPFRegisterInfo.cpp       |  11 ++
 llvm/lib/Target/BPF/BPFRegisterInfo.h         |   3 +
 llvm/test/CodeGen/BPF/bpf-fastcall-1.ll       |  46 ++++++++
 llvm/test/CodeGen/BPF/bpf-fastcall-2.ll       |  68 +++++++++++
 llvm/test/CodeGen/BPF/bpf-fastcall-3.ll       |  62 ++++++++++
 .../CodeGen/BPF/bpf-fastcall-regmask-1.ll     | 110 ++++++++++++++++++
 16 files changed, 486 insertions(+), 3 deletions(-)
 create mode 100644 clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c
 create mode 100644 clang/test/Sema/bpf-attr-bpf-fastcall.c
 create mode 100644 llvm/test/CodeGen/BPF/bpf-fastcall-1.ll
 create mode 100644 llvm/test/CodeGen/BPF/bpf-fastcall-2.ll
 create mode 100644 llvm/test/CodeGen/BPF/bpf-fastcall-3.ll
 create mode 100644 llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll

diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 10a9d9e899e007..98bedfe20f5d98 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2200,6 +2200,15 @@ def BTFTypeTag : TypeAttr {
   let LangOpts = [COnly];
 }
 
+def BPFFastCall : InheritableAttr,
+                  TargetSpecificAttr<TargetBPF> {
+  let Spellings = [Clang<"bpf_fastcall">];
+  let Subjects = SubjectList<[FunctionLike]>;
+  let Documentation = [BPFFastCallDocs];
+  let LangOpts = [COnly];
+  let SimpleHandler = 1;
+}
+
 def WebAssemblyExportName : InheritableAttr,
                             TargetSpecificAttr<TargetWebAssembly> {
   let Spellings = [Clang<"export_name">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 19cbb9a0111a28..df36a2163b9f0b 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -2345,6 +2345,25 @@ section.
   }];
 }
 
+def BPFFastCallDocs : Documentation {
+  let Category = DocCatType;
+  let Content = [{
+Functions annotated with this attribute are likely to be inlined by BPF JIT.
+It is assumed that inlined implementation uses less caller saved registers,
+than a regular function.
+Specifically, the following registers are likely to be preserved:
+- ``R0`` if function return value is ``void``;
+- ``R2-R5` if function takes 1 argument;
+- ``R3-R5` if function takes 2 arguments;
+- ``R4-R5` if function takes 3 arguments;
+- ``R5`` if function takes 4 arguments;
+
+For such functions Clang generates code pattern that allows BPF JIT
+to recognize and remove unnecessary spills and fills of the preserved
+registers.
+  }];
+}
+
 def MipsInterruptDocs : Documentation {
   let Category = DocCatFunction;
   let Heading = "interrupt (MIPS)";
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 34ca2227608361..ca2c79b51ac96b 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2421,6 +2421,8 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
       FuncAttrs.addAttribute(llvm::Attribute::NoCfCheck);
     if (TargetDecl->hasAttr<LeafAttr>())
       FuncAttrs.addAttribute(llvm::Attribute::NoCallback);
+    if (TargetDecl->hasAttr<BPFFastCallAttr>())
+      FuncAttrs.addAttribute("bpf_fastcall");
 
     HasOptnone = TargetDecl->hasAttr<OptimizeNoneAttr>();
     if (auto *AllocSize = TargetDecl->getAttr<AllocSizeAttr>()) {
diff --git a/clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c b/clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c
new file mode 100644
index 00000000000000..fa740d8e44ff51
--- /dev/null
+++ b/clang/test/CodeGen/bpf-attr-bpf-fastcall-1.c
@@ -0,0 +1,24 @@
+// REQUIRES: bpf-registered-target
+// RUN: %clang_cc1 -triple bpf -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s
+
+#define __bpf_fastcall __attribute__((bpf_fastcall))
+
+void test(void) __bpf_fastcall;
+void (*ptr)(void) __bpf_fastcall;
+
+void foo(void) {
+  test();
+  (*ptr)();
+}
+
+// CHECK: @ptr = global ptr null
+// CHECK: define {{.*}} void @foo()
+// CHECK: entry:
+// CHECK:   call void @test() #[[call_attr:[0-9]+]]
+// CHECK:   %[[ptr:.*]] = load ptr, ptr @ptr, align 8
+// CHECK:   call void %[[ptr]]() #[[call_attr]]
+// CHECK:   ret void
+
+// CHECK: declare void @test() #[[func_attr:[0-9]+]]
+// CHECK: attributes #[[func_attr]] = { {{.*}}"bpf_fastcall"{{.*}} }
+// CHECK: attributes #[[call_attr]] = { "bpf_fastcall" }
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index 1a71556213bb16..a7e425e3d5f431 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -22,6 +22,7 @@
 // CHECK-NEXT: AssumeAligned (SubjectMatchRule_objc_method, SubjectMatchRule_function)
 // CHECK-NEXT: Availability ((SubjectMatchRule_record, SubjectMatchRule_enum, SubjectMatchRule_enum_constant, SubjectMatchRule_field, SubjectMatchRule_function, SubjectMatchRule_namespace, SubjectMatchRule_objc_category, SubjectMatchRule_objc_implementation, SubjectMatchRule_objc_interface, SubjectMatchRule_objc_method, SubjectMatchRule_objc_property, SubjectMatchRule_objc_protocol, SubjectMatchRule_record, SubjectMatchRule_type_alias, SubjectMatchRule_variable))
 // CHECK-NEXT: AvailableOnlyInDefaultEvalMethod (SubjectMatchRule_type_alias)
+// CHECK-NEXT: BPFFastCall (SubjectMatchRule_hasType_functionType)
 // CHECK-NEXT: BPFPreserveAccessIndex (SubjectMatchRule_record)
 // CHECK-NEXT: BPFPreserveStaticOffset (SubjectMatchRule_record)
 // CHECK-NEXT: BTFDeclTag (SubjectMatchRule_variable, SubjectMatchRule_function, SubjectMatchRule_record, SubjectMatchRule_field, SubjectMatchRule_type_alias)
diff --git a/clang/test/Sema/bpf-attr-bpf-fastcall.c b/clang/test/Sema/bpf-attr-bpf-fastcall.c
new file mode 100644
index 00000000000000..178b1f50741e87
--- /dev/null
+++ b/clang/test/Sema/bpf-attr-bpf-fastcall.c
@@ -0,0 +1,14 @@
+// REQUIRES: bpf-registered-target
+// RUN: %clang_cc1 %s -triple bpf -verify
+
+__attribute__((bpf_fastcall)) int var; // expected-warning {{'bpf_fastcall' attribute only applies to functions and function pointers}}
+
+__attribute__((bpf_fastcall)) void func();
+__attribute__((bpf_fastcall(1))) void func_invalid(); // expected-error {{'bpf_fastcall' attribute takes no arguments}}
+
+void (*ptr1)(void) __attribute__((bpf_fastcall));
+void (*ptr2)(void);
+void foo(void) {
+  ptr2 = ptr1; // not an error
+  ptr1 = ptr2; // not an error
+}
diff --git a/llvm/lib/Target/BPF/BPFCallingConv.td b/llvm/lib/Target/BPF/BPFCallingConv.td
index ef4ef1930aa8fb..a557211437e95f 100644
--- a/llvm/lib/Target/BPF/BPFCallingConv.td
+++ b/llvm/lib/Target/BPF/BPFCallingConv.td
@@ -46,3 +46,4 @@ def CC_BPF32 : CallingConv<[
 ]>;
 
 def CSR : CalleeSavedRegs<(add R6, R7, R8, R9, R10)>;
+def CSR_PreserveAll : CalleeSavedRegs<(add R0, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10)>;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index 071fe004806e3e..ff23d3b055d0d5 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -402,6 +402,21 @@ SDValue BPFTargetLowering::LowerFormalArguments(
 
 const size_t BPFTargetLowering::MaxArgs = 5;
 
+static void resetRegMaskBit(const TargetRegisterInfo *TRI, uint32_t *RegMask,
+                            MCRegister Reg) {
+  for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg))
+    RegMask[SubReg / 32] &= ~(1u << (SubReg % 32));
+}
+
+static uint32_t *regMaskFromTemplate(const TargetRegisterInfo *TRI,
+                                     MachineFunction &MF,
+                                     const uint32_t *BaseRegMask) {
+  uint32_t *RegMask = MF.allocateRegMask();
+  unsigned RegMaskSize = MachineOperand::getRegMaskSize(TRI->getNumRegs());
+  memcpy(RegMask, BaseRegMask, sizeof(RegMask[0]) * RegMaskSize);
+  return RegMask;
+}
+
 SDValue BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
                                      SmallVectorImpl<SDValue> &InVals) const {
   SelectionDAG &DAG = CLI.DAG;
@@ -513,6 +528,22 @@ SDValue BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
   for (auto &Reg : RegsToPass)
     Ops.push_back(DAG.getRegister(Reg.first, Reg.second.getValueType()));
 
+  bool HasFastCall =
+      (CLI.CB && isa<CallInst>(CLI.CB) && CLI.CB->hasFnAttr("bpf_fastcall"));
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+  if (HasFastCall) {
+    uint32_t *RegMask = regMaskFromTemplate(
+        TRI, MF, TRI->getCallPreservedMask(MF, CallingConv::PreserveAll));
+    for (auto const &RegPair : RegsToPass)
+      resetRegMaskBit(TRI, RegMask, RegPair.first);
+    if (!CLI.CB->getType()->isVoidTy())
+      resetRegMaskBit(TRI, RegMask, BPF::R0);
+    Ops.push_back(DAG.getRegisterMask(RegMask));
+  } else {
+    Ops.push_back(
+        DAG.getRegisterMask(TRI->getCallPreservedMask(MF, CLI.CallConv)));
+  }
+
   if (InGlue.getNode())
     Ops.push_back(InGlue);
 
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
index 2ee630e29790f3..4baeeb017699d6 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -677,9 +677,7 @@ let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
 }
 
 // Jump and link
-let isCall=1, hasDelaySlot=0, Uses = [R11],
-    // Potentially clobbered registers
-    Defs = [R0, R1, R2, R3, R4, R5] in {
+let isCall=1, hasDelaySlot=0, Uses = [R11] in {
   def JAL  : CALL<"call">;
   def JALX  : CALLX<"callx">;
 }
diff --git a/llvm/lib/Target/BPF/BPFMIPeephole.cpp b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
index f0edf706bd8fd7..c41eab319dbb9b 100644
--- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -24,6 +24,8 @@
 #include "BPFInstrInfo.h"
 #include "BPFTargetMachine.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/LivePhysRegs.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
@@ -319,6 +321,7 @@ struct BPFMIPreEmitPeephole : public MachineFunctionPass {
   bool in16BitRange(int Num);
   bool eliminateRedundantMov();
   bool adjustBranch();
+  bool insertMissingCallerSavedSpills();
 
 public:
 
@@ -333,6 +336,7 @@ struct BPFMIPreEmitPeephole : public MachineFunctionPass {
     Changed = eliminateRedundantMov();
     if (SupportGotol)
       Changed = adjustBranch() || Changed;
+    Changed |= insertMissingCallerSavedSpills();
     return Changed;
   }
 };
@@ -596,6 +600,86 @@ bool BPFMIPreEmitPeephole::adjustBranch() {
   return Changed;
 }
 
+static const unsigned CallerSavedRegs[] = {BPF::R0, BPF::R1, BPF::R2,
+                                           BPF::R3, BPF::R4, BPF::R5};
+
+struct BPFFastCall {
+  MachineInstr *MI;
+  unsigned LiveCallerSavedRegs;
+};
+
+static void collectBPFFastCalls(const TargetRegisterInfo *TRI,
+                                LivePhysRegs &LiveRegs, MachineBasicBlock &BB,
+                                SmallVectorImpl<BPFFastCall> &Calls) {
+  LiveRegs.init(*TRI);
+  LiveRegs.addLiveOuts(BB);
+  Calls.clear();
+  for (MachineInstr &MI : llvm::reverse(BB)) {
+    if (MI.isCall()) {
+      unsigned LiveCallerSavedRegs = 0;
+      for (MCRegister R : CallerSavedRegs) {
+        bool DoSpillFill = !MI.definesRegister(R, TRI) && LiveRegs.contains(R);
+        if (!DoSpillFill)
+          continue;
+        LiveCallerSavedRegs |= 1 << R;
+      }
+      if (LiveCallerSavedRegs)
+        Calls.push_back({&MI, LiveCallerSavedRegs});
+    }
+    LiveRegs.stepBackward(MI);
+  }
+}
+
+static int64_t computeMinFixedObjOffset(MachineFrameInfo &MFI,
+                                        unsigned SlotSize) {
+  int64_t MinFixedObjOffset = 0;
+  // Same logic as in X86FrameLowering::adjustFrameForMsvcCxxEh()
+  for (int I = MFI.getObjectIndexBegin(); I < MFI.getObjectIndexEnd(); ++I) {
+    if (MFI.isDeadObjectIndex(I))
+      continue;
+    MinFixedObjOffset = std::min(MinFixedObjOffset, MFI.getObjectOffset(I));
+  }
+  MinFixedObjOffset -=
+      (SlotSize + MinFixedObjOffset % SlotSize) & (SlotSize - 1);
+  return MinFixedObjOffset;
+}
+
+bool BPFMIPreEmitPeephole::insertMissingCallerSavedSpills() {
+  MachineFrameInfo &MFI = MF->getFrameInfo();
+  SmallVector<BPFFastCall, 8> Calls;
+  LivePhysRegs LiveRegs;
+  const unsigned SlotSize = 8;
+  int64_t MinFixedObjOffset = computeMinFixedObjOffset(MFI, SlotSize);
+  bool Changed = false;
+  for (MachineBasicBlock &BB : *MF) {
+    collectBPFFastCalls(TRI, LiveRegs, BB, Calls);
+    Changed |= !Calls.empty();
+    for (BPFFastCall &Call : Calls) {
+      int64_t CurOffset = MinFixedObjOffset;
+      for (MCRegister Reg : CallerSavedRegs) {
+        if (((1 << Reg) & Call.LiveCallerSavedRegs) == 0)
+          continue;
+        // Allocate stack object
+        CurOffset -= SlotSize;
+        MFI.CreateFixedSpillStackObject(SlotSize, CurOffset);
+        // Generate spill
+        BuildMI(BB, Call.MI->getIterator(), Call.MI->getDebugLoc(),
+                TII->get(BPF::STD))
+            .addReg(Reg, RegState::Kill)
+            .addReg(BPF::R10)
+            .addImm(CurOffset);
+        // Generate fill
+        BuildMI(BB, ++Call.MI->getIterator(), Call.MI->getDebugLoc(),
+                TII->get(BPF::LDD))
+            .addReg(Reg, RegState::Define)
+            .addReg(BPF::R10)
+            .addImm(CurOffset);
+      }
+    }
+  }
+  return Changed;
+}
+
 } // end default namespace
 
 INITIALIZE_PASS(BPFMIPreEmitPeephole, "bpf-mi-pemit-peephole",
diff --git a/llvm/lib/Target/BPF/BPFRegisterInfo.cpp b/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
index 84af6806abb36c..69e1318954a973 100644
--- a/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
+++ b/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
@@ -40,6 +40,17 @@ BPFRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
   return CSR_SaveList;
 }
 
+const uint32_t *
+BPFRegisterInfo::getCallPreservedMask(const MachineFunction &MF,
+                                      CallingConv::ID CC) const {
+  switch (CC) {
+  default:
+    return CSR_RegMask;
+  case CallingConv::PreserveAll:
+    return CSR_PreserveAll_RegMask;
+  }
+}
+
 BitVector BPFRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   BitVector Reserved(getNumRegs());
   markSuperRegs(Reserved, BPF::W10); // [W|R]10 is read only frame pointer
diff --git a/llvm/lib/Target/BPF/BPFRegisterInfo.h b/llvm/lib/Target/BPF/BPFRegisterInfo.h
index f7dea75ebea6f9..db868769a1579a 100644
--- a/llvm/lib/Target/BPF/BPFRegisterInfo.h
+++ b/llvm/lib/Target/BPF/BPFRegisterInfo.h
@@ -26,6 +26,9 @@ struct BPFRegisterInfo : public BPFGenRegisterInfo {
 
   const MCPhysReg *getCalleeSavedRegs(const MachineFunction *MF) const override;
 
+  const uint32_t *getCallPreservedMask(const MachineFunction &MF,
+                                       CallingConv::ID) const override;
+
   BitVector getReservedRegs(const MachineFunction &MF) const override;
 
   bool eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-1.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-1.ll
new file mode 100644
index 00000000000000..fd81314a495ef8
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-1.ll
@@ -0,0 +1,46 @@
+; RUN: llc -O2 --march=bpfel %s -o - | FileCheck %s
+
+; Generated from the following C code:
+;
+;   #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+;   void bar(void) __bpf_fastcall;
+;   void buz(long i, long j, long k);
+;
+;   void foo(long i, long j, long k) {
+;     bar();
+;     buz(i, j, k);
+;   }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check that function marked with bpf_fastcall does not clobber R1-R5.
+
+define dso_local void @foo(i64 noundef %i, i64 noundef %j, i64 noundef %k) {
+entry:
+  tail call void @bar() #1
+  tail call void @buz(i64 noundef %i, i64 noundef %j, i64 noundef %k)
+  ret void
+}
+
+; CHECK:      foo:
+; CHECK:      # %bb.0:
+; CHECK-NEXT:   *(u64 *)(r10 - 8) = r1
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = *(u64 *)(r10 - 8)
+; CHECK-NEXT:   call buz
+; CHECK-NEXT:   exit
+
+declare dso_local void @bar() #0
+declare dso_local void @buz(i64 noundef, i64 noundef, i64 noundef)
+
+attributes #0 = { "bpf_fastcall" }
+attributes #1 = { nounwind "bpf_fastcall" }
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-2.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-2.ll
new file mode 100644
index 00000000000000..e3e29cdddca8ea
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-2.ll
@@ -0,0 +1,68 @@
+; RUN: llc -O2 --march=bpfel %s -o - | FileCheck %s
+
+; Generated from the following C code:
+;
+;   #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+;   void bar(void) __bpf_fastcall;
+;   void buz(long i, long j);
+;
+;   void foo(long i, long j, long k, long l) {
+;     bar();
+;     if (k > 42l)
+;       buz(i, 1);
+;     else
+;       buz(1, j);
+;   }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check that function marked with bpf_fastcall does not clobber R1-R5.
+; Use R1 in one branch following call and R2 in another branch following call.
+
+define dso_local void @foo(i64 noundef %i, i64 noundef %j, i64 noundef %k, i64 noundef %l) {
+entry:
+  tail call void @bar() #0
+  %cmp = icmp sgt i64 %k, 42
+  br i1 %cmp, label %if.then, label %if.else
+
+if.then:
+  tail call void @buz(i64 noundef %i, i64 noundef 1)
+  br label %if.end
+
+if.else:
+  tail call void @buz(i64 noundef 1, i64 noundef %j)
+  br label %if.end
+
+if.end:
+  ret void
+}
+
+; CHECK:      foo:                                    # @foo
+; CHECK:      # %bb.0:                                # %entry
+; CHECK-NEXT:   *(u64 *)(r10 - 8) = r1
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = *(u64 *)(r10 - 8)
+; CHECK-NEXT:   r4 = 43
+; CHECK-NEXT:   if r4 s> r3 goto [[ELSE:.*]]
+; CHECK-NEXT: # %bb.1:                                # %if.then
+; CHECK-NEXT:   r2 = 1
+; CHECK-NEXT:   goto [[END:.*]]
+; CHECK-NEXT: [[ELSE]]:                               # %if.else
+; CHECK-NEXT:   r1 = 1
+; CHECK-NEXT: [[END]]:                                # %if.end
+; CHECK-NEXT:   call buz
+; CHECK-NEXT:   exit
+
+declare dso_local void @bar() #0
+declare dso_local void @buz(i64 noundef, i64 noundef)
+
+attributes #0 = { "bpf_fastcall" }
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll
new file mode 100644
index 00000000000000..81ca4e1ac57bc7
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-3.ll
@@ -0,0 +1,62 @@
+; RUN: llc -O2 --march=bpfel %s -o - | FileCheck %s
+
+; Generated from the following C code:
+;
+; #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+; void quux(void *);
+; void bar(long) __bpf_fastcall;
+; void buz(long i, long j);
+;
+; void foo(long i, long j) {
+;   long k;
+;   bar(i);
+;   bar(i);
+;   buz(i, j);
+;   quux(&k);
+; }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check that function marked with bpf_fastcall does not clobber R1-R5.
+; Check that spills/fills wrapping the call use and reuse lowest stack offsets.
+
+define dso_local void @foo(i64 noundef %i, i64 noundef %j) {
+entry:
+  %k = alloca i64, align 8
+  tail call void @bar(i64 noundef %i) #0
+  tail call void @bar(i64 noundef %i) #0
+  tail call void @buz(i64 noundef %i, i64 noundef %j)
+  call void @quux(ptr noundef nonnull %k)
+  ret void
+}
+
+; CHECK:      # %bb.0:
+; CHECK-NEXT:   r3 = r1
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = r3
+; CHECK-NEXT:   *(u64 *)(r10 - 16) = r2
+; CHECK-NEXT:   *(u64 *)(r10 - 24) = r3
+; CHECK-NEXT:   call bar
+; CHECK-NEXT:   r3 = *(u64 *)(r10 - 24)
+; CHECK-NEXT:   r2 = *(u64 *)(r10 - 16)
+; CHECK-NEXT:   r1 = r3
+; CHECK-NEXT:   call buz
+; CHECK-NEXT:   r1 = r10
+; CHECK-NEXT:   r1 += -8
+; CHECK-NEXT:   call quux
+; CHECK-NEXT:   exit
+
+declare dso_local void @bar(i64 noundef) #0
+declare dso_local void @buz(i64 noundef, i64 noundef)
+declare dso_local void @quux(ptr noundef)
+
+attributes #0 = { "bpf_fastcall" }
diff --git a/llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll b/llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll
new file mode 100644
index 00000000000000..857d2f000d1d5a
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/bpf-fastcall-regmask-1.ll
@@ -0,0 +1,110 @@
+; RUN: llc -O2 --march=bpfel \
+; RUN:   -print-after=stack-slot-coloring %s \
+; RUN:   -o /dev/null 2>&1 | FileCheck %s
+
+; Generated from the following C code:
+;
+;   #define __bpf_fastcall __attribute__((bpf_fastcall))
+;
+;   void bar1(void) __bpf_fastcall;
+;   void buz1(long i, long j, long k);
+;   void foo1(long i, long j, long k) {
+;     bar1();
+;     buz1(i, j, k);
+;   }
+;
+;   long bar2(void) __bpf_fastcall;
+;   void buz2(long i, long j, long k);
+;   void foo2(long i, long j, long k) {
+;     bar2();
+;     buz2(i, j, k);
+;   }
+;
+;   void bar3(long) __bpf_fastcall;
+;   void buz3(long i, long j, long k);
+;   void foo3(long i, long j, long k) {
+;     bar3(i);
+;     buz3(i, j, k);
+;   }
+;
+;   long bar4(long, long) __bpf_fastcall;
+;   void buz4(long i, long j, long k);
+;   void foo4(long i, long j, long k) {
+;     bar4(i, j);
+;     buz4(i, j, k);
+;   }
+;
+; Using the following command:
+;
+;   clang --target=bpf -emit-llvm -O2 -S -o - t.c
+;
+; (unnecessary attrs removed maually)
+
+; Check regmask for calls to functions marked with bpf_fastcall:
+; - void function w/o parameters
+; - non-void function w/o parameters
+; - void function with parameters
+; - non-void function with parameters
+
+declare dso_local void @bar1() #0
+declare dso_local void @buz1(i64 noundef, i64 noundef, i64 noundef)
+define dso_local void @foo1(i64 noundef %i, i64 noundef %j, i64 noundef %k) {
+entry:
+  tail call void @bar1() #1
+  tail call void @buz1(i64 noundef %i, i64 noundef %j, i64 noundef %k)
+  ret void
+}
+
+; CHECK:      JAL @bar1, <regmask $r0 $r1 $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10
+; CHECK-SAME:                     $w0 $w1 $w2 $w3 $w4 $w5 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit-def $r11
+; CHECK:      JAL @buz1, <regmask $r6 $r7 $r8 $r9 $r10 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit $r1, implicit $r2, implicit $r3, implicit-def $r11
+
+declare dso_local i64 @bar2() #0
+declare dso_local void @buz2(i64 noundef, i64 noundef, i64 noundef)
+define dso_local void @foo2(i64 noundef %i, i64 noundef %j, i64 noundef %k) {
+entry:
+  tail call i64 @bar2() #1
+  tail call void @buz2(i64 noundef %i, i64 noundef %j, i64 noundef %k)
+  ret void
+}
+
+; CHECK:      JAL @bar2, <regmask $r1 $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10
+; CHECK-SAME:                     $w1 $w2 $w3 $w4 $w5 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit-def $r11, implicit-def dead $r0
+; CHECK:      JAL @buz2, <regmask $r6 $r7 $r8 $r9 $r10 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit $r1, implicit $r2, implicit $r3, implicit-def $r11
+
+declare dso_local void @bar3(i64) #0
+declare dso_local void @buz3(i64 noundef, i64 noundef, i64 noundef)
+define dso_local void @foo3(i64 noundef %i, i64 noundef %j, i64 noundef %k) {
+entry:
+  tail call void @bar3(i64 noundef %i) #1
+  tail call void @buz3(i64 noundef %i, i64 noundef %j, i64 noundef %k)
+  ret void
+}
+
+; CHECK:      JAL @bar3, <regmask $r0 $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10
+; CHECK-SAME:                     $w0 $w2 $w3 $w4 $w5 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit $r1, implicit-def $r11
+; CHECK:      JAL @buz3, <regmask $r6 $r7 $r8 $r9 $r10 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit $r1, implicit $r2, implicit $r3, implicit-def $r11
+
+declare dso_local i64 @bar4(i64 noundef, i64 noundef) #0
+declare dso_local void @buz4(i64 noundef, i64 noundef, i64 noundef)
+define dso_local void @foo4(i64 noundef %i, i64 noundef %j, i64 noundef %k) {
+entry:
+  tail call i64 @bar4(i64 noundef %i, i64 noundef %j) #1
+  tail call void @buz4(i64 noundef %i, i64 noundef %j, i64 noundef %k)
+  ret void
+}
+
+; CHECK:      JAL @bar4, <regmask $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10
+; CHECK-SAME:                     $w3 $w4 $w5 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit $r1, implicit $r2, implicit-def $r11, implicit-def dead $r0
+; CHECK:      JAL @buz4, <regmask $r6 $r7 $r8 $r9 $r10 $w6 $w7 $w8 $w9 $w10>
+; CHECK-SAME:          , implicit $r11, implicit $r1, implicit $r2, implicit $r3, implicit-def $r11
+
+attributes #0 = { "bpf_fastcall" }
+attributes #1 = { nounwind "bpf_fastcall" }

>From e2b97f3802ac5a75a603c9cacd2f3ab19b6cf9b5 Mon Sep 17 00:00:00 2001
From: Vitaly Buka <vitalybuka at google.com>
Date: Wed, 21 Aug 2024 17:44:05 -0700
Subject: [PATCH 068/116] Revert "Speculative fix for
 asan/TestCases/Darwin/cstring_section.c"

This fix is not enough, and the breaking patch was reverted with 2704b804bec50c2b016bf678bd534c330ec655b6.

This reverts commit bf71c64839c0082e761a4f070ed92e01ced0187c.
---
 compiler-rt/test/asan/TestCases/Darwin/cstring_section.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c b/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c
index e40c4b1b8ed6ba..d72b0ba8a8bb33 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c
+++ b/compiler-rt/test/asan/TestCases/Darwin/cstring_section.c
@@ -6,10 +6,10 @@
 // Check that "Hello.\n" is in __asan_cstring and not in __cstring.
 // CHECK: Contents of section {{.*}}__asan_cstring:
 // CHECK: 48656c6c {{.*}} Hello.
-// CHECK: Contents of section {{.*}}__cstring:
-// CHECK-NOT: 48656c6c {{.*}} Hello.
 // CHECK: Contents of section {{.*}}__const:
 // CHECK-NOT: 48656c6c {{.*}} Hello.
+// CHECK: Contents of section {{.*}}__cstring:
+// CHECK-NOT: 48656c6c {{.*}} Hello.
 
 int main(int argc, char *argv[]) {
   argv[0] = "Hello.\n";

>From 359c704004ec0826059578c79974d9ea29a8fbff Mon Sep 17 00:00:00 2001
From: Shubham Sandeep Rastogi <srastogi22 at apple.com>
Date: Wed, 21 Aug 2024 17:52:37 -0700
Subject: [PATCH 069/116] Handle #dbg_values in SROA. (#94070)

This patch properly handles #dbg_values in SROA by making sure that any
#dbg_values get moved to before a store just like #dbg_declares do, or
the #dbg_value is correctly updated with the right alloca after an
aggregate alloca is broken up.

The issue stems from swift where #dbg_values are emitted and not
dbg.declares, the SROA pass doesn't handle the #dbg_values correctly and
it causes them to all have undefs

If we look at this simple-ish testcase (This is all I could reduce it
down to, and I am still relatively bad at writing llvm IR by hand so I
apologize in advance):

```
%T4main1TV13TangentVectorV = type <{ %T4main1UV13TangentVectorV, [7 x i8], %T4main1UV13TangentVectorV }>
%T4main1UV13TangentVectorV = type <{ %T1M1SVySfG, [7 x i8], %T4main1VV13TangentVectorV }>
%T1M1SVySfG = type <{ ptr, %Ts4Int8V }>
%Ts4Int8V = type <{ i8 }>
%T4main1VV13TangentVectorV = type <{ %T1M1SVySfG }>
define hidden swiftcc void @"$s4main1TV13TangentVectorV1poiyA2E_AEtFZ"(ptr noalias nocapture sret(%T4main1TV13TangentVectorV) %0, ptr noalias nocapture dereferenceable(57) %1, ptr noalias nocapture dereferenceable(57) %2) #0 !dbg !44 {
entry:
  %3 = alloca %T4main1VV13TangentVectorV
  %4 = alloca %T4main1UV13TangentVectorV
  %5 = alloca %T4main1VV13TangentVectorV
  %6 = alloca %T4main1UV13TangentVectorV
  %7 = alloca %T4main1VV13TangentVectorV
  %8 = alloca %T4main1UV13TangentVectorV
  %9 = alloca %T4main1VV13TangentVectorV
  %10 = alloca %T4main1UV13TangentVectorV
  call void @llvm.lifetime.start.p0(i64 9, ptr %3)
  call void @llvm.lifetime.start.p0(i64 25, ptr %4)
  call void @llvm.lifetime.start.p0(i64 9, ptr %5)
  call void @llvm.lifetime.start.p0(i64 25, ptr %6)
  call void @llvm.lifetime.start.p0(i64 9, ptr %7)
  call void @llvm.lifetime.start.p0(i64 25, ptr %8)
  call void @llvm.lifetime.start.p0(i64 9, ptr %9)
  call void @llvm.lifetime.start.p0(i64 25, ptr %10)
  %.u1 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %1, i32 0, i32 0
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %4, ptr align 8 %.u1, i64 25, i1 false)
  %.u11 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %2, i32 0, i32 0
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %6, ptr align 8 %.u11, i64 25, i1 false)
  call void @llvm.dbg.value(metadata ptr %4, metadata !62, metadata !DIExpression(DW_OP_deref)), !dbg !75
  %.s = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %4, i32 0, i32 0
  %.s.c = getelementptr inbounds %T1M1SVySfG, ptr %.s, i32 0, i32 0
  %11 = load ptr, ptr %.s.c
  %.s.b = getelementptr inbounds %T1M1SVySfG, ptr %.s, i32 0, i32 1
  %.s.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s.b, i32 0, i32 0
  %12 = load i8, ptr %.s.b._value
  %.s2 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %6, i32 0, i32 0
  %.s2.c = getelementptr inbounds %T1M1SVySfG, ptr %.s2, i32 0, i32 0
  %13 = load ptr, ptr %.s2.c
  %.s2.b = getelementptr inbounds %T1M1SVySfG, ptr %.s2, i32 0, i32 1
  %.s2.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s2.b, i32 0, i32 0
  %14 = load i8, ptr %.s2.b._value
  %.v = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %4, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %3, ptr align 8 %.v, i64 9, i1 false)
  %.v3 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %6, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %5, ptr align 8 %.v3, i64 9, i1 false)
  %.s4 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %3, i32 0, i32 0
  %.s4.c = getelementptr inbounds %T1M1SVySfG, ptr %.s4, i32 0, i32 0
  %18 = load ptr, ptr %.s4.c
  %.s5 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %5, i32 0, i32 0
  %.s5.c = getelementptr inbounds %T1M1SVySfG, ptr %.s5, i32 0, i32 0
  %20 = load ptr, ptr %.s5.c
  %.u2 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %1, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %8, ptr align 8 %.u2, i64 25, i1 false)
  %.u26 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %2, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %10, ptr align 8 %.u26, i64 25, i1 false)
  %.s7 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %8, i32 0, i32 0
  %.s7.c = getelementptr inbounds %T1M1SVySfG, ptr %.s7, i32 0, i32 0
  %25 = load ptr, ptr %.s7.c
  %.s7.b = getelementptr inbounds %T1M1SVySfG, ptr %.s7, i32 0, i32 1
  %.s7.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s7.b, i32 0, i32 0
  %26 = load i8, ptr %.s7.b._value
  %.s8 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %10, i32 0, i32 0
  %.s8.c = getelementptr inbounds %T1M1SVySfG, ptr %.s8, i32 0, i32 0
  %27 = load ptr, ptr %.s8.c
  %.s8.b = getelementptr inbounds %T1M1SVySfG, ptr %.s8, i32 0, i32 1
  %.s8.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s8.b, i32 0, i32 0
  %28 = load i8, ptr %.s8.b._value
  %.v9 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %8, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %7, ptr align 8 %.v9, i64 9, i1 false)
  %.v10 = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %10, i32 0, i32 2
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %9, ptr align 8 %.v10, i64 9, i1 false)
  %.s11 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %7, i32 0, i32 0
  %.s11.c = getelementptr inbounds %T1M1SVySfG, ptr %.s11, i32 0, i32 0
  %32 = load ptr, ptr %.s11.c
  %.s12 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %9, i32 0, i32 0
  %.s12.c = getelementptr inbounds %T1M1SVySfG, ptr %.s12, i32 0, i32 0
  %34 = load ptr, ptr %.s12.c
  call void @llvm.lifetime.end.p0(i64 25, ptr %10)
  call void @llvm.lifetime.end.p0(i64 9, ptr %9)
  call void @llvm.lifetime.end.p0(i64 25, ptr %8)
  call void @llvm.lifetime.end.p0(i64 9, ptr %7)
  call void @llvm.lifetime.end.p0(i64 25, ptr %6)
  call void @llvm.lifetime.end.p0(i64 9, ptr %5)
  call void @llvm.lifetime.end.p0(i64 25, ptr %4)
  call void @llvm.lifetime.end.p0(i64 9, ptr %3)
  ret void
}
!llvm.module.flags = !{!0, !1, !2, !3, !4, !6, !7, !8, !9, !10, !11, !12, !13, !14, !15}
!swift.module.flags = !{!33}
!llvm.linker.options = !{!34, !35, !36, !37, !38, !39, !40, !41, !42, !43}
!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 4]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA, no_dead_strip"}
!4 = !{i32 1, !"Objective-C Garbage Collection", i8 0}
!6 = !{i32 7, !"Dwarf Version", i32 4}
!7 = !{i32 2, !"Debug Info Version", i32 3}
!8 = !{i32 1, !"wchar_size", i32 4}
!9 = !{i32 8, !"PIC Level", i32 2}
!10 = !{i32 7, !"uwtable", i32 1}
!11 = !{i32 7, !"frame-pointer", i32 1}
!12 = !{i32 1, !"Swift Version", i32 7}
!13 = !{i32 1, !"Swift ABI Version", i32 7}
!14 = !{i32 1, !"Swift Major Version", i8 6}
!15 = !{i32 1, !"Swift Minor Version", i8 0}
!16 = distinct !DICompileUnit(language: DW_LANG_Swift, file: !17, imports: !18, sdk: "MacOSX14.4.sdk")
!17 = !DIFile(filename: "/Users/emilpedersen/swift2/swift/test/IRGen/debug_scope_distinct.swift", directory: "/Users/emilpedersen/swift2")
!18 = !{!19, !21, !23, !25, !27, !29, !31}
!19 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !20, file: !17)
!20 = !DIModule(scope: null, name: "main", includePath: "/Users/emilpedersen/swift2/swift/test/IRGen")
!21 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !22, file: !17)
!22 = !DIModule(scope: null, name: "Swift", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/Swift.swiftmodule/arm64-apple-macos.swiftmodule")
!23 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !24, line: 60)
!24 = !DIModule(scope: null, name: "_Differentiation", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_Differentiation.swiftmodule/arm64-apple-macos.swiftmodule")
!25 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !26, line: 61)
!26 = !DIModule(scope: null, name: "M", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/test-macosx-arm64/IRGen/Output/debug_scope_distinct.swift.tmp/M.swiftmodule")
!27 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !28, file: !17)
!28 = !DIModule(scope: null, name: "_StringProcessing", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_StringProcessing.swiftmodule/arm64-apple-macos.swiftmodule")
!29 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !30, file: !17)
!30 = !DIModule(scope: null, name: "_SwiftConcurrencyShims", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/shims")
!31 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !32, file: !17)
!32 = !DIModule(scope: null, name: "_Concurrency", includePath: "/Users/emilpedersen/swift2/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_Concurrency.swiftmodule/arm64-apple-macos.swiftmodule")
!33 = !{i1 false}
!34 = !{!"-lswiftCore"}
!35 = !{!"-lswift_StringProcessing"}
!36 = !{!"-lswift_Differentiation"}
!37 = !{!"-lswiftDarwin"}
!38 = !{!"-lswift_Concurrency"}
!39 = !{!"-lswiftSwiftOnoneSupport"}
!40 = !{!"-lobjc"}
!41 = !{!"-lswiftCompatibilityConcurrency"}
!42 = !{!"-lswiftCompatibility56"}
!43 = !{!"-lswiftCompatibilityPacks"}
!44 = distinct !DISubprogram( unit: !16, declaration: !52, retainedNodes: !53)
!45 = !DIFile(filename: "<compiler-generated>", directory: "/")
!46 = !DICompositeType(tag: DW_TAG_structure_type, scope: !47, elements: !48, identifier: "$s4main1TV13TangentVectorVD")
!47 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1TVD")
!48 = !{}
!49 = !DISubroutineType(types: !50)
!50 = !{!51}
!51 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1TV13TangentVectorVXMtD")
!52 = !DISubprogram( file: !45, type: !49, spFlags: DISPFlagOptimized)
!53 = !{!54, !56, !57}
!54 = !DILocalVariable( scope: !44, type: !55, flags: DIFlagArtificial)
!55 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !46)
!56 = !DILocalVariable( scope: !44, flags: DIFlagArtificial)
!57 = !DILocalVariable( scope: !44, type: !58, flags: DIFlagArtificial)
!58 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !51)
!62 = !DILocalVariable( scope: !63, type: !72, flags: DIFlagArtificial)
!63 = distinct !DISubprogram( type: !66, unit: !16, declaration: !69, retainedNodes: !70)
!64 = !DICompositeType(tag: DW_TAG_structure_type, scope: !65, identifier: "$s4main1UV13TangentVectorVD")
!65 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1UVD")
!66 = !DISubroutineType(types: !67)
!67 = !{!68}
!68 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1UV13TangentVectorVXMtD")
!69 = !DISubprogram( spFlags: DISPFlagOptimized)
!70 = !{!71, !73}
!71 = !DILocalVariable( scope: !63, flags: DIFlagArtificial)
!72 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !64)
!73 = !DILocalVariable( scope: !63, type: !74, flags: DIFlagArtificial)
!74 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !68)
!75 = !DILocation( scope: !63, inlinedAt: !76)
!76 = distinct !DILocation( scope: !44)

```

if we run
` opt -S -passes=sroa file.ll  -o -`

With this patch we will see
```
%.sroa.5.sroa.021 = alloca [7 x i8], align 8
tail call void @llvm.dbg.value(metadata ptr %.sroa.5.sroa.021, metadata !59, metadata !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 72, 56)), !dbg !72
%.sroa.5.sroa.014 = alloca [7 x i8], align 8
 ```

 Without this patch we will see:

```
%.sroa.5.sroa.021 = alloca [7 x i8], align 8
%.sroa.5.sroa.014 = alloca [7 x i8], align 8
```

Thus this patch ensures that llvm.dbg.values that use allocas that are broken up still have the correct metadata and debug information is preserved

This is part of a stack of patches and is preceded by: https://github.com/llvm/llvm-project/pull/94068
---
 llvm/include/llvm/IR/DebugInfo.h              |   2 +
 .../include/llvm/IR/DebugProgramInstruction.h |   5 +
 llvm/include/llvm/IR/IntrinsicInst.h          |   8 ++
 llvm/include/llvm/Transforms/Utils/Local.h    |  10 ++
 llvm/lib/IR/DebugInfo.cpp                     |  21 ++-
 llvm/lib/Transforms/Scalar/SROA.cpp           |  23 ++--
 llvm/lib/Transforms/Utils/Local.cpp           |  34 +++++
 .../Utils/PromoteMemoryToRegister.cpp         |   4 +
 .../Generic/mem2reg-promote-alloca-1.ll       |   2 +-
 llvm/test/DebugInfo/sroa-handle-dbg-value.ll  | 110 ++++++++++++++++
 llvm/test/Transforms/SROA/alignment.ll        |  56 ++++----
 llvm/test/Transforms/SROA/vector-promotion.ll | 120 +++++++++++-------
 12 files changed, 313 insertions(+), 82 deletions(-)
 create mode 100644 llvm/test/DebugInfo/sroa-handle-dbg-value.ll

diff --git a/llvm/include/llvm/IR/DebugInfo.h b/llvm/include/llvm/IR/DebugInfo.h
index 5b80218d6c5ccd..73f45c3769be44 100644
--- a/llvm/include/llvm/IR/DebugInfo.h
+++ b/llvm/include/llvm/IR/DebugInfo.h
@@ -43,6 +43,8 @@ class Module;
 TinyPtrVector<DbgDeclareInst *> findDbgDeclares(Value *V);
 /// As above, for DVRDeclares.
 TinyPtrVector<DbgVariableRecord *> findDVRDeclares(Value *V);
+/// As above, for DVRValues.
+TinyPtrVector<DbgVariableRecord *> findDVRValues(Value *V);
 
 /// Finds the llvm.dbg.value intrinsics describing a value.
 void findDbgValues(
diff --git a/llvm/include/llvm/IR/DebugProgramInstruction.h b/llvm/include/llvm/IR/DebugProgramInstruction.h
index 8d7427cc67e2d9..e6dd1e979794e2 100644
--- a/llvm/include/llvm/IR/DebugProgramInstruction.h
+++ b/llvm/include/llvm/IR/DebugProgramInstruction.h
@@ -427,6 +427,11 @@ class DbgVariableRecord : public DbgRecord, protected DebugValueUser {
   /// Does this describe the address of a local variable. True for dbg.addr
   /// and dbg.declare, but not dbg.value, which describes its value.
   bool isAddressOfVariable() const { return Type == LocationType::Declare; }
+
+  /// Determine if this describes the value of a local variable. It is false for
+  /// dbg.declare, but true for dbg.value, which describes its value.
+  bool isValueOfVariable() const { return Type == LocationType::Value; }
+
   LocationType getType() const { return Type; }
 
   void setKillLocation();
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 2f1e2c08c3ecec..c188bec631a239 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -344,6 +344,14 @@ class DbgVariableIntrinsic : public DbgInfoIntrinsic {
     return getIntrinsicID() == Intrinsic::dbg_declare;
   }
 
+  /// Determine if this describes the value of a local variable. It is true for
+  /// dbg.value, but false for dbg.declare, which describes its address, and
+  /// false for dbg.assign, which describes a combination of the variable's
+  /// value and address.
+  bool isValueOfVariable() const {
+    return getIntrinsicID() == Intrinsic::dbg_value;
+  }
+
   void setKillLocation() {
     // TODO: When/if we remove duplicate values from DIArgLists, we don't need
     // this set anymore.
diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h
index b17ff6539a25a4..bbf29e6f46b47b 100644
--- a/llvm/include/llvm/Transforms/Utils/Local.h
+++ b/llvm/include/llvm/Transforms/Utils/Local.h
@@ -259,6 +259,16 @@ CallInst *changeToCall(InvokeInst *II, DomTreeUpdater *DTU = nullptr);
 ///  Dbg Intrinsic utilities
 ///
 
+/// Creates and inserts a dbg_value record intrinsic before a store
+/// that has an associated llvm.dbg.value intrinsic.
+void InsertDebugValueAtStoreLoc(DbgVariableRecord *DVR, StoreInst *SI,
+                                DIBuilder &Builder);
+
+/// Creates and inserts an llvm.dbg.value intrinsic before a store
+/// that has an associated llvm.dbg.value intrinsic.
+void InsertDebugValueAtStoreLoc(DbgVariableIntrinsic *DII, StoreInst *SI,
+                                DIBuilder &Builder);
+
 /// Inserts a llvm.dbg.value intrinsic before a store to an alloca'd value
 /// that has an associated llvm.dbg.declare intrinsic.
 void ConvertDebugDeclareToDebugValue(DbgVariableIntrinsic *DII,
diff --git a/llvm/lib/IR/DebugInfo.cpp b/llvm/lib/IR/DebugInfo.cpp
index 7fa1f9696d43b2..e50b6f6335ef5f 100644
--- a/llvm/lib/IR/DebugInfo.cpp
+++ b/llvm/lib/IR/DebugInfo.cpp
@@ -46,7 +46,7 @@ using namespace llvm::dwarf;
 
 TinyPtrVector<DbgDeclareInst *> llvm::findDbgDeclares(Value *V) {
   // This function is hot. Check whether the value has any metadata to avoid a
-  // DenseMap lookup.
+  // DenseMap lookup. This check is a bitfield datamember lookup.
   if (!V->isUsedByMetadata())
     return {};
   auto *L = LocalAsMetadata::getIfExists(V);
@@ -65,7 +65,7 @@ TinyPtrVector<DbgDeclareInst *> llvm::findDbgDeclares(Value *V) {
 }
 TinyPtrVector<DbgVariableRecord *> llvm::findDVRDeclares(Value *V) {
   // This function is hot. Check whether the value has any metadata to avoid a
-  // DenseMap lookup.
+  // DenseMap lookup. This check is a bitfield datamember lookup.
   if (!V->isUsedByMetadata())
     return {};
   auto *L = LocalAsMetadata::getIfExists(V);
@@ -80,6 +80,23 @@ TinyPtrVector<DbgVariableRecord *> llvm::findDVRDeclares(Value *V) {
   return Declares;
 }
 
+TinyPtrVector<DbgVariableRecord *> llvm::findDVRValues(Value *V) {
+  // This function is hot. Check whether the value has any metadata to avoid a
+  // DenseMap lookup. This check is a bitfield datamember lookup.
+  if (!V->isUsedByMetadata())
+    return {};
+  auto *L = LocalAsMetadata::getIfExists(V);
+  if (!L)
+    return {};
+
+  TinyPtrVector<DbgVariableRecord *> Values;
+  for (DbgVariableRecord *DVR : L->getAllDbgVariableRecordUsers())
+    if (DVR->isValueOfVariable())
+      Values.push_back(DVR);
+
+  return Values;
+}
+
 template <typename IntrinsicT, bool DbgAssignAndValuesOnly>
 static void
 findDbgIntrinsics(SmallVectorImpl<IntrinsicT *> &Result, Value *V,
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index c738a2a6f39a45..26b62cb79cdedf 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -4977,8 +4977,6 @@ const Value *getAddress(const DbgVariableIntrinsic *DVI) {
 }
 
 const Value *getAddress(const DbgVariableRecord *DVR) {
-  assert(DVR->getType() == DbgVariableRecord::LocationType::Declare ||
-         DVR->getType() == DbgVariableRecord::LocationType::Assign);
   return DVR->getAddress();
 }
 
@@ -4989,8 +4987,6 @@ bool isKillAddress(const DbgVariableIntrinsic *DVI) {
 }
 
 bool isKillAddress(const DbgVariableRecord *DVR) {
-  assert(DVR->getType() == DbgVariableRecord::LocationType::Declare ||
-         DVR->getType() == DbgVariableRecord::LocationType::Assign);
   if (DVR->getType() == DbgVariableRecord::LocationType::Assign)
     return DVR->isKillAddress();
   return DVR->isKillLocation();
@@ -5003,8 +4999,6 @@ const DIExpression *getAddressExpression(const DbgVariableIntrinsic *DVI) {
 }
 
 const DIExpression *getAddressExpression(const DbgVariableRecord *DVR) {
-  assert(DVR->getType() == DbgVariableRecord::LocationType::Declare ||
-         DVR->getType() == DbgVariableRecord::LocationType::Assign);
   if (DVR->getType() == DbgVariableRecord::LocationType::Assign)
     return DVR->getAddressExpression();
   return DVR->getExpression();
@@ -5187,6 +5181,19 @@ insertNewDbgInst(DIBuilder &DIB, DbgVariableRecord *Orig, AllocaInst *NewAddr,
     return;
   }
 
+  if (Orig->isDbgValue()) {
+    DbgVariableRecord *DVR = DbgVariableRecord::createDbgVariableRecord(
+        NewAddr, Orig->getVariable(), NewFragmentExpr, Orig->getDebugLoc());
+    // Drop debug information if the expression doesn't start with a
+    // DW_OP_deref. This is because without a DW_OP_deref, the #dbg_value
+    // describes the address of alloca rather than the value inside the alloca.
+    if (!NewFragmentExpr->startsWithDeref())
+      DVR->setKillAddress();
+    BeforeInst->getParent()->insertDbgRecordBefore(DVR,
+                                                   BeforeInst->getIterator());
+    return;
+  }
+
   // Apply a DIAssignID to the store if it doesn't already have it.
   if (!NewAddr->hasMetadata(LLVMContext::MD_DIAssignID)) {
     NewAddr->setMetadata(LLVMContext::MD_DIAssignID,
@@ -5389,7 +5396,7 @@ bool SROA::splitAlloca(AllocaInst &AI, AllocaSlices &AS) {
       };
       for_each(findDbgDeclares(Fragment.Alloca), RemoveOne);
       for_each(findDVRDeclares(Fragment.Alloca), RemoveOne);
-
+      for_each(findDVRValues(Fragment.Alloca), RemoveOne);
       insertNewDbgInst(DIB, DbgVariable, Fragment.Alloca, NewExpr, &AI,
                        NewDbgFragment, BitExtractOffset);
     }
@@ -5399,6 +5406,7 @@ bool SROA::splitAlloca(AllocaInst &AI, AllocaSlices &AS) {
   // and the individual partitions.
   for_each(findDbgDeclares(&AI), MigrateOne);
   for_each(findDVRDeclares(&AI), MigrateOne);
+  for_each(findDVRValues(&AI), MigrateOne);
   for_each(at::getAssignmentMarkers(&AI), MigrateOne);
   for_each(at::getDVRAssignmentMarkers(&AI), MigrateOne);
 
@@ -5545,7 +5553,6 @@ bool SROA::deleteDeadInstructions(
   }
   return Changed;
 }
-
 /// Promote the allocas, using the best available technique.
 ///
 /// This attempts to promote whatever allocas have been identified as viable in
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index efb02fdec56d7e..d3710de1964ece 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -1731,6 +1731,26 @@ void llvm::ConvertDebugDeclareToDebugValue(DbgVariableIntrinsic *DII,
                                     SI->getIterator());
 }
 
+static DIExpression *dropInitialDeref(const DIExpression *DIExpr) {
+  int NumEltDropped = DIExpr->getElements()[0] == dwarf::DW_OP_LLVM_arg ? 3 : 1;
+  return DIExpression::get(DIExpr->getContext(),
+                           DIExpr->getElements().drop_front(NumEltDropped));
+}
+
+void llvm::InsertDebugValueAtStoreLoc(DbgVariableIntrinsic *DII, StoreInst *SI,
+                                      DIBuilder &Builder) {
+  auto *DIVar = DII->getVariable();
+  assert(DIVar && "Missing variable");
+  auto *DIExpr = DII->getExpression();
+  DIExpr = dropInitialDeref(DIExpr);
+  Value *DV = SI->getValueOperand();
+
+  DebugLoc NewLoc = getDebugValueLoc(DII);
+
+  insertDbgValueOrDbgVariableRecord(Builder, DV, DIVar, DIExpr, NewLoc,
+                                    SI->getIterator());
+}
+
 /// Inserts a llvm.dbg.value intrinsic before a load of an alloca'd value
 /// that has an associated llvm.dbg.declare intrinsic.
 void llvm::ConvertDebugDeclareToDebugValue(DbgVariableIntrinsic *DII,
@@ -1805,6 +1825,20 @@ void llvm::ConvertDebugDeclareToDebugValue(DbgVariableRecord *DVR,
   SI->getParent()->insertDbgRecordBefore(NewDVR, SI->getIterator());
 }
 
+void llvm::InsertDebugValueAtStoreLoc(DbgVariableRecord *DVR, StoreInst *SI,
+                                      DIBuilder &Builder) {
+  auto *DIVar = DVR->getVariable();
+  assert(DIVar && "Missing variable");
+  auto *DIExpr = DVR->getExpression();
+  DIExpr = dropInitialDeref(DIExpr);
+  Value *DV = SI->getValueOperand();
+
+  DebugLoc NewLoc = getDebugValueLoc(DVR);
+
+  insertDbgValueOrDbgVariableRecord(Builder, DV, DIVar, DIExpr, NewLoc,
+                                    SI->getIterator());
+}
+
 /// Inserts a llvm.dbg.value intrinsic after a phi that has an associated
 /// llvm.dbg.declare intrinsic.
 void llvm::ConvertDebugDeclareToDebugValue(DbgVariableIntrinsic *DII,
diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index cfae63405966ff..5251eb86bca926 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -596,6 +596,10 @@ rewriteSingleStoreAlloca(AllocaInst *AI, AllocaInfo &Info, LargeBlockInfo &LBI,
       if (DbgItem->isAddressOfVariable()) {
         ConvertDebugDeclareToDebugValue(DbgItem, Info.OnlyStore, DIB);
         DbgItem->eraseFromParent();
+      } else if (DbgItem->isValueOfVariable() &&
+                 DbgItem->getExpression()->startsWithDeref()) {
+        InsertDebugValueAtStoreLoc(DbgItem, Info.OnlyStore, DIB);
+        DbgItem->eraseFromParent();
       } else if (DbgItem->getExpression()->startsWithDeref()) {
         DbgItem->eraseFromParent();
       }
diff --git a/llvm/test/DebugInfo/Generic/mem2reg-promote-alloca-1.ll b/llvm/test/DebugInfo/Generic/mem2reg-promote-alloca-1.ll
index 3d469965d1cfa2..d76dcfe317b31f 100644
--- a/llvm/test/DebugInfo/Generic/mem2reg-promote-alloca-1.ll
+++ b/llvm/test/DebugInfo/Generic/mem2reg-promote-alloca-1.ll
@@ -21,7 +21,7 @@
 ; CHECK: define dso_local void @fun(i32 %param)
 ; CHECK-NEXT: entry:
 ; CHECK-NEXT: #dbg_value(i32 %param, ![[PARAM:[0-9]+]], !DIExpression(),
-; CHECK-NOT: #dbg_value({{.*}}, ![[PARAM]]
+; CHECK-NEXT: #dbg_value(i32 %param, ![[PARAM]], !DIExpression(),
 ; CHECK: ![[PARAM]] = !DILocalVariable(name: "param",
 
 @g = dso_local global i32 0, align 4, !dbg !0
diff --git a/llvm/test/DebugInfo/sroa-handle-dbg-value.ll b/llvm/test/DebugInfo/sroa-handle-dbg-value.ll
new file mode 100644
index 00000000000000..dc9abde884b376
--- /dev/null
+++ b/llvm/test/DebugInfo/sroa-handle-dbg-value.ll
@@ -0,0 +1,110 @@
+; This test was obtained from swift source code and then automatically reducing it via Delta.
+; The swift source code was from the test test/DebugInfo/debug_scope_distinct.swift.
+
+; RUN: opt %s -S -p=sroa -o - | FileCheck %s
+
+; CHECK: [[SROA_5_SROA_21:%.*]] = alloca [7 x i8], align 8
+; CHECK-NEXT: #dbg_value(ptr [[SROA_5_SROA_21]], !59, !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 72, 56), [[DBG72:![0-9]+]])
+
+; CHECK: #dbg_value(ptr [[REG1:%[0-9]+]], [[META54:![0-9]+]], !DIExpression(DW_OP_deref), [[DBG78:![0-9]+]])
+; CHECK-NEXT: #dbg_value(ptr [[REG2:%[0-9]+]], [[META56:![0-9]+]], !DIExpression(DW_OP_deref), [[DBG78]])
+; CHECK-NEXT: #dbg_value(i64 0, [[META57:![0-9]+]], !DIExpression(), [[DBG78]])
+
+; CHECK: [[SROA_418_SROA_COPYLOAD:%.*]] = load i8, ptr [[SROA_418_0_U1_IDX:%.*]], align 8, !dbg [[DBG78]]
+; CHECK-NEXT #dbg_value(i8 [[SROA_418_SROA_COPYLOAD]], [[META59]], !DIExpression(DW_OP_deref, DW_OP_LLVM_fragment, 64, 8), [[DBG72]])
+
+%T4main1TV13TangentVectorV = type <{ %T4main1UV13TangentVectorV, [7 x i8], %T4main1UV13TangentVectorV }>
+%T4main1UV13TangentVectorV = type <{ %T1M1SVySfG, [7 x i8], %T4main1VV13TangentVectorV }>
+%T1M1SVySfG = type <{ ptr, %Ts4Int8V }>
+%Ts4Int8V = type <{ i8 }>
+%T4main1VV13TangentVectorV = type <{ %T1M1SVySfG }>
+define hidden swiftcc void @"$s4main1TV13TangentVectorV1poiyA2E_AEtFZ"(ptr noalias nocapture sret(%T4main1TV13TangentVectorV) %0, ptr noalias nocapture dereferenceable(57) %1, ptr noalias nocapture dereferenceable(57) %2) #0 !dbg !44 {
+entry:
+  %3 = alloca %T4main1VV13TangentVectorV
+  %4 = alloca %T4main1UV13TangentVectorV
+  call void @llvm.dbg.value(metadata ptr %1, metadata !54, metadata !DIExpression(DW_OP_deref)), !dbg !61
+  call void @llvm.dbg.value(metadata ptr %2, metadata !56, metadata !DIExpression(DW_OP_deref)), !dbg !61
+  call void @llvm.dbg.value(metadata i64 0, metadata !57, metadata !DIExpression()), !dbg !61
+  %.u1 = getelementptr inbounds %T4main1TV13TangentVectorV, ptr %1, i32 0, i32 0
+  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %4, ptr align 8 %.u1, i64 25, i1 false), !dbg !61
+  call void @llvm.dbg.value(metadata ptr %4, metadata !62, metadata !DIExpression(DW_OP_deref)), !dbg !75
+  %.s = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %4, i32 0, i32 0
+  %.s.b = getelementptr inbounds %T1M1SVySfG, ptr %.s, i32 0, i32 1
+  %.s.b._value = getelementptr inbounds %Ts4Int8V, ptr %.s.b, i32 0, i32 0
+  %12 = load i8, ptr %.s.b._value
+  %.v = getelementptr inbounds %T4main1UV13TangentVectorV, ptr %4, i32 0, i32 2
+  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %3, ptr align 8 %.v, i64 9, i1 false)
+  %.s4 = getelementptr inbounds %T4main1VV13TangentVectorV, ptr %3, i32 0, i32 0
+  %.s4.c = getelementptr inbounds %T1M1SVySfG, ptr %.s4, i32 0, i32 0
+  %18 = load ptr, ptr %.s4.c
+  ret void
+}
+!llvm.module.flags = !{!0, !1, !2, !3, !4, !6, !7, !8, !9, !10, !11, !12, !13, !14, !15}
+!swift.module.flags = !{!33}
+!llvm.linker.options = !{!34, !35, !36, !37, !38, !39, !40, !41, !42, !43}
+!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 14, i32 4]}
+!1 = !{i32 1, !"Objective-C Version", i32 2}
+!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
+!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,no_dead_strip"}
+!4 = !{i32 1, !"Objective-C Garbage Collection", i8 0}
+!6 = !{i32 7, !"Dwarf Version", i32 4}
+!7 = !{i32 2, !"Debug Info Version", i32 3}
+!8 = !{i32 1, !"wchar_size", i32 4}
+!9 = !{i32 8, !"PIC Level", i32 2}
+!10 = !{i32 7, !"uwtable", i32 1}
+!11 = !{i32 7, !"frame-pointer", i32 1}
+!12 = !{i32 1, !"Swift Version", i32 7}
+!13 = !{i32 1, !"Swift ABI Version", i32 7}
+!14 = !{i32 1, !"Swift Major Version", i8 6}
+!15 = !{i32 1, !"Swift Minor Version", i8 0}
+!16 = distinct !DICompileUnit(language: DW_LANG_Swift, file: !17, imports: !18, sdk: "MacOSX14.4.sdk")
+!17 = !DIFile(filename: "swift/swift/test/IRGen/debug_scope_distinct.swift", directory: "swift")
+!18 = !{!19, !21, !23, !25, !27, !29, !31}
+!19 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !20, file: !17)
+!20 = !DIModule(scope: null, name: "main", includePath: "swift/swift/test/IRGen")
+!21 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !22, file: !17)
+!22 = !DIModule(scope: null, name: "Swift", includePath: "swift/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/Swift.swiftmodule/arm64-apple-macos.swiftmodule")
+!23 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !24, line: 60)
+!24 = !DIModule(scope: null, name: "_Differentiation", includePath: "swift/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_Differentiation.swiftmodule/arm64-apple-macos.swiftmodule")
+!25 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !26, line: 61)
+!26 = !DIModule(scope: null, name: "M", includePath: "swift/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/test-macosx-arm64/IRGen/Output/debug_scope_distinct.swift.tmp/M.swiftmodule")
+!27 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !28, file: !17)
+!28 = !DIModule(scope: null, name: "_StringProcessing", includePath: "swift/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_StringProcessing.swiftmodule/arm64-apple-macos.swiftmodule")
+!29 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !30, file: !17)
+!30 = !DIModule(scope: null, name: "_SwiftConcurrencyShims", includePath: "swift/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/shims")
+!31 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !17, entity: !32, file: !17)
+!32 = !DIModule(scope: null, name: "_Concurrency", includePath: "swift/_build/Ninja-RelWithDebInfoAssert+stdlib-RelWithDebInfo/swift-macosx-arm64/lib/swift/macosx/_Concurrency.swiftmodule/arm64-apple-macos.swiftmodule")
+!33 = !{ i1 false}
+!34 = !{!"-lswiftCore"}
+!35 = !{!"-lswift_StringProcessing"}
+!36 = !{!"-lswift_Differentiation"}
+!37 = !{!"-lswiftDarwin"}
+!38 = !{!"-lswift_Concurrency"}
+!39 = !{!"-lswiftSwiftOnoneSupport"}
+!40 = !{!"-lobjc"}
+!41 = !{!"-lswiftCompatibilityConcurrency"}
+!42 = !{!"-lswiftCompatibility56"}
+!43 = !{!"-lswiftCompatibilityPacks"}
+!44 = distinct !DISubprogram(file: !45, type: !49, unit: !16, declaration: !52, retainedNodes: !53)
+!45 = !DIFile(filename: "<compiler-generated>", directory: "/")
+!46 = !DICompositeType(tag: DW_TAG_structure_type, scope: !47, elements: !48, identifier: "$s4main1TV13TangentVectorVD")
+!47 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1TVD")
+!48 = !{}
+!49 = !DISubroutineType(types: !50)
+!50 = !{ !51}
+!51 = !DICompositeType(tag: DW_TAG_structure_type, identifier: "$s4main1TV13TangentVectorVXMtD")
+!52 = !DISubprogram(spFlags: DISPFlagOptimized)
+!53 = !{!54, !56, !57}
+!54 = !DILocalVariable(name: "a", scope: !44, flags: DIFlagArtificial)
+!55 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !46)
+!56 = !DILocalVariable(name: "b", scope: !44, type: !55, flags: DIFlagArtificial)
+!57 = !DILocalVariable(name: "c", scope: !44, type: !58, flags: DIFlagArtificial)
+!58 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !51)
+!61 = !DILocation(scope: !44)
+!62 = !DILocalVariable(name: "d", scope: !63, type: !72, flags: DIFlagArtificial)
+!63 = distinct !DISubprogram(unit: !16, retainedNodes: !70)
+!64 = !DICompositeType(tag: DW_TAG_structure_type, size: 200, identifier: "$s4main1UV13TangentVectorVD")
+!70 = !{}
+!72 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !64)
+!75 = !DILocation(scope: !63, inlinedAt: !76)
+!76 = distinct !DILocation(scope: !44)
diff --git a/llvm/test/Transforms/SROA/alignment.ll b/llvm/test/Transforms/SROA/alignment.ll
index 98be495e5eb354..8322da189aeeaa 100644
--- a/llvm/test/Transforms/SROA/alignment.ll
+++ b/llvm/test/Transforms/SROA/alignment.ll
@@ -23,7 +23,9 @@ define void @test1(ptr %a, ptr %b) {
 ;
 ; CHECK-DEBUGLOC-LABEL: @test1(
 ; CHECK-DEBUGLOC-NEXT:  entry:
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META9:![0-9]+]], !DIExpression(), [[META14:![0-9]+]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META9:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 0, 8), [[META14:![0-9]+]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META9]], !DIExpression(DW_OP_LLVM_fragment, 8, 8), [[META14]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META9]], !DIExpression(), [[META14]])
 ; CHECK-DEBUGLOC-NEXT:    [[GEP_A:%.*]] = getelementptr { i8, i8 }, ptr [[A:%.*]], i32 0, i32 0, !dbg [[DBG15:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[GEP_A]], [[META11:![0-9]+]], !DIExpression(), [[DBG15]])
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META12:![0-9]+]], !DIExpression(), [[META16:![0-9]+]])
@@ -57,24 +59,25 @@ define void @test2() {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[A_SROA_0:%.*]] = alloca i16, align 2
 ; CHECK-NEXT:    store volatile i16 0, ptr [[A_SROA_0]], align 2
-; CHECK-NEXT:    [[A_SROA_0_1_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1
-; CHECK-NEXT:    [[A_SROA_0_1_A_SROA_0_2_RESULT:%.*]] = load i8, ptr [[A_SROA_0_1_SROA_IDX]], align 1
-; CHECK-NEXT:    [[A_SROA_0_1_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1
-; CHECK-NEXT:    store i8 42, ptr [[A_SROA_0_1_SROA_IDX2]], align 1
+; CHECK-NEXT:    [[A_SROA_0_1_GEP2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1
+; CHECK-NEXT:    [[A_SROA_0_1_A_SROA_0_2_RESULT:%.*]] = load i8, ptr [[A_SROA_0_1_GEP2_SROA_IDX]], align 1
+; CHECK-NEXT:    [[A_SROA_0_1_GEP2_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1
+; CHECK-NEXT:    store i8 42, ptr [[A_SROA_0_1_GEP2_SROA_IDX2]], align 1
 ; CHECK-NEXT:    ret void
 ;
 ; CHECK-DEBUGLOC-LABEL: @test2(
 ; CHECK-DEBUGLOC-NEXT:  entry:
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0:%.*]] = alloca i16, align 2, !dbg [[DBG28:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META23:![0-9]+]], !DIExpression(), [[DBG28]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[A_SROA_0]], [[META23:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 8, 16), [[DBG28]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META23]], !DIExpression(), [[DBG28]])
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META24:![0-9]+]], !DIExpression(), [[META29:![0-9]+]])
 ; CHECK-DEBUGLOC-NEXT:    store volatile i16 0, ptr [[A_SROA_0]], align 2, !dbg [[DBG30:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META25:![0-9]+]], !DIExpression(), [[META31:![0-9]+]])
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_1_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1, !dbg [[DBG32:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_1_A_SROA_0_2_RESULT:%.*]] = load i8, ptr [[A_SROA_0_1_SROA_IDX]], align 1, !dbg [[DBG32]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_1_GEP2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1, !dbg [[DBG32:![0-9]+]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_1_A_SROA_0_2_RESULT:%.*]] = load i8, ptr [[A_SROA_0_1_GEP2_SROA_IDX]], align 1, !dbg [[DBG32]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(i8 [[A_SROA_0_1_A_SROA_0_2_RESULT]], [[META26:![0-9]+]], !DIExpression(), [[DBG32]])
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_1_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1, !dbg [[DBG33:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:    store i8 42, ptr [[A_SROA_0_1_SROA_IDX2]], align 1, !dbg [[DBG33]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_1_GEP2_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 1, !dbg [[DBG33:![0-9]+]]
+; CHECK-DEBUGLOC-NEXT:    store i8 42, ptr [[A_SROA_0_1_GEP2_SROA_IDX2]], align 1, !dbg [[DBG33]]
 ; CHECK-DEBUGLOC-NEXT:    ret void, !dbg [[DBG34:![0-9]+]]
 ;
 entry:
@@ -117,7 +120,6 @@ define void @test3(ptr %x) {
 ; expecting. However, also check that any offset within an alloca can in turn
 ; reduce the alignment.
 ;
-;
 ; CHECK-LABEL: @test3(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[A_SROA_0:%.*]] = alloca [22 x i8], align 8
@@ -129,9 +131,11 @@ define void @test3(ptr %x) {
 ; CHECK-DEBUGLOC-LABEL: @test3(
 ; CHECK-DEBUGLOC-NEXT:  entry:
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0:%.*]] = alloca [22 x i8], align 8, !dbg [[DBG47:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META44:![0-9]+]], !DIExpression(), [[DBG47]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[A_SROA_0]], [[META44:![0-9]+]], !DIExpression(), [[DBG47]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META44]], !DIExpression(), [[DBG47]])
 ; CHECK-DEBUGLOC-NEXT:    [[B_SROA_0:%.*]] = alloca [18 x i8], align 2, !dbg [[DBG48:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META45:![0-9]+]], !DIExpression(), [[DBG48]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[B_SROA_0]], [[META45:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 48, 16), [[DBG48]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META45]], !DIExpression(), [[DBG48]])
 ; CHECK-DEBUGLOC-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 8 [[A_SROA_0]], ptr align 8 [[X:%.*]], i32 22, i1 false), !dbg [[DBG49:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META46:![0-9]+]], !DIExpression(), [[META50:![0-9]+]])
 ; CHECK-DEBUGLOC-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 2 [[B_SROA_0]], ptr align 2 [[X]], i32 18, i1 false), !dbg [[DBG51:![0-9]+]]
@@ -158,31 +162,32 @@ define void @test5() {
 ; CHECK-NEXT:    [[A_SROA_0:%.*]] = alloca [9 x i8], align 1
 ; CHECK-NEXT:    [[A_SROA_3:%.*]] = alloca [9 x i8], align 1
 ; CHECK-NEXT:    store volatile double 0.000000e+00, ptr [[A_SROA_0]], align 1
-; CHECK-NEXT:    [[A_SROA_0_7_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 7
-; CHECK-NEXT:    [[A_SROA_0_7_A_SROA_0_7_WEIRD_LOAD1:%.*]] = load volatile i16, ptr [[A_SROA_0_7_SROA_IDX1]], align 1
+; CHECK-NEXT:    [[A_SROA_0_7_WEIRD_GEP1_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 7
+; CHECK-NEXT:    [[A_SROA_0_7_A_SROA_0_7_WEIRD_LOAD1:%.*]] = load volatile i16, ptr [[A_SROA_0_7_WEIRD_GEP1_SROA_IDX1]], align 1
 ; CHECK-NEXT:    [[A_SROA_0_0_A_SROA_0_0_D1:%.*]] = load double, ptr [[A_SROA_0]], align 1
 ; CHECK-NEXT:    store volatile double [[A_SROA_0_0_A_SROA_0_0_D1]], ptr [[A_SROA_3]], align 1
-; CHECK-NEXT:    [[A_SROA_3_7_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_3]], i64 7
-; CHECK-NEXT:    [[A_SROA_3_7_A_SROA_3_16_WEIRD_LOAD2:%.*]] = load volatile i16, ptr [[A_SROA_3_7_SROA_IDX]], align 1
+; CHECK-NEXT:    [[A_SROA_3_7_WEIRD_GEP2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_3]], i64 7
+; CHECK-NEXT:    [[A_SROA_3_7_A_SROA_3_16_WEIRD_LOAD2:%.*]] = load volatile i16, ptr [[A_SROA_3_7_WEIRD_GEP2_SROA_IDX]], align 1
 ; CHECK-NEXT:    ret void
 ;
 ; CHECK-DEBUGLOC-LABEL: @test5(
 ; CHECK-DEBUGLOC-NEXT:  entry:
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0:%.*]] = alloca [9 x i8], align 1, !dbg [[DBG63:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_3:%.*]] = alloca [9 x i8], align 1, !dbg [[DBG63]]
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META55:![0-9]+]], !DIExpression(), [[DBG63]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[A_SROA_0]], [[META55:![0-9]+]], !DIExpression(), [[DBG63]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META55]], !DIExpression(), [[DBG63]])
 ; CHECK-DEBUGLOC-NEXT:    store volatile double 0.000000e+00, ptr [[A_SROA_0]], align 1, !dbg [[DBG64:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META56:![0-9]+]], !DIExpression(), [[META65:![0-9]+]])
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_7_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 7, !dbg [[DBG66:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_7_A_SROA_0_7_WEIRD_LOAD1:%.*]] = load volatile i16, ptr [[A_SROA_0_7_SROA_IDX1]], align 1, !dbg [[DBG66]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_7_WEIRD_GEP1_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_0]], i64 7, !dbg [[DBG66:![0-9]+]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_7_A_SROA_0_7_WEIRD_LOAD1:%.*]] = load volatile i16, ptr [[A_SROA_0_7_WEIRD_GEP1_SROA_IDX1]], align 1, !dbg [[DBG66]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(i16 [[A_SROA_0_7_A_SROA_0_7_WEIRD_LOAD1]], [[META57:![0-9]+]], !DIExpression(), [[DBG66]])
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META59:![0-9]+]], !DIExpression(), [[META67:![0-9]+]])
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_0_A_SROA_0_0_D1:%.*]] = load double, ptr [[A_SROA_0]], align 1, !dbg [[DBG68:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(double [[A_SROA_0_0_A_SROA_0_0_D1]], [[META60:![0-9]+]], !DIExpression(), [[DBG68]])
 ; CHECK-DEBUGLOC-NEXT:    store volatile double [[A_SROA_0_0_A_SROA_0_0_D1]], ptr [[A_SROA_3]], align 1, !dbg [[DBG69:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META61:![0-9]+]], !DIExpression(), [[META70:![0-9]+]])
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_3_7_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_3]], i64 7, !dbg [[DBG71:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:    [[A_SROA_3_7_A_SROA_3_16_WEIRD_LOAD2:%.*]] = load volatile i16, ptr [[A_SROA_3_7_SROA_IDX]], align 1, !dbg [[DBG71]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_3_7_WEIRD_GEP2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[A_SROA_3]], i64 7, !dbg [[DBG71:![0-9]+]]
+; CHECK-DEBUGLOC-NEXT:    [[A_SROA_3_7_A_SROA_3_16_WEIRD_LOAD2:%.*]] = load volatile i16, ptr [[A_SROA_3_7_WEIRD_GEP2_SROA_IDX]], align 1, !dbg [[DBG71]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(i16 [[A_SROA_3_7_A_SROA_3_16_WEIRD_LOAD2]], [[META62:![0-9]+]], !DIExpression(), [[DBG71]])
 ; CHECK-DEBUGLOC-NEXT:    ret void, !dbg [[DBG72:![0-9]+]]
 ;
@@ -219,7 +224,8 @@ define void @test6() {
 ; CHECK-DEBUGLOC-NEXT:  entry:
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0:%.*]] = alloca double, align 8, !dbg [[DBG78:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_2:%.*]] = alloca double, align 8, !dbg [[DBG78]]
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META75:![0-9]+]], !DIExpression(), [[DBG78]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[A_SROA_0]], [[META75:![0-9]+]], !DIExpression(), [[DBG78]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META75]], !DIExpression(), [[DBG78]])
 ; CHECK-DEBUGLOC-NEXT:    store volatile double 0.000000e+00, ptr [[A_SROA_0]], align 8, !dbg [[DBG79:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META76:![0-9]+]], !DIExpression(), [[META80:![0-9]+]])
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_0_A_SROA_0_0_VAL:%.*]] = load double, ptr [[A_SROA_0]], align 8, !dbg [[DBG81:![0-9]+]]
@@ -256,6 +262,7 @@ define void @test7(ptr %out) {
 ; CHECK-DEBUGLOC-LABEL: @test7(
 ; CHECK-DEBUGLOC-NEXT:  entry:
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META86:![0-9]+]], !DIExpression(), [[META90:![0-9]+]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META86]], !DIExpression(), [[META90]])
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META87:![0-9]+]], !DIExpression(), [[META91:![0-9]+]])
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_0_0_COPYLOAD:%.*]] = load double, ptr [[OUT:%.*]], align 1, !dbg [[DBG92:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:    [[A_SROA_4_0_OUT_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OUT]], i64 8, !dbg [[DBG92]]
@@ -442,7 +449,8 @@ define dso_local i32 @pr45010(ptr %A) {
 ;
 ; CHECK-DEBUGLOC-LABEL: @pr45010(
 ; CHECK-DEBUGLOC-NEXT:    [[B_SROA_0:%.*]] = alloca i32, align 4, !dbg [[DBG129:![0-9]+]]
-; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META125:![0-9]+]], !DIExpression(), [[DBG129]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr [[B_SROA_0]], [[META125:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 0, 32), [[DBG129]])
+; CHECK-DEBUGLOC-NEXT:      #dbg_value(ptr undef, [[META125]], !DIExpression(), [[DBG129]])
 ; CHECK-DEBUGLOC-NEXT:    [[TMP1:%.*]] = load i32, ptr [[A:%.*]], align 4, !dbg [[DBG130:![0-9]+]]
 ; CHECK-DEBUGLOC-NEXT:      #dbg_value(i32 [[TMP1]], [[META126:![0-9]+]], !DIExpression(), [[DBG130]])
 ; CHECK-DEBUGLOC-NEXT:    store atomic volatile i32 [[TMP1]], ptr [[B_SROA_0]] release, align 4, !dbg [[DBG131:![0-9]+]]
diff --git a/llvm/test/Transforms/SROA/vector-promotion.ll b/llvm/test/Transforms/SROA/vector-promotion.ll
index 8624ab27ed3cc9..08863dce1c7879 100644
--- a/llvm/test/Transforms/SROA/vector-promotion.ll
+++ b/llvm/test/Transforms/SROA/vector-promotion.ll
@@ -23,6 +23,7 @@ define i32 @test1(<4 x i32> %x, <4 x i32> %y) {
 ; DEBUG-LABEL: @test1(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META9:![0-9]+]], !DIExpression(), [[META21:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META9]], !DIExpression(), [[META21]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META11:![0-9]+]], !DIExpression(), [[META22:![0-9]+]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META12:![0-9]+]], !DIExpression(), [[META23:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_8_VEC_EXTRACT:%.*]] = extractelement <4 x i32> [[X:%.*]], i32 2, !dbg [[DBG24:![0-9]+]]
@@ -72,6 +73,7 @@ define i32 @test2(<4 x i32> %x, <4 x i32> %y) {
 ; DEBUG-LABEL: @test2(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META34:![0-9]+]], !DIExpression(), [[META45:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META34]], !DIExpression(), [[META45]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META35:![0-9]+]], !DIExpression(), [[META46:![0-9]+]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META36:![0-9]+]], !DIExpression(), [[META47:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_8_VEC_EXTRACT:%.*]] = extractelement <4 x i32> [[X:%.*]], i32 2, !dbg [[DBG48:![0-9]+]]
@@ -124,6 +126,7 @@ define i32 @test3(<4 x i32> %x, <4 x i32> %y) {
 ; DEBUG-LABEL: @test3(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META59:![0-9]+]], !DIExpression(), [[META69:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META59]], !DIExpression(), [[META69]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META60:![0-9]+]], !DIExpression(), [[META70:![0-9]+]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META61:![0-9]+]], !DIExpression(), [[META71:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_8_VEC_INSERT:%.*]] = insertelement <4 x i32> [[X:%.*]], i32 -1, i32 2, !dbg [[DBG72:![0-9]+]]
@@ -180,6 +183,7 @@ define i32 @test4(<4 x i32> %x, <4 x i32> %y, ptr %z) {
 ; DEBUG-LABEL: @test4(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META83:![0-9]+]], !DIExpression(), [[META94:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META83]], !DIExpression(), [[META94]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META84:![0-9]+]], !DIExpression(), [[META95:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_3_16_COPYLOAD:%.*]] = load <4 x i32>, ptr [[Z:%.*]], align 1, !dbg [[DBG96:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META85:![0-9]+]], !DIExpression(), [[META97:![0-9]+]])
@@ -244,6 +248,7 @@ define i32 @test4_as1(<4 x i32> %x, <4 x i32> %y, ptr addrspace(1) %z) {
 ; DEBUG-LABEL: @test4_as1(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META110:![0-9]+]], !DIExpression(), [[META121:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META110]], !DIExpression(), [[META121]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META111:![0-9]+]], !DIExpression(), [[META122:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_3_16_COPYLOAD:%.*]] = load <4 x i32>, ptr addrspace(1) [[Z:%.*]], align 1, !dbg [[DBG123:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META112:![0-9]+]], !DIExpression(), [[META124:![0-9]+]])
@@ -306,6 +311,7 @@ define i32 @test5(<4 x i32> %x, <4 x i32> %y, ptr %z) {
 ; DEBUG-LABEL: @test5(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META137:![0-9]+]], !DIExpression(), [[META148:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META137]], !DIExpression(), [[META148]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META138:![0-9]+]], !DIExpression(), [[META149:![0-9]+]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META139:![0-9]+]], !DIExpression(), [[META150:![0-9]+]])
 ; DEBUG-NEXT:    [[Z_TMP1:%.*]] = getelementptr inbounds <4 x i32>, ptr [[Z:%.*]], i64 0, i64 2, !dbg [[DBG151:![0-9]+]]
@@ -596,7 +602,8 @@ define i32 @PR14212(<3 x i8> %val) {
 ;
 ; DEBUG-LABEL: @PR14212(
 ; DEBUG-NEXT:  entry:
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META250:![0-9]+]], !DIExpression(), [[META252:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META250:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 24, 8), [[META252:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META250]], !DIExpression(), [[META252]])
 ; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast <3 x i8> [[VAL:%.*]] to i24, !dbg [[DBG253:![0-9]+]]
 ; DEBUG-NEXT:    [[RETVAL_SROA_2_0_INSERT_EXT:%.*]] = zext i8 undef to i32, !dbg [[DBG254:![0-9]+]]
 ; DEBUG-NEXT:    [[RETVAL_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[RETVAL_SROA_2_0_INSERT_EXT]], 24, !dbg [[DBG254]]
@@ -630,7 +637,9 @@ define <2 x i8> @PR14349.1(i32 %x) {
 ;
 ; DEBUG-LABEL: @PR14349.1(
 ; DEBUG-NEXT:  entry:
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META257:![0-9]+]], !DIExpression(), [[META260:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META257:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 0, 16), [[META260:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META257]], !DIExpression(DW_OP_LLVM_fragment, 16, 16), [[META260]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META257]], !DIExpression(), [[META260]])
 ; DEBUG-NEXT:    [[A_SROA_0_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[X:%.*]] to i16, !dbg [[DBG261:![0-9]+]]
 ; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast i16 [[A_SROA_0_0_EXTRACT_TRUNC]] to <2 x i8>, !dbg [[DBG261]]
 ; DEBUG-NEXT:    [[A_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i32 [[X]], 16, !dbg [[DBG261]]
@@ -666,7 +675,9 @@ define i32 @PR14349.2(<2 x i8> %x) {
 ;
 ; DEBUG-LABEL: @PR14349.2(
 ; DEBUG-NEXT:  entry:
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META266:![0-9]+]], !DIExpression(), [[META268:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META266:![0-9]+]], !DIExpression(DW_OP_LLVM_fragment, 0, 16), [[META268:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META266]], !DIExpression(DW_OP_LLVM_fragment, 16, 16), [[META268]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META266]], !DIExpression(), [[META268]])
 ; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast <2 x i8> [[X:%.*]] to i16, !dbg [[DBG269:![0-9]+]]
 ; DEBUG-NEXT:    [[A_SROA_2_0_INSERT_EXT:%.*]] = zext i16 undef to i32, !dbg [[DBG270:![0-9]+]]
 ; DEBUG-NEXT:    [[A_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_2_0_INSERT_EXT]], 16, !dbg [[DBG270]]
@@ -703,6 +714,7 @@ define i32 @test7(<2 x i32> %x, <2 x i32> %y) {
 ; DEBUG-LABEL: @test7(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META273:![0-9]+]], !DIExpression(), [[META283:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META273]], !DIExpression(), [[META283]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META274:![0-9]+]], !DIExpression(), [[META284:![0-9]+]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META275:![0-9]+]], !DIExpression(), [[META285:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x i32> [[X:%.*]], i32 1, !dbg [[DBG286:![0-9]+]]
@@ -752,6 +764,7 @@ define i32 @test8(<2 x i32> %x) {
 ; DEBUG-LABEL: @test8(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META296:![0-9]+]], !DIExpression(), [[META301:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META296]], !DIExpression(), [[META301]])
 ; DEBUG-NEXT:    [[A_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x i32> [[X:%.*]], i32 0, !dbg [[DBG302:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(i32 [[A_SROA_0_0_VEC_EXTRACT]], [[META297:![0-9]+]], !DIExpression(), [[DBG302]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META298:![0-9]+]], !DIExpression(), [[META303:![0-9]+]])
@@ -787,6 +800,7 @@ define <2 x i32> @test9(i32 %x, i32 %y) {
 ; DEBUG-LABEL: @test9(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META309:![0-9]+]], !DIExpression(), [[META312:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META309]], !DIExpression(), [[META312]])
 ; DEBUG-NEXT:    [[A_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x i32> undef, i32 [[X:%.*]], i32 0, !dbg [[DBG313:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META310:![0-9]+]], !DIExpression(), [[META314:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x i32> [[A_SROA_0_0_VEC_INSERT]], i32 [[Y:%.*]], i32 1, !dbg [[DBG315:![0-9]+]]
@@ -818,6 +832,7 @@ define <2 x i32> @test10(<4 x i16> %x, i32 %y) {
 ; DEBUG-LABEL: @test10(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META320:![0-9]+]], !DIExpression(), [[META323:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META320]], !DIExpression(), [[META323]])
 ; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast <4 x i16> [[X:%.*]] to <2 x i32>, !dbg [[DBG324:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META321:![0-9]+]], !DIExpression(), [[META325:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[Y:%.*]], i32 1, !dbg [[DBG326:![0-9]+]]
@@ -851,6 +866,7 @@ define <2 x float> @test11(<4 x i16> %x, i32 %y) {
 ; DEBUG-LABEL: @test11(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META331:![0-9]+]], !DIExpression(), [[META334:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META331]], !DIExpression(), [[META334]])
 ; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast <4 x i16> [[X:%.*]] to <2 x i32>, !dbg [[DBG335:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META332:![0-9]+]], !DIExpression(), [[META336:![0-9]+]])
 ; DEBUG-NEXT:    [[A_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[Y:%.*]], i32 1, !dbg [[DBG337:![0-9]+]]
@@ -877,6 +893,7 @@ define <4 x float> @test12(<4 x i32> %val) {
 ;
 ; DEBUG-LABEL: @test12(
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META342:![0-9]+]], !DIExpression(), [[META344:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META342]], !DIExpression(), [[META344]])
 ; DEBUG-NEXT:    [[TMP1:%.*]] = bitcast <4 x i32> [[VAL:%.*]] to <4 x float>, !dbg [[DBG345:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(<4 x float> [[TMP1]], [[META343:![0-9]+]], !DIExpression(), [[DBG345]])
 ; DEBUG-NEXT:    ret <4 x float> [[TMP1]], !dbg [[DBG346:![0-9]+]]
@@ -905,6 +922,7 @@ define <2 x i64> @test13(i32 %a, i32 %b, i32 %c, i32 %d) {
 ; DEBUG-LABEL: @test13(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META349:![0-9]+]], !DIExpression(), [[META354:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META349]], !DIExpression(), [[META354]])
 ; DEBUG-NEXT:    [[X_SROA_0_0_VEC_INSERT:%.*]] = insertelement <4 x i32> undef, i32 [[A:%.*]], i32 0, !dbg [[DBG355:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META350:![0-9]+]], !DIExpression(), [[META356:![0-9]+]])
 ; DEBUG-NEXT:    [[X_SROA_0_4_VEC_INSERT:%.*]] = insertelement <4 x i32> [[X_SROA_0_0_VEC_INSERT]], i32 [[B:%.*]], i32 1, !dbg [[DBG357:![0-9]+]]
@@ -947,6 +965,7 @@ define i32 @test14(<2 x i64> %x) {
 ; DEBUG-LABEL: @test14(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META366:![0-9]+]], !DIExpression(), [[META378:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META366]], !DIExpression(), [[META378]])
 ; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast <2 x i64> [[X:%.*]] to <4 x i32>, !dbg [[DBG379:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META367:![0-9]+]], !DIExpression(), [[META380:![0-9]+]])
 ; DEBUG-NEXT:    [[X_ADDR_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0, !dbg [[DBG381:![0-9]+]]
@@ -990,29 +1009,30 @@ define <4 x ptr> @test15(i32 %a, i32 %b, i32 %c, i32 %d) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[X_SROA_0:%.*]] = alloca <4 x ptr>, align 32
 ; CHECK-NEXT:    store i32 [[A:%.*]], ptr [[X_SROA_0]], align 32
-; CHECK-NEXT:    [[X_SROA_0_4_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4
-; CHECK-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_SROA_IDX1]], align 4
-; CHECK-NEXT:    [[X_SROA_0_8_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 8
-; CHECK-NEXT:    store i32 [[C:%.*]], ptr [[X_SROA_0_8_SROA_IDX2]], align 8
-; CHECK-NEXT:    [[X_SROA_0_12_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 12
-; CHECK-NEXT:    store i32 [[D:%.*]], ptr [[X_SROA_0_12_SROA_IDX3]], align 4
+; CHECK-NEXT:    [[X_SROA_0_4_X_TMP2_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4
+; CHECK-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_X_TMP2_SROA_IDX1]], align 4
+; CHECK-NEXT:    [[X_SROA_0_8_X_TMP3_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 8
+; CHECK-NEXT:    store i32 [[C:%.*]], ptr [[X_SROA_0_8_X_TMP3_SROA_IDX2]], align 8
+; CHECK-NEXT:    [[X_SROA_0_12_X_TMP4_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 12
+; CHECK-NEXT:    store i32 [[D:%.*]], ptr [[X_SROA_0_12_X_TMP4_SROA_IDX3]], align 4
 ; CHECK-NEXT:    [[X_SROA_0_0_X_SROA_0_0_RESULT:%.*]] = load <4 x ptr>, ptr [[X_SROA_0]], align 32
 ; CHECK-NEXT:    ret <4 x ptr> [[X_SROA_0_0_X_SROA_0_0_RESULT]]
 ;
 ; DEBUG-LABEL: @test15(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:    [[X_SROA_0:%.*]] = alloca <4 x ptr>, align 32, !dbg [[DBG400:![0-9]+]]
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META394:![0-9]+]], !DIExpression(), [[DBG400]])
+; DEBUG-NEXT:      #dbg_value(ptr [[X_SROA_0]], [[META394:![0-9]+]], !DIExpression(), [[DBG400]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META394]], !DIExpression(), [[DBG400]])
 ; DEBUG-NEXT:    store i32 [[A:%.*]], ptr [[X_SROA_0]], align 32, !dbg [[DBG401:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META395:![0-9]+]], !DIExpression(), [[META402:![0-9]+]])
-; DEBUG-NEXT:    [[X_SROA_0_4_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4, !dbg [[DBG403:![0-9]+]]
-; DEBUG-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_SROA_IDX1]], align 4, !dbg [[DBG403]]
+; DEBUG-NEXT:    [[X_SROA_0_4_X_TMP2_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4, !dbg [[DBG403:![0-9]+]]
+; DEBUG-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_X_TMP2_SROA_IDX1]], align 4, !dbg [[DBG403]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META396:![0-9]+]], !DIExpression(), [[META404:![0-9]+]])
-; DEBUG-NEXT:    [[X_SROA_0_8_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 8, !dbg [[DBG405:![0-9]+]]
-; DEBUG-NEXT:    store i32 [[C:%.*]], ptr [[X_SROA_0_8_SROA_IDX2]], align 8, !dbg [[DBG405]]
+; DEBUG-NEXT:    [[X_SROA_0_8_X_TMP3_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 8, !dbg [[DBG405:![0-9]+]]
+; DEBUG-NEXT:    store i32 [[C:%.*]], ptr [[X_SROA_0_8_X_TMP3_SROA_IDX2]], align 8, !dbg [[DBG405]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META397:![0-9]+]], !DIExpression(), [[META406:![0-9]+]])
-; DEBUG-NEXT:    [[X_SROA_0_12_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 12, !dbg [[DBG407:![0-9]+]]
-; DEBUG-NEXT:    store i32 [[D:%.*]], ptr [[X_SROA_0_12_SROA_IDX3]], align 4, !dbg [[DBG407]]
+; DEBUG-NEXT:    [[X_SROA_0_12_X_TMP4_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 12, !dbg [[DBG407:![0-9]+]]
+; DEBUG-NEXT:    store i32 [[D:%.*]], ptr [[X_SROA_0_12_X_TMP4_SROA_IDX3]], align 4, !dbg [[DBG407]]
 ; DEBUG-NEXT:    [[X_SROA_0_0_X_SROA_0_0_RESULT:%.*]] = load <4 x ptr>, ptr [[X_SROA_0]], align 32, !dbg [[DBG408:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(<4 x ptr> [[X_SROA_0_0_X_SROA_0_0_RESULT]], [[META398:![0-9]+]], !DIExpression(), [[DBG408]])
 ; DEBUG-NEXT:    ret <4 x ptr> [[X_SROA_0_0_X_SROA_0_0_RESULT]], !dbg [[DBG409:![0-9]+]]
@@ -1046,6 +1066,7 @@ define <4 x ptr> @test16(i64 %a, i64 %b, i64 %c, i64 %d) {
 ; DEBUG-LABEL: @test16(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META412:![0-9]+]], !DIExpression(), [[META417:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META412]], !DIExpression(), [[META417]])
 ; DEBUG-NEXT:    [[TMP0:%.*]] = inttoptr i64 [[A:%.*]] to ptr, !dbg [[DBG418:![0-9]+]]
 ; DEBUG-NEXT:    [[X_SROA_0_0_VEC_INSERT:%.*]] = insertelement <4 x ptr> undef, ptr [[TMP0]], i32 0, !dbg [[DBG418]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META413:![0-9]+]], !DIExpression(), [[META419:![0-9]+]])
@@ -1078,29 +1099,30 @@ define <4 x ptr> @test17(i32 %a, i32 %b, i64 %c, i64 %d) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[X_SROA_0:%.*]] = alloca <4 x ptr>, align 32
 ; CHECK-NEXT:    store i32 [[A:%.*]], ptr [[X_SROA_0]], align 32
-; CHECK-NEXT:    [[X_SROA_0_4_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4
-; CHECK-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_SROA_IDX1]], align 4
-; CHECK-NEXT:    [[X_SROA_0_16_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 16
-; CHECK-NEXT:    store i64 [[C:%.*]], ptr [[X_SROA_0_16_SROA_IDX2]], align 16
-; CHECK-NEXT:    [[X_SROA_0_24_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 24
-; CHECK-NEXT:    store i64 [[D:%.*]], ptr [[X_SROA_0_24_SROA_IDX3]], align 8
+; CHECK-NEXT:    [[X_SROA_0_4_X_TMP2_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4
+; CHECK-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_X_TMP2_SROA_IDX1]], align 4
+; CHECK-NEXT:    [[X_SROA_0_16_X_TMP3_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 16
+; CHECK-NEXT:    store i64 [[C:%.*]], ptr [[X_SROA_0_16_X_TMP3_SROA_IDX2]], align 16
+; CHECK-NEXT:    [[X_SROA_0_24_X_TMP4_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 24
+; CHECK-NEXT:    store i64 [[D:%.*]], ptr [[X_SROA_0_24_X_TMP4_SROA_IDX3]], align 8
 ; CHECK-NEXT:    [[X_SROA_0_0_X_SROA_0_0_RESULT:%.*]] = load <4 x ptr>, ptr [[X_SROA_0]], align 32
 ; CHECK-NEXT:    ret <4 x ptr> [[X_SROA_0_0_X_SROA_0_0_RESULT]]
 ;
 ; DEBUG-LABEL: @test17(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:    [[X_SROA_0:%.*]] = alloca <4 x ptr>, align 32, !dbg [[DBG434:![0-9]+]]
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META429:![0-9]+]], !DIExpression(), [[DBG434]])
+; DEBUG-NEXT:      #dbg_value(ptr [[X_SROA_0]], [[META429:![0-9]+]], !DIExpression(), [[DBG434]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META429]], !DIExpression(), [[DBG434]])
 ; DEBUG-NEXT:    store i32 [[A:%.*]], ptr [[X_SROA_0]], align 32, !dbg [[DBG435:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META430:![0-9]+]], !DIExpression(), [[META436:![0-9]+]])
-; DEBUG-NEXT:    [[X_SROA_0_4_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4, !dbg [[DBG437:![0-9]+]]
-; DEBUG-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_SROA_IDX1]], align 4, !dbg [[DBG437]]
+; DEBUG-NEXT:    [[X_SROA_0_4_X_TMP2_SROA_IDX1:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 4, !dbg [[DBG437:![0-9]+]]
+; DEBUG-NEXT:    store i32 [[B:%.*]], ptr [[X_SROA_0_4_X_TMP2_SROA_IDX1]], align 4, !dbg [[DBG437]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META431:![0-9]+]], !DIExpression(), [[META438:![0-9]+]])
-; DEBUG-NEXT:    [[X_SROA_0_16_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 16, !dbg [[DBG439:![0-9]+]]
-; DEBUG-NEXT:    store i64 [[C:%.*]], ptr [[X_SROA_0_16_SROA_IDX2]], align 16, !dbg [[DBG439]]
+; DEBUG-NEXT:    [[X_SROA_0_16_X_TMP3_SROA_IDX2:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 16, !dbg [[DBG439:![0-9]+]]
+; DEBUG-NEXT:    store i64 [[C:%.*]], ptr [[X_SROA_0_16_X_TMP3_SROA_IDX2]], align 16, !dbg [[DBG439]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META432:![0-9]+]], !DIExpression(), [[META440:![0-9]+]])
-; DEBUG-NEXT:    [[X_SROA_0_24_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 24, !dbg [[DBG441:![0-9]+]]
-; DEBUG-NEXT:    store i64 [[D:%.*]], ptr [[X_SROA_0_24_SROA_IDX3]], align 8, !dbg [[DBG441]]
+; DEBUG-NEXT:    [[X_SROA_0_24_X_TMP4_SROA_IDX3:%.*]] = getelementptr inbounds i8, ptr [[X_SROA_0]], i64 24, !dbg [[DBG441:![0-9]+]]
+; DEBUG-NEXT:    store i64 [[D:%.*]], ptr [[X_SROA_0_24_X_TMP4_SROA_IDX3]], align 8, !dbg [[DBG441]]
 ; DEBUG-NEXT:    [[X_SROA_0_0_X_SROA_0_0_RESULT:%.*]] = load <4 x ptr>, ptr [[X_SROA_0]], align 32, !dbg [[DBG442:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(<4 x ptr> [[X_SROA_0_0_X_SROA_0_0_RESULT]], [[META433:![0-9]+]], !DIExpression(), [[DBG442]])
 ; DEBUG-NEXT:    ret <4 x ptr> [[X_SROA_0_0_X_SROA_0_0_RESULT]], !dbg [[DBG443:![0-9]+]]
@@ -1129,7 +1151,8 @@ define i1 @test18() {
 ;
 ; DEBUG-LABEL: @test18(
 ; DEBUG-NEXT:    [[A_SROA_0:%.*]] = alloca <2 x i64>, align 32, !dbg [[DBG449:![0-9]+]]
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META446:![0-9]+]], !DIExpression(), [[DBG449]])
+; DEBUG-NEXT:      #dbg_value(ptr [[A_SROA_0]], [[META446:![0-9]+]], !DIExpression(), [[DBG449]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META446]], !DIExpression(), [[DBG449]])
 ; DEBUG-NEXT:    store <2 x i64> <i64 0, i64 -1>, ptr [[A_SROA_0]], align 32, !dbg [[DBG450:![0-9]+]]
 ; DEBUG-NEXT:    [[A_SROA_0_0_A_SROA_0_0_L:%.*]] = load i1, ptr [[A_SROA_0]], align 32, !dbg [[DBG451:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(i1 [[A_SROA_0_0_A_SROA_0_0_L]], [[META447:![0-9]+]], !DIExpression(), [[DBG451]])
@@ -1150,6 +1173,7 @@ define void @swap-8bytes(ptr %x, ptr %y) {
 ;
 ; DEBUG-LABEL: @swap-8bytes(
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META455:![0-9]+]], !DIExpression(), [[META456:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META455]], !DIExpression(), [[META456]])
 ; DEBUG-NEXT:    [[TMP_SROA_0_0_COPYLOAD:%.*]] = load i64, ptr [[X:%.*]], align 1, !dbg [[DBG457:![0-9]+]]
 ; DEBUG-NEXT:    tail call void @llvm.memcpy.p0.p0.i64(ptr [[X]], ptr [[Y:%.*]], i64 8, i1 false), !dbg [[DBG458:![0-9]+]]
 ; DEBUG-NEXT:    store i64 [[TMP_SROA_0_0_COPYLOAD]], ptr [[Y]], align 1, !dbg [[DBG459:![0-9]+]]
@@ -1276,10 +1300,10 @@ define <4 x float> @ptrLoadStoreTysFloat(ptr %init, float %val2) {
 ; CHECK-NEXT:    [[OBJ:%.*]] = alloca <4 x float>, align 16
 ; CHECK-NEXT:    store <4 x float> zeroinitializer, ptr [[OBJ]], align 16
 ; CHECK-NEXT:    store ptr [[VAL0]], ptr [[OBJ]], align 16
-; CHECK-NEXT:    [[OBJ_8_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8
-; CHECK-NEXT:    store float [[VAL2:%.*]], ptr [[OBJ_8_SROA_IDX]], align 8
-; CHECK-NEXT:    [[OBJ_12_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12
-; CHECK-NEXT:    store float 1.310720e+05, ptr [[OBJ_12_SROA_IDX]], align 4
+; CHECK-NEXT:    [[OBJ_8_PTR2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8
+; CHECK-NEXT:    store float [[VAL2:%.*]], ptr [[OBJ_8_PTR2_SROA_IDX]], align 8
+; CHECK-NEXT:    [[OBJ_12_PTR3_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12
+; CHECK-NEXT:    store float 1.310720e+05, ptr [[OBJ_12_PTR3_SROA_IDX]], align 4
 ; CHECK-NEXT:    [[OBJ_0_SROAVAL:%.*]] = load <4 x float>, ptr [[OBJ]], align 16
 ; CHECK-NEXT:    ret <4 x float> [[OBJ_0_SROAVAL]]
 ;
@@ -1291,11 +1315,11 @@ define <4 x float> @ptrLoadStoreTysFloat(ptr %init, float %val2) {
 ; DEBUG-NEXT:    store <4 x float> zeroinitializer, ptr [[OBJ]], align 16, !dbg [[DBG510:![0-9]+]]
 ; DEBUG-NEXT:    store ptr [[VAL0]], ptr [[OBJ]], align 16, !dbg [[DBG511:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META505:![0-9]+]], !DIExpression(), [[META512:![0-9]+]])
-; DEBUG-NEXT:    [[OBJ_8_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8, !dbg [[DBG513:![0-9]+]]
-; DEBUG-NEXT:    store float [[VAL2:%.*]], ptr [[OBJ_8_SROA_IDX]], align 8, !dbg [[DBG513]]
+; DEBUG-NEXT:    [[OBJ_8_PTR2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8, !dbg [[DBG513:![0-9]+]]
+; DEBUG-NEXT:    store float [[VAL2:%.*]], ptr [[OBJ_8_PTR2_SROA_IDX]], align 8, !dbg [[DBG513]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META506:![0-9]+]], !DIExpression(), [[META514:![0-9]+]])
-; DEBUG-NEXT:    [[OBJ_12_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12, !dbg [[DBG515:![0-9]+]]
-; DEBUG-NEXT:    store float 1.310720e+05, ptr [[OBJ_12_SROA_IDX]], align 4, !dbg [[DBG515]]
+; DEBUG-NEXT:    [[OBJ_12_PTR3_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12, !dbg [[DBG515:![0-9]+]]
+; DEBUG-NEXT:    store float 1.310720e+05, ptr [[OBJ_12_PTR3_SROA_IDX]], align 4, !dbg [[DBG515]]
 ; DEBUG-NEXT:    [[OBJ_0_SROAVAL:%.*]] = load <4 x float>, ptr [[OBJ]], align 16, !dbg [[DBG516:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(<4 x float> [[OBJ_0_SROAVAL]], [[META507:![0-9]+]], !DIExpression(), [[DBG516]])
 ; DEBUG-NEXT:    ret <4 x float> [[OBJ_0_SROAVAL]], !dbg [[DBG517:![0-9]+]]
@@ -1356,10 +1380,10 @@ define <4 x ptr> @ptrLoadStoreTysPtr(ptr %init, i64 %val2) {
 ; CHECK-NEXT:    [[OBJ:%.*]] = alloca <4 x ptr>, align 16
 ; CHECK-NEXT:    store <4 x ptr> zeroinitializer, ptr [[OBJ]], align 16
 ; CHECK-NEXT:    store ptr [[VAL0]], ptr [[OBJ]], align 16
-; CHECK-NEXT:    [[OBJ_8_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8
-; CHECK-NEXT:    store i64 [[VAL2:%.*]], ptr [[OBJ_8_SROA_IDX]], align 8
-; CHECK-NEXT:    [[OBJ_12_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12
-; CHECK-NEXT:    store i64 131072, ptr [[OBJ_12_SROA_IDX]], align 4
+; CHECK-NEXT:    [[OBJ_8_PTR2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8
+; CHECK-NEXT:    store i64 [[VAL2:%.*]], ptr [[OBJ_8_PTR2_SROA_IDX]], align 8
+; CHECK-NEXT:    [[OBJ_12_PTR3_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12
+; CHECK-NEXT:    store i64 131072, ptr [[OBJ_12_PTR3_SROA_IDX]], align 4
 ; CHECK-NEXT:    [[OBJ_0_SROAVAL:%.*]] = load <4 x ptr>, ptr [[OBJ]], align 16
 ; CHECK-NEXT:    ret <4 x ptr> [[OBJ_0_SROAVAL]]
 ;
@@ -1371,11 +1395,11 @@ define <4 x ptr> @ptrLoadStoreTysPtr(ptr %init, i64 %val2) {
 ; DEBUG-NEXT:    store <4 x ptr> zeroinitializer, ptr [[OBJ]], align 16, !dbg [[DBG543:![0-9]+]]
 ; DEBUG-NEXT:    store ptr [[VAL0]], ptr [[OBJ]], align 16, !dbg [[DBG544:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META538:![0-9]+]], !DIExpression(), [[META545:![0-9]+]])
-; DEBUG-NEXT:    [[OBJ_8_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8, !dbg [[DBG546:![0-9]+]]
-; DEBUG-NEXT:    store i64 [[VAL2:%.*]], ptr [[OBJ_8_SROA_IDX]], align 8, !dbg [[DBG546]]
+; DEBUG-NEXT:    [[OBJ_8_PTR2_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 8, !dbg [[DBG546:![0-9]+]]
+; DEBUG-NEXT:    store i64 [[VAL2:%.*]], ptr [[OBJ_8_PTR2_SROA_IDX]], align 8, !dbg [[DBG546]]
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META539:![0-9]+]], !DIExpression(), [[META547:![0-9]+]])
-; DEBUG-NEXT:    [[OBJ_12_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12, !dbg [[DBG548:![0-9]+]]
-; DEBUG-NEXT:    store i64 131072, ptr [[OBJ_12_SROA_IDX]], align 4, !dbg [[DBG548]]
+; DEBUG-NEXT:    [[OBJ_12_PTR3_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr [[OBJ]], i64 12, !dbg [[DBG548:![0-9]+]]
+; DEBUG-NEXT:    store i64 131072, ptr [[OBJ_12_PTR3_SROA_IDX]], align 4, !dbg [[DBG548]]
 ; DEBUG-NEXT:    [[OBJ_0_SROAVAL:%.*]] = load <4 x ptr>, ptr [[OBJ]], align 16, !dbg [[DBG549:![0-9]+]]
 ; DEBUG-NEXT:      #dbg_value(<4 x ptr> [[OBJ_0_SROAVAL]], [[META540:![0-9]+]], !DIExpression(), [[DBG549]])
 ; DEBUG-NEXT:    ret <4 x ptr> [[OBJ_0_SROAVAL]], !dbg [[DBG550:![0-9]+]]
@@ -1405,6 +1429,7 @@ define <4 x i32> @validLoadStoreTy([2 x i64] %cond.coerce) {
 ; DEBUG-LABEL: @validLoadStoreTy(
 ; DEBUG-NEXT:  entry:
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META553:![0-9]+]], !DIExpression(), [[META557:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META553]], !DIExpression(), [[META557]])
 ; DEBUG-NEXT:      #dbg_value(ptr undef, [[META554:![0-9]+]], !DIExpression(), [[META558:![0-9]+]])
 ; DEBUG-NEXT:    [[COND_COERCE_FCA_0_EXTRACT:%.*]] = extractvalue [2 x i64] [[COND_COERCE:%.*]], 0, !dbg [[DBG559:![0-9]+]]
 ; DEBUG-NEXT:    [[COND_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x i64> undef, i64 [[COND_COERCE_FCA_0_EXTRACT]], i32 0, !dbg [[DBG559]]
@@ -1455,7 +1480,8 @@ define noundef zeroext i1 @CandidateTysRealloc() personality ptr null {
 ;
 ; DEBUG-LABEL: @CandidateTysRealloc(
 ; DEBUG-NEXT:  entry:
-; DEBUG-NEXT:      #dbg_value(ptr undef, [[META565:![0-9]+]], !DIExpression(), [[META570:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr poison, [[META565:![0-9]+]], !DIExpression(), [[META570:![0-9]+]])
+; DEBUG-NEXT:      #dbg_value(ptr undef, [[META565]], !DIExpression(), [[META570]])
 ; DEBUG-NEXT:    br label [[BB_1:%.*]], !dbg [[DBG571:![0-9]+]]
 ; DEBUG:       bb.1:
 ; DEBUG-NEXT:    br label [[BB_1]], !dbg [[DBG572:![0-9]+]]

>From d23c24f336674727d281258157fc5b15ce9040a4 Mon Sep 17 00:00:00 2001
From: Alexander Shaposhnikov <ashaposhnikov at google.com>
Date: Wed, 21 Aug 2024 18:08:31 -0700
Subject: [PATCH 070/116] [llvm][nsan] Skip function declarations (#105598)

Skip function declarations in the instrumentation pass.
---
 .../Transforms/Instrumentation/NumericalStabilitySanitizer.cpp | 3 ++-
 llvm/test/Instrumentation/NumericalStabilitySanitizer/basic.ll | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Instrumentation/NumericalStabilitySanitizer.cpp b/llvm/lib/Transforms/Instrumentation/NumericalStabilitySanitizer.cpp
index 5872396669435a..ffd9faff1d3a53 100644
--- a/llvm/lib/Transforms/Instrumentation/NumericalStabilitySanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/NumericalStabilitySanitizer.cpp
@@ -2038,7 +2038,8 @@ static void moveFastMathFlags(Function &F,
 
 bool NumericalStabilitySanitizer::sanitizeFunction(
     Function &F, const TargetLibraryInfo &TLI) {
-  if (!F.hasFnAttribute(Attribute::SanitizeNumericalStability))
+  if (!F.hasFnAttribute(Attribute::SanitizeNumericalStability) ||
+      F.isDeclaration())
     return false;
 
   // This is required to prevent instrumenting call to __nsan_init from within
diff --git a/llvm/test/Instrumentation/NumericalStabilitySanitizer/basic.ll b/llvm/test/Instrumentation/NumericalStabilitySanitizer/basic.ll
index 5da68320d91f90..2131162bf4bf3f 100644
--- a/llvm/test/Instrumentation/NumericalStabilitySanitizer/basic.ll
+++ b/llvm/test/Instrumentation/NumericalStabilitySanitizer/basic.ll
@@ -4,6 +4,8 @@
 
 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
 
+declare float @declaration_only(float %a) sanitize_numerical_stability
+
 ; Tests with simple control flow.
 
 @float_const = private unnamed_addr constant float 0.5

>From 2b66417d08d8e87f42cd154370ad1722ae7842c8 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Wed, 21 Aug 2024 20:08:39 -0500
Subject: [PATCH 071/116] [libc] Fix accidentally using system file on GPU

Summary:
Forgot to delete this
---
 libc/src/stdio/scanf_core/CMakeLists.txt | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libc/src/stdio/scanf_core/CMakeLists.txt b/libc/src/stdio/scanf_core/CMakeLists.txt
index 5c00ae0c9973c2..a8935d464417c2 100644
--- a/libc/src/stdio/scanf_core/CMakeLists.txt
+++ b/libc/src/stdio/scanf_core/CMakeLists.txt
@@ -105,8 +105,6 @@ if(LIBC_TARGET_OS_IS_GPU)
       libc.src.stdio.getc
       libc.src.stdio.ungetc
       libc.src.stdio.ferror
-    COMPILE_OPTIONS
-      -DLIBC_COPT_STDIO_USE_SYSTEM_FILE
   )
 elseif(TARGET libc.src.__support.File.file OR (NOT LLVM_LIBC_FULL_BUILD))
   add_header_library(

>From 8e0b9c85924ca22a65d57988ea2c5c22a5181ed9 Mon Sep 17 00:00:00 2001
From: John Harrison <harjohn at google.com>
Date: Wed, 21 Aug 2024 18:52:48 -0700
Subject: [PATCH 072/116] [lldb-dap] Skip the lldb-dap output test on windows,
 it seems all the lldb-dap tests are disabled on windows. (#105604)

This should fix https://lab.llvm.org/buildbot/#/builders/141/builds/1747
---
 lldb/test/API/tools/lldb-dap/output/TestDAP_output.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py b/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
index 0d40ce993dc31c..02c34ba10321bd 100644
--- a/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
+++ b/lldb/test/API/tools/lldb-dap/output/TestDAP_output.py
@@ -8,6 +8,7 @@
 
 
 class TestDAP_output(lldbdap_testcase.DAPTestCaseBase):
+    @skipIfWindows
     def test_output(self):
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(program)

>From 7854b16d2699ca7cc02d4ea066230d370c751ba9 Mon Sep 17 00:00:00 2001
From: vporpo <vporpodas at google.com>
Date: Wed, 21 Aug 2024 19:05:30 -0700
Subject: [PATCH 073/116] [SandboxIR] Implement FuncletPadInst, CatchPadInst
 and CleanupInst (#105294)

This patch implements sandboxir::FuncletPadInst,CatchInst,CleanupInst
mirroring their llvm:: counterparts.
---
 llvm/include/llvm/SandboxIR/SandboxIR.h       | 75 ++++++++++++++++
 .../llvm/SandboxIR/SandboxIRValues.def        |  2 +
 llvm/lib/SandboxIR/SandboxIR.cpp              | 83 ++++++++++++++++-
 llvm/unittests/SandboxIR/SandboxIRTest.cpp    | 90 +++++++++++++++++++
 llvm/unittests/SandboxIR/TrackerTest.cpp      | 51 +++++++++++
 5 files changed, 300 insertions(+), 1 deletion(-)

diff --git a/llvm/include/llvm/SandboxIR/SandboxIR.h b/llvm/include/llvm/SandboxIR/SandboxIR.h
index 278951113aed84..ed5b6f9c9da852 100644
--- a/llvm/include/llvm/SandboxIR/SandboxIR.h
+++ b/llvm/include/llvm/SandboxIR/SandboxIR.h
@@ -127,6 +127,9 @@ class CallBase;
 class CallInst;
 class InvokeInst;
 class CallBrInst;
+class FuncletPadInst;
+class CatchPadInst;
+class CleanupPadInst;
 class GetElementPtrInst;
 class CastInst;
 class PtrToIntInst;
@@ -256,6 +259,9 @@ class Value {
   friend class CallInst;              // For getting `Val`.
   friend class InvokeInst;            // For getting `Val`.
   friend class CallBrInst;            // For getting `Val`.
+  friend class FuncletPadInst;        // For getting `Val`.
+  friend class CatchPadInst;          // For getting `Val`.
+  friend class CleanupPadInst;        // For getting `Val`.
   friend class GetElementPtrInst;     // For getting `Val`.
   friend class CatchSwitchInst;       // For getting `Val`.
   friend class SwitchInst;            // For getting `Val`.
@@ -679,6 +685,8 @@ class Instruction : public sandboxir::User {
   friend class CallInst;           // For getTopmostLLVMInstruction().
   friend class InvokeInst;         // For getTopmostLLVMInstruction().
   friend class CallBrInst;         // For getTopmostLLVMInstruction().
+  friend class CatchPadInst;       // For getTopmostLLVMInstruction().
+  friend class CleanupPadInst;     // For getTopmostLLVMInstruction().
   friend class GetElementPtrInst;  // For getTopmostLLVMInstruction().
   friend class CatchSwitchInst;    // For getTopmostLLVMInstruction().
   friend class SwitchInst;         // For getTopmostLLVMInstruction().
@@ -845,6 +853,7 @@ template <typename LLVMT> class SingleLLVMInstructionImpl : public Instruction {
 #include "llvm/SandboxIR/SandboxIRValues.def"
   friend class UnaryInstruction;
   friend class CallBase;
+  friend class FuncletPadInst;
 
   Use getOperandUseInternal(unsigned OpIdx, bool Verify) const final {
     return getOperandUseDefault(OpIdx, Verify);
@@ -1843,6 +1852,68 @@ class CallBrInst final : public CallBase {
   }
 };
 
+class FuncletPadInst : public SingleLLVMInstructionImpl<llvm::FuncletPadInst> {
+  FuncletPadInst(ClassID SubclassID, Opcode Opc, llvm::Instruction *I,
+                 Context &Ctx)
+      : SingleLLVMInstructionImpl(SubclassID, Opc, I, Ctx) {}
+  friend class CatchPadInst;   // For constructor.
+  friend class CleanupPadInst; // For constructor.
+
+public:
+  /// Return the number of funcletpad arguments.
+  unsigned arg_size() const {
+    return cast<llvm::FuncletPadInst>(Val)->arg_size();
+  }
+  /// Return the outer EH-pad this funclet is nested within.
+  ///
+  /// Note: This returns the associated CatchSwitchInst if this FuncletPadInst
+  /// is a CatchPadInst.
+  Value *getParentPad() const;
+  void setParentPad(Value *ParentPad);
+  /// Return the Idx-th funcletpad argument.
+  Value *getArgOperand(unsigned Idx) const;
+  /// Set the Idx-th funcletpad argument.
+  void setArgOperand(unsigned Idx, Value *V);
+
+  // TODO: Implement missing functions: arg_operands().
+  static bool classof(const Value *From) {
+    return From->getSubclassID() == ClassID::CatchPad ||
+           From->getSubclassID() == ClassID::CleanupPad;
+  }
+};
+
+class CatchPadInst : public FuncletPadInst {
+  CatchPadInst(llvm::CatchPadInst *CPI, Context &Ctx)
+      : FuncletPadInst(ClassID::CatchPad, Opcode::CatchPad, CPI, Ctx) {}
+  friend class Context; // For constructor.
+
+public:
+  CatchSwitchInst *getCatchSwitch() const;
+  // TODO: We have not implemented setCatchSwitch() because we can't revert it
+  // for now, as there is no CatchPadInst member function that can undo it.
+
+  static CatchPadInst *create(Value *ParentPad, ArrayRef<Value *> Args,
+                              BBIterator WhereIt, BasicBlock *WhereBB,
+                              Context &Ctx, const Twine &Name = "");
+  static bool classof(const Value *From) {
+    return From->getSubclassID() == ClassID::CatchPad;
+  }
+};
+
+class CleanupPadInst : public FuncletPadInst {
+  CleanupPadInst(llvm::CleanupPadInst *CPI, Context &Ctx)
+      : FuncletPadInst(ClassID::CleanupPad, Opcode::CleanupPad, CPI, Ctx) {}
+  friend class Context; // For constructor.
+
+public:
+  static CleanupPadInst *create(Value *ParentPad, ArrayRef<Value *> Args,
+                                BBIterator WhereIt, BasicBlock *WhereBB,
+                                Context &Ctx, const Twine &Name = "");
+  static bool classof(const Value *From) {
+    return From->getSubclassID() == ClassID::CleanupPad;
+  }
+};
+
 class GetElementPtrInst final
     : public SingleLLVMInstructionImpl<llvm::GetElementPtrInst> {
   /// Use Context::createGetElementPtrInst(). Don't call
@@ -2745,6 +2816,10 @@ class Context {
   friend InvokeInst; // For createInvokeInst()
   CallBrInst *createCallBrInst(llvm::CallBrInst *I);
   friend CallBrInst; // For createCallBrInst()
+  CatchPadInst *createCatchPadInst(llvm::CatchPadInst *I);
+  friend CatchPadInst; // For createCatchPadInst()
+  CleanupPadInst *createCleanupPadInst(llvm::CleanupPadInst *I);
+  friend CleanupPadInst; // For createCleanupPadInst()
   GetElementPtrInst *createGetElementPtrInst(llvm::GetElementPtrInst *I);
   friend GetElementPtrInst; // For createGetElementPtrInst()
   CatchSwitchInst *createCatchSwitchInst(llvm::CatchSwitchInst *I);
diff --git a/llvm/include/llvm/SandboxIR/SandboxIRValues.def b/llvm/include/llvm/SandboxIR/SandboxIRValues.def
index 56720f564a7cae..a75f872bc88acb 100644
--- a/llvm/include/llvm/SandboxIR/SandboxIRValues.def
+++ b/llvm/include/llvm/SandboxIR/SandboxIRValues.def
@@ -46,6 +46,8 @@ DEF_INSTR(Ret,            OP(Ret),            ReturnInst)
 DEF_INSTR(Call,           OP(Call),           CallInst)
 DEF_INSTR(Invoke,         OP(Invoke),         InvokeInst)
 DEF_INSTR(CallBr,         OP(CallBr),         CallBrInst)
+DEF_INSTR(CatchPad,       OP(CatchPad),       CatchPadInst)
+DEF_INSTR(CleanupPad,     OP(CleanupPad),     CleanupPadInst)
 DEF_INSTR(GetElementPtr,  OP(GetElementPtr),  GetElementPtrInst)
 DEF_INSTR(CatchSwitch,    OP(CatchSwitch),    CatchSwitchInst)
 DEF_INSTR(Switch,         OP(Switch),         SwitchInst)
diff --git a/llvm/lib/SandboxIR/SandboxIR.cpp b/llvm/lib/SandboxIR/SandboxIR.cpp
index 92054e7cab86ee..1ff82a968a717f 100644
--- a/llvm/lib/SandboxIR/SandboxIR.cpp
+++ b/llvm/lib/SandboxIR/SandboxIR.cpp
@@ -1043,6 +1043,68 @@ BasicBlock *CallBrInst::getSuccessor(unsigned Idx) const {
       Ctx.getValue(cast<llvm::CallBrInst>(Val)->getSuccessor(Idx)));
 }
 
+Value *FuncletPadInst::getParentPad() const {
+  return Ctx.getValue(cast<llvm::FuncletPadInst>(Val)->getParentPad());
+}
+
+void FuncletPadInst::setParentPad(Value *ParentPad) {
+  Ctx.getTracker()
+      .emplaceIfTracking<GenericSetter<&FuncletPadInst::getParentPad,
+                                       &FuncletPadInst::setParentPad>>(this);
+  cast<llvm::FuncletPadInst>(Val)->setParentPad(ParentPad->Val);
+}
+
+Value *FuncletPadInst::getArgOperand(unsigned Idx) const {
+  return Ctx.getValue(cast<llvm::FuncletPadInst>(Val)->getArgOperand(Idx));
+}
+
+void FuncletPadInst::setArgOperand(unsigned Idx, Value *V) {
+  Ctx.getTracker()
+      .emplaceIfTracking<GenericSetterWithIdx<&FuncletPadInst::getArgOperand,
+                                              &FuncletPadInst::setArgOperand>>(
+          this, Idx);
+  cast<llvm::FuncletPadInst>(Val)->setArgOperand(Idx, V->Val);
+}
+
+CatchSwitchInst *CatchPadInst::getCatchSwitch() const {
+  return cast<CatchSwitchInst>(
+      Ctx.getValue(cast<llvm::CatchPadInst>(Val)->getCatchSwitch()));
+}
+
+CatchPadInst *CatchPadInst::create(Value *ParentPad, ArrayRef<Value *> Args,
+                                   BBIterator WhereIt, BasicBlock *WhereBB,
+                                   Context &Ctx, const Twine &Name) {
+  auto &Builder = Ctx.getLLVMIRBuilder();
+  if (WhereIt != WhereBB->end())
+    Builder.SetInsertPoint((*WhereIt).getTopmostLLVMInstruction());
+  else
+    Builder.SetInsertPoint(cast<llvm::BasicBlock>(WhereBB->Val));
+  SmallVector<llvm::Value *> LLVMArgs;
+  LLVMArgs.reserve(Args.size());
+  for (auto *Arg : Args)
+    LLVMArgs.push_back(Arg->Val);
+  llvm::CatchPadInst *LLVMI =
+      Builder.CreateCatchPad(ParentPad->Val, LLVMArgs, Name);
+  return Ctx.createCatchPadInst(LLVMI);
+}
+
+CleanupPadInst *CleanupPadInst::create(Value *ParentPad, ArrayRef<Value *> Args,
+                                       BBIterator WhereIt, BasicBlock *WhereBB,
+                                       Context &Ctx, const Twine &Name) {
+  auto &Builder = Ctx.getLLVMIRBuilder();
+  if (WhereIt != WhereBB->end())
+    Builder.SetInsertPoint((*WhereIt).getTopmostLLVMInstruction());
+  else
+    Builder.SetInsertPoint(cast<llvm::BasicBlock>(WhereBB->Val));
+  SmallVector<llvm::Value *> LLVMArgs;
+  LLVMArgs.reserve(Args.size());
+  for (auto *Arg : Args)
+    LLVMArgs.push_back(Arg->Val);
+  llvm::CleanupPadInst *LLVMI =
+      Builder.CreateCleanupPad(ParentPad->Val, LLVMArgs, Name);
+  return Ctx.createCleanupPadInst(LLVMI);
+}
+
 Value *GetElementPtrInst::create(Type *Ty, Value *Ptr,
                                  ArrayRef<Value *> IdxList,
                                  BasicBlock::iterator WhereIt,
@@ -2064,6 +2126,18 @@ Value *Context::getOrCreateValueInternal(llvm::Value *LLVMV, llvm::User *U) {
     It->second = std::unique_ptr<CallBrInst>(new CallBrInst(LLVMCallBr, *this));
     return It->second.get();
   }
+  case llvm::Instruction::CatchPad: {
+    auto *LLVMCPI = cast<llvm::CatchPadInst>(LLVMV);
+    It->second =
+        std::unique_ptr<CatchPadInst>(new CatchPadInst(LLVMCPI, *this));
+    return It->second.get();
+  }
+  case llvm::Instruction::CleanupPad: {
+    auto *LLVMCPI = cast<llvm::CleanupPadInst>(LLVMV);
+    It->second =
+        std::unique_ptr<CleanupPadInst>(new CleanupPadInst(LLVMCPI, *this));
+    return It->second.get();
+  }
   case llvm::Instruction::GetElementPtr: {
     auto *LLVMGEP = cast<llvm::GetElementPtrInst>(LLVMV);
     It->second = std::unique_ptr<GetElementPtrInst>(
@@ -2240,7 +2314,14 @@ UnreachableInst *Context::createUnreachableInst(llvm::UnreachableInst *UI) {
       std::unique_ptr<UnreachableInst>(new UnreachableInst(UI, *this));
   return cast<UnreachableInst>(registerValue(std::move(NewPtr)));
 }
-
+CatchPadInst *Context::createCatchPadInst(llvm::CatchPadInst *I) {
+  auto NewPtr = std::unique_ptr<CatchPadInst>(new CatchPadInst(I, *this));
+  return cast<CatchPadInst>(registerValue(std::move(NewPtr)));
+}
+CleanupPadInst *Context::createCleanupPadInst(llvm::CleanupPadInst *I) {
+  auto NewPtr = std::unique_ptr<CleanupPadInst>(new CleanupPadInst(I, *this));
+  return cast<CleanupPadInst>(registerValue(std::move(NewPtr)));
+}
 GetElementPtrInst *
 Context::createGetElementPtrInst(llvm::GetElementPtrInst *I) {
   auto NewPtr =
diff --git a/llvm/unittests/SandboxIR/SandboxIRTest.cpp b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
index b6981027b4c040..28894397a60d6f 100644
--- a/llvm/unittests/SandboxIR/SandboxIRTest.cpp
+++ b/llvm/unittests/SandboxIR/SandboxIRTest.cpp
@@ -1867,6 +1867,96 @@ define void @foo(i8 %arg) {
   }
 }
 
+TEST_F(SandboxIRTest, FuncletPadInst_CatchPadInst_CleanupPadInst) {
+  parseIR(C, R"IR(
+define void @foo() {
+dispatch:
+  %cs = catchswitch within none [label %handler0] unwind to caller
+handler0:
+  %catchpad = catchpad within %cs [ptr @foo]
+  ret void
+handler1:
+  %cleanuppad = cleanuppad within %cs [ptr @foo]
+  ret void
+bb:
+  ret void
+}
+)IR");
+  Function &LLVMF = *M->getFunction("foo");
+  BasicBlock *LLVMDispatch = getBasicBlockByName(LLVMF, "dispatch");
+  BasicBlock *LLVMHandler0 = getBasicBlockByName(LLVMF, "handler0");
+  BasicBlock *LLVMHandler1 = getBasicBlockByName(LLVMF, "handler1");
+  auto *LLVMCP = cast<llvm::CatchPadInst>(&*LLVMHandler0->begin());
+  auto *LLVMCLP = cast<llvm::CleanupPadInst>(&*LLVMHandler1->begin());
+
+  sandboxir::Context Ctx(C);
+  [[maybe_unused]] auto &F = *Ctx.createFunction(&LLVMF);
+  auto *Dispatch = cast<sandboxir::BasicBlock>(Ctx.getValue(LLVMDispatch));
+  auto *Handler0 = cast<sandboxir::BasicBlock>(Ctx.getValue(LLVMHandler0));
+  auto *Handler1 = cast<sandboxir::BasicBlock>(Ctx.getValue(LLVMHandler1));
+  auto *BB = cast<sandboxir::BasicBlock>(
+      Ctx.getValue(getBasicBlockByName(LLVMF, "bb")));
+  auto *BBRet = cast<sandboxir::ReturnInst>(&*BB->begin());
+  auto *CS = cast<sandboxir::CatchSwitchInst>(&*Dispatch->begin());
+  [[maybe_unused]] auto *CP =
+      cast<sandboxir::CatchPadInst>(&*Handler0->begin());
+  [[maybe_unused]] auto *CLP =
+      cast<sandboxir::CleanupPadInst>(&*Handler1->begin());
+
+  // Check getCatchSwitch().
+  EXPECT_EQ(CP->getCatchSwitch(), CS);
+  EXPECT_EQ(CP->getCatchSwitch(), Ctx.getValue(LLVMCP->getCatchSwitch()));
+
+  for (llvm::FuncletPadInst *LLVMFPI :
+       {static_cast<llvm::FuncletPadInst *>(LLVMCP),
+        static_cast<llvm::FuncletPadInst *>(LLVMCLP)}) {
+    auto *FPI = cast<sandboxir::FuncletPadInst>(Ctx.getValue(LLVMFPI));
+    // Check arg_size().
+    EXPECT_EQ(FPI->arg_size(), LLVMFPI->arg_size());
+    // Check getParentPad().
+    EXPECT_EQ(FPI->getParentPad(), Ctx.getValue(LLVMFPI->getParentPad()));
+    // Check setParentPad().
+    auto *OrigParentPad = FPI->getParentPad();
+    auto *NewParentPad = Dispatch;
+    EXPECT_NE(NewParentPad, OrigParentPad);
+    FPI->setParentPad(NewParentPad);
+    EXPECT_EQ(FPI->getParentPad(), NewParentPad);
+    FPI->setParentPad(OrigParentPad);
+    EXPECT_EQ(FPI->getParentPad(), OrigParentPad);
+    // Check getArgOperand().
+    for (auto Idx : seq<unsigned>(0, FPI->arg_size()))
+      EXPECT_EQ(FPI->getArgOperand(Idx),
+                Ctx.getValue(LLVMFPI->getArgOperand(Idx)));
+    // Check setArgOperand().
+    auto *OrigArgOperand = FPI->getArgOperand(0);
+    auto *NewArgOperand = Dispatch;
+    EXPECT_NE(NewArgOperand, OrigArgOperand);
+    FPI->setArgOperand(0, NewArgOperand);
+    EXPECT_EQ(FPI->getArgOperand(0), NewArgOperand);
+    FPI->setArgOperand(0, OrigArgOperand);
+    EXPECT_EQ(FPI->getArgOperand(0), OrigArgOperand);
+  }
+  // Check CatchPadInst::create().
+  auto *NewCPI = cast<sandboxir::CatchPadInst>(sandboxir::CatchPadInst::create(
+      CS, {}, BBRet->getIterator(), BB, Ctx, "NewCPI"));
+  EXPECT_EQ(NewCPI->getCatchSwitch(), CS);
+  EXPECT_EQ(NewCPI->arg_size(), 0u);
+  EXPECT_EQ(NewCPI->getNextNode(), BBRet);
+#ifndef NDEBUG
+  EXPECT_EQ(NewCPI->getName(), "NewCPI");
+#endif // NDEBUG
+  // Check CleanupPadInst::create().
+  auto *NewCLPI =
+      cast<sandboxir::CleanupPadInst>(sandboxir::CleanupPadInst::create(
+          CS, {}, BBRet->getIterator(), BB, Ctx, "NewCLPI"));
+  EXPECT_EQ(NewCLPI->getParentPad(), CS);
+  EXPECT_EQ(NewCLPI->arg_size(), 0u);
+  EXPECT_EQ(NewCLPI->getNextNode(), BBRet);
+#ifndef NDEBUG
+  EXPECT_EQ(NewCLPI->getName(), "NewCLPI");
+#endif // NDEBUG
+}
+
 TEST_F(SandboxIRTest, GetElementPtrInstruction) {
   parseIR(C, R"IR(
 define void @foo(ptr %ptr, <2 x ptr> %ptrs) {
diff --git a/llvm/unittests/SandboxIR/TrackerTest.cpp b/llvm/unittests/SandboxIR/TrackerTest.cpp
index a2c3080011f162..c2faf60a57f3b8 100644
--- a/llvm/unittests/SandboxIR/TrackerTest.cpp
+++ b/llvm/unittests/SandboxIR/TrackerTest.cpp
@@ -1033,6 +1033,57 @@ define void @foo(i8 %arg) {
   EXPECT_EQ(CallBr->getIndirectDest(0), OrigIndirectDest);
 }
 
+TEST_F(TrackerTest, FuncletPadInstSetters) {
+  parseIR(C, R"IR(
+define void @foo() {
+dispatch:
+  %cs = catchswitch within none [label %handler0] unwind to caller
+handler0:
+  %catchpad = catchpad within %cs [ptr @foo]
+  ret void
+handler1:
+  %cleanuppad = cleanuppad within %cs [ptr @foo]
+  ret void
+bb:
+  ret void
+}
+)IR");
+  Function &LLVMF = *M->getFunction("foo");
+  sandboxir::Context Ctx(C);
+  [[maybe_unused]] auto &F = *Ctx.createFunction(&LLVMF);
+  auto *Dispatch = cast<sandboxir::BasicBlock>(
+      Ctx.getValue(getBasicBlockByName(LLVMF, "dispatch")));
+  auto *Handler0 = cast<sandboxir::BasicBlock>(
+      Ctx.getValue(getBasicBlockByName(LLVMF, "handler0")));
+  auto *Handler1 = cast<sandboxir::BasicBlock>(
+      Ctx.getValue(getBasicBlockByName(LLVMF, "handler1")));
+  auto *CP = cast<sandboxir::CatchPadInst>(&*Handler0->begin());
+  auto *CLP = cast<sandboxir::CleanupPadInst>(&*Handler1->begin());
+
+  for (auto *FPI : {static_cast<sandboxir::FuncletPadInst *>(CP),
+                    static_cast<sandboxir::FuncletPadInst *>(CLP)}) {
+    // Check setParentPad().
+    auto *OrigParentPad = FPI->getParentPad();
+    auto *NewParentPad = Dispatch;
+    EXPECT_NE(NewParentPad, OrigParentPad);
+    Ctx.save();
+    FPI->setParentPad(NewParentPad);
+    EXPECT_EQ(FPI->getParentPad(), NewParentPad);
+    Ctx.revert();
+    EXPECT_EQ(FPI->getParentPad(), OrigParentPad);
+
+    // Check setArgOperand().
+    auto *OrigArgOperand = FPI->getArgOperand(0);
+    auto *NewArgOperand = Dispatch;
+    EXPECT_NE(NewArgOperand, OrigArgOperand);
+    Ctx.save();
+    FPI->setArgOperand(0, NewArgOperand);
+    EXPECT_EQ(FPI->getArgOperand(0), NewArgOperand);
+    Ctx.revert();
+    EXPECT_EQ(FPI->getArgOperand(0), OrigArgOperand);
+  }
+}
+
 TEST_F(TrackerTest, PHINodeSetters) {
   parseIR(C, R"IR(
 define void @foo(i8 %arg0, i8 %arg1, i8 %arg2) {

>From 0ca77f6656a772624a591261957f6b313a0d544e Mon Sep 17 00:00:00 2001
From: Craig Topper <craig.topper at sifive.com>
Date: Wed, 21 Aug 2024 19:23:07 -0700
Subject: [PATCH 074/116] [RISCV] Add CSRs and an instruction for Smctr and
 Ssctr extensions. (#105148)

https://github.com/riscv/riscv-control-transfer-records/releases/tag/v1.0_rc3
---
 .../Driver/print-supported-extensions-riscv.c |  2 +
 .../test/Preprocessor/riscv-target-features.c | 18 ++++++++
 llvm/docs/RISCVUsage.rst                      |  3 ++
 llvm/docs/ReleaseNotes.rst                    |  1 +
 llvm/lib/Target/RISCV/RISCVFeatures.td        | 13 ++++++
 llvm/lib/Target/RISCV/RISCVInstrInfo.td       |  8 ++++
 llvm/lib/Target/RISCV/RISCVSystemOperands.td  |  9 ++++
 llvm/test/CodeGen/RISCV/attributes.ll         |  8 ++++
 llvm/test/MC/RISCV/attribute-arch.s           |  6 +++
 llvm/test/MC/RISCV/hypervisor-csr-names.s     | 17 ++++++++
 llvm/test/MC/RISCV/machine-csr-names.s        | 17 ++++++++
 llvm/test/MC/RISCV/smctr-ssctr-valid.s        | 30 +++++++++++++
 llvm/test/MC/RISCV/supervisor-csr-names.s     | 43 +++++++++++++++++++
 .../TargetParser/RISCVISAInfoTest.cpp         |  2 +
 14 files changed, 177 insertions(+)
 create mode 100644 llvm/test/MC/RISCV/smctr-ssctr-valid.s

diff --git a/clang/test/Driver/print-supported-extensions-riscv.c b/clang/test/Driver/print-supported-extensions-riscv.c
index 9497d01a832604..312c462f715d5e 100644
--- a/clang/test/Driver/print-supported-extensions-riscv.c
+++ b/clang/test/Driver/print-supported-extensions-riscv.c
@@ -175,8 +175,10 @@
 // CHECK-NEXT:     zalasr               0.1       'Zalasr' (Load-Acquire and Store-Release Instructions)
 // CHECK-NEXT:     zvbc32e              0.7       'Zvbc32e' (Vector Carryless Multiplication with 32-bits elements)
 // CHECK-NEXT:     zvkgs                0.7       'Zvkgs' (Vector-Scalar GCM instructions for Cryptography)
+// CHECK-NEXT:     smctr                1.0       'Smctr' (Control Transfer Records Machine Level)
 // CHECK-NEXT:     smmpm                1.0       'Smmpm' (Machine-level Pointer Masking for M-mode)
 // CHECK-NEXT:     smnpm                1.0       'Smnpm' (Machine-level Pointer Masking for next lower privilege mode)
+// CHECK-NEXT:     ssctr                1.0       'Ssctr' (Control Transfer Records Supervisor Level)
 // CHECK-NEXT:     ssnpm                1.0       'Ssnpm' (Supervisor-level Pointer Masking for next lower privilege mode)
 // CHECK-NEXT:     sspm                 1.0       'Sspm' (Indicates Supervisor-mode Pointer Masking)
 // CHECK-NEXT:     supm                 1.0       'Supm' (Indicates User-mode Pointer Masking)
diff --git a/clang/test/Preprocessor/riscv-target-features.c b/clang/test/Preprocessor/riscv-target-features.c
index 5bb6c10f85f1a7..60675065495bba 100644
--- a/clang/test/Preprocessor/riscv-target-features.c
+++ b/clang/test/Preprocessor/riscv-target-features.c
@@ -176,8 +176,10 @@
 
 // Experimental extensions
 
+// CHECK-NOT: __riscv_smctr{{.*$}}
 // CHECK-NOT: __riscv_smmpm{{.*$}}
 // CHECK-NOT: __riscv_smnpm{{.*$}}
+// CHECK-NOT: __riscv_ssctr{{.*$}}
 // CHECK-NOT: __riscv_ssnpm{{.*$}}
 // CHECK-NOT: __riscv_sspm{{.*$}}
 // CHECK-NOT: __riscv_supm{{.*$}}
@@ -1748,6 +1750,22 @@
 // RUN:   -o - | FileCheck --check-prefix=CHECK-SUPM-EXT %s
 // CHECK-SUPM-EXT: __riscv_supm 1000000{{$}}
 
+// RUN: %clang --target=riscv32 -menable-experimental-extensions \
+// RUN:   -march=rv32i_smctr1p0 -E -dM %s \
+// RUN:   -o - | FileCheck --check-prefix=CHECK-SMCTR-EXT %s
+// RUN: %clang --target=riscv64 -menable-experimental-extensions \
+// RUN:   -march=rv64i_smctr1p0 -E -dM %s \
+// RUN:   -o - | FileCheck --check-prefix=CHECK-SMCTR-EXT %s
+// CHECK-SMCTR-EXT: __riscv_smctr 1000000{{$}}
+
+// RUN: %clang --target=riscv32 -menable-experimental-extensions \
+// RUN:   -march=rv32i_ssctr1p0 -E -dM %s \
+// RUN:   -o - | FileCheck --check-prefix=CHECK-SSCTR-EXT %s
+// RUN: %clang --target=riscv64 -menable-experimental-extensions \
+// RUN:   -march=rv64i_ssctr1p0 -E -dM %s \
+// RUN:   -o - | FileCheck --check-prefix=CHECK-SSCTR-EXT %s
+// CHECK-SSCTR-EXT: __riscv_ssctr 1000000{{$}}
+
 // Misaligned
 
 // RUN: %clang --target=riscv32-unknown-linux-gnu -march=rv32i -E -dM %s \
diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index 4e50f55e4cb60b..8846b82fcaea59 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -303,6 +303,9 @@ The primary goal of experimental support is to assist in the process of ratifica
 ``experimental-zvbc32e``, ``experimental-zvkgs``
   LLVM implements the `0.7 release specification <https://github.com/user-attachments/files/16450464/riscv-crypto-spec-vector-extra_v0.0.7.pdf>`__.
 
+``experimental-smctr``, ``experimental-ssctr``
+  LLVM implements the `1.0-rc3 specification <https://github.com/riscv/riscv-control-transfer-records/releases/tag/v1.0_rc3>`__.
+
 To use an experimental extension from `clang`, you must add `-menable-experimental-extensions` to the command line, and specify the exact version of the experimental extension you are using.  To use an experimental extension with LLVM's internal developer tools (e.g. `llc`, `llvm-objdump`, `llvm-mc`), you must prefix the extension name with `experimental-`.  Note that you don't need to specify the version with internal tools, and shouldn't include the `experimental-` prefix with `clang`.
 
 Vendor Extensions
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 65fa21e517940b..c9eb5eea896905 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -114,6 +114,7 @@ Changes to the RISC-V Backend
   means Zve32x and Zve32f will also require Zvl64b. The prior support was
   largely untested.
 * The ``Zvbc32e`` and ``Zvkgs`` extensions are now supported experimentally.
+* Added ``Smctr`` and ``Ssctr`` extensions.
 
 Changes to the WebAssembly Backend
 ----------------------------------
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index d448f9301f3ae8..fa141c31f94dbd 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1054,6 +1054,19 @@ def FeatureStdExtSupm
     : RISCVExperimentalExtension<"supm", 1, 0,
                                  "'Supm' (Indicates User-mode Pointer Masking)">;
 
+def FeatureStdExtSmctr
+    : RISCVExperimentalExtension<"smctr", 1, 0,
+                                 "'Smctr' (Control Transfer Records Machine Level)",
+                                 [FeatureStdExtSscsrind]>;
+def FeatureStdExtSsctr
+    : RISCVExperimentalExtension<"ssctr" ,1, 0,
+                                 "'Ssctr' (Control Transfer Records Supervisor Level)",
+                                 [FeatureStdExtSscsrind]>;
+def HasStdExtSmctrOrSsctr : Predicate<"Subtarget->hasStdExtSmctrOrSsctr()">,
+                            AssemblerPredicate<(any_of FeatureStdExtSmctr, FeatureStdExtSsctr),
+                               "'Smctr' (Control Transfer Records Machine Level) or "
+                               "'Ssctr' (Control Transfer Records Supervisor Level)">;
+
 //===----------------------------------------------------------------------===//
 // Vendor extensions
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 74406bf4b10471..6d0952a42eda9f 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -839,6 +839,14 @@ def HLV_D   : HLoad_r<0b0110110, 0b00000, "hlv.d">, Sched<[]>;
 def HSV_D   : HStore_rr<0b0110111, "hsv.d">, Sched<[]>;
 }
 
+let Predicates = [HasStdExtSmctrOrSsctr] in {
+def SCTRCLR : Priv<"sctrclr", 0b0001000>, Sched<[]> {
+  let rd = 0;
+  let rs1 = 0;
+  let rs2 = 0b00100;
+}
+}
+
 //===----------------------------------------------------------------------===//
 // Debug instructions
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/RISCV/RISCVSystemOperands.td b/llvm/lib/Target/RISCV/RISCVSystemOperands.td
index a836227e18957c..d85b4a9cf77b33 100644
--- a/llvm/lib/Target/RISCV/RISCVSystemOperands.td
+++ b/llvm/lib/Target/RISCV/RISCVSystemOperands.td
@@ -455,3 +455,12 @@ def : SysReg<"mnscratch", 0x740>;
 def : SysReg<"mnepc", 0x741>;
 def : SysReg<"mncause", 0x742>;
 def : SysReg<"mnstatus", 0x744>;
+
+//===-----------------------------------------------
+// Control Transfer Records CSRs
+//===-----------------------------------------------
+def : SysReg<"sctrctl", 0x14e>;
+def : SysReg<"sctrstatus", 0x14f>;
+def : SysReg<"sctrdepth", 0x15f>;
+def : SysReg<"vsctrctl", 0x24e>;
+def : SysReg<"mctrctl", 0x34e>;
diff --git a/llvm/test/CodeGen/RISCV/attributes.ll b/llvm/test/CodeGen/RISCV/attributes.ll
index 2a02327cd3c7b0..1d4a634c89a22f 100644
--- a/llvm/test/CodeGen/RISCV/attributes.ll
+++ b/llvm/test/CodeGen/RISCV/attributes.ll
@@ -133,6 +133,8 @@
 ; RUN: llc -mtriple=riscv32 -mattr=+experimental-smmpm %s -o - | FileCheck --check-prefix=RV32SMMPM %s
 ; RUN: llc -mtriple=riscv32 -mattr=+experimental-sspm %s -o - | FileCheck --check-prefix=RV32SSPM %s
 ; RUN: llc -mtriple=riscv32 -mattr=+experimental-supm %s -o - | FileCheck --check-prefix=RV32SUPM %s
+; RUN: llc -mtriple=riscv32 -mattr=+experimental-smctr  %s -o - | FileCheck --check-prefix=RV32SMCTR %s
+; RUN: llc -mtriple=riscv32 -mattr=+experimental-ssctr  %s -o - | FileCheck --check-prefix=RV32SSCTR %s
 
 ; RUN: llc -mtriple=riscv64 %s -o - | FileCheck %s
 ; RUN: llc -mtriple=riscv64 -mattr=+m %s -o - | FileCheck --check-prefixes=CHECK,RV64M %s
@@ -273,6 +275,8 @@
 ; RUN: llc -mtriple=riscv64 -mattr=+experimental-smmpm %s -o - | FileCheck --check-prefix=RV64SMMPM %s
 ; RUN: llc -mtriple=riscv64 -mattr=+experimental-sspm %s -o - | FileCheck --check-prefix=RV64SSPM %s
 ; RUN: llc -mtriple=riscv64 -mattr=+experimental-supm %s -o - | FileCheck --check-prefix=RV64SUPM %s
+; RUN: llc -mtriple=riscv64 -mattr=+experimental-smctr  %s -o - | FileCheck --check-prefix=RV64SMCTR %s
+; RUN: llc -mtriple=riscv64 -mattr=+experimental-ssctr  %s -o - | FileCheck --check-prefix=RV64SSCTR %s
 
 ; Tests for profile features.
 ; RUN: llc -mtriple=riscv32 -mattr=+rvi20u32 %s -o - | FileCheck --check-prefix=RVI20U32 %s
@@ -421,6 +425,8 @@
 ; RV32SMMPM: .attribute 5, "rv32i2p1_smmpm1p0"
 ; RV32SSPM: .attribute 5, "rv32i2p1_sspm1p0"
 ; RV32SUPM: .attribute 5, "rv32i2p1_supm1p0"
+; RV32SMCTR: .attribute 5, "rv32i2p1_smctr1p0_sscsrind1p0"
+; RV32SSCTR: .attribute 5, "rv32i2p1_sscsrind1p0_ssctr1p0"
 
 ; RV64M: .attribute 5, "rv64i2p1_m2p0_zmmul1p0"
 ; RV64ZMMUL: .attribute 5, "rv64i2p1_zmmul1p0"
@@ -559,6 +565,8 @@
 ; RV64SMMPM: .attribute 5, "rv64i2p1_smmpm1p0"
 ; RV64SSPM: .attribute 5, "rv64i2p1_sspm1p0"
 ; RV64SUPM: .attribute 5, "rv64i2p1_supm1p0"
+; RV64SMCTR: .attribute 5, "rv64i2p1_smctr1p0_sscsrind1p0"
+; RV64SSCTR: .attribute 5, "rv64i2p1_sscsrind1p0_ssctr1p0"
 
 ; RVI20U32: .attribute 5, "rv32i2p1"
 ; RVI20U64: .attribute 5, "rv64i2p1"
diff --git a/llvm/test/MC/RISCV/attribute-arch.s b/llvm/test/MC/RISCV/attribute-arch.s
index 0ba15cfd489cb1..1c0b2a59d0693f 100644
--- a/llvm/test/MC/RISCV/attribute-arch.s
+++ b/llvm/test/MC/RISCV/attribute-arch.s
@@ -446,3 +446,9 @@
 
 .attribute arch, "rv64i_supm1p0"
 # CHECK: attribute      5, "rv64i2p1_supm1p0"
+
+.attribute arch, "rv32i_smctr1p0"
+# CHECK: attribute      5, "rv32i2p1_smctr1p0_sscsrind1p0"
+
+.attribute arch, "rv32i_ssctr1p0"
+# CHECK: attribute      5, "rv32i2p1_sscsrind1p0_ssctr1p0"
diff --git a/llvm/test/MC/RISCV/hypervisor-csr-names.s b/llvm/test/MC/RISCV/hypervisor-csr-names.s
index 950570c74746a9..2f29e5dacbeb95 100644
--- a/llvm/test/MC/RISCV/hypervisor-csr-names.s
+++ b/llvm/test/MC/RISCV/hypervisor-csr-names.s
@@ -633,3 +633,20 @@ csrrs t2, 0x25C, zero
 csrrs t1, vstopi, zero
 # uimm12
 csrrs t2, 0xEB0, zero
+
+##################################
+# Control Transfer Records
+##################################
+
+# vsctrctl
+# name
+# CHECK-INST: csrrs t1, vsctrctl, zero
+# CHECK-ENC: encoding: [0x73,0x23,0xe0,0x24]
+# CHECK-INST-ALIAS: csrr t1, vsctrctl
+# uimm12
+# CHECK-INST: csrrs t2, vsctrctl, zero
+# CHECK-ENC: encoding: [0xf3,0x23,0xe0,0x24]
+# CHECK-INST-ALIAS: csrr t2, vsctrctl
+csrrs t1, vsctrctl, zero
+# uimm12
+csrrs t2, 0x24E, zero
diff --git a/llvm/test/MC/RISCV/machine-csr-names.s b/llvm/test/MC/RISCV/machine-csr-names.s
index 5f668aea00485d..ae1af1fc8abc35 100644
--- a/llvm/test/MC/RISCV/machine-csr-names.s
+++ b/llvm/test/MC/RISCV/machine-csr-names.s
@@ -2568,3 +2568,20 @@ csrrs t2, 0x308, zero
 csrrs t1, mvip, zero
 # uimm12
 csrrs t2, 0x309, zero
+
+##################################
+# Control Transfer Records
+##################################
+
+# mctrctl
+# name
+# CHECK-INST: csrrs t1, mctrctl, zero
+# CHECK-ENC: encoding: [0x73,0x23,0xe0,0x34]
+# CHECK-INST-ALIAS: csrr t1, mctrctl
+# uimm12
+# CHECK-INST: csrrs t2, mctrctl, zero
+# CHECK-ENC: encoding: [0xf3,0x23,0xe0,0x34]
+# CHECK-INST-ALIAS: csrr t2, mctrctl
+csrrs t1, mctrctl, zero
+# uimm12
+csrrs t2, 0x34E, zero
diff --git a/llvm/test/MC/RISCV/smctr-ssctr-valid.s b/llvm/test/MC/RISCV/smctr-ssctr-valid.s
new file mode 100644
index 00000000000000..0b4fe47ae33f4b
--- /dev/null
+++ b/llvm/test/MC/RISCV/smctr-ssctr-valid.s
@@ -0,0 +1,30 @@
+# RUN: llvm-mc %s -triple=riscv32 -mattr=+experimental-smctr -riscv-no-aliases -show-encoding \
+# RUN:     | FileCheck -check-prefixes=CHECK,CHECK-INST %s
+# RUN: llvm-mc %s -triple=riscv64 -mattr=+experimental-smctr -riscv-no-aliases -show-encoding \
+# RUN:     | FileCheck -check-prefixes=CHECK,CHECK-INST %s
+# RUN: llvm-mc %s -triple=riscv32 -mattr=+experimental-ssctr -riscv-no-aliases -show-encoding \
+# RUN:     | FileCheck -check-prefixes=CHECK,CHECK-INST %s
+# RUN: llvm-mc %s -triple=riscv64 -mattr=+experimental-ssctr -riscv-no-aliases -show-encoding \
+# RUN:     | FileCheck -check-prefixes=CHECK,CHECK-INST %s
+# RUN: llvm-mc -filetype=obj -triple riscv32 -mattr=+experimental-smctr < %s \
+# RUN:     | llvm-objdump --mattr=+experimental-smctr -M no-aliases -d - \
+# RUN:     | FileCheck -check-prefix=CHECK-INST %s
+# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+experimental-smctr < %s \
+# RUN:     | llvm-objdump --mattr=+experimental-smctr -M no-aliases -d - \
+# RUN:     | FileCheck -check-prefix=CHECK-INST %s
+# RUN: llvm-mc -filetype=obj -triple riscv32 -mattr=+experimental-ssctr < %s \
+# RUN:     | llvm-objdump --mattr=+experimental-ssctr -M no-aliases -d - \
+# RUN:     | FileCheck -check-prefix=CHECK-INST %s
+# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+experimental-ssctr < %s \
+# RUN:     | llvm-objdump --mattr=+experimental-ssctr -M no-aliases -d - \
+# RUN:     | FileCheck -check-prefix=CHECK-INST %s
+
+# RUN: not llvm-mc -triple riscv32 -riscv-no-aliases -show-encoding < %s 2>&1 \
+# RUN:     | FileCheck -check-prefixes=CHECK-NO-EXT %s
+# RUN: not llvm-mc -triple riscv64 -defsym=RV64=1 -riscv-no-aliases -show-encoding < %s 2>&1 \
+# RUN:     | FileCheck -check-prefixes=CHECK-NO-EXT %s
+
+# CHECK-INST: sctrclr
+# CHECK: encoding: [0x73,0x00,0x40,0x10]
+# CHECK-NO-EXT: error: instruction requires the following: 'Smctr' (Control Transfer Records Machine Level) or 'Ssctr' (Control Transfer Records Supervisor Level){{$}}
+sctrclr
diff --git a/llvm/test/MC/RISCV/supervisor-csr-names.s b/llvm/test/MC/RISCV/supervisor-csr-names.s
index 481f11e0082b8d..db0fcb381ef2a4 100644
--- a/llvm/test/MC/RISCV/supervisor-csr-names.s
+++ b/llvm/test/MC/RISCV/supervisor-csr-names.s
@@ -457,3 +457,46 @@ csrrs t2, 0xDB0, zero
 csrrs t1, scountinhibit, zero
 # uimm12
 csrrs t2, 0x120, zero
+
+##################################
+# Control Transfer Records
+##################################
+
+# sctrctl
+# name
+# CHECK-INST: csrrs t1, sctrctl, zero
+# CHECK-ENC: encoding: [0x73,0x23,0xe0,0x14]
+# CHECK-INST-ALIAS: csrr t1, sctrctl
+# uimm12
+# CHECK-INST: csrrs t2, sctrctl, zero
+# CHECK-ENC: encoding: [0xf3,0x23,0xe0,0x14]
+# CHECK-INST-ALIAS: csrr t2, sctrctl
+csrrs t1, sctrctl, zero
+# uimm12
+csrrs t2, 0x14E, zero
+
+# sctrstatus
+# name
+# CHECK-INST: csrrs t1, sctrstatus, zero
+# CHECK-ENC: encoding: [0x73,0x23,0xf0,0x14]
+# CHECK-INST-ALIAS: csrr t1, sctrstatus
+# uimm12
+# CHECK-INST: csrrs t2, sctrstatus, zero
+# CHECK-ENC: encoding: [0xf3,0x23,0xf0,0x14]
+# CHECK-INST-ALIAS: csrr t2, sctrstatus
+csrrs t1, sctrstatus, zero
+# uimm12
+csrrs t2, 0x14F, zero
+
+# sctrdepth
+# name
+# CHECK-INST: csrrs t1, sctrdepth, zero
+# CHECK-ENC: encoding: [0x73,0x23,0xf0,0x15]
+# CHECK-INST-ALIAS: csrr t1, sctrdepth
+# uimm12
+# CHECK-INST: csrrs t2, sctrdepth, zero
+# CHECK-ENC: encoding: [0xf3,0x23,0xf0,0x15]
+# CHECK-INST-ALIAS: csrr t2, sctrdepth
+csrrs t1, sctrdepth, zero
+# uimm12
+csrrs t2, 0x15F, zero
diff --git a/llvm/unittests/TargetParser/RISCVISAInfoTest.cpp b/llvm/unittests/TargetParser/RISCVISAInfoTest.cpp
index 6172e48c484ce8..6662421eb26d9d 100644
--- a/llvm/unittests/TargetParser/RISCVISAInfoTest.cpp
+++ b/llvm/unittests/TargetParser/RISCVISAInfoTest.cpp
@@ -1120,8 +1120,10 @@ Experimental extensions
     zalasr               0.1
     zvbc32e              0.7
     zvkgs                0.7
+    smctr                1.0
     smmpm                1.0
     smnpm                1.0
+    ssctr                1.0
     ssnpm                1.0
     sspm                 1.0
     supm                 1.0

>From 65f66d2c605f0c9b0af26244f4d42ca93f552ec8 Mon Sep 17 00:00:00 2001
From: "Ivan R. Ivanov" <ivanov.i.aa at m.titech.ac.jp>
Date: Wed, 21 Aug 2024 22:59:11 +0900
Subject: [PATCH 075/116] [flang][NFC] Move OpenMP related passes into a
 separate directory (#104732)

Reapplied with fixed library dependencies for shared lib build
---
 flang/docs/OpenMP-declare-target.md           |  4 +-
 flang/docs/OpenMP-descriptor-management.md    |  4 +-
 flang/include/flang/Optimizer/CMakeLists.txt  |  1 +
 .../flang/Optimizer/OpenMP/CMakeLists.txt     |  4 ++
 flang/include/flang/Optimizer/OpenMP/Passes.h | 30 ++++++++++++++
 .../include/flang/Optimizer/OpenMP/Passes.td  | 40 +++++++++++++++++++
 .../flang/Optimizer/Transforms/Passes.td      | 26 ------------
 flang/include/flang/Tools/CLOptions.inc       |  7 ++--
 flang/lib/Frontend/CMakeLists.txt             |  1 +
 flang/lib/Optimizer/CMakeLists.txt            |  1 +
 flang/lib/Optimizer/OpenMP/CMakeLists.txt     | 26 ++++++++++++
 .../FunctionFiltering.cpp}                    | 18 ++++-----
 .../MapInfoFinalization.cpp}                  | 21 +++++-----
 .../MarkDeclareTarget.cpp}                    | 26 ++++++++----
 flang/lib/Optimizer/Transforms/CMakeLists.txt |  3 --
 flang/tools/bbc/CMakeLists.txt                |  1 +
 flang/tools/fir-opt/CMakeLists.txt            |  1 +
 flang/tools/fir-opt/fir-opt.cpp               |  2 +
 flang/tools/tco/CMakeLists.txt                |  1 +
 19 files changed, 154 insertions(+), 63 deletions(-)
 create mode 100644 flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Passes.h
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Passes.td
 create mode 100644 flang/lib/Optimizer/OpenMP/CMakeLists.txt
 rename flang/lib/Optimizer/{Transforms/OMPFunctionFiltering.cpp => OpenMP/FunctionFiltering.cpp} (90%)
 rename flang/lib/Optimizer/{Transforms/OMPMapInfoFinalization.cpp => OpenMP/MapInfoFinalization.cpp} (96%)
 rename flang/lib/Optimizer/{Transforms/OMPMarkDeclareTarget.cpp => OpenMP/MarkDeclareTarget.cpp} (80%)

diff --git a/flang/docs/OpenMP-declare-target.md b/flang/docs/OpenMP-declare-target.md
index d29a46807e1eaf..45062469007b65 100644
--- a/flang/docs/OpenMP-declare-target.md
+++ b/flang/docs/OpenMP-declare-target.md
@@ -149,7 +149,7 @@ flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`.
 
 There are currently two passes within Flang that are related to the processing 
 of `declare target`:
-* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured
+* `MarkDeclareTarget` - This pass is in charge of marking functions captured
 (called from) in `target` regions or other `declare target` marked functions as
 `declare target`. It does so recursively, i.e. nested calls will also be 
 implicitly marked. It currently will try to mark things as conservatively as 
@@ -157,7 +157,7 @@ possible, e.g. if captured in a `target` region it will apply `nohost`, unless
 it encounters a `host` `declare target` in which case it will apply the `any` 
 device type. Functions are handled similarly, except we utilise the parent's 
 device type where possible.
-* `OMPFunctionFiltering` - This is executed after the `OMPMarkDeclareTarget`
+* `FunctionFiltering` - This is executed after the `MarkDeclareTarget`
 pass, and its job is to conservatively remove host functions from
 the module where possible when compiling for the device. This helps make 
 sure that most incompatible code for the host is not lowered for the 
diff --git a/flang/docs/OpenMP-descriptor-management.md b/flang/docs/OpenMP-descriptor-management.md
index d0eb01b00f9bb9..66c153914f70da 100644
--- a/flang/docs/OpenMP-descriptor-management.md
+++ b/flang/docs/OpenMP-descriptor-management.md
@@ -44,7 +44,7 @@ Currently, Flang will lower these descriptor types in the OpenMP lowering (lower
 to all other map types, generating an omp.MapInfoOp containing relevant information required for lowering
 the OpenMP dialect to LLVM-IR during the final stages of the MLIR lowering. However, after 
 the lowering to FIR/HLFIR has been performed an OpenMP dialect specific pass for Fortran, 
-`OMPMapInfoFinalizationPass` (Optimizer/OMPMapInfoFinalization.cpp) will expand the 
+`MapInfoFinalizationPass` (Optimizer/OpenMP/MapInfoFinalization.cpp) will expand the 
 `omp.MapInfoOp`'s containing descriptors (which currently will be a `BoxType` or `BoxAddrOp`) into multiple 
 mappings, with one extra per pointer member in the descriptor that is supported on top of the original
 descriptor map operation. These pointers members are linked to the parent descriptor by adding them to 
@@ -53,7 +53,7 @@ owning operation's (`omp.TargetOp`, `omp.TargetDataOp` etc.) map operand list an
 operation is `IsolatedFromAbove`, it also inserts them as `BlockArgs` to canonicalize the mappings and
 simplify lowering.
 
-An example transformation by the `OMPMapInfoFinalizationPass`:
+An example transformation by the `MapInfoFinalizationPass`:
 
 ```
 
diff --git a/flang/include/flang/Optimizer/CMakeLists.txt b/flang/include/flang/Optimizer/CMakeLists.txt
index 89e43a9ee8d621..3336ac935e1012 100644
--- a/flang/include/flang/Optimizer/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/CMakeLists.txt
@@ -2,3 +2,4 @@ add_subdirectory(CodeGen)
 add_subdirectory(Dialect)
 add_subdirectory(HLFIR)
 add_subdirectory(Transforms)
+add_subdirectory(OpenMP)
diff --git a/flang/include/flang/Optimizer/OpenMP/CMakeLists.txt b/flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
new file mode 100644
index 00000000000000..d59573f0f7fd91
--- /dev/null
+++ b/flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
@@ -0,0 +1,4 @@
+set(LLVM_TARGET_DEFINITIONS Passes.td)
+mlir_tablegen(Passes.h.inc -gen-pass-decls -name FlangOpenMP)
+
+add_public_tablegen_target(FlangOpenMPPassesIncGen)
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h b/flang/include/flang/Optimizer/OpenMP/Passes.h
new file mode 100644
index 00000000000000..403d79667bf448
--- /dev/null
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.h
@@ -0,0 +1,30 @@
+//===- Passes.h - OpenMP pass entry points ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This header declares the flang OpenMP passes.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_OPTIMIZER_OPENMP_PASSES_H
+#define FORTRAN_OPTIMIZER_OPENMP_PASSES_H
+
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Pass/PassRegistry.h"
+
+#include <memory>
+
+namespace flangomp {
+#define GEN_PASS_DECL
+#define GEN_PASS_REGISTRATION
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+
+} // namespace flangomp
+
+#endif // FORTRAN_OPTIMIZER_OPENMP_PASSES_H
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td
new file mode 100644
index 00000000000000..395178e26a5762
--- /dev/null
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -0,0 +1,40 @@
+//===-- Passes.td - flang OpenMP pass definition -----------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_OPTIMIZER_OPENMP_PASSES
+#define FORTRAN_OPTIMIZER_OPENMP_PASSES
+
+include "mlir/Pass/PassBase.td"
+
+def MapInfoFinalizationPass
+    : Pass<"omp-map-info-finalization"> {
+  let summary = "expands OpenMP MapInfo operations containing descriptors";
+  let description = [{
+    Expands MapInfo operations containing descriptor types into multiple
+    MapInfo's for each pointer element in the descriptor that requires
+    explicit individual mapping by the OpenMP runtime.
+  }];
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
+def MarkDeclareTargetPass
+    : Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {
+  let summary = "Marks all functions called by an OpenMP declare target function as declare target";
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
+def FunctionFiltering : Pass<"omp-function-filtering"> {
+  let summary = "Filters out functions intended for the host when compiling "
+                "for the target device.";
+  let dependentDialects = [
+    "mlir::func::FuncDialect",
+    "fir::FIROpsDialect"
+  ];
+}
+
+#endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index a0211384667ed1..49bd4f5349a754 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -358,32 +358,6 @@ def LoopVersioning : Pass<"loop-versioning", "mlir::func::FuncOp"> {
   let dependentDialects = [ "fir::FIROpsDialect" ];
 }
 
-def OMPMapInfoFinalizationPass
-    : Pass<"omp-map-info-finalization"> {
-  let summary = "expands OpenMP MapInfo operations containing descriptors";
-  let description = [{
-    Expands MapInfo operations containing descriptor types into multiple 
-    MapInfo's for each pointer element in the descriptor that requires 
-    explicit individual mapping by the OpenMP runtime.
-  }];
-  let dependentDialects = ["mlir::omp::OpenMPDialect"];
-}
-
-def OMPMarkDeclareTargetPass
-    : Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {
-  let summary = "Marks all functions called by an OpenMP declare target function as declare target";
-  let dependentDialects = ["mlir::omp::OpenMPDialect"];
-}
-
-def OMPFunctionFiltering : Pass<"omp-function-filtering"> {
-  let summary = "Filters out functions intended for the host when compiling "
-                "for the target device.";
-  let dependentDialects = [
-    "mlir::func::FuncDialect",
-    "fir::FIROpsDialect"
-  ];
-}
-
 def VScaleAttr : Pass<"vscale-attr", "mlir::func::FuncOp"> {
   let summary = "Add vscale_range attribute to functions";
   let description = [{
diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index 57b90017d052e4..1881e23b00045a 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -17,6 +17,7 @@
 #include "mlir/Transforms/Passes.h"
 #include "flang/Optimizer/CodeGen/CodeGen.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Transforms/Passes.h"
 #include "llvm/Passes/OptimizationLevel.h"
 #include "llvm/Support/CommandLine.h"
@@ -367,10 +368,10 @@ inline void createHLFIRToFIRPassPipeline(
 inline void createOpenMPFIRPassPipeline(
     mlir::PassManager &pm, bool isTargetDevice) {
   addNestedPassToAllTopLevelOperations(
-      pm, fir::createOMPMapInfoFinalizationPass);
-  pm.addPass(fir::createOMPMarkDeclareTargetPass());
+      pm, flangomp::createMapInfoFinalizationPass);
+  pm.addPass(flangomp::createMarkDeclareTargetPass());
   if (isTargetDevice)
-    pm.addPass(fir::createOMPFunctionFiltering());
+    pm.addPass(flangomp::createFunctionFiltering());
 }
 
 #if !defined(FLANG_EXCLUDE_CODEGEN)
diff --git a/flang/lib/Frontend/CMakeLists.txt b/flang/lib/Frontend/CMakeLists.txt
index c20b9096aff496..ecdcc73d61ec1f 100644
--- a/flang/lib/Frontend/CMakeLists.txt
+++ b/flang/lib/Frontend/CMakeLists.txt
@@ -38,6 +38,7 @@ add_flang_library(flangFrontend
   FIRTransforms
   HLFIRDialect
   HLFIRTransforms
+  FlangOpenMPTransforms
   MLIRTransforms
   MLIRBuiltinToLLVMIRTranslation
   MLIRLLVMToLLVMIRTranslation
diff --git a/flang/lib/Optimizer/CMakeLists.txt b/flang/lib/Optimizer/CMakeLists.txt
index 4a602162ed2b77..dd153ac33c0fbb 100644
--- a/flang/lib/Optimizer/CMakeLists.txt
+++ b/flang/lib/Optimizer/CMakeLists.txt
@@ -5,3 +5,4 @@ add_subdirectory(HLFIR)
 add_subdirectory(Support)
 add_subdirectory(Transforms)
 add_subdirectory(Analysis)
+add_subdirectory(OpenMP)
diff --git a/flang/lib/Optimizer/OpenMP/CMakeLists.txt b/flang/lib/Optimizer/OpenMP/CMakeLists.txt
new file mode 100644
index 00000000000000..92051634f0378b
--- /dev/null
+++ b/flang/lib/Optimizer/OpenMP/CMakeLists.txt
@@ -0,0 +1,26 @@
+get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
+
+add_flang_library(FlangOpenMPTransforms
+  FunctionFiltering.cpp
+  MapInfoFinalization.cpp
+  MarkDeclareTarget.cpp
+
+  DEPENDS
+  FIRDialect
+  HLFIROpsIncGen
+  FlangOpenMPPassesIncGen
+
+  LINK_LIBS
+  FIRAnalysis
+  FIRBuilder
+  FIRCodeGen
+  FIRDialect
+  FIRDialectSupport
+  FIRSupport
+  FortranCommon
+  MLIRFuncDialect
+  MLIROpenMPDialect
+  HLFIRDialect
+  MLIRIR
+  MLIRPass
+)
diff --git a/flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
similarity index 90%
rename from flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
rename to flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
index 0c472246c2a44c..bd9005d3e2df6f 100644
--- a/flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
+++ b/flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
@@ -1,4 +1,4 @@
-//===- OMPFunctionFiltering.cpp -------------------------------------------===//
+//===- FunctionFiltering.cpp -------------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -13,7 +13,7 @@
 
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
-#include "flang/Optimizer/Transforms/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
@@ -21,18 +21,18 @@
 #include "mlir/IR/BuiltinOps.h"
 #include "llvm/ADT/SmallVector.h"
 
-namespace fir {
-#define GEN_PASS_DEF_OMPFUNCTIONFILTERING
-#include "flang/Optimizer/Transforms/Passes.h.inc"
-} // namespace fir
+namespace flangomp {
+#define GEN_PASS_DEF_FUNCTIONFILTERING
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
 
 using namespace mlir;
 
 namespace {
-class OMPFunctionFilteringPass
-    : public fir::impl::OMPFunctionFilteringBase<OMPFunctionFilteringPass> {
+class FunctionFilteringPass
+    : public flangomp::impl::FunctionFilteringBase<FunctionFilteringPass> {
 public:
-  OMPFunctionFilteringPass() = default;
+  FunctionFilteringPass() = default;
 
   void runOnOperation() override {
     MLIRContext *context = &getContext();
diff --git a/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
similarity index 96%
rename from flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
rename to flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index ddaa3c5f404f0b..6e9cd03dca8f3f 100644
--- a/flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -1,5 +1,4 @@
-//===- OMPMapInfoFinalization.cpp
-//---------------------------------------------------===//
+//===- MapInfoFinalization.cpp -----------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -28,7 +27,7 @@
 #include "flang/Optimizer/Builder/FIRBuilder.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Dialect/Support/KindMapping.h"
-#include "flang/Optimizer/Transforms/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/BuiltinDialect.h"
@@ -41,15 +40,15 @@
 #include "llvm/Frontend/OpenMP/OMPConstants.h"
 #include <iterator>
 
-namespace fir {
-#define GEN_PASS_DEF_OMPMAPINFOFINALIZATIONPASS
-#include "flang/Optimizer/Transforms/Passes.h.inc"
-} // namespace fir
+namespace flangomp {
+#define GEN_PASS_DEF_MAPINFOFINALIZATIONPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
 
 namespace {
-class OMPMapInfoFinalizationPass
-    : public fir::impl::OMPMapInfoFinalizationPassBase<
-          OMPMapInfoFinalizationPass> {
+class MapInfoFinalizationPass
+    : public flangomp::impl::MapInfoFinalizationPassBase<
+          MapInfoFinalizationPass> {
 
   void genDescriptorMemberMaps(mlir::omp::MapInfoOp op,
                                fir::FirOpBuilder &builder,
@@ -245,7 +244,7 @@ class OMPMapInfoFinalizationPass
       // all users appropriately, making sure to only add a single member link
       // per new generation for the original originating descriptor MapInfoOp.
       assert(llvm::hasSingleElement(op->getUsers()) &&
-             "OMPMapInfoFinalization currently only supports single users "
+             "MapInfoFinalization currently only supports single users "
              "of a MapInfoOp");
 
       if (!op.getMembers().empty()) {
diff --git a/flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp b/flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp
similarity index 80%
rename from flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
rename to flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp
index 4946e13b22865d..a7ffd5fda82b7f 100644
--- a/flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
+++ b/flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp
@@ -1,4 +1,16 @@
-#include "flang/Optimizer/Transforms/Passes.h"
+//===- MarkDeclareTarget.cpp -------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Mark functions called from explicit target code as implicitly declare target.
+//
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
@@ -10,14 +22,14 @@
 #include "mlir/Support/LLVM.h"
 #include "llvm/ADT/SmallPtrSet.h"
 
-namespace fir {
-#define GEN_PASS_DEF_OMPMARKDECLARETARGETPASS
-#include "flang/Optimizer/Transforms/Passes.h.inc"
-} // namespace fir
+namespace flangomp {
+#define GEN_PASS_DEF_MARKDECLARETARGETPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
 
 namespace {
-class OMPMarkDeclareTargetPass
-    : public fir::impl::OMPMarkDeclareTargetPassBase<OMPMarkDeclareTargetPass> {
+class MarkDeclareTargetPass
+    : public flangomp::impl::MarkDeclareTargetPassBase<MarkDeclareTargetPass> {
 
   void markNestedFuncs(mlir::omp::DeclareTargetDeviceType parentDevTy,
                        mlir::omp::DeclareTargetCaptureClause parentCapClause,
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index bf0a8d14d95df6..b32f2ef86fca44 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -22,9 +22,6 @@ add_flang_library(FIRTransforms
   AddDebugInfo.cpp
   PolymorphicOpConversion.cpp
   LoopVersioning.cpp
-  OMPFunctionFiltering.cpp
-  OMPMapInfoFinalization.cpp
-  OMPMarkDeclareTarget.cpp
   StackReclaim.cpp
   VScaleAttr.cpp
   FunctionAttr.cpp
diff --git a/flang/tools/bbc/CMakeLists.txt b/flang/tools/bbc/CMakeLists.txt
index 9410fd00566006..69316d4dc61de3 100644
--- a/flang/tools/bbc/CMakeLists.txt
+++ b/flang/tools/bbc/CMakeLists.txt
@@ -25,6 +25,7 @@ FIRTransforms
 FIRBuilder
 HLFIRDialect
 HLFIRTransforms
+FlangOpenMPTransforms
 ${dialect_libs}
 ${extension_libs}
 MLIRAffineToStandard
diff --git a/flang/tools/fir-opt/CMakeLists.txt b/flang/tools/fir-opt/CMakeLists.txt
index 43679a9d535782..4c6dbf7d9c8c37 100644
--- a/flang/tools/fir-opt/CMakeLists.txt
+++ b/flang/tools/fir-opt/CMakeLists.txt
@@ -19,6 +19,7 @@ target_link_libraries(fir-opt PRIVATE
   FIRCodeGen
   HLFIRDialect
   HLFIRTransforms
+  FlangOpenMPTransforms
   FIRAnalysis
   ${test_libs}
   ${dialect_libs}
diff --git a/flang/tools/fir-opt/fir-opt.cpp b/flang/tools/fir-opt/fir-opt.cpp
index 1846c1b317848f..f75fba27c68f08 100644
--- a/flang/tools/fir-opt/fir-opt.cpp
+++ b/flang/tools/fir-opt/fir-opt.cpp
@@ -14,6 +14,7 @@
 #include "mlir/Tools/mlir-opt/MlirOptMain.h"
 #include "flang/Optimizer/CodeGen/CodeGen.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Support/InitFIR.h"
 #include "flang/Optimizer/Transforms/Passes.h"
 
@@ -34,6 +35,7 @@ int main(int argc, char **argv) {
   fir::registerOptCodeGenPasses();
   fir::registerOptTransformPasses();
   hlfir::registerHLFIRPasses();
+  flangomp::registerFlangOpenMPPasses();
 #ifdef FLANG_INCLUDE_TESTS
   fir::test::registerTestFIRAliasAnalysisPass();
   mlir::registerSideEffectTestPasses();
diff --git a/flang/tools/tco/CMakeLists.txt b/flang/tools/tco/CMakeLists.txt
index 808219ac361f2a..698a398547c773 100644
--- a/flang/tools/tco/CMakeLists.txt
+++ b/flang/tools/tco/CMakeLists.txt
@@ -17,6 +17,7 @@ target_link_libraries(tco PRIVATE
   FIRBuilder
   HLFIRDialect
   HLFIRTransforms
+  FlangOpenMPTransforms
   ${dialect_libs}
   ${extension_libs}
   MLIRIR

>From a55c1f2e2b62b8c6e2e374893a995c2908eb8d2f Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Wed, 31 Jul 2024 14:09:09 +0900
Subject: [PATCH 076/116] [MLIR][omp] Add omp.workshare op

---
 .../Dialect/OpenMP/OpenMPClauseOperands.h     |  3 +++
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td | 22 +++++++++++++++++++
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp  | 13 +++++++++++
 3 files changed, 38 insertions(+)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
index 38e4d8f245e4fa..d14e5e17afbb08 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
@@ -17,6 +17,7 @@
 
 #include "mlir/IR/BuiltinAttributes.h"
 #include "llvm/ADT/SmallVector.h"
+#include <mlir/Dialect/OpenMP/OpenMPDialect.h>
 
 #include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"
 
@@ -316,6 +317,8 @@ using TeamsOperands =
     detail::Clauses<AllocateClauseOps, IfClauseOps, NumTeamsClauseOps,
                     PrivateClauseOps, ReductionClauseOps, ThreadLimitClauseOps>;
 
+using WorkshareOperands = detail::Clauses<NowaitClauseOps>;
+
 using WsloopOperands =
     detail::Clauses<AllocateClauseOps, LinearClauseOps, NowaitClauseOps,
                     OrderClauseOps, OrderedClauseOps, PrivateClauseOps,
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index d63fdd88f79104..e360018cde28cf 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -287,6 +287,28 @@ def SingleOp : OpenMP_Op<"single", traits = [
   let hasVerifier = 1;
 }
 
+//===----------------------------------------------------------------------===//
+// 2.8.3 Workshare Construct
+//===----------------------------------------------------------------------===//
+
+def WorkshareOp : OpenMP_Op<"workshare", clauses = [
+    OpenMP_NowaitClause,
+  ], singleRegion = true> {
+  let summary = "workshare directive";
+  let description = [{
+    The workshare construct divides the execution of the enclosed structured
+    block into separate units of work, and causes the threads of the team to
+    share the work such that each unit is executed only once by one thread, in
+    the context of its implicit task
+  }] # clausesDescription;
+
+  let builders = [
+    OpBuilder<(ins CArg<"const WorkshareOperands &">:$clauses)>
+  ];
+
+  let hasVerifier = 1;
+}
+
 //===----------------------------------------------------------------------===//
 // Loop Nest
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 4c943ebbe3144f..b3489004a7ac2b 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1689,6 +1689,19 @@ LogicalResult SingleOp::verify() {
                                   getCopyprivateSyms());
 }
 
+//===----------------------------------------------------------------------===//
+// WorkshareOp
+//===----------------------------------------------------------------------===//
+
+void WorkshareOp::build(OpBuilder &builder, OperationState &state,
+                        const WorkshareOperands &clauses) {
+  WorkshareOp::build(builder, state, clauses.nowait);
+}
+
+LogicalResult WorkshareOp::verify() {
+  return (*this)->getRegion(0).getBlocks().size() == 1 ? success() : failure();
+}
+
 //===----------------------------------------------------------------------===//
 // WsloopOp
 //===----------------------------------------------------------------------===//

>From ef2715c7b2213f382627ce06804a8f079a3fd561 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Fri, 2 Aug 2024 16:10:25 +0900
Subject: [PATCH 077/116] Add custom omp loop wrapper

---
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index e360018cde28cf..ea302878923745 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -309,6 +309,17 @@ def WorkshareOp : OpenMP_Op<"workshare", clauses = [
   let hasVerifier = 1;
 }
 
+def WorkshareLoopWrapperOp : OpenMP_Op<"workshare_loop_wrapper", traits = [
+    DeclareOpInterfaceMethods<LoopWrapperInterface>,
+    RecursiveMemoryEffects, SingleBlock
+  ], singleRegion = true> {
+  let summary = "contains loop nests to be parallelized by workshare";
+
+  let builders = [
+    OpBuilder<(ins), [{ build($_builder, $_state, {}); }]>
+  ];
+}
+
 //===----------------------------------------------------------------------===//
 // Loop Nest
 //===----------------------------------------------------------------------===//

>From 8cc6bc001073c1df4e2b766d0f8616a36fef07a7 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Fri, 2 Aug 2024 16:08:58 +0900
Subject: [PATCH 078/116] Add recursive memory effects trait to workshare

---
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index ea302878923745..dbb25f7cc7dd00 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -291,7 +291,9 @@ def SingleOp : OpenMP_Op<"single", traits = [
 // 2.8.3 Workshare Construct
 //===----------------------------------------------------------------------===//
 
-def WorkshareOp : OpenMP_Op<"workshare", clauses = [
+def WorkshareOp : OpenMP_Op<"workshare", traits = [
+    RecursiveMemoryEffects,
+  ], clauses = [
     OpenMP_NowaitClause,
   ], singleRegion = true> {
   let summary = "workshare directive";

>From bdf8397305a45c774fa1912292f415633a331b69 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 17:04:07 +0900
Subject: [PATCH 079/116] Remove stray include

---
 mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
index d14e5e17afbb08..896ca9581c3fc8 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h
@@ -17,7 +17,6 @@
 
 #include "mlir/IR/BuiltinAttributes.h"
 #include "llvm/ADT/SmallVector.h"
-#include <mlir/Dialect/OpenMP/OpenMPDialect.h>
 
 #include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"
 

>From d443b23d136bdf2e95da488e87d5d95afce3719e Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 21:56:13 +0900
Subject: [PATCH 080/116] Remove omp.workshare verifier

---
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td | 2 --
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp  | 4 ----
 2 files changed, 6 deletions(-)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index dbb25f7cc7dd00..94806dc4cc5e34 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -307,8 +307,6 @@ def WorkshareOp : OpenMP_Op<"workshare", traits = [
   let builders = [
     OpBuilder<(ins CArg<"const WorkshareOperands &">:$clauses)>
   ];
-
-  let hasVerifier = 1;
 }
 
 def WorkshareLoopWrapperOp : OpenMP_Op<"workshare_loop_wrapper", traits = [
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index b3489004a7ac2b..230af836c1531e 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1698,10 +1698,6 @@ void WorkshareOp::build(OpBuilder &builder, OperationState &state,
   WorkshareOp::build(builder, state, clauses.nowait);
 }
 
-LogicalResult WorkshareOp::verify() {
-  return (*this)->getRegion(0).getBlocks().size() == 1 ? success() : failure();
-}
-
 //===----------------------------------------------------------------------===//
 // WsloopOp
 //===----------------------------------------------------------------------===//

>From 7bd28715f6db2caa4b6c7a5293f03ec6676ea385 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Tue, 6 Aug 2024 13:41:22 +0900
Subject: [PATCH 081/116] Add assembly format for wrapper and add test

---
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td |  2 +-
 mlir/test/Dialect/OpenMP/ops.mlir             | 61 +++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 94806dc4cc5e34..b41c53ef282047 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -314,10 +314,10 @@ def WorkshareLoopWrapperOp : OpenMP_Op<"workshare_loop_wrapper", traits = [
     RecursiveMemoryEffects, SingleBlock
   ], singleRegion = true> {
   let summary = "contains loop nests to be parallelized by workshare";
-
   let builders = [
     OpBuilder<(ins), [{ build($_builder, $_state, {}); }]>
   ];
+  let assemblyFormat = "$region attr-dict";
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index 9ac97e069addd2..8ef49ddc807b6a 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -2810,3 +2810,64 @@ func.func @omp_target_private(%map1: memref<?xi32>, %map2: memref<?xi32>, %priv_
 
   return
 }
+
+// CHECK-LABEL: func @omp_workshare
+func.func @omp_workshare() {
+  // CHECK: omp.workshare {
+  omp.workshare {
+    "test.payload"() : () -> ()
+    // CHECK: omp.terminator
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: func @omp_workshare_nowait
+func.func @omp_workshare_nowait() {
+  // CHECK: omp.workshare nowait {
+  omp.workshare nowait {
+    "test.payload"() : () -> ()
+    // CHECK: omp.terminator
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: func @omp_workshare_multiple_blocks
+func.func @omp_workshare_multiple_blocks() {
+  // CHECK: omp.workshare {
+  omp.workshare {
+    cf.br ^bb2
+    ^bb2:
+    // CHECK: omp.terminator
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: func @omp_workshare_loop_wrapper
+func.func @omp_workshare_loop_wrapper(%idx : index) {
+  // CHECK-NEXT: omp.workshare_loop_wrapper
+  omp.workshare_loop_wrapper {
+    // CHECK-NEXT: omp.loop_nest
+    omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
+      omp.yield
+    }
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: func @omp_workshare_loop_wrapper_attrs
+func.func @omp_workshare_loop_wrapper_attrs(%idx : index) {
+  // CHECK-NEXT: omp.workshare_loop_wrapper {
+  omp.workshare_loop_wrapper {
+    // CHECK-NEXT: omp.loop_nest
+    omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
+      omp.yield
+    }
+    omp.terminator
+  // CHECK: } {attr_in_dict}
+  } {attr_in_dict}
+  return
+}

>From 85037970cd12e17ee0ca704557e5ff9d67e3be07 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 14:42:35 +0900
Subject: [PATCH 082/116] Add verification and descriptions

---
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td | 10 +++++
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp  | 14 +++++++
 mlir/test/Dialect/OpenMP/invalid.mlir         | 42 +++++++++++++++++++
 mlir/test/Dialect/OpenMP/ops.mlir             | 34 +++++++++------
 4 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index b41c53ef282047..f8d7e934edbc61 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -302,6 +302,10 @@ def WorkshareOp : OpenMP_Op<"workshare", traits = [
     block into separate units of work, and causes the threads of the team to
     share the work such that each unit is executed only once by one thread, in
     the context of its implicit task
+
+    This operation is used for the intermediate representation of the workshare
+    block before the work gets divided between the threads. See the flang
+    LowerWorkshare pass for details.
   }] # clausesDescription;
 
   let builders = [
@@ -314,10 +318,16 @@ def WorkshareLoopWrapperOp : OpenMP_Op<"workshare_loop_wrapper", traits = [
     RecursiveMemoryEffects, SingleBlock
   ], singleRegion = true> {
   let summary = "contains loop nests to be parallelized by workshare";
+  let description = [{
+    This operation wraps a loop nest that is marked for dividing into units of
+    work by an encompassing omp.workshare operation.
+  }];
+
   let builders = [
     OpBuilder<(ins), [{ build($_builder, $_state, {}); }]>
   ];
   let assemblyFormat = "$region attr-dict";
+  let hasVerifier = 1;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 230af836c1531e..f4acbd97ca6d1a 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1698,6 +1698,20 @@ void WorkshareOp::build(OpBuilder &builder, OperationState &state,
   WorkshareOp::build(builder, state, clauses.nowait);
 }
 
+//===----------------------------------------------------------------------===//
+// WorkshareLoopWrapperOp
+//===----------------------------------------------------------------------===//
+
+LogicalResult WorkshareLoopWrapperOp::verify() {
+  if (!isWrapper())
+    return emitOpError() << "must be a loop wrapper";
+  if (getNestedWrapper())
+    return emitError() << "nested wrappers not supported";
+  if (!(*this)->getParentOfType<WorkshareOp>())
+    return emitError() << "must be nested in an omp.workshare";
+  return success();
+}
+
 //===----------------------------------------------------------------------===//
 // WsloopOp
 //===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index c76b07ec94a597..a67dcb34427062 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -2545,3 +2545,45 @@ func.func @omp_taskloop_invalid_composite(%lb: index, %ub: index, %step: index)
   } {omp.composite}
   return
 }
+
+// -----
+func.func @nested_wrapper(%idx : index) {
+  omp.workshare {
+    // expected-error @below {{nested wrappers not supported}}
+    omp.workshare_loop_wrapper {
+      omp.simd {
+        omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
+          omp.yield
+        }
+        omp.terminator
+      }
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// -----
+func.func @not_wrapper() {
+  omp.workshare {
+    // expected-error @below {{must be a loop wrapper}}
+    omp.workshare_loop_wrapper {
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// -----
+func.func @missing_workshare(%idx : index) {
+  // expected-error @below {{must be nested in an omp.workshare}}
+  omp.workshare_loop_wrapper {
+    omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
+      omp.yield
+    }
+    omp.terminator
+  }
+  return
+}
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index 8ef49ddc807b6a..1e0ccfb79ed54e 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -2847,11 +2847,15 @@ func.func @omp_workshare_multiple_blocks() {
 
 // CHECK-LABEL: func @omp_workshare_loop_wrapper
 func.func @omp_workshare_loop_wrapper(%idx : index) {
-  // CHECK-NEXT: omp.workshare_loop_wrapper
-  omp.workshare_loop_wrapper {
-    // CHECK-NEXT: omp.loop_nest
-    omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
-      omp.yield
+  // CHECK-NEXT: omp.workshare {
+  omp.workshare {
+    // CHECK-NEXT: omp.workshare_loop_wrapper
+    omp.workshare_loop_wrapper {
+      // CHECK-NEXT: omp.loop_nest
+      omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
+        omp.yield
+      }
+      omp.terminator
     }
     omp.terminator
   }
@@ -2860,14 +2864,18 @@ func.func @omp_workshare_loop_wrapper(%idx : index) {
 
 // CHECK-LABEL: func @omp_workshare_loop_wrapper_attrs
 func.func @omp_workshare_loop_wrapper_attrs(%idx : index) {
-  // CHECK-NEXT: omp.workshare_loop_wrapper {
-  omp.workshare_loop_wrapper {
-    // CHECK-NEXT: omp.loop_nest
-    omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
-      omp.yield
-    }
+  // CHECK-NEXT: omp.workshare {
+  omp.workshare {
+    // CHECK-NEXT: omp.workshare_loop_wrapper {
+    omp.workshare_loop_wrapper {
+      // CHECK-NEXT: omp.loop_nest
+      omp.loop_nest (%iv) : index = (%idx) to (%idx) step (%idx) {
+        omp.yield
+      }
+      omp.terminator
+    // CHECK: } {attr_in_dict}
+    } {attr_in_dict}
     omp.terminator
-  // CHECK: } {attr_in_dict}
-  } {attr_in_dict}
+  }
   return
 }

>From e5789180a3dd1fd8c46a5d7dfc446921110642ca Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Wed, 31 Jul 2024 14:11:47 +0900
Subject: [PATCH 083/116] [flang][omp] Emit omp.workshare in frontend

---
 flang/lib/Lower/OpenMP/OpenMP.cpp | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index d614db8b68ef65..83c90374afa5e3 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1272,6 +1272,15 @@ static void genTaskwaitClauses(lower::AbstractConverter &converter,
       loc, llvm::omp::Directive::OMPD_taskwait);
 }
 
+static void genWorkshareClauses(lower::AbstractConverter &converter,
+                                semantics::SemanticsContext &semaCtx,
+                                lower::StatementContext &stmtCtx,
+                                const List<Clause> &clauses, mlir::Location loc,
+                                mlir::omp::WorkshareOperands &clauseOps) {
+  ClauseProcessor cp(converter, semaCtx, clauses);
+  cp.processNowait(clauseOps);
+}
+
 static void genTeamsClauses(lower::AbstractConverter &converter,
                             semantics::SemanticsContext &semaCtx,
                             lower::StatementContext &stmtCtx,
@@ -1897,6 +1906,22 @@ genTaskyieldOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
   return converter.getFirOpBuilder().create<mlir::omp::TaskyieldOp>(loc);
 }
 
+static mlir::omp::WorkshareOp
+genWorkshareOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
+           semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
+           mlir::Location loc, const ConstructQueue &queue,
+           ConstructQueue::iterator item) {
+  lower::StatementContext stmtCtx;
+  mlir::omp::WorkshareOperands clauseOps;
+  genWorkshareClauses(converter, semaCtx, stmtCtx, item->clauses, loc, clauseOps);
+
+  return genOpWithBody<mlir::omp::WorkshareOp>(
+      OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
+                        llvm::omp::Directive::OMPD_workshare)
+          .setClauses(&item->clauses),
+      queue, item, clauseOps);
+}
+
 static mlir::omp::TeamsOp
 genTeamsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
            semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
@@ -2309,10 +2334,7 @@ static void genOMPDispatch(lower::AbstractConverter &converter,
                   llvm::omp::getOpenMPDirectiveName(dir) + ")");
   // case llvm::omp::Directive::OMPD_workdistribute:
   case llvm::omp::Directive::OMPD_workshare:
-    // FIXME: Workshare is not a commonly used OpenMP construct, an
-    // implementation for this feature will come later. For the codes
-    // that use this construct, add a single construct for now.
-    genSingleOp(converter, symTable, semaCtx, eval, loc, queue, item);
+    genWorkshareOp(converter, symTable, semaCtx, eval, loc, queue, item);
     break;
   default:
     // Combined and composite constructs should have been split into a sequence

>From 70daa016c0c39861926b1b82e31b96db005cfba1 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 16:02:37 +0900
Subject: [PATCH 084/116] Fix lower test for workshare

---
 flang/test/Lower/OpenMP/workshare.f90 | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/flang/test/Lower/OpenMP/workshare.f90 b/flang/test/Lower/OpenMP/workshare.f90
index 1e11677a15e1f0..8e771952f5b6da 100644
--- a/flang/test/Lower/OpenMP/workshare.f90
+++ b/flang/test/Lower/OpenMP/workshare.f90
@@ -6,7 +6,7 @@ subroutine sb1(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
   !$omp parallel
-!CHECK: omp.single  {
+!CHECK: omp.workshare {
   !$omp workshare
     arr = 0
   !$omp end workshare
@@ -20,7 +20,7 @@ subroutine sb2(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
   !$omp parallel
-!CHECK: omp.single nowait {
+!CHECK: omp.workshare nowait {
   !$omp workshare
     arr = 0
   !$omp end workshare nowait
@@ -33,7 +33,7 @@ subroutine sb2(arr)
 subroutine sb3(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
-!CHECK: omp.single  {
+!CHECK: omp.workshare  {
   !$omp parallel workshare
     arr = 0
   !$omp end parallel workshare

>From 81606df746e9862c330681ed8ae9113a43e577a2 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Wed, 31 Jul 2024 14:12:34 +0900
Subject: [PATCH 085/116] [flang] Introduce ws loop nest generation for HLFIR
 lowering

---
 .../flang/Optimizer/Builder/HLFIRTools.h      | 12 +++--
 flang/lib/Lower/ConvertCall.cpp               |  2 +-
 flang/lib/Lower/OpenMP/ReductionProcessor.cpp |  4 +-
 flang/lib/Optimizer/Builder/HLFIRTools.cpp    | 52 ++++++++++++++-----
 .../HLFIR/Transforms/BufferizeHLFIR.cpp       |  3 +-
 .../LowerHLFIROrderedAssignments.cpp          | 30 +++++------
 .../Transforms/OptimizedBufferization.cpp     |  6 +--
 7 files changed, 69 insertions(+), 40 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 6b41025eea0780..14e42c6f358e46 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp(
 
 /// Structure to describe a loop nest.
 struct LoopNest {
-  fir::DoLoopOp outerLoop;
-  fir::DoLoopOp innerLoop;
+  mlir::Operation *outerOp;
+  mlir::Block *body;
   llvm::SmallVector<mlir::Value> oneBasedIndices;
 };
 
@@ -366,11 +366,13 @@ struct LoopNest {
 /// \p isUnordered specifies whether the loops in the loop nest
 /// are unordered.
 LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
-                     mlir::ValueRange extents, bool isUnordered = false);
+                     mlir::ValueRange extents, bool isUnordered = false,
+                     bool emitWsLoop = false);
 inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
-                            mlir::Value shape, bool isUnordered = false) {
+                            mlir::Value shape, bool isUnordered = false,
+                            bool emitWsLoop = false) {
   return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape),
-                     isUnordered);
+                     isUnordered, emitWsLoop);
 }
 
 /// Inline the body of an hlfir.elemental at the current insertion point
diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index fd873f55dd844e..0689d6e033dd9c 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -2128,7 +2128,7 @@ class ElementalCallBuilder {
           hlfir::genLoopNest(loc, builder, shape, !mustBeOrdered);
       mlir::ValueRange oneBasedIndices = loopNest.oneBasedIndices;
       auto insPt = builder.saveInsertionPoint();
-      builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+      builder.setInsertionPointToStart(loopNest.body);
       callContext.stmtCtx.pushScope();
       for (auto &preparedActual : loweredActuals)
         if (preparedActual)
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index c3c1f363033c27..72a90dd0d6f29d 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -375,7 +375,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
   // know this won't miss any opportuinties for clever elemental inlining
   hlfir::LoopNest nest = hlfir::genLoopNest(
       loc, builder, shapeShift.getExtents(), /*isUnordered=*/true);
-  builder.setInsertionPointToStart(nest.innerLoop.getBody());
+  builder.setInsertionPointToStart(nest.body);
   mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy());
   auto lhsEleAddr = builder.create<fir::ArrayCoorOp>(
       loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{},
@@ -389,7 +389,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc,
       builder, loc, redId, refTy, lhsEle, rhsEle);
   builder.create<fir::StoreOp>(loc, scalarReduction, lhsEleAddr);
 
-  builder.setInsertionPointAfter(nest.outerLoop);
+  builder.setInsertionPointAfter(nest.outerOp);
   builder.create<mlir::omp::YieldOp>(loc, lhsAddr);
 }
 
diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index 8d0ae2f195178c..cd07cb741eb4bb 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -20,6 +20,7 @@
 #include "mlir/IR/IRMapping.h"
 #include "mlir/Support/LLVM.h"
 #include "llvm/ADT/TypeSwitch.h"
+#include <mlir/Dialect/OpenMP/OpenMPDialect.h>
 #include <optional>
 
 // Return explicit extents. If the base is a fir.box, this won't read it to
@@ -855,26 +856,51 @@ mlir::Value hlfir::inlineElementalOp(
 
 hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
                                    fir::FirOpBuilder &builder,
-                                   mlir::ValueRange extents, bool isUnordered) {
+                                   mlir::ValueRange extents, bool isUnordered,
+                                   bool emitWsLoop) {
   hlfir::LoopNest loopNest;
   assert(!extents.empty() && "must have at least one extent");
-  auto insPt = builder.saveInsertionPoint();
+  mlir::OpBuilder::InsertionGuard guard(builder);
   loopNest.oneBasedIndices.assign(extents.size(), mlir::Value{});
   // Build loop nest from column to row.
   auto one = builder.create<mlir::arith::ConstantIndexOp>(loc, 1);
   mlir::Type indexType = builder.getIndexType();
-  unsigned dim = extents.size() - 1;
-  for (auto extent : llvm::reverse(extents)) {
-    auto ub = builder.createConvert(loc, indexType, extent);
-    loopNest.innerLoop =
-        builder.create<fir::DoLoopOp>(loc, one, ub, one, isUnordered);
-    builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
-    // Reverse the indices so they are in column-major order.
-    loopNest.oneBasedIndices[dim--] = loopNest.innerLoop.getInductionVar();
-    if (!loopNest.outerLoop)
-      loopNest.outerLoop = loopNest.innerLoop;
+  if (emitWsLoop) {
+    auto wsloop = builder.create<mlir::omp::WsloopOp>(
+        loc, mlir::ArrayRef<mlir::NamedAttribute>());
+    loopNest.outerOp = wsloop;
+    builder.createBlock(&wsloop.getRegion());
+    mlir::omp::LoopNestOperands lnops;
+    lnops.loopInclusive = builder.getUnitAttr();
+    for (auto extent : llvm::reverse(extents)) {
+      lnops.loopLowerBounds.push_back(one);
+      lnops.loopUpperBounds.push_back(extent);
+      lnops.loopSteps.push_back(one);
+    }
+    auto lnOp = builder.create<mlir::omp::LoopNestOp>(loc, lnops);
+    builder.create<mlir::omp::TerminatorOp>(loc);
+    mlir::Block *block = builder.createBlock(&lnOp.getRegion());
+    for (auto extent : llvm::reverse(extents))
+      block->addArgument(extent.getType(), extent.getLoc());
+    loopNest.body = block;
+    builder.create<mlir::omp::YieldOp>(loc);
+    for (unsigned dim = 0; dim < extents.size(); dim++)
+      loopNest.oneBasedIndices[extents.size() - dim - 1] =
+          lnOp.getRegion().front().getArgument(dim);
+  } else {
+    unsigned dim = extents.size() - 1;
+    for (auto extent : llvm::reverse(extents)) {
+      auto ub = builder.createConvert(loc, indexType, extent);
+      auto doLoop =
+          builder.create<fir::DoLoopOp>(loc, one, ub, one, isUnordered);
+      loopNest.body = doLoop.getBody();
+      builder.setInsertionPointToStart(loopNest.body);
+      // Reverse the indices so they are in column-major order.
+      loopNest.oneBasedIndices[dim--] = doLoop.getInductionVar();
+      if (!loopNest.outerOp)
+        loopNest.outerOp = doLoop;
+    }
   }
-  builder.restoreInsertionPoint(insPt);
   return loopNest;
 }
 
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index a70a6b388c4b1a..b608677c526310 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -31,6 +31,7 @@
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/DialectConversion.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "llvm/ADT/TypeSwitch.h"
 
 namespace hlfir {
@@ -793,7 +794,7 @@ struct ElementalOpConversion
     hlfir::LoopNest loopNest =
         hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
     auto insPt = builder.saveInsertionPoint();
-    builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+    builder.setInsertionPointToStart(loopNest.body);
     auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
                                           loopNest.oneBasedIndices);
     hlfir::Entity elementValue(yield.getElementValue());
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp b/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp
index 85dd517cb57914..645abf65d10a32 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp
@@ -464,7 +464,7 @@ void OrderedAssignmentRewriter::pre(hlfir::RegionAssignOp regionAssignOp) {
       // if the LHS is not).
       mlir::Value shape = hlfir::genShape(loc, builder, lhsEntity);
       elementalLoopNest = hlfir::genLoopNest(loc, builder, shape);
-      builder.setInsertionPointToStart(elementalLoopNest->innerLoop.getBody());
+      builder.setInsertionPointToStart(elementalLoopNest->body);
       lhsEntity = hlfir::getElementAt(loc, builder, lhsEntity,
                                       elementalLoopNest->oneBasedIndices);
       rhsEntity = hlfir::getElementAt(loc, builder, rhsEntity,
@@ -484,7 +484,7 @@ void OrderedAssignmentRewriter::pre(hlfir::RegionAssignOp regionAssignOp) {
     for (auto &cleanupConversion : argConversionCleanups)
       cleanupConversion();
     if (elementalLoopNest)
-      builder.setInsertionPointAfter(elementalLoopNest->outerLoop);
+      builder.setInsertionPointAfter(elementalLoopNest->outerOp);
   } else {
     // TODO: preserve allocatable assignment aspects for forall once
     // they are conveyed in hlfir.region_assign.
@@ -493,7 +493,7 @@ void OrderedAssignmentRewriter::pre(hlfir::RegionAssignOp regionAssignOp) {
   generateCleanupIfAny(loweredLhs.elementalCleanup);
   if (loweredLhs.vectorSubscriptLoopNest)
     builder.setInsertionPointAfter(
-        loweredLhs.vectorSubscriptLoopNest->outerLoop);
+        loweredLhs.vectorSubscriptLoopNest->outerOp);
   generateCleanupIfAny(oldRhsYield);
   generateCleanupIfAny(loweredLhs.nonElementalCleanup);
 }
@@ -518,8 +518,8 @@ void OrderedAssignmentRewriter::pre(hlfir::WhereOp whereOp) {
       hlfir::Entity savedMask{maybeSaved->first};
       mlir::Value shape = hlfir::genShape(loc, builder, savedMask);
       whereLoopNest = hlfir::genLoopNest(loc, builder, shape);
-      constructStack.push_back(whereLoopNest->outerLoop.getOperation());
-      builder.setInsertionPointToStart(whereLoopNest->innerLoop.getBody());
+      constructStack.push_back(whereLoopNest->outerOp);
+      builder.setInsertionPointToStart(whereLoopNest->body);
       mlir::Value cdt = hlfir::getElementAt(loc, builder, savedMask,
                                             whereLoopNest->oneBasedIndices);
       generateMaskIfOp(cdt);
@@ -527,7 +527,7 @@ void OrderedAssignmentRewriter::pre(hlfir::WhereOp whereOp) {
         // If this is the same run as the one that saved the value, the clean-up
         // was left-over to be done now.
         auto insertionPoint = builder.saveInsertionPoint();
-        builder.setInsertionPointAfter(whereLoopNest->outerLoop);
+        builder.setInsertionPointAfter(whereLoopNest->outerOp);
         generateCleanupIfAny(maybeSaved->second);
         builder.restoreInsertionPoint(insertionPoint);
       }
@@ -539,8 +539,8 @@ void OrderedAssignmentRewriter::pre(hlfir::WhereOp whereOp) {
     mask.generateNoneElementalPart(builder, mapper);
     mlir::Value shape = mask.generateShape(builder, mapper);
     whereLoopNest = hlfir::genLoopNest(loc, builder, shape);
-    constructStack.push_back(whereLoopNest->outerLoop.getOperation());
-    builder.setInsertionPointToStart(whereLoopNest->innerLoop.getBody());
+    constructStack.push_back(whereLoopNest->outerOp);
+    builder.setInsertionPointToStart(whereLoopNest->body);
     mlir::Value cdt = generateMaskedEntity(mask);
     generateMaskIfOp(cdt);
     return;
@@ -754,7 +754,7 @@ OrderedAssignmentRewriter::generateYieldedLHS(
       loweredLhs.vectorSubscriptLoopNest = hlfir::genLoopNest(
           loc, builder, loweredLhs.vectorSubscriptShape.value());
       builder.setInsertionPointToStart(
-          loweredLhs.vectorSubscriptLoopNest->innerLoop.getBody());
+          loweredLhs.vectorSubscriptLoopNest->body);
     }
     loweredLhs.lhs = temp->second.fetch(loc, builder);
     return loweredLhs;
@@ -772,7 +772,7 @@ OrderedAssignmentRewriter::generateYieldedLHS(
         hlfir::genLoopNest(loc, builder, *loweredLhs.vectorSubscriptShape,
                            !elementalAddrLhs.isOrdered());
     builder.setInsertionPointToStart(
-        loweredLhs.vectorSubscriptLoopNest->innerLoop.getBody());
+        loweredLhs.vectorSubscriptLoopNest->body);
     mapper.map(elementalAddrLhs.getIndices(),
                loweredLhs.vectorSubscriptLoopNest->oneBasedIndices);
     for (auto &op : elementalAddrLhs.getBody().front().without_terminator())
@@ -798,11 +798,11 @@ OrderedAssignmentRewriter::generateMaskedEntity(MaskedArrayExpr &maskedExpr) {
   if (!maskedExpr.noneElementalPartWasGenerated) {
     // Generate none elemental part before the where loops (but inside the
     // current forall loops if any).
-    builder.setInsertionPoint(whereLoopNest->outerLoop);
+    builder.setInsertionPoint(whereLoopNest->outerOp);
     maskedExpr.generateNoneElementalPart(builder, mapper);
   }
   // Generate the none elemental part cleanup after the where loops.
-  builder.setInsertionPointAfter(whereLoopNest->outerLoop);
+  builder.setInsertionPointAfter(whereLoopNest->outerOp);
   maskedExpr.generateNoneElementalCleanupIfAny(builder, mapper);
   // Generate the value of the current element for the masked expression
   // at the current insertion point (inside the where loops, and any fir.if
@@ -1242,7 +1242,7 @@ void OrderedAssignmentRewriter::saveLeftHandSide(
   LhsValueAndCleanUp loweredLhs = generateYieldedLHS(loc, region);
   fir::factory::TemporaryStorage *temp = nullptr;
   if (loweredLhs.vectorSubscriptLoopNest)
-    constructStack.push_back(loweredLhs.vectorSubscriptLoopNest->outerLoop);
+    constructStack.push_back(loweredLhs.vectorSubscriptLoopNest->outerOp);
   if (loweredLhs.vectorSubscriptLoopNest && !rhsIsArray(regionAssignOp)) {
     // Vector subscripted entity for which the shape must also be saved on top
     // of the element addresses (e.g. the shape may change in each forall
@@ -1265,7 +1265,7 @@ void OrderedAssignmentRewriter::saveLeftHandSide(
     // subscripted LHS.
     auto &vectorTmp = temp->cast<fir::factory::AnyVectorSubscriptStack>();
     auto insertionPoint = builder.saveInsertionPoint();
-    builder.setInsertionPoint(loweredLhs.vectorSubscriptLoopNest->outerLoop);
+    builder.setInsertionPoint(loweredLhs.vectorSubscriptLoopNest->outerOp);
     vectorTmp.pushShape(loc, builder, shape);
     builder.restoreInsertionPoint(insertionPoint);
   } else {
@@ -1291,7 +1291,7 @@ void OrderedAssignmentRewriter::saveLeftHandSide(
   if (loweredLhs.vectorSubscriptLoopNest) {
     constructStack.pop_back();
     builder.setInsertionPointAfter(
-        loweredLhs.vectorSubscriptLoopNest->outerLoop);
+        loweredLhs.vectorSubscriptLoopNest->outerOp);
   }
 }
 
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 7553e05b470634..3a0a98dc594463 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -483,7 +483,7 @@ llvm::LogicalResult ElementalAssignBufferization::matchAndRewrite(
   // hlfir.elemental region inside the inner loop
   hlfir::LoopNest loopNest =
       hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
-  builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+  builder.setInsertionPointToStart(loopNest.body);
   auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
                                         loopNest.oneBasedIndices);
   hlfir::Entity elementValue{yield.getElementValue()};
@@ -554,7 +554,7 @@ llvm::LogicalResult BroadcastAssignBufferization::matchAndRewrite(
       hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
       hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
-  builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+  builder.setInsertionPointToStart(loopNest.body);
   auto arrayElement =
       hlfir::getElementAt(loc, builder, lhs, loopNest.oneBasedIndices);
   builder.create<hlfir::AssignOp>(loc, rhs, arrayElement);
@@ -649,7 +649,7 @@ llvm::LogicalResult VariableAssignBufferization::matchAndRewrite(
       hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
       hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
-  builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+  builder.setInsertionPointToStart(loopNest.body);
   auto rhsArrayElement =
       hlfir::getElementAt(loc, builder, rhs, loopNest.oneBasedIndices);
   rhsArrayElement = hlfir::loadTrivialScalar(loc, builder, rhsArrayElement);

>From 2d741e50c39cf322277b92f3f4e874ef5aa1daf4 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Fri, 2 Aug 2024 16:08:34 +0900
Subject: [PATCH 086/116] Emit loop nests in a custom wrapper

---
 flang/include/flang/Optimizer/Builder/HLFIRTools.h |  6 +++---
 flang/lib/Optimizer/Builder/HLFIRTools.cpp         | 11 +++++------
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 14e42c6f358e46..69874719572186 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -367,12 +367,12 @@ struct LoopNest {
 /// are unordered.
 LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
                      mlir::ValueRange extents, bool isUnordered = false,
-                     bool emitWsLoop = false);
+                     bool emitWorkshareLoop = false);
 inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
                             mlir::Value shape, bool isUnordered = false,
-                            bool emitWsLoop = false) {
+                            bool emitWorkshareLoop = false) {
   return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape),
-                     isUnordered, emitWsLoop);
+                     isUnordered, emitWorkshareLoop);
 }
 
 /// Inline the body of an hlfir.elemental at the current insertion point
diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index cd07cb741eb4bb..91b1b3d774a012 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -857,7 +857,7 @@ mlir::Value hlfir::inlineElementalOp(
 hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
                                    fir::FirOpBuilder &builder,
                                    mlir::ValueRange extents, bool isUnordered,
-                                   bool emitWsLoop) {
+                                   bool emitWorkshareLoop) {
   hlfir::LoopNest loopNest;
   assert(!extents.empty() && "must have at least one extent");
   mlir::OpBuilder::InsertionGuard guard(builder);
@@ -865,11 +865,10 @@ hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
   // Build loop nest from column to row.
   auto one = builder.create<mlir::arith::ConstantIndexOp>(loc, 1);
   mlir::Type indexType = builder.getIndexType();
-  if (emitWsLoop) {
-    auto wsloop = builder.create<mlir::omp::WsloopOp>(
-        loc, mlir::ArrayRef<mlir::NamedAttribute>());
-    loopNest.outerOp = wsloop;
-    builder.createBlock(&wsloop.getRegion());
+  if (emitWorkshareLoop) {
+    auto wslw = builder.create<mlir::omp::WorkshareLoopWrapperOp>(loc);
+    loopNest.outerOp = wslw;
+    builder.createBlock(&wslw.getRegion());
     mlir::omp::LoopNestOperands lnops;
     lnops.loopInclusive = builder.getUnitAttr();
     for (auto extent : llvm::reverse(extents)) {

>From f2c2d5caad1a401e564114a97267a82ca8a353d7 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 22:05:47 +0900
Subject: [PATCH 087/116] Only emit unordered loops as omp loops

---
 flang/lib/Optimizer/Builder/HLFIRTools.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index 91b1b3d774a012..333331378841ed 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -858,6 +858,7 @@ hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
                                    fir::FirOpBuilder &builder,
                                    mlir::ValueRange extents, bool isUnordered,
                                    bool emitWorkshareLoop) {
+  emitWorkshareLoop = emitWorkshareLoop && isUnordered;
   hlfir::LoopNest loopNest;
   assert(!extents.empty() && "must have at least one extent");
   mlir::OpBuilder::InsertionGuard guard(builder);

>From 51efcced19a4e4dd9697ce778c68b74909879eb3 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 17:16:22 +0900
Subject: [PATCH 088/116] Fix uninitialized memory bug in genLoopNest

---
 flang/include/flang/Optimizer/Builder/HLFIRTools.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 69874719572186..f073f494b3fb21 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp(
 
 /// Structure to describe a loop nest.
 struct LoopNest {
-  mlir::Operation *outerOp;
-  mlir::Block *body;
+  mlir::Operation *outerOp = nullptr;
+  mlir::Block *body = nullptr;
   llvm::SmallVector<mlir::Value> oneBasedIndices;
 };
 

>From 41d85aa5b0e9ac478dee312929237c402876f825 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 22:06:55 +0900
Subject: [PATCH 089/116] [flang] Lower omp.workshare to other omp constructs

---
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |   4 +
 flang/include/flang/Tools/CLOptions.inc       |   1 +
 flang/lib/Optimizer/OpenMP/CMakeLists.txt     |   1 +
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 259 ++++++++++++++++++
 .../Transforms/OpenMP/lower-workshare.mlir    |  81 ++++++
 .../Transforms/OpenMP/lower-workshare5.mlir   |  42 +++
 7 files changed, 390 insertions(+)
 create mode 100644 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare.mlir
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare5.mlir

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h b/flang/include/flang/Optimizer/OpenMP/Passes.h
index 403d79667bf448..11fa4e59f891ea 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.h
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.h
@@ -25,6 +25,8 @@ namespace flangomp {
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/OpenMP/Passes.h.inc"
 
+bool shouldUseWorkshareLowering(mlir::Operation *op);
+
 } // namespace flangomp
 
 #endif // FORTRAN_OPTIMIZER_OPENMP_PASSES_H
diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 395178e26a5762..1c9d75d8cfaa18 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -37,4 +37,8 @@ def FunctionFiltering : Pass<"omp-function-filtering"> {
   ];
 }
 
+def LowerWorkshare : Pass<"lower-workshare"> {
+  let summary = "Lower workshare construct";
+}
+
 #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index 1881e23b00045a..d43e1c736020a2 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -354,6 +354,7 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
+  pm.addPass(flangomp::createLowerWorkshare());
 }
 
 /// Create a pass pipeline for handling certain OpenMP transformations needed
diff --git a/flang/lib/Optimizer/OpenMP/CMakeLists.txt b/flang/lib/Optimizer/OpenMP/CMakeLists.txt
index 92051634f0378b..39e92d388288d4 100644
--- a/flang/lib/Optimizer/OpenMP/CMakeLists.txt
+++ b/flang/lib/Optimizer/OpenMP/CMakeLists.txt
@@ -4,6 +4,7 @@ add_flang_library(FlangOpenMPTransforms
   FunctionFiltering.cpp
   MapInfoFinalization.cpp
   MarkDeclareTarget.cpp
+  LowerWorkshare.cpp
 
   DEPENDS
   FIRDialect
diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
new file mode 100644
index 00000000000000..40975552d1fe33
--- /dev/null
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -0,0 +1,259 @@
+//===- LowerWorkshare.cpp - special cases for bufferization -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Lower omp workshare construct.
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/IRMapping.h"
+#include "mlir/IR/OpDefinition.h"
+#include "mlir/IR/PatternMatch.h"
+#include "mlir/Support/LLVM.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/iterator_range.h"
+
+#include <variant>
+
+namespace flangomp {
+#define GEN_PASS_DEF_LOWERWORKSHARE
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+#define DEBUG_TYPE "lower-workshare"
+
+using namespace mlir;
+
+namespace flangomp {
+bool shouldUseWorkshareLowering(Operation *op) {
+  auto workshare = dyn_cast<omp::WorkshareOp>(op->getParentOp());
+  if (!workshare)
+    return false;
+  return workshare->getParentOfType<omp::ParallelOp>();
+}
+} // namespace flangomp
+
+namespace {
+
+struct SingleRegion {
+  Block::iterator begin, end;
+};
+
+static bool isSupportedByFirAlloca(Type ty) {
+  return !isa<fir::ReferenceType>(ty);
+}
+
+static bool isSafeToParallelize(Operation *op) {
+  if (isa<fir::DeclareOp>(op))
+    return true;
+
+  llvm::SmallVector<MemoryEffects::EffectInstance> effects;
+  MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
+  if (!interface) {
+    return false;
+  }
+  interface.getEffects(effects);
+  if (effects.empty())
+    return true;
+
+  return false;
+}
+
+/// Lowers workshare to a sequence of single-thread regions and parallel loops
+///
+/// For example:
+///
+/// omp.workshare {
+///   %a = fir.allocmem
+///   omp.wsloop {}
+///   fir.call Assign %b %a
+///   fir.freemem %a
+/// }
+///
+/// becomes
+///
+/// omp.single {
+///   %a = fir.allocmem
+///   fir.store %a %tmp
+/// }
+/// %a_reloaded = fir.load %tmp
+/// omp.wsloop {}
+/// omp.single {
+///   fir.call Assign %b %a_reloaded
+///   fir.freemem %a_reloaded
+/// }
+///
+/// Note that we allocate temporary memory for values in omp.single's which need
+/// to be accessed in all threads in the closest omp.parallel
+///
+/// TODO currently we need to be able to access the encompassing omp.parallel so
+/// that we can allocate temporaries accessible by all threads outside of it.
+/// In case we do not find it, we fall back to converting the omp.workshare to
+/// omp.single.
+/// To better handle this we should probably enable yielding values out of an
+/// omp.single which will be supported by the omp runtime.
+void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
+  assert(wsOp.getRegion().getBlocks().size() == 1);
+
+  Location loc = wsOp->getLoc();
+
+  omp::ParallelOp parallelOp = wsOp->getParentOfType<omp::ParallelOp>();
+  if (!parallelOp) {
+    wsOp.emitWarning("cannot handle workshare, converting to single");
+    Operation *terminator = wsOp.getRegion().front().getTerminator();
+    wsOp->getBlock()->getOperations().splice(
+        wsOp->getIterator(), wsOp.getRegion().front().getOperations());
+    terminator->erase();
+    return;
+  }
+
+  OpBuilder allocBuilder(parallelOp);
+  OpBuilder rootBuilder(wsOp);
+  IRMapping rootMapping;
+
+  omp::SingleOp singleOp = nullptr;
+
+  auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder,
+                              IRMapping singleMapping) {
+    if (auto reloaded = rootMapping.lookupOrNull(v))
+      return;
+    Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext());
+    Type ty = v.getType();
+    Value alloc, reloaded;
+    if (isSupportedByFirAlloca(ty)) {
+      alloc = allocBuilder.create<fir::AllocaOp>(loc, ty);
+      singleBuilder.create<fir::StoreOp>(loc, singleMapping.lookup(v), alloc);
+      reloaded = rootBuilder.create<fir::LoadOp>(loc, ty, alloc);
+    } else {
+      auto one = allocBuilder.create<LLVM::ConstantOp>(
+          loc, allocBuilder.getI32Type(), 1);
+      alloc =
+          allocBuilder.create<LLVM::AllocaOp>(loc, llvmPtrTy, llvmPtrTy, one);
+      Value toStore = singleBuilder
+                          .create<UnrealizedConversionCastOp>(
+                              loc, llvmPtrTy, singleMapping.lookup(v))
+                          .getResult(0);
+      singleBuilder.create<LLVM::StoreOp>(loc, toStore, alloc);
+      reloaded = rootBuilder.create<LLVM::LoadOp>(loc, llvmPtrTy, alloc);
+      reloaded =
+          rootBuilder.create<UnrealizedConversionCastOp>(loc, ty, reloaded)
+              .getResult(0);
+    }
+    rootMapping.map(v, reloaded);
+  };
+
+  auto moveToSingle = [&](SingleRegion sr, OpBuilder singleBuilder) {
+    IRMapping singleMapping = rootMapping;
+
+    for (Operation &op : llvm::make_range(sr.begin, sr.end)) {
+      singleBuilder.clone(op, singleMapping);
+      if (isSafeToParallelize(&op)) {
+        rootBuilder.clone(op, rootMapping);
+      } else {
+        // Prepare reloaded values for results of operations that cannot be
+        // safely parallelized and which are used after the region `sr`
+        for (auto res : op.getResults()) {
+          for (auto &use : res.getUses()) {
+            Operation *user = use.getOwner();
+            while (user->getParentOp() != wsOp)
+              user = user->getParentOp();
+            if (!user->isBeforeInBlock(&*sr.end)) {
+              // We need to reload
+              mapReloadedValue(use.get(), singleBuilder, singleMapping);
+            }
+          }
+        }
+      }
+    }
+    singleBuilder.create<omp::TerminatorOp>(loc);
+  };
+
+  Block *wsBlock = &wsOp.getRegion().front();
+  assert(wsBlock->getTerminator()->getNumOperands() == 0);
+  Operation *terminator = wsBlock->getTerminator();
+
+  SmallVector<std::variant<SingleRegion, omp::WsloopOp>> regions;
+
+  auto it = wsBlock->begin();
+  auto getSingleRegion = [&]() {
+    if (&*it == terminator)
+      return false;
+    if (auto pop = dyn_cast<omp::WsloopOp>(&*it)) {
+      regions.push_back(pop);
+      it++;
+      return true;
+    }
+    SingleRegion sr;
+    sr.begin = it;
+    while (&*it != terminator && !isa<omp::WsloopOp>(&*it))
+      it++;
+    sr.end = it;
+    assert(sr.begin != sr.end);
+    regions.push_back(sr);
+    return true;
+  };
+  while (getSingleRegion())
+    ;
+
+  for (auto [i, loopOrSingle] : llvm::enumerate(regions)) {
+    bool isLast = i + 1 == regions.size();
+    if (std::holds_alternative<SingleRegion>(loopOrSingle)) {
+      omp::SingleOperands singleOperands;
+      if (isLast)
+        singleOperands.nowait = rootBuilder.getUnitAttr();
+      singleOp = rootBuilder.create<omp::SingleOp>(loc, singleOperands);
+      OpBuilder singleBuilder(singleOp);
+      singleBuilder.createBlock(&singleOp.getRegion());
+      moveToSingle(std::get<SingleRegion>(loopOrSingle), singleBuilder);
+    } else {
+      rootBuilder.clone(*std::get<omp::WsloopOp>(loopOrSingle), rootMapping);
+      if (!isLast)
+        rootBuilder.create<omp::BarrierOp>(loc);
+    }
+  }
+
+  if (!wsOp.getNowait())
+    rootBuilder.create<omp::BarrierOp>(loc);
+
+  wsOp->erase();
+
+  return;
+}
+
+class LowerWorksharePass
+    : public flangomp::impl::LowerWorkshareBase<LowerWorksharePass> {
+public:
+  void runOnOperation() override {
+    SmallPtrSet<Operation *, 8> parents;
+    getOperation()->walk([&](mlir::omp::WorkshareOp wsOp) {
+      Operation *isolatedParent =
+          wsOp->getParentWithTrait<OpTrait::IsIsolatedFromAbove>();
+      parents.insert(isolatedParent);
+
+      lowerWorkshare(wsOp);
+    });
+
+    // Do folding
+    for (Operation *isolatedParent : parents) {
+      RewritePatternSet patterns(&getContext());
+      GreedyRewriteConfig config;
+      // prevent the pattern driver form merging blocks
+      config.enableRegionSimplification =
+          mlir::GreedySimplifyRegionLevel::Disabled;
+      if (failed(applyPatternsAndFoldGreedily(isolatedParent,
+                                              std::move(patterns), config))) {
+        emitError(isolatedParent->getLoc(), "error in lower workshare\n");
+        signalPassFailure();
+      }
+    }
+  }
+};
+} // namespace
diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
new file mode 100644
index 00000000000000..a8d36443f08bda
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -0,0 +1,81 @@
+// RUN: fir-opt --lower-workshare %s | FileCheck %s
+
+module {
+// CHECK-LABEL:   func.func @simple(
+// CHECK-SAME:                      %[[VAL_0:.*]]: !fir.ref<!fir.array<42xi32>>) {
+// CHECK:           %[[VAL_1:.*]] = arith.constant 1 : index
+// CHECK:           %[[VAL_2:.*]] = arith.constant 1 : i32
+// CHECK:           %[[VAL_3:.*]] = arith.constant 42 : index
+// CHECK:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i32) : i32
+// CHECK:           %[[VAL_5:.*]] = llvm.alloca %[[VAL_4]] x !llvm.ptr : (i32) -> !llvm.ptr
+// CHECK:           %[[VAL_6:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:           omp.parallel {
+// CHECK:             omp.single {
+// CHECK:               %[[VAL_7:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
+// CHECK:               %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_7]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_9:.*]] = builtin.unrealized_conversion_cast %[[VAL_8]]#0 : !fir.ref<!fir.array<42xi32>> to !llvm.ptr
+// CHECK:               llvm.store %[[VAL_9]], %[[VAL_5]] : !llvm.ptr, !llvm.ptr
+// CHECK:               %[[VAL_10:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:               %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_10]](%[[VAL_7]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:               fir.store %[[VAL_11]]#0 to %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             %[[VAL_12:.*]] = llvm.load %[[VAL_5]] : !llvm.ptr -> !llvm.ptr
+// CHECK:             %[[VAL_13:.*]] = builtin.unrealized_conversion_cast %[[VAL_12]] : !llvm.ptr to !fir.ref<!fir.array<42xi32>>
+// CHECK:             %[[VAL_14:.*]] = fir.load %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             omp.wsloop {
+// CHECK:               omp.loop_nest (%[[VAL_15:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_3]]) inclusive step (%[[VAL_1]]) {
+// CHECK:                 %[[VAL_16:.*]] = hlfir.designate %[[VAL_13]] (%[[VAL_15]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 %[[VAL_17:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_18:.*]] = arith.subi %[[VAL_17]], %[[VAL_2]] : i32
+// CHECK:                 %[[VAL_19:.*]] = hlfir.designate %[[VAL_14]] (%[[VAL_15]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 hlfir.assign %[[VAL_18]] to %[[VAL_19]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:                 omp.yield
+// CHECK:               }
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             omp.barrier
+// CHECK:             omp.single nowait {
+// CHECK:               hlfir.assign %[[VAL_14]] to %[[VAL_13]] : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:               fir.freemem %[[VAL_14]] : !fir.heap<!fir.array<42xi32>>
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             omp.barrier
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+  func.func @simple(%arg0: !fir.ref<!fir.array<42xi32>>) {
+    omp.parallel {
+      omp.workshare {
+        %c42 = arith.constant 42 : index
+        %c1_i32 = arith.constant 1 : i32
+        %0 = fir.shape %c42 : (index) -> !fir.shape<1>
+        %1:2 = hlfir.declare %arg0(%0) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+        %2 = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+        %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+        %true = arith.constant true
+        %c1 = arith.constant 1 : index
+        omp.wsloop {
+          omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
+            %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+            %8 = fir.load %7 : !fir.ref<i32>
+            %9 = arith.subi %8, %c1_i32 : i32
+            %10 = hlfir.designate %3#0 (%arg1)  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+            hlfir.assign %9 to %10 temporary_lhs : i32, !fir.ref<i32>
+            omp.yield
+          }
+          omp.terminator
+        }
+        %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+        %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+        %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+        hlfir.assign %3#0 to %1#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+        fir.freemem %3#0 : !fir.heap<!fir.array<42xi32>>
+        omp.terminator
+      }
+      omp.terminator
+    }
+    return
+  }
+}
diff --git a/flang/test/Transforms/OpenMP/lower-workshare5.mlir b/flang/test/Transforms/OpenMP/lower-workshare5.mlir
new file mode 100644
index 00000000000000..177f8aa8f86c7c
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare5.mlir
@@ -0,0 +1,42 @@
+// XFAIL: *
+// RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
+
+// TODO we can lower these but we have no guarantee that the parent of
+// omp.workshare supports multi-block regions, thus we fail for now.
+
+func.func @wsfunc() {
+  %a = fir.alloca i32
+  omp.parallel {
+    omp.workshare {
+    ^bb1:
+      %c1 = arith.constant 1 : i32
+      cf.br ^bb3(%c1: i32)
+    ^bb3(%arg1: i32):
+      "test.test2"(%arg1) : (i32) -> ()
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
+func.func @wsfunc() {
+  %a = fir.alloca i32
+  omp.parallel {
+    omp.workshare {
+    ^bb1:
+      %c1 = arith.constant 1 : i32
+      cf.br ^bb3(%c1: i32)
+    ^bb2:
+      "test.test2"(%r) : (i32) -> ()
+      omp.terminator
+    ^bb3(%arg1: i32):
+      %r = "test.test2"(%arg1) : (i32) -> i32
+      cf.br ^bb2
+    }
+    omp.terminator
+  }
+  return
+}

>From 855f56732f8ede017f9440f5b5d258de27496cc1 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Fri, 2 Aug 2024 16:41:09 +0900
Subject: [PATCH 090/116] Change to workshare loop wrapper op

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 24 ++++++++++++-------
 .../Transforms/OpenMP/lower-workshare.mlir    |  5 ++--
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 40975552d1fe33..cb342b60de4e8d 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -21,6 +21,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/iterator_range.h"
 
+#include <mlir/Dialect/OpenMP/OpenMPClauseOperands.h>
 #include <variant>
 
 namespace flangomp {
@@ -73,7 +74,7 @@ static bool isSafeToParallelize(Operation *op) {
 ///
 /// omp.workshare {
 ///   %a = fir.allocmem
-///   omp.wsloop {}
+///   omp.workshare_loop_wrapper {}
 ///   fir.call Assign %b %a
 ///   fir.freemem %a
 /// }
@@ -85,7 +86,7 @@ static bool isSafeToParallelize(Operation *op) {
 ///   fir.store %a %tmp
 /// }
 /// %a_reloaded = fir.load %tmp
-/// omp.wsloop {}
+/// omp.workshare_loop_wrapper {}
 /// omp.single {
 ///   fir.call Assign %b %a_reloaded
 ///   fir.freemem %a_reloaded
@@ -180,20 +181,20 @@ void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
   assert(wsBlock->getTerminator()->getNumOperands() == 0);
   Operation *terminator = wsBlock->getTerminator();
 
-  SmallVector<std::variant<SingleRegion, omp::WsloopOp>> regions;
+  SmallVector<std::variant<SingleRegion, omp::WorkshareLoopWrapperOp>> regions;
 
   auto it = wsBlock->begin();
   auto getSingleRegion = [&]() {
     if (&*it == terminator)
       return false;
-    if (auto pop = dyn_cast<omp::WsloopOp>(&*it)) {
+    if (auto pop = dyn_cast<omp::WorkshareLoopWrapperOp>(&*it)) {
       regions.push_back(pop);
       it++;
       return true;
     }
     SingleRegion sr;
     sr.begin = it;
-    while (&*it != terminator && !isa<omp::WsloopOp>(&*it))
+    while (&*it != terminator && !isa<omp::WorkshareLoopWrapperOp>(&*it))
       it++;
     sr.end = it;
     assert(sr.begin != sr.end);
@@ -214,9 +215,16 @@ void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
       singleBuilder.createBlock(&singleOp.getRegion());
       moveToSingle(std::get<SingleRegion>(loopOrSingle), singleBuilder);
     } else {
-      rootBuilder.clone(*std::get<omp::WsloopOp>(loopOrSingle), rootMapping);
-      if (!isLast)
-        rootBuilder.create<omp::BarrierOp>(loc);
+      omp::WsloopOperands wsloopOperands;
+      if (isLast)
+        wsloopOperands.nowait = rootBuilder.getUnitAttr();
+      auto wsloop =
+          rootBuilder.create<mlir::omp::WsloopOp>(loc, wsloopOperands);
+      auto wslw = std::get<omp::WorkshareLoopWrapperOp>(loopOrSingle);
+      auto clonedWslw = cast<omp::WorkshareLoopWrapperOp>(
+          rootBuilder.clone(*wslw, rootMapping));
+      wsloop.getRegion().takeBody(clonedWslw.getRegion());
+      clonedWslw->erase();
     }
   }
 
diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index a8d36443f08bda..cb5791d35916a9 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -34,7 +34,6 @@ module {
 // CHECK:               }
 // CHECK:               omp.terminator
 // CHECK:             }
-// CHECK:             omp.barrier
 // CHECK:             omp.single nowait {
 // CHECK:               hlfir.assign %[[VAL_14]] to %[[VAL_13]] : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
 // CHECK:               fir.freemem %[[VAL_14]] : !fir.heap<!fir.array<42xi32>>
@@ -56,7 +55,7 @@ module {
         %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
         %true = arith.constant true
         %c1 = arith.constant 1 : index
-        omp.wsloop {
+        "omp.workshare_loop_wrapper"() ({
           omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
             %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
             %8 = fir.load %7 : !fir.ref<i32>
@@ -66,7 +65,7 @@ module {
             omp.yield
           }
           omp.terminator
-        }
+        }) : () -> ()
         %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
         %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
         %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>

>From 7402e03182c6e0bcd7af524363a7871006c70b52 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Fri, 2 Aug 2024 16:47:27 +0900
Subject: [PATCH 091/116] Move single op declaration

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index cb342b60de4e8d..2322d2acbc0138 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -120,8 +120,6 @@ void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
   OpBuilder rootBuilder(wsOp);
   IRMapping rootMapping;
 
-  omp::SingleOp singleOp = nullptr;
-
   auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder,
                               IRMapping singleMapping) {
     if (auto reloaded = rootMapping.lookupOrNull(v))
@@ -210,7 +208,8 @@ void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
       omp::SingleOperands singleOperands;
       if (isLast)
         singleOperands.nowait = rootBuilder.getUnitAttr();
-      singleOp = rootBuilder.create<omp::SingleOp>(loc, singleOperands);
+      omp::SingleOp singleOp =
+          rootBuilder.create<omp::SingleOp>(loc, singleOperands);
       OpBuilder singleBuilder(singleOp);
       singleBuilder.createBlock(&singleOp.getRegion());
       moveToSingle(std::get<SingleRegion>(loopOrSingle), singleBuilder);

>From 3ad20f64f6aeade627dc0a5e08e3b8623363da55 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Fri, 2 Aug 2024 17:13:58 +0900
Subject: [PATCH 092/116] Schedule pass properly

---
 flang/include/flang/Tools/CLOptions.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index d43e1c736020a2..fc2b474b034a09 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -354,7 +354,7 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
-  pm.addPass(flangomp::createLowerWorkshare());
+  addNestedPassToAllTopLevelOperations(pm, flangomp::createLowerWorkshare);
 }
 
 /// Create a pass pipeline for handling certain OpenMP transformations needed

>From ddeb40391fd9d07d91e1f530ec3c6553634f1863 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 00:30:40 +0900
Subject: [PATCH 093/116] Correctly handle nested nested loop nests to be
 parallelized by workshare

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 256 ++++++++++--------
 1 file changed, 138 insertions(+), 118 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 2322d2acbc0138..8e79d1401c01c6 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -19,9 +19,14 @@
 #include "mlir/Support/LLVM.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallVectorExtras.h"
 #include "llvm/ADT/iterator_range.h"
 
+#include <mlir/Dialect/Arith/IR/Arith.h>
 #include <mlir/Dialect/OpenMP/OpenMPClauseOperands.h>
+#include <mlir/Dialect/SCF/IR/SCF.h>
+#include <mlir/IR/Visitors.h>
+#include <mlir/Interfaces/SideEffectInterfaces.h>
 #include <variant>
 
 namespace flangomp {
@@ -52,90 +57,40 @@ static bool isSupportedByFirAlloca(Type ty) {
   return !isa<fir::ReferenceType>(ty);
 }
 
-static bool isSafeToParallelize(Operation *op) {
-  if (isa<fir::DeclareOp>(op))
-    return true;
-
-  llvm::SmallVector<MemoryEffects::EffectInstance> effects;
-  MemoryEffectOpInterface interface = dyn_cast<MemoryEffectOpInterface>(op);
-  if (!interface) {
-    return false;
-  }
-  interface.getEffects(effects);
-  if (effects.empty())
-    return true;
-
-  return false;
+static bool mustParallelizeOp(Operation *op) {
+  return op
+      ->walk(
+          [](omp::WorkshareLoopWrapperOp) { return WalkResult::interrupt(); })
+      .wasInterrupted();
 }
 
-/// Lowers workshare to a sequence of single-thread regions and parallel loops
-///
-/// For example:
-///
-/// omp.workshare {
-///   %a = fir.allocmem
-///   omp.workshare_loop_wrapper {}
-///   fir.call Assign %b %a
-///   fir.freemem %a
-/// }
-///
-/// becomes
-///
-/// omp.single {
-///   %a = fir.allocmem
-///   fir.store %a %tmp
-/// }
-/// %a_reloaded = fir.load %tmp
-/// omp.workshare_loop_wrapper {}
-/// omp.single {
-///   fir.call Assign %b %a_reloaded
-///   fir.freemem %a_reloaded
-/// }
-///
-/// Note that we allocate temporary memory for values in omp.single's which need
-/// to be accessed in all threads in the closest omp.parallel
-///
-/// TODO currently we need to be able to access the encompassing omp.parallel so
-/// that we can allocate temporaries accessible by all threads outside of it.
-/// In case we do not find it, we fall back to converting the omp.workshare to
-/// omp.single.
-/// To better handle this we should probably enable yielding values out of an
-/// omp.single which will be supported by the omp runtime.
-void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
-  assert(wsOp.getRegion().getBlocks().size() == 1);
-
-  Location loc = wsOp->getLoc();
+static bool isSafeToParallelize(Operation *op) {
+  return isa<fir::DeclareOp>(op) || isPure(op);
+}
 
-  omp::ParallelOp parallelOp = wsOp->getParentOfType<omp::ParallelOp>();
-  if (!parallelOp) {
-    wsOp.emitWarning("cannot handle workshare, converting to single");
-    Operation *terminator = wsOp.getRegion().front().getTerminator();
-    wsOp->getBlock()->getOperations().splice(
-        wsOp->getIterator(), wsOp.getRegion().front().getOperations());
-    terminator->erase();
-    return;
-  }
-
-  OpBuilder allocBuilder(parallelOp);
-  OpBuilder rootBuilder(wsOp);
-  IRMapping rootMapping;
+static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
+                              IRMapping &rootMapping, Location loc) {
+  Operation *parentOp = sourceRegion.getParentOp();
+  OpBuilder rootBuilder(sourceRegion.getContext());
 
+  // TODO need to copyprivate the alloca's
   auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder,
                               IRMapping singleMapping) {
+    OpBuilder allocaBuilder(&targetRegion.front().front());
     if (auto reloaded = rootMapping.lookupOrNull(v))
       return;
-    Type llvmPtrTy = LLVM::LLVMPointerType::get(allocBuilder.getContext());
+    Type llvmPtrTy = LLVM::LLVMPointerType::get(allocaBuilder.getContext());
     Type ty = v.getType();
     Value alloc, reloaded;
     if (isSupportedByFirAlloca(ty)) {
-      alloc = allocBuilder.create<fir::AllocaOp>(loc, ty);
+      alloc = allocaBuilder.create<fir::AllocaOp>(loc, ty);
       singleBuilder.create<fir::StoreOp>(loc, singleMapping.lookup(v), alloc);
       reloaded = rootBuilder.create<fir::LoadOp>(loc, ty, alloc);
     } else {
-      auto one = allocBuilder.create<LLVM::ConstantOp>(
-          loc, allocBuilder.getI32Type(), 1);
+      auto one = allocaBuilder.create<LLVM::ConstantOp>(
+          loc, allocaBuilder.getI32Type(), 1);
       alloc =
-          allocBuilder.create<LLVM::AllocaOp>(loc, llvmPtrTy, llvmPtrTy, one);
+          allocaBuilder.create<LLVM::AllocaOp>(loc, llvmPtrTy, llvmPtrTy, one);
       Value toStore = singleBuilder
                           .create<UnrealizedConversionCastOp>(
                               loc, llvmPtrTy, singleMapping.lookup(v))
@@ -162,9 +117,10 @@ void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
         for (auto res : op.getResults()) {
           for (auto &use : res.getUses()) {
             Operation *user = use.getOwner();
-            while (user->getParentOp() != wsOp)
+            while (user->getParentOp() != parentOp)
               user = user->getParentOp();
-            if (!user->isBeforeInBlock(&*sr.end)) {
+            if (!(user->isBeforeInBlock(&*sr.end) &&
+                  sr.begin->isBeforeInBlock(user))) {
               // We need to reload
               mapReloadedValue(use.get(), singleBuilder, singleMapping);
             }
@@ -175,61 +131,125 @@ void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
     singleBuilder.create<omp::TerminatorOp>(loc);
   };
 
-  Block *wsBlock = &wsOp.getRegion().front();
-  assert(wsBlock->getTerminator()->getNumOperands() == 0);
-  Operation *terminator = wsBlock->getTerminator();
+  // TODO Need to handle these (clone them) in dominator tree order
+  for (Block &block : sourceRegion) {
+    rootBuilder.createBlock(
+        &targetRegion, {}, block.getArgumentTypes(),
+        llvm::map_to_vector(block.getArguments(),
+                            [](BlockArgument arg) { return arg.getLoc(); }));
+    Operation *terminator = block.getTerminator();
 
-  SmallVector<std::variant<SingleRegion, omp::WorkshareLoopWrapperOp>> regions;
+    SmallVector<std::variant<SingleRegion, Operation *>> regions;
 
-  auto it = wsBlock->begin();
-  auto getSingleRegion = [&]() {
-    if (&*it == terminator)
-      return false;
-    if (auto pop = dyn_cast<omp::WorkshareLoopWrapperOp>(&*it)) {
-      regions.push_back(pop);
-      it++;
+    auto it = block.begin();
+    auto getOneRegion = [&]() {
+      if (&*it == terminator)
+        return false;
+      if (mustParallelizeOp(&*it)) {
+        regions.push_back(&*it);
+        it++;
+        return true;
+      }
+      SingleRegion sr;
+      sr.begin = it;
+      while (&*it != terminator && !mustParallelizeOp(&*it))
+        it++;
+      sr.end = it;
+      assert(sr.begin != sr.end);
+      regions.push_back(sr);
       return true;
+    };
+    while (getOneRegion())
+      ;
+
+    for (auto [i, opOrSingle] : llvm::enumerate(regions)) {
+      bool isLast = i + 1 == regions.size();
+      if (std::holds_alternative<SingleRegion>(opOrSingle)) {
+        omp::SingleOperands singleOperands;
+        if (isLast)
+          singleOperands.nowait = rootBuilder.getUnitAttr();
+        omp::SingleOp singleOp =
+            rootBuilder.create<omp::SingleOp>(loc, singleOperands);
+        OpBuilder singleBuilder(singleOp);
+        singleBuilder.createBlock(&singleOp.getRegion());
+        moveToSingle(std::get<SingleRegion>(opOrSingle), singleBuilder);
+      } else {
+        auto op = std::get<Operation *>(opOrSingle);
+        if (auto wslw = dyn_cast<omp::WorkshareLoopWrapperOp>(op)) {
+          omp::WsloopOperands wsloopOperands;
+          if (isLast)
+            wsloopOperands.nowait = rootBuilder.getUnitAttr();
+          auto wsloop =
+              rootBuilder.create<mlir::omp::WsloopOp>(loc, wsloopOperands);
+          auto clonedWslw = cast<omp::WorkshareLoopWrapperOp>(
+              rootBuilder.clone(*wslw, rootMapping));
+          wsloop.getRegion().takeBody(clonedWslw.getRegion());
+          clonedWslw->erase();
+        } else {
+          assert(mustParallelizeOp(op));
+          Operation *cloned = rootBuilder.cloneWithoutRegions(*op, rootMapping);
+          for (auto [region, clonedRegion] :
+               llvm::zip(op->getRegions(), cloned->getRegions()))
+            parallelizeRegion(region, clonedRegion, rootMapping, loc);
+        }
+      }
     }
-    SingleRegion sr;
-    sr.begin = it;
-    while (&*it != terminator && !isa<omp::WorkshareLoopWrapperOp>(&*it))
-      it++;
-    sr.end = it;
-    assert(sr.begin != sr.end);
-    regions.push_back(sr);
-    return true;
-  };
-  while (getSingleRegion())
-    ;
-
-  for (auto [i, loopOrSingle] : llvm::enumerate(regions)) {
-    bool isLast = i + 1 == regions.size();
-    if (std::holds_alternative<SingleRegion>(loopOrSingle)) {
-      omp::SingleOperands singleOperands;
-      if (isLast)
-        singleOperands.nowait = rootBuilder.getUnitAttr();
-      omp::SingleOp singleOp =
-          rootBuilder.create<omp::SingleOp>(loc, singleOperands);
-      OpBuilder singleBuilder(singleOp);
-      singleBuilder.createBlock(&singleOp.getRegion());
-      moveToSingle(std::get<SingleRegion>(loopOrSingle), singleBuilder);
-    } else {
-      omp::WsloopOperands wsloopOperands;
-      if (isLast)
-        wsloopOperands.nowait = rootBuilder.getUnitAttr();
-      auto wsloop =
-          rootBuilder.create<mlir::omp::WsloopOp>(loc, wsloopOperands);
-      auto wslw = std::get<omp::WorkshareLoopWrapperOp>(loopOrSingle);
-      auto clonedWslw = cast<omp::WorkshareLoopWrapperOp>(
-          rootBuilder.clone(*wslw, rootMapping));
-      wsloop.getRegion().takeBody(clonedWslw.getRegion());
-      clonedWslw->erase();
-    }
+
+    rootBuilder.clone(*block.getTerminator(), rootMapping);
   }
+}
+
+/// Lowers workshare to a sequence of single-thread regions and parallel loops
+///
+/// For example:
+///
+/// omp.workshare {
+///   %a = fir.allocmem
+///   omp.workshare_loop_wrapper {}
+///   fir.call Assign %b %a
+///   fir.freemem %a
+/// }
+///
+/// becomes
+///
+/// omp.single {
+///   %a = fir.allocmem
+///   fir.store %a %tmp
+/// }
+/// %a_reloaded = fir.load %tmp
+/// omp.workshare_loop_wrapper {}
+/// omp.single {
+///   fir.call Assign %b %a_reloaded
+///   fir.freemem %a_reloaded
+/// }
+///
+/// Note that we allocate temporary memory for values in omp.single's which need
+/// to be accessed in all threads in the closest omp.parallel
+void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
+  Location loc = wsOp->getLoc();
+  IRMapping rootMapping;
+
+  OpBuilder rootBuilder(wsOp);
+
+  // TODO We need something like an scf;execute here, but that is not registered
+  // so using fir.if for now but it looks like it does not support multiple
+  // blocks so it doesnt work for multi block case...
+  auto ifOp = rootBuilder.create<fir::IfOp>(
+      loc, rootBuilder.create<arith::ConstantIntOp>(loc, 1, 1), false);
+  ifOp.getThenRegion().front().erase();
+
+  parallelizeRegion(wsOp.getRegion(), ifOp.getThenRegion(), rootMapping, loc);
+
+  Operation *terminatorOp = ifOp.getThenRegion().back().getTerminator();
+  assert(isa<omp::TerminatorOp>(terminatorOp));
+  OpBuilder termBuilder(terminatorOp);
 
   if (!wsOp.getNowait())
-    rootBuilder.create<omp::BarrierOp>(loc);
+    termBuilder.create<omp::BarrierOp>(loc);
+
+  termBuilder.create<fir::ResultOp>(loc, ValueRange());
 
+  terminatorOp->erase();
   wsOp->erase();
 
   return;

>From a5eb8616e3f6e8498286f60fee293a5401806fcb Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 00:33:57 +0900
Subject: [PATCH 094/116] Leave comments for shouldUseWorkshareLowering

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 21 +++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 8e79d1401c01c6..40dae0fd848ef8 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -40,10 +40,23 @@ using namespace mlir;
 
 namespace flangomp {
 bool shouldUseWorkshareLowering(Operation *op) {
-  auto workshare = dyn_cast<omp::WorkshareOp>(op->getParentOp());
-  if (!workshare)
-    return false;
-  return workshare->getParentOfType<omp::ParallelOp>();
+  // TODO this is insufficient, as we could have
+  // omp.parallel {
+  //   omp.workshare {
+  //     omp.parallel {
+  //       hlfir.elemental {}
+  //
+  // Then this hlfir.elemental shall _not_ use the lowering for workshare
+  //
+  // Standard says:
+  //   For a parallel construct, the construct is a unit of work with respect to
+  //   the workshare construct. The statements contained in the parallel
+  //   construct are executed by a new thread team.
+  //
+  // TODO similarly for single, critical, etc. Need to think through the
+  // patterns and implement this function.
+  //
+  return op->getParentOfType<omp::WorkshareOp>();
 }
 } // namespace flangomp
 

>From d4129b7c9bce732387b2fe3babd9750da4387902 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 13:14:38 +0900
Subject: [PATCH 095/116] Use copyprivate to scatter val from omp.single

TODO still need to implement copy function
TODO transitive check for usage outside of omp.single not imiplemented yet
---
 .../include/flang/Optimizer/OpenMP/Passes.td  |   3 +-
 flang/include/flang/Tools/CLOptions.inc       |   2 +-
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 138 ++++++++++++++----
 3 files changed, 109 insertions(+), 34 deletions(-)

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 1c9d75d8cfaa18..041240cad12eb3 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -37,7 +37,8 @@ def FunctionFiltering : Pass<"omp-function-filtering"> {
   ];
 }
 
-def LowerWorkshare : Pass<"lower-workshare"> {
+// Needs to be scheduled on Module as we create functions in it
+def LowerWorkshare : Pass<"lower-workshare", "::mlir::ModuleOp"> {
   let summary = "Lower workshare construct";
 }
 
diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index fc2b474b034a09..d43e1c736020a2 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -354,7 +354,7 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
-  addNestedPassToAllTopLevelOperations(pm, flangomp::createLowerWorkshare);
+  pm.addPass(flangomp::createLowerWorkshare());
 }
 
 /// Create a pass pipeline for handling certain OpenMP transformations needed
diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 40dae0fd848ef8..950737fccada79 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -8,25 +8,27 @@
 // Lower omp workshare construct.
 //===----------------------------------------------------------------------===//
 
-#include "flang/Optimizer/Dialect/FIROps.h"
-#include "flang/Optimizer/Dialect/FIRType.h"
-#include "flang/Optimizer/OpenMP/Passes.h"
-#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
-#include "mlir/IR/BuiltinOps.h"
-#include "mlir/IR/IRMapping.h"
-#include "mlir/IR/OpDefinition.h"
-#include "mlir/IR/PatternMatch.h"
-#include "mlir/Support/LLVM.h"
-#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
-#include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/SmallVectorExtras.h"
-#include "llvm/ADT/iterator_range.h"
-
+#include <flang/Optimizer/Builder/FIRBuilder.h>
+#include <flang/Optimizer/Dialect/FIROps.h>
+#include <flang/Optimizer/Dialect/FIRType.h>
+#include <flang/Optimizer/HLFIR/HLFIROps.h>
+#include <flang/Optimizer/OpenMP/Passes.h>
+#include <llvm/ADT/STLExtras.h>
+#include <llvm/ADT/SmallVectorExtras.h>
+#include <llvm/ADT/iterator_range.h>
+#include <llvm/Support/ErrorHandling.h>
 #include <mlir/Dialect/Arith/IR/Arith.h>
-#include <mlir/Dialect/OpenMP/OpenMPClauseOperands.h>
+#include <mlir/Dialect/OpenMP/OpenMPDialect.h>
 #include <mlir/Dialect/SCF/IR/SCF.h>
+#include <mlir/IR/BuiltinOps.h>
+#include <mlir/IR/IRMapping.h>
+#include <mlir/IR/OpDefinition.h>
+#include <mlir/IR/PatternMatch.h>
 #include <mlir/IR/Visitors.h>
 #include <mlir/Interfaces/SideEffectInterfaces.h>
+#include <mlir/Support/LLVM.h>
+#include <mlir/Transforms/GreedyPatternRewriteDriver.h>
+
 #include <variant>
 
 namespace flangomp {
@@ -71,6 +73,8 @@ static bool isSupportedByFirAlloca(Type ty) {
 }
 
 static bool mustParallelizeOp(Operation *op) {
+  // TODO as in shouldUseWorkshareLowering we be careful not to pick up
+  // workshare_loop_wrapper in nested omp.parallel ops
   return op
       ->walk(
           [](omp::WorkshareLoopWrapperOp) { return WalkResult::interrupt(); })
@@ -78,7 +82,33 @@ static bool mustParallelizeOp(Operation *op) {
 }
 
 static bool isSafeToParallelize(Operation *op) {
-  return isa<fir::DeclareOp>(op) || isPure(op);
+  return isa<hlfir::DeclareOp>(op) || isa<fir::DeclareOp>(op) ||
+         isMemoryEffectFree(op);
+}
+
+static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType,
+                                         fir::FirOpBuilder builder) {
+  mlir::ModuleOp module = builder.getModule();
+  mlir::Type eleTy = mlir::cast<fir::ReferenceType>(varType).getEleTy();
+
+  std::string copyFuncName =
+      fir::getTypeAsString(eleTy, builder.getKindMap(), "_workshare_copy");
+
+  if (auto decl = module.lookupSymbol<mlir::func::FuncOp>(copyFuncName))
+    return decl;
+  // create function
+  mlir::OpBuilder::InsertionGuard guard(builder);
+  mlir::OpBuilder modBuilder(module.getBodyRegion());
+  llvm::SmallVector<mlir::Type> argsTy = {varType, varType};
+  auto funcType = mlir::FunctionType::get(builder.getContext(), argsTy, {});
+  mlir::func::FuncOp funcOp =
+      modBuilder.create<mlir::func::FuncOp>(loc, copyFuncName, funcType);
+  funcOp.setVisibility(mlir::SymbolTable::Visibility::Private);
+  builder.createBlock(&funcOp.getRegion(), funcOp.getRegion().end(), argsTy,
+                      {loc, loc});
+  builder.setInsertionPointToStart(&funcOp.getRegion().back());
+  builder.create<mlir::func::ReturnOp>(loc);
+  return funcOp;
 }
 
 static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
@@ -86,19 +116,23 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
   Operation *parentOp = sourceRegion.getParentOp();
   OpBuilder rootBuilder(sourceRegion.getContext());
 
+  ModuleOp m = sourceRegion.getParentOfType<ModuleOp>();
+  OpBuilder copyFuncBuilder(m.getBodyRegion());
+  fir::FirOpBuilder firCopyFuncBuilder(copyFuncBuilder, m);
+
   // TODO need to copyprivate the alloca's
-  auto mapReloadedValue = [&](Value v, OpBuilder singleBuilder,
-                              IRMapping singleMapping) {
-    OpBuilder allocaBuilder(&targetRegion.front().front());
+  auto mapReloadedValue =
+      [&](Value v, OpBuilder allocaBuilder, OpBuilder singleBuilder,
+          OpBuilder parallelBuilder, IRMapping singleMapping) -> Value {
     if (auto reloaded = rootMapping.lookupOrNull(v))
-      return;
+      return nullptr;
     Type llvmPtrTy = LLVM::LLVMPointerType::get(allocaBuilder.getContext());
     Type ty = v.getType();
     Value alloc, reloaded;
     if (isSupportedByFirAlloca(ty)) {
       alloc = allocaBuilder.create<fir::AllocaOp>(loc, ty);
       singleBuilder.create<fir::StoreOp>(loc, singleMapping.lookup(v), alloc);
-      reloaded = rootBuilder.create<fir::LoadOp>(loc, ty, alloc);
+      reloaded = parallelBuilder.create<fir::LoadOp>(loc, ty, alloc);
     } else {
       auto one = allocaBuilder.create<LLVM::ConstantOp>(
           loc, allocaBuilder.getI32Type(), 1);
@@ -109,21 +143,25 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
                               loc, llvmPtrTy, singleMapping.lookup(v))
                           .getResult(0);
       singleBuilder.create<LLVM::StoreOp>(loc, toStore, alloc);
-      reloaded = rootBuilder.create<LLVM::LoadOp>(loc, llvmPtrTy, alloc);
+      reloaded = parallelBuilder.create<LLVM::LoadOp>(loc, llvmPtrTy, alloc);
       reloaded =
-          rootBuilder.create<UnrealizedConversionCastOp>(loc, ty, reloaded)
+          parallelBuilder.create<UnrealizedConversionCastOp>(loc, ty, reloaded)
               .getResult(0);
     }
     rootMapping.map(v, reloaded);
+    return alloc;
   };
 
-  auto moveToSingle = [&](SingleRegion sr, OpBuilder singleBuilder) {
+  auto moveToSingle = [&](SingleRegion sr, OpBuilder allocaBuilder,
+                          OpBuilder singleBuilder,
+                          OpBuilder parallelBuilder) -> SmallVector<Value> {
     IRMapping singleMapping = rootMapping;
+    SmallVector<Value> copyPrivate;
 
     for (Operation &op : llvm::make_range(sr.begin, sr.end)) {
       singleBuilder.clone(op, singleMapping);
       if (isSafeToParallelize(&op)) {
-        rootBuilder.clone(op, rootMapping);
+        parallelBuilder.clone(op, rootMapping);
       } else {
         // Prepare reloaded values for results of operations that cannot be
         // safely parallelized and which are used after the region `sr`
@@ -132,16 +170,21 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
             Operation *user = use.getOwner();
             while (user->getParentOp() != parentOp)
               user = user->getParentOp();
-            if (!(user->isBeforeInBlock(&*sr.end) &&
-                  sr.begin->isBeforeInBlock(user))) {
-              // We need to reload
-              mapReloadedValue(use.get(), singleBuilder, singleMapping);
+            // TODO we need to look at transitively used vals
+            if (true || !(user->isBeforeInBlock(&*sr.end) &&
+                          sr.begin->isBeforeInBlock(user))) {
+              auto alloc =
+                  mapReloadedValue(use.get(), allocaBuilder, singleBuilder,
+                                   parallelBuilder, singleMapping);
+              if (alloc)
+                copyPrivate.push_back(alloc);
             }
           }
         }
       }
     }
     singleBuilder.create<omp::TerminatorOp>(loc);
+    return copyPrivate;
   };
 
   // TODO Need to handle these (clone them) in dominator tree order
@@ -178,14 +221,45 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
     for (auto [i, opOrSingle] : llvm::enumerate(regions)) {
       bool isLast = i + 1 == regions.size();
       if (std::holds_alternative<SingleRegion>(opOrSingle)) {
+        OpBuilder singleBuilder(sourceRegion.getContext());
+        Block *singleBlock = new Block();
+        singleBuilder.setInsertionPointToStart(singleBlock);
+
+        OpBuilder allocaBuilder(sourceRegion.getContext());
+        Block *allocaBlock = new Block();
+        allocaBuilder.setInsertionPointToStart(allocaBlock);
+
+        OpBuilder parallelBuilder(sourceRegion.getContext());
+        Block *parallelBlock = new Block();
+        parallelBuilder.setInsertionPointToStart(parallelBlock);
+
         omp::SingleOperands singleOperands;
         if (isLast)
           singleOperands.nowait = rootBuilder.getUnitAttr();
+        auto insPtAtSingle = rootBuilder.saveInsertionPoint();
+        singleOperands.copyprivateVars =
+            moveToSingle(std::get<SingleRegion>(opOrSingle), allocaBuilder,
+                         singleBuilder, parallelBuilder);
+        for (auto var : singleOperands.copyprivateVars) {
+          Type ty;
+          if (auto firAlloca = var.getDefiningOp<fir::AllocaOp>()) {
+            ty = firAlloca.getAllocatedType();
+          } else {
+            llvm_unreachable("unexpected");
+          }
+          mlir::func::FuncOp funcOp =
+              createCopyFunc(loc, var.getType(), firCopyFuncBuilder);
+          singleOperands.copyprivateSyms.push_back(SymbolRefAttr::get(funcOp));
+        }
         omp::SingleOp singleOp =
             rootBuilder.create<omp::SingleOp>(loc, singleOperands);
-        OpBuilder singleBuilder(singleOp);
-        singleBuilder.createBlock(&singleOp.getRegion());
-        moveToSingle(std::get<SingleRegion>(opOrSingle), singleBuilder);
+        singleOp.getRegion().push_back(singleBlock);
+        rootBuilder.getInsertionBlock()->getOperations().splice(
+            rootBuilder.getInsertionPoint(), parallelBlock->getOperations());
+        targetRegion.front().getOperations().splice(
+            singleOp->getIterator(), allocaBlock->getOperations());
+        delete allocaBlock;
+        delete parallelBlock;
       } else {
         auto op = std::get<Operation *>(opOrSingle);
         if (auto wslw = dyn_cast<omp::WorkshareLoopWrapperOp>(op)) {

>From 75d694ef17b630d4bfb0f80698850bda4d70de79 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 13:39:49 +0900
Subject: [PATCH 096/116] Transitively check for users outisde of single op

TODO need to implement copy func
TODO need to hoist allocas outside of single regions
---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 51 ++++++++++++++-----
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 950737fccada79..2e88d852ff2cba 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -111,6 +111,38 @@ static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType,
   return funcOp;
 }
 
+static bool isUserOutsideSR(Operation *user, Operation *parentOp,
+                            SingleRegion sr) {
+  while (user->getParentOp() != parentOp)
+    user = user->getParentOp();
+  return sr.begin->getBlock() != user->getBlock() ||
+         !(user->isBeforeInBlock(&*sr.end) && sr.begin->isBeforeInBlock(user));
+}
+
+static bool isTransitivelyUsedOutside(Value v, SingleRegion sr) {
+  Block *srBlock = sr.begin->getBlock();
+  Operation *parentOp = srBlock->getParentOp();
+
+  for (auto &use : v.getUses()) {
+    Operation *user = use.getOwner();
+    if (isUserOutsideSR(user, parentOp, sr))
+      return true;
+
+    // Results of nested users cannot be used outside of the SR
+    if (user->getBlock() != srBlock)
+      continue;
+
+    // A non-safe to parallelize operation will be handled separately
+    if (!isSafeToParallelize(user))
+      continue;
+
+    for (auto res : user->getResults())
+      if (isTransitivelyUsedOutside(res, sr))
+        return true;
+  }
+  return false;
+}
+
 static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
                               IRMapping &rootMapping, Location loc) {
   Operation *parentOp = sourceRegion.getParentOp();
@@ -166,19 +198,11 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
         // Prepare reloaded values for results of operations that cannot be
         // safely parallelized and which are used after the region `sr`
         for (auto res : op.getResults()) {
-          for (auto &use : res.getUses()) {
-            Operation *user = use.getOwner();
-            while (user->getParentOp() != parentOp)
-              user = user->getParentOp();
-            // TODO we need to look at transitively used vals
-            if (true || !(user->isBeforeInBlock(&*sr.end) &&
-                          sr.begin->isBeforeInBlock(user))) {
-              auto alloc =
-                  mapReloadedValue(use.get(), allocaBuilder, singleBuilder,
-                                   parallelBuilder, singleMapping);
-              if (alloc)
-                copyPrivate.push_back(alloc);
-            }
+          if (isTransitivelyUsedOutside(res, sr)) {
+            auto alloc = mapReloadedValue(res, allocaBuilder, singleBuilder,
+                                          parallelBuilder, singleMapping);
+            if (alloc)
+              copyPrivate.push_back(alloc);
           }
         }
       }
@@ -236,7 +260,6 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
         omp::SingleOperands singleOperands;
         if (isLast)
           singleOperands.nowait = rootBuilder.getUnitAttr();
-        auto insPtAtSingle = rootBuilder.saveInsertionPoint();
         singleOperands.copyprivateVars =
             moveToSingle(std::get<SingleRegion>(opOrSingle), allocaBuilder,
                          singleBuilder, parallelBuilder);

>From 20f409fcb2df6e5d81b5df4b703b592267420083 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 14:30:43 +0900
Subject: [PATCH 097/116] Add tests

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp |  25 +-
 .../Transforms/OpenMP/lower-workshare.mlir    | 230 +++++++++++++-----
 2 files changed, 188 insertions(+), 67 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 2e88d852ff2cba..30af2556cf4cae 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -18,6 +18,7 @@
 #include <llvm/ADT/iterator_range.h>
 #include <llvm/Support/ErrorHandling.h>
 #include <mlir/Dialect/Arith/IR/Arith.h>
+#include <mlir/Dialect/LLVMIR/LLVMTypes.h>
 #include <mlir/Dialect/OpenMP/OpenMPDialect.h>
 #include <mlir/Dialect/SCF/IR/SCF.h>
 #include <mlir/IR/BuiltinOps.h>
@@ -75,6 +76,14 @@ static bool isSupportedByFirAlloca(Type ty) {
 static bool mustParallelizeOp(Operation *op) {
   // TODO as in shouldUseWorkshareLowering we be careful not to pick up
   // workshare_loop_wrapper in nested omp.parallel ops
+  //
+  // e.g.
+  //
+  // omp.parallel {
+  //   omp.workshare {
+  //     omp.parallel {
+  //       omp.workshare {
+  //         omp.workshare_loop_wrapper {}
   return op
       ->walk(
           [](omp::WorkshareLoopWrapperOp) { return WalkResult::interrupt(); })
@@ -89,10 +98,14 @@ static bool isSafeToParallelize(Operation *op) {
 static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType,
                                          fir::FirOpBuilder builder) {
   mlir::ModuleOp module = builder.getModule();
-  mlir::Type eleTy = mlir::cast<fir::ReferenceType>(varType).getEleTy();
-
-  std::string copyFuncName =
-      fir::getTypeAsString(eleTy, builder.getKindMap(), "_workshare_copy");
+  std::string copyFuncName;
+  if (auto rt = dyn_cast<fir::ReferenceType>(varType)) {
+    mlir::Type eleTy = rt.getEleTy();
+    copyFuncName =
+        fir::getTypeAsString(eleTy, builder.getKindMap(), "_workshare_copy");
+  } else {
+    copyFuncName = "_workshare_copy_llvm_ptr";
+  }
 
   if (auto decl = module.lookupSymbol<mlir::func::FuncOp>(copyFuncName))
     return decl;
@@ -145,9 +158,7 @@ static bool isTransitivelyUsedOutside(Value v, SingleRegion sr) {
 
 static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
                               IRMapping &rootMapping, Location loc) {
-  Operation *parentOp = sourceRegion.getParentOp();
   OpBuilder rootBuilder(sourceRegion.getContext());
-
   ModuleOp m = sourceRegion.getParentOfType<ModuleOp>();
   OpBuilder copyFuncBuilder(m.getBodyRegion());
   fir::FirOpBuilder firCopyFuncBuilder(copyFuncBuilder, m);
@@ -268,7 +279,7 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
           if (auto firAlloca = var.getDefiningOp<fir::AllocaOp>()) {
             ty = firAlloca.getAllocatedType();
           } else {
-            llvm_unreachable("unexpected");
+            ty = LLVM::LLVMPointerType::get(allocaBuilder.getContext());
           }
           mlir::func::FuncOp funcOp =
               createCopyFunc(loc, var.getType(), firCopyFuncBuilder);
diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index cb5791d35916a9..19123e71cacf60 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -1,80 +1,190 @@
-// RUN: fir-opt --lower-workshare %s | FileCheck %s
+// RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
 
-module {
-// CHECK-LABEL:   func.func @simple(
+func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
+  omp.parallel {
+    omp.workshare {
+      %c42 = arith.constant 42 : index
+      %c1_i32 = arith.constant 1 : i32
+      %0 = fir.shape %c42 : (index) -> !fir.shape<1>
+      %1:2 = hlfir.declare %arg0(%0) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+      %2 = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+      %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+      %true = arith.constant true
+      %c1 = arith.constant 1 : index
+      "omp.workshare_loop_wrapper"() ({
+        omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
+          %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+          %8 = fir.load %7 : !fir.ref<i32>
+          %9 = arith.subi %8, %c1_i32 : i32
+          %10 = hlfir.designate %3#0 (%arg1)  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+          hlfir.assign %9 to %10 temporary_lhs : i32, !fir.ref<i32>
+          omp.yield
+        }
+        omp.terminator
+      }) : () -> ()
+      %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+      %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+      %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+      hlfir.assign %3#0 to %1#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+      fir.freemem %3#0 : !fir.heap<!fir.array<42xi32>>
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+
+// -----
+
+func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
+  omp.workshare {
+    %c1_i32 = arith.constant 1 : i32
+    %alloc = fir.alloca i32
+    fir.store %c1_i32 to %alloc : !fir.ref<i32>
+    %c42 = arith.constant 42 : index
+    %0 = fir.shape %c42 : (index) -> !fir.shape<1>
+    %1:2 = hlfir.declare %arg0(%0) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+    %2 = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+    %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+    %true = arith.constant true
+    %c1 = arith.constant 1 : index
+    "omp.workshare_loop_wrapper"() ({
+      omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
+        %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+        %8 = fir.load %7 : !fir.ref<i32>
+        %ld = fir.load %alloc : !fir.ref<i32>
+        %n8 = arith.subi %8, %ld : i32
+        %9 = arith.subi %n8, %c1_i32 : i32
+        %10 = hlfir.designate %3#0 (%arg1)  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+        hlfir.assign %9 to %10 temporary_lhs : i32, !fir.ref<i32>
+        omp.yield
+      }
+      omp.terminator
+    }) : () -> ()
+    %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+    %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+    %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+    "test.test1"(%alloc) : (!fir.ref<i32>) -> ()
+    hlfir.assign %3#0 to %1#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+    fir.freemem %3#0 : !fir.heap<!fir.array<42xi32>>
+    omp.terminator
+  }
+  return
+}
+
+
+// CHECK-LABEL:   func.func private @_workshare_copy_heap_42xi32(
+// CHECK-SAME:                                                   %[[VAL_0:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>,
+// CHECK-SAME:                                                   %[[VAL_1:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:           return
+// CHECK:         }
+
+// CHECK-LABEL:   func.func @wsfunc(
 // CHECK-SAME:                      %[[VAL_0:.*]]: !fir.ref<!fir.array<42xi32>>) {
 // CHECK:           %[[VAL_1:.*]] = arith.constant 1 : index
 // CHECK:           %[[VAL_2:.*]] = arith.constant 1 : i32
 // CHECK:           %[[VAL_3:.*]] = arith.constant 42 : index
-// CHECK:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i32) : i32
-// CHECK:           %[[VAL_5:.*]] = llvm.alloca %[[VAL_4]] x !llvm.ptr : (i32) -> !llvm.ptr
-// CHECK:           %[[VAL_6:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:           %[[VAL_4:.*]] = arith.constant true
 // CHECK:           omp.parallel {
-// CHECK:             omp.single {
-// CHECK:               %[[VAL_7:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
-// CHECK:               %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_7]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:               %[[VAL_9:.*]] = builtin.unrealized_conversion_cast %[[VAL_8]]#0 : !fir.ref<!fir.array<42xi32>> to !llvm.ptr
-// CHECK:               llvm.store %[[VAL_9]], %[[VAL_5]] : !llvm.ptr, !llvm.ptr
-// CHECK:               %[[VAL_10:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
-// CHECK:               %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_10]](%[[VAL_7]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:               fir.store %[[VAL_11]]#0 to %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             fir.if %[[VAL_4]] {
+// CHECK:               %[[VAL_5:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:               omp.single copyprivate(%[[VAL_5]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:                 %[[VAL_6:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
+// CHECK:                 %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_6]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:                 %[[VAL_8:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:                 fir.store %[[VAL_8]] to %[[VAL_5]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:                 %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_8]](%[[VAL_6]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               %[[VAL_10:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
+// CHECK:               %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_10]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_12:.*]] = fir.load %[[VAL_5]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:               %[[VAL_13:.*]]:2 = hlfir.declare %[[VAL_12]](%[[VAL_10]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:               omp.wsloop {
+// CHECK:                 omp.loop_nest (%[[VAL_14:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_3]]) inclusive step (%[[VAL_1]]) {
+// CHECK:                   %[[VAL_15:.*]] = hlfir.designate %[[VAL_11]]#0 (%[[VAL_14]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                   %[[VAL_16:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
+// CHECK:                   %[[VAL_17:.*]] = arith.subi %[[VAL_16]], %[[VAL_2]] : i32
+// CHECK:                   %[[VAL_18:.*]] = hlfir.designate %[[VAL_13]]#0 (%[[VAL_14]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                   hlfir.assign %[[VAL_17]] to %[[VAL_18]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:                   omp.yield
+// CHECK:                 }
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               omp.single nowait {
+// CHECK:                 hlfir.assign %[[VAL_13]]#0 to %[[VAL_11]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:                 fir.freemem %[[VAL_13]]#0 : !fir.heap<!fir.array<42xi32>>
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               omp.barrier
+// CHECK:             }
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+
+// CHECK-LABEL:   func.func private @_workshare_copy_heap_42xi32(
+// CHECK-SAME:                                                   %[[VAL_0:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>,
+// CHECK-SAME:                                                   %[[VAL_1:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:           return
+// CHECK:         }
+
+// CHECK-LABEL:   func.func private @_workshare_copy_llvm_ptr(
+// CHECK-SAME:                                                %[[VAL_0:.*]]: !llvm.ptr,
+// CHECK-SAME:                                                %[[VAL_1:.*]]: !llvm.ptr) {
+// CHECK:           return
+// CHECK:         }
+
+// CHECK-LABEL:   func.func @wsfunc(
+// CHECK-SAME:                      %[[VAL_0:.*]]: !fir.ref<!fir.array<42xi32>>) {
+// CHECK:           %[[VAL_1:.*]] = arith.constant 1 : index
+// CHECK:           %[[VAL_2:.*]] = arith.constant 42 : index
+// CHECK:           %[[VAL_3:.*]] = arith.constant 1 : i32
+// CHECK:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i32) : i32
+// CHECK:           %[[VAL_5:.*]] = arith.constant true
+// CHECK:           fir.if %[[VAL_5]] {
+// CHECK:             %[[VAL_6:.*]] = llvm.alloca %[[VAL_4]] x !llvm.ptr : (i32) -> !llvm.ptr
+// CHECK:             %[[VAL_7:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:             omp.single copyprivate(%[[VAL_6]] -> @_workshare_copy_llvm_ptr : !llvm.ptr, %[[VAL_7]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:               %[[VAL_8:.*]] = fir.alloca i32
+// CHECK:               %[[VAL_9:.*]] = builtin.unrealized_conversion_cast %[[VAL_8]] : !fir.ref<i32> to !llvm.ptr
+// CHECK:               llvm.store %[[VAL_9]], %[[VAL_6]] : !llvm.ptr, !llvm.ptr
+// CHECK:               fir.store %[[VAL_3]] to %[[VAL_8]] : !fir.ref<i32>
+// CHECK:               %[[VAL_10:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
+// CHECK:               %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_10]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_12:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:               fir.store %[[VAL_12]] to %[[VAL_7]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:               %[[VAL_13:.*]]:2 = hlfir.declare %[[VAL_12]](%[[VAL_10]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
 // CHECK:               omp.terminator
 // CHECK:             }
-// CHECK:             %[[VAL_12:.*]] = llvm.load %[[VAL_5]] : !llvm.ptr -> !llvm.ptr
-// CHECK:             %[[VAL_13:.*]] = builtin.unrealized_conversion_cast %[[VAL_12]] : !llvm.ptr to !fir.ref<!fir.array<42xi32>>
-// CHECK:             %[[VAL_14:.*]] = fir.load %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             %[[VAL_14:.*]] = llvm.load %[[VAL_6]] : !llvm.ptr -> !llvm.ptr
+// CHECK:             %[[VAL_15:.*]] = builtin.unrealized_conversion_cast %[[VAL_14]] : !llvm.ptr to !fir.ref<i32>
+// CHECK:             %[[VAL_16:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
+// CHECK:             %[[VAL_17:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_16]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:             %[[VAL_18:.*]] = fir.load %[[VAL_7]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             %[[VAL_19:.*]]:2 = hlfir.declare %[[VAL_18]](%[[VAL_16]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
 // CHECK:             omp.wsloop {
-// CHECK:               omp.loop_nest (%[[VAL_15:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_3]]) inclusive step (%[[VAL_1]]) {
-// CHECK:                 %[[VAL_16:.*]] = hlfir.designate %[[VAL_13]] (%[[VAL_15]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 %[[VAL_17:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
-// CHECK:                 %[[VAL_18:.*]] = arith.subi %[[VAL_17]], %[[VAL_2]] : i32
-// CHECK:                 %[[VAL_19:.*]] = hlfir.designate %[[VAL_14]] (%[[VAL_15]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 hlfir.assign %[[VAL_18]] to %[[VAL_19]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:               omp.loop_nest (%[[VAL_20:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_2]]) inclusive step (%[[VAL_1]]) {
+// CHECK:                 %[[VAL_21:.*]] = hlfir.designate %[[VAL_17]]#0 (%[[VAL_20]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 %[[VAL_22:.*]] = fir.load %[[VAL_21]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_23:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_24:.*]] = arith.subi %[[VAL_22]], %[[VAL_23]] : i32
+// CHECK:                 %[[VAL_25:.*]] = arith.subi %[[VAL_24]], %[[VAL_3]] : i32
+// CHECK:                 %[[VAL_26:.*]] = hlfir.designate %[[VAL_19]]#0 (%[[VAL_20]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 hlfir.assign %[[VAL_25]] to %[[VAL_26]] temporary_lhs : i32, !fir.ref<i32>
 // CHECK:                 omp.yield
 // CHECK:               }
 // CHECK:               omp.terminator
 // CHECK:             }
 // CHECK:             omp.single nowait {
-// CHECK:               hlfir.assign %[[VAL_14]] to %[[VAL_13]] : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
-// CHECK:               fir.freemem %[[VAL_14]] : !fir.heap<!fir.array<42xi32>>
+// CHECK:               "test.test1"(%[[VAL_15]]) : (!fir.ref<i32>) -> ()
+// CHECK:               hlfir.assign %[[VAL_19]]#0 to %[[VAL_17]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:               fir.freemem %[[VAL_19]]#0 : !fir.heap<!fir.array<42xi32>>
 // CHECK:               omp.terminator
 // CHECK:             }
 // CHECK:             omp.barrier
-// CHECK:             omp.terminator
 // CHECK:           }
 // CHECK:           return
 // CHECK:         }
-  func.func @simple(%arg0: !fir.ref<!fir.array<42xi32>>) {
-    omp.parallel {
-      omp.workshare {
-        %c42 = arith.constant 42 : index
-        %c1_i32 = arith.constant 1 : i32
-        %0 = fir.shape %c42 : (index) -> !fir.shape<1>
-        %1:2 = hlfir.declare %arg0(%0) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-        %2 = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
-        %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-        %true = arith.constant true
-        %c1 = arith.constant 1 : index
-        "omp.workshare_loop_wrapper"() ({
-          omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
-            %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-            %8 = fir.load %7 : !fir.ref<i32>
-            %9 = arith.subi %8, %c1_i32 : i32
-            %10 = hlfir.designate %3#0 (%arg1)  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-            hlfir.assign %9 to %10 temporary_lhs : i32, !fir.ref<i32>
-            omp.yield
-          }
-          omp.terminator
-        }) : () -> ()
-        %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
-        %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
-        %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
-        hlfir.assign %3#0 to %1#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
-        fir.freemem %3#0 : !fir.heap<!fir.array<42xi32>>
-        omp.terminator
-      }
-      omp.terminator
-    }
-    return
-  }
-}
+

>From 991d0dbe8c560c1affe5c7b9f422311eb9015aeb Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 15:50:58 +0900
Subject: [PATCH 098/116] Hoist allocas

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 10 ++-
 .../Transforms/OpenMP/lower-workshare.mlir    | 69 +++++++++----------
 2 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 30af2556cf4cae..d0cd235d3eb079 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -163,7 +163,6 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
   OpBuilder copyFuncBuilder(m.getBodyRegion());
   fir::FirOpBuilder firCopyFuncBuilder(copyFuncBuilder, m);
 
-  // TODO need to copyprivate the alloca's
   auto mapReloadedValue =
       [&](Value v, OpBuilder allocaBuilder, OpBuilder singleBuilder,
           OpBuilder parallelBuilder, IRMapping singleMapping) -> Value {
@@ -202,10 +201,17 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
     SmallVector<Value> copyPrivate;
 
     for (Operation &op : llvm::make_range(sr.begin, sr.end)) {
-      singleBuilder.clone(op, singleMapping);
       if (isSafeToParallelize(&op)) {
+        singleBuilder.clone(op, singleMapping);
         parallelBuilder.clone(op, rootMapping);
+      } else if (auto alloca = dyn_cast<fir::AllocaOp>(&op)) {
+        auto hoisted =
+            cast<fir::AllocaOp>(allocaBuilder.clone(*alloca, singleMapping));
+        rootMapping.map(&*alloca, &*hoisted);
+        rootMapping.map(alloca.getResult(), hoisted.getResult());
+        copyPrivate.push_back(hoisted);
       } else {
+        singleBuilder.clone(op, singleMapping);
         // Prepare reloaded values for results of operations that cannot be
         // safely parallelized and which are used after the region `sr`
         for (auto res : op.getResults()) {
diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index 19123e71cacf60..b78cfd80e17acb 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -1,5 +1,7 @@
 // RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
 
+// checks:
+// nowait on final omp.single
 func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
   omp.parallel {
     omp.workshare {
@@ -37,6 +39,8 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 
 // -----
 
+// checks:
+// fir.alloca hoisted out and copyprivate'd
 func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
   omp.workshare {
     %c1_i32 = arith.constant 1 : i32
@@ -73,7 +77,6 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
   return
 }
 
-
 // CHECK-LABEL:   func.func private @_workshare_copy_heap_42xi32(
 // CHECK-SAME:                                                   %[[VAL_0:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>,
 // CHECK-SAME:                                                   %[[VAL_1:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
@@ -130,9 +133,9 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK:           return
 // CHECK:         }
 
-// CHECK-LABEL:   func.func private @_workshare_copy_llvm_ptr(
-// CHECK-SAME:                                                %[[VAL_0:.*]]: !llvm.ptr,
-// CHECK-SAME:                                                %[[VAL_1:.*]]: !llvm.ptr) {
+// CHECK-LABEL:   func.func private @_workshare_copy_i32(
+// CHECK-SAME:                                           %[[VAL_0:.*]]: !fir.ref<i32>,
+// CHECK-SAME:                                           %[[VAL_1:.*]]: !fir.ref<i32>) {
 // CHECK:           return
 // CHECK:         }
 
@@ -141,46 +144,40 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK:           %[[VAL_1:.*]] = arith.constant 1 : index
 // CHECK:           %[[VAL_2:.*]] = arith.constant 42 : index
 // CHECK:           %[[VAL_3:.*]] = arith.constant 1 : i32
-// CHECK:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i32) : i32
-// CHECK:           %[[VAL_5:.*]] = arith.constant true
-// CHECK:           fir.if %[[VAL_5]] {
-// CHECK:             %[[VAL_6:.*]] = llvm.alloca %[[VAL_4]] x !llvm.ptr : (i32) -> !llvm.ptr
-// CHECK:             %[[VAL_7:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
-// CHECK:             omp.single copyprivate(%[[VAL_6]] -> @_workshare_copy_llvm_ptr : !llvm.ptr, %[[VAL_7]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
-// CHECK:               %[[VAL_8:.*]] = fir.alloca i32
-// CHECK:               %[[VAL_9:.*]] = builtin.unrealized_conversion_cast %[[VAL_8]] : !fir.ref<i32> to !llvm.ptr
-// CHECK:               llvm.store %[[VAL_9]], %[[VAL_6]] : !llvm.ptr, !llvm.ptr
-// CHECK:               fir.store %[[VAL_3]] to %[[VAL_8]] : !fir.ref<i32>
-// CHECK:               %[[VAL_10:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
-// CHECK:               %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_10]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:               %[[VAL_12:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
-// CHECK:               fir.store %[[VAL_12]] to %[[VAL_7]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
-// CHECK:               %[[VAL_13:.*]]:2 = hlfir.declare %[[VAL_12]](%[[VAL_10]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:           %[[VAL_4:.*]] = arith.constant true
+// CHECK:           fir.if %[[VAL_4]] {
+// CHECK:             %[[VAL_5:.*]] = fir.alloca i32
+// CHECK:             %[[VAL_6:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:             omp.single copyprivate(%[[VAL_5]] -> @_workshare_copy_i32 : !fir.ref<i32>, %[[VAL_6]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:               fir.store %[[VAL_3]] to %[[VAL_5]] : !fir.ref<i32>
+// CHECK:               %[[VAL_7:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
+// CHECK:               %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_7]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_9:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:               fir.store %[[VAL_9]] to %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:               %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_9]](%[[VAL_7]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
 // CHECK:               omp.terminator
 // CHECK:             }
-// CHECK:             %[[VAL_14:.*]] = llvm.load %[[VAL_6]] : !llvm.ptr -> !llvm.ptr
-// CHECK:             %[[VAL_15:.*]] = builtin.unrealized_conversion_cast %[[VAL_14]] : !llvm.ptr to !fir.ref<i32>
-// CHECK:             %[[VAL_16:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
-// CHECK:             %[[VAL_17:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_16]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:             %[[VAL_18:.*]] = fir.load %[[VAL_7]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
-// CHECK:             %[[VAL_19:.*]]:2 = hlfir.declare %[[VAL_18]](%[[VAL_16]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:             %[[VAL_11:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
+// CHECK:             %[[VAL_12:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_11]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:             %[[VAL_13:.*]] = fir.load %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             %[[VAL_14:.*]]:2 = hlfir.declare %[[VAL_13]](%[[VAL_11]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
 // CHECK:             omp.wsloop {
-// CHECK:               omp.loop_nest (%[[VAL_20:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_2]]) inclusive step (%[[VAL_1]]) {
-// CHECK:                 %[[VAL_21:.*]] = hlfir.designate %[[VAL_17]]#0 (%[[VAL_20]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 %[[VAL_22:.*]] = fir.load %[[VAL_21]] : !fir.ref<i32>
-// CHECK:                 %[[VAL_23:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
-// CHECK:                 %[[VAL_24:.*]] = arith.subi %[[VAL_22]], %[[VAL_23]] : i32
-// CHECK:                 %[[VAL_25:.*]] = arith.subi %[[VAL_24]], %[[VAL_3]] : i32
-// CHECK:                 %[[VAL_26:.*]] = hlfir.designate %[[VAL_19]]#0 (%[[VAL_20]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 hlfir.assign %[[VAL_25]] to %[[VAL_26]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:               omp.loop_nest (%[[VAL_15:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_2]]) inclusive step (%[[VAL_1]]) {
+// CHECK:                 %[[VAL_16:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_15]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 %[[VAL_17:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_18:.*]] = fir.load %[[VAL_5]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_19:.*]] = arith.subi %[[VAL_17]], %[[VAL_18]] : i32
+// CHECK:                 %[[VAL_20:.*]] = arith.subi %[[VAL_19]], %[[VAL_3]] : i32
+// CHECK:                 %[[VAL_21:.*]] = hlfir.designate %[[VAL_14]]#0 (%[[VAL_15]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 hlfir.assign %[[VAL_20]] to %[[VAL_21]] temporary_lhs : i32, !fir.ref<i32>
 // CHECK:                 omp.yield
 // CHECK:               }
 // CHECK:               omp.terminator
 // CHECK:             }
 // CHECK:             omp.single nowait {
-// CHECK:               "test.test1"(%[[VAL_15]]) : (!fir.ref<i32>) -> ()
-// CHECK:               hlfir.assign %[[VAL_19]]#0 to %[[VAL_17]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
-// CHECK:               fir.freemem %[[VAL_19]]#0 : !fir.heap<!fir.array<42xi32>>
+// CHECK:               "test.test1"(%[[VAL_5]]) : (!fir.ref<i32>) -> ()
+// CHECK:               hlfir.assign %[[VAL_14]]#0 to %[[VAL_12]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:               fir.freemem %[[VAL_14]]#0 : !fir.heap<!fir.array<42xi32>>
 // CHECK:               omp.terminator
 // CHECK:             }
 // CHECK:             omp.barrier

>From 5a903333bd588097974c99074d0851f7d743fc64 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 15:52:23 +0900
Subject: [PATCH 099/116] More tests

---
 .../Transforms/OpenMP/lower-workshare2.mlir   | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare2.mlir

diff --git a/flang/test/Transforms/OpenMP/lower-workshare2.mlir b/flang/test/Transforms/OpenMP/lower-workshare2.mlir
new file mode 100644
index 00000000000000..325a40d4184453
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare2.mlir
@@ -0,0 +1,21 @@
+// RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
+
+// CHECK-LABEL:   func.func @nonowait
+func.func @nonowait(%arg0: !fir.ref<!fir.array<42xi32>>) {
+  // CHECK: omp.barrier
+  omp.workshare {
+    omp.terminator
+  }
+  return
+}
+
+// -----
+
+// CHECK-LABEL:   func.func @nowait
+func.func @nowait(%arg0: !fir.ref<!fir.array<42xi32>>) {
+  // CHECK-NOT: omp.barrier
+  omp.workshare nowait {
+    omp.terminator
+  }
+  return
+}

>From ee3f346ff690f58574c6c00ec412fccc5ff8cd4d Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 16:13:39 +0900
Subject: [PATCH 100/116] Emit body for copy func

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 44 +++++--------------
 .../Transforms/OpenMP/lower-workshare.mlir    |  6 +++
 2 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index d0cd235d3eb079..20f45296a8159a 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -69,10 +69,6 @@ struct SingleRegion {
   Block::iterator begin, end;
 };
 
-static bool isSupportedByFirAlloca(Type ty) {
-  return !isa<fir::ReferenceType>(ty);
-}
-
 static bool mustParallelizeOp(Operation *op) {
   // TODO as in shouldUseWorkshareLowering we be careful not to pick up
   // workshare_loop_wrapper in nested omp.parallel ops
@@ -98,14 +94,10 @@ static bool isSafeToParallelize(Operation *op) {
 static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType,
                                          fir::FirOpBuilder builder) {
   mlir::ModuleOp module = builder.getModule();
-  std::string copyFuncName;
-  if (auto rt = dyn_cast<fir::ReferenceType>(varType)) {
-    mlir::Type eleTy = rt.getEleTy();
-    copyFuncName =
-        fir::getTypeAsString(eleTy, builder.getKindMap(), "_workshare_copy");
-  } else {
-    copyFuncName = "_workshare_copy_llvm_ptr";
-  }
+  auto rt = cast<fir::ReferenceType>(varType);
+  mlir::Type eleTy = rt.getEleTy();
+  std::string copyFuncName =
+      fir::getTypeAsString(eleTy, builder.getKindMap(), "_workshare_copy");
 
   if (auto decl = module.lookupSymbol<mlir::func::FuncOp>(copyFuncName))
     return decl;
@@ -120,6 +112,10 @@ static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType,
   builder.createBlock(&funcOp.getRegion(), funcOp.getRegion().end(), argsTy,
                       {loc, loc});
   builder.setInsertionPointToStart(&funcOp.getRegion().back());
+
+  Value loaded = builder.create<fir::LoadOp>(loc, funcOp.getArgument(0));
+  builder.create<fir::StoreOp>(loc, loaded, funcOp.getArgument(1));
+
   builder.create<mlir::func::ReturnOp>(loc);
   return funcOp;
 }
@@ -168,28 +164,10 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
           OpBuilder parallelBuilder, IRMapping singleMapping) -> Value {
     if (auto reloaded = rootMapping.lookupOrNull(v))
       return nullptr;
-    Type llvmPtrTy = LLVM::LLVMPointerType::get(allocaBuilder.getContext());
     Type ty = v.getType();
-    Value alloc, reloaded;
-    if (isSupportedByFirAlloca(ty)) {
-      alloc = allocaBuilder.create<fir::AllocaOp>(loc, ty);
-      singleBuilder.create<fir::StoreOp>(loc, singleMapping.lookup(v), alloc);
-      reloaded = parallelBuilder.create<fir::LoadOp>(loc, ty, alloc);
-    } else {
-      auto one = allocaBuilder.create<LLVM::ConstantOp>(
-          loc, allocaBuilder.getI32Type(), 1);
-      alloc =
-          allocaBuilder.create<LLVM::AllocaOp>(loc, llvmPtrTy, llvmPtrTy, one);
-      Value toStore = singleBuilder
-                          .create<UnrealizedConversionCastOp>(
-                              loc, llvmPtrTy, singleMapping.lookup(v))
-                          .getResult(0);
-      singleBuilder.create<LLVM::StoreOp>(loc, toStore, alloc);
-      reloaded = parallelBuilder.create<LLVM::LoadOp>(loc, llvmPtrTy, alloc);
-      reloaded =
-          parallelBuilder.create<UnrealizedConversionCastOp>(loc, ty, reloaded)
-              .getResult(0);
-    }
+    Value alloc = allocaBuilder.create<fir::AllocaOp>(loc, ty);
+    singleBuilder.create<fir::StoreOp>(loc, singleMapping.lookup(v), alloc);
+    Value reloaded = parallelBuilder.create<fir::LoadOp>(loc, ty, alloc);
     rootMapping.map(v, reloaded);
     return alloc;
   };
diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index b78cfd80e17acb..997bc8d79f9b3f 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -80,6 +80,8 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK-LABEL:   func.func private @_workshare_copy_heap_42xi32(
 // CHECK-SAME:                                                   %[[VAL_0:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>,
 // CHECK-SAME:                                                   %[[VAL_1:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:           fir.store %[[VAL_2]] to %[[VAL_1]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
 // CHECK:           return
 // CHECK:         }
 
@@ -130,12 +132,16 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK-LABEL:   func.func private @_workshare_copy_heap_42xi32(
 // CHECK-SAME:                                                   %[[VAL_0:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>,
 // CHECK-SAME:                                                   %[[VAL_1:.*]]: !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:           fir.store %[[VAL_2]] to %[[VAL_1]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
 // CHECK:           return
 // CHECK:         }
 
 // CHECK-LABEL:   func.func private @_workshare_copy_i32(
 // CHECK-SAME:                                           %[[VAL_0:.*]]: !fir.ref<i32>,
 // CHECK-SAME:                                           %[[VAL_1:.*]]: !fir.ref<i32>) {
+// CHECK:           %[[VAL_2:.*]] = fir.load %[[VAL_0]] : !fir.ref<i32>
+// CHECK:           fir.store %[[VAL_2]] to %[[VAL_1]] : !fir.ref<i32>
 // CHECK:           return
 // CHECK:         }
 

>From 72f5c1a8ef5ecdeaf01767784905da51b1cb3944 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 16:59:34 +0900
Subject: [PATCH 101/116] Test the tmp storing logic

---
 .../Transforms/OpenMP/lower-workshare.mlir    |  2 -
 .../Transforms/OpenMP/lower-workshare3.mlir   | 74 +++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare3.mlir

diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index 997bc8d79f9b3f..063d3865065e01 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -36,7 +36,6 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
   return
 }
 
-
 // -----
 
 // checks:
@@ -190,4 +189,3 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK:           }
 // CHECK:           return
 // CHECK:         }
-
diff --git a/flang/test/Transforms/OpenMP/lower-workshare3.mlir b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
new file mode 100644
index 00000000000000..84eded94503282
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
@@ -0,0 +1,74 @@
+// RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
+
+
+// tests if the correct values are stored
+
+func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
+  omp.parallel {
+  // CHECK: fir.alloca
+  // CHECK: fir.alloca
+  // CHECK: fir.alloca
+  // CHECK: fir.alloca
+  // CHECK: fir.alloca
+  // CHECK-NOT: fir.alloca
+    omp.workshare {
+
+      %t1 = "test.test1"() : () -> i32
+      // CHECK: %[[T1:.*]] = "test.test1"
+      // CHECK: fir.store %[[T1]]
+      %t2 = "test.test2"() : () -> i32
+      // CHECK: %[[T2:.*]] = "test.test2"
+      // CHECK: fir.store %[[T2]]
+      %t3 = "test.test3"() : () -> i32
+      // CHECK: %[[T3:.*]] = "test.test3"
+      // CHECK-NOT: fir.store %[[T3]]
+      %t4 = "test.test4"() : () -> i32
+      // CHECK: %[[T4:.*]] = "test.test4"
+      // CHECK: fir.store %[[T4]]
+      %t5 = "test.test5"() : () -> i32
+      // CHECK: %[[T5:.*]] = "test.test5"
+      // CHECK: fir.store %[[T5]]
+      %t6 = "test.test6"() : () -> i32
+      // CHECK: %[[T6:.*]] = "test.test6"
+      // CHECK-NOT: fir.store %[[T6]]
+
+
+      "test.test1"(%t1) : (i32) -> ()
+      "test.test1"(%t2) : (i32) -> ()
+      "test.test1"(%t3) : (i32) -> ()
+
+      %true = arith.constant true
+      fir.if %true {
+        "test.test2"(%t3) : (i32) -> ()
+      }
+
+      %c1_i32 = arith.constant 1 : i32
+
+      %t5_pure_use = arith.addi %t5, %c1_i32 : i32
+
+      %t6_mem_effect_use = "test.test8"(%t6) : (i32) -> i32
+      // CHECK: %[[T6_USE:.*]] = "test.test8"
+      // CHECK: fir.store %[[T6_USE]]
+
+      %c42 = arith.constant 42 : index
+      %c1 = arith.constant 1 : index
+      "omp.workshare_loop_wrapper"() ({
+        omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
+          "test.test10"(%t1) : (i32) -> ()
+          "test.test10"(%t5_pure_use) : (i32) -> ()
+          "test.test10"(%t6_mem_effect_use) : (i32) -> ()
+          omp.yield
+        }
+        omp.terminator
+      }) : () -> ()
+
+      "test.test10"(%t2) : (i32) -> ()
+      fir.if %true {
+        "test.test10"(%t4) : (i32) -> ()
+      }
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}

>From cabb42617ba08f6d6c3654f67e96c98d402b4814 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 20:24:36 +0900
Subject: [PATCH 102/116] Clean up trivially dead ops

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 32 ++++-------
 .../Transforms/OpenMP/lower-workshare3.mlir   |  2 +-
 .../Transforms/OpenMP/lower-workshare4.mlir   | 55 +++++++++++++++++++
 3 files changed, 68 insertions(+), 21 deletions(-)
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare4.mlir

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 20f45296a8159a..a147db2cb5d59a 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -152,6 +152,14 @@ static bool isTransitivelyUsedOutside(Value v, SingleRegion sr) {
   return false;
 }
 
+/// We clone pure operations in both the parallel and single blocks. this
+/// functions cleans them up if they end up with no uses
+static void cleanupBlock(Block *block) {
+  for (Operation &op : llvm::make_early_inc_range(*block))
+    if (isOpTriviallyDead(&op))
+      op.erase();
+}
+
 static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
                               IRMapping &rootMapping, Location loc) {
   OpBuilder rootBuilder(sourceRegion.getContext());
@@ -258,13 +266,8 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
         singleOperands.copyprivateVars =
             moveToSingle(std::get<SingleRegion>(opOrSingle), allocaBuilder,
                          singleBuilder, parallelBuilder);
+        cleanupBlock(singleBlock);
         for (auto var : singleOperands.copyprivateVars) {
-          Type ty;
-          if (auto firAlloca = var.getDefiningOp<fir::AllocaOp>()) {
-            ty = firAlloca.getAllocatedType();
-          } else {
-            ty = LLVM::LLVMPointerType::get(allocaBuilder.getContext());
-          }
           mlir::func::FuncOp funcOp =
               createCopyFunc(loc, var.getType(), firCopyFuncBuilder);
           singleOperands.copyprivateSyms.push_back(SymbolRefAttr::get(funcOp));
@@ -302,6 +305,9 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
 
     rootBuilder.clone(*block.getTerminator(), rootMapping);
   }
+
+  for (Block &targetBlock : targetRegion)
+    cleanupBlock(&targetBlock);
 }
 
 /// Lowers workshare to a sequence of single-thread regions and parallel loops
@@ -372,20 +378,6 @@ class LowerWorksharePass
 
       lowerWorkshare(wsOp);
     });
-
-    // Do folding
-    for (Operation *isolatedParent : parents) {
-      RewritePatternSet patterns(&getContext());
-      GreedyRewriteConfig config;
-      // prevent the pattern driver form merging blocks
-      config.enableRegionSimplification =
-          mlir::GreedySimplifyRegionLevel::Disabled;
-      if (failed(applyPatternsAndFoldGreedily(isolatedParent,
-                                              std::move(patterns), config))) {
-        emitError(isolatedParent->getLoc(), "error in lower workshare\n");
-        signalPassFailure();
-      }
-    }
   }
 };
 } // namespace
diff --git a/flang/test/Transforms/OpenMP/lower-workshare3.mlir b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
index 84eded94503282..aee95a464a31bd 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare3.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
@@ -3,7 +3,7 @@
 
 // tests if the correct values are stored
 
-func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
+func.func @wsfunc() {
   omp.parallel {
   // CHECK: fir.alloca
   // CHECK: fir.alloca
diff --git a/flang/test/Transforms/OpenMP/lower-workshare4.mlir b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
new file mode 100644
index 00000000000000..6cff0075b4fe50
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
@@ -0,0 +1,55 @@
+// RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
+
+func.func @wsfunc() {
+  %a = fir.alloca i32
+  omp.parallel {
+    omp.workshare {
+      %t1 = "test.test1"() : () -> i32
+
+      %c1 = arith.constant 1 : index
+      %c42 = arith.constant 42 : index
+
+      %c2 = arith.constant 2 : index
+      "test.test3"(%c2) : (index) -> ()
+
+      "omp.workshare_loop_wrapper"() ({
+        omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
+          "test.test2"() : () -> ()
+          omp.yield
+        }
+        omp.terminator
+      }) : () -> ()
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL:   func.func @wsfunc() {
+// CHECK:           %[[VAL_0:.*]] = fir.alloca i32
+// CHECK:           omp.parallel {
+// CHECK:             %[[VAL_1:.*]] = arith.constant true
+// CHECK:             fir.if %[[VAL_1]] {
+// CHECK:               omp.single {
+// CHECK:                 %[[VAL_2:.*]] = "test.test1"() : () -> i32
+// CHECK:                 %[[VAL_3:.*]] = arith.constant 2 : index
+// CHECK:                 "test.test3"(%[[VAL_3]]) : (index) -> ()
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               %[[VAL_4:.*]] = arith.constant 1 : index
+// CHECK:               %[[VAL_5:.*]] = arith.constant 42 : index
+// CHECK:               omp.wsloop nowait {
+// CHECK:                 omp.loop_nest (%[[VAL_6:.*]]) : index = (%[[VAL_4]]) to (%[[VAL_5]]) inclusive step (%[[VAL_4]]) {
+// CHECK:                   "test.test2"() : () -> ()
+// CHECK:                   omp.yield
+// CHECK:                 }
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               omp.barrier
+// CHECK:             }
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+

>From 9aefed404e49396d8edb77513fed402c8ab583cd Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 21:55:14 +0900
Subject: [PATCH 103/116] Only handle single-block regions for now

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp |  80 +++++----
 .../Transforms/OpenMP/lower-workshare.mlir    | 154 +++++++++---------
 .../Transforms/OpenMP/lower-workshare4.mlir   |  31 ++--
 3 files changed, 143 insertions(+), 122 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index a147db2cb5d59a..5998489c13d382 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -13,12 +13,14 @@
 #include <flang/Optimizer/Dialect/FIRType.h>
 #include <flang/Optimizer/HLFIR/HLFIROps.h>
 #include <flang/Optimizer/OpenMP/Passes.h>
+#include <llvm/ADT/BreadthFirstIterator.h>
 #include <llvm/ADT/STLExtras.h>
 #include <llvm/ADT/SmallVectorExtras.h>
 #include <llvm/ADT/iterator_range.h>
 #include <llvm/Support/ErrorHandling.h>
 #include <mlir/Dialect/Arith/IR/Arith.h>
 #include <mlir/Dialect/LLVMIR/LLVMTypes.h>
+#include <mlir/Dialect/OpenMP/OpenMPClauseOperands.h>
 #include <mlir/Dialect/OpenMP/OpenMPDialect.h>
 #include <mlir/Dialect/SCF/IR/SCF.h>
 #include <mlir/IR/BuiltinOps.h>
@@ -161,7 +163,8 @@ static void cleanupBlock(Block *block) {
 }
 
 static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
-                              IRMapping &rootMapping, Location loc) {
+                              IRMapping &rootMapping, Location loc,
+                              mlir::DominanceInfo &di) {
   OpBuilder rootBuilder(sourceRegion.getContext());
   ModuleOp m = sourceRegion.getParentOfType<ModuleOp>();
   OpBuilder copyFuncBuilder(m.getBodyRegion());
@@ -214,14 +217,19 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
     return copyPrivate;
   };
 
-  // TODO Need to handle these (clone them) in dominator tree order
   for (Block &block : sourceRegion) {
-    rootBuilder.createBlock(
+    Block *targetBlock = rootBuilder.createBlock(
         &targetRegion, {}, block.getArgumentTypes(),
         llvm::map_to_vector(block.getArguments(),
                             [](BlockArgument arg) { return arg.getLoc(); }));
-    Operation *terminator = block.getTerminator();
+    rootMapping.map(&block, targetBlock);
+    rootMapping.map(block.getArguments(), targetBlock->getArguments());
+  }
 
+  auto handleOneBlock = [&](Block &block) {
+    Block &targetBlock = *rootMapping.lookup(&block);
+    rootBuilder.setInsertionPointToStart(&targetBlock);
+    Operation *terminator = block.getTerminator();
     SmallVector<std::variant<SingleRegion, Operation *>> regions;
 
     auto it = block.begin();
@@ -298,12 +306,21 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
           Operation *cloned = rootBuilder.cloneWithoutRegions(*op, rootMapping);
           for (auto [region, clonedRegion] :
                llvm::zip(op->getRegions(), cloned->getRegions()))
-            parallelizeRegion(region, clonedRegion, rootMapping, loc);
+            parallelizeRegion(region, clonedRegion, rootMapping, loc, di);
         }
       }
     }
 
     rootBuilder.clone(*block.getTerminator(), rootMapping);
+  };
+
+  if (sourceRegion.hasOneBlock()) {
+    handleOneBlock(sourceRegion.front());
+  } else {
+    auto &domTree = di.getDomTree(&sourceRegion);
+    for (auto node : llvm::breadth_first(domTree.getRootNode())) {
+      handleOneBlock(*node->getBlock());
+    }
   }
 
   for (Block &targetBlock : targetRegion)
@@ -336,47 +353,46 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
 ///
 /// Note that we allocate temporary memory for values in omp.single's which need
 /// to be accessed in all threads in the closest omp.parallel
-void lowerWorkshare(mlir::omp::WorkshareOp wsOp) {
+LogicalResult lowerWorkshare(mlir::omp::WorkshareOp wsOp, DominanceInfo &di) {
   Location loc = wsOp->getLoc();
   IRMapping rootMapping;
 
   OpBuilder rootBuilder(wsOp);
 
-  // TODO We need something like an scf;execute here, but that is not registered
-  // so using fir.if for now but it looks like it does not support multiple
-  // blocks so it doesnt work for multi block case...
-  auto ifOp = rootBuilder.create<fir::IfOp>(
-      loc, rootBuilder.create<arith::ConstantIntOp>(loc, 1, 1), false);
-  ifOp.getThenRegion().front().erase();
-
-  parallelizeRegion(wsOp.getRegion(), ifOp.getThenRegion(), rootMapping, loc);
-
-  Operation *terminatorOp = ifOp.getThenRegion().back().getTerminator();
-  assert(isa<omp::TerminatorOp>(terminatorOp));
-  OpBuilder termBuilder(terminatorOp);
-
+  // TODO We need something like an scf.execute here, but that is not registered
+  // so using omp.workshare as a placeholder. We need this op as our
+  // parallelizeRegion works on regions and not blocks.
+  omp::WorkshareOp newOp =
+      rootBuilder.create<omp::WorkshareOp>(loc, omp::WorkshareOperands());
   if (!wsOp.getNowait())
-    termBuilder.create<omp::BarrierOp>(loc);
-
-  termBuilder.create<fir::ResultOp>(loc, ValueRange());
-
-  terminatorOp->erase();
+    rootBuilder.create<omp::BarrierOp>(loc);
+
+  parallelizeRegion(wsOp.getRegion(), newOp.getRegion(), rootMapping, loc, di);
+
+  if (wsOp.getRegion().getBlocks().size() != 1)
+    return failure();
+
+  // Inline the contents of the placeholder workshare op into its parent block.
+  Block *theBlock = &newOp.getRegion().front();
+  Operation *term = theBlock->getTerminator();
+  Block *parentBlock = wsOp->getBlock();
+  parentBlock->getOperations().splice(newOp->getIterator(),
+                                      theBlock->getOperations());
+  assert(term->getNumOperands() == 0);
+  term->erase();
+  newOp->erase();
   wsOp->erase();
-
-  return;
+  return success();
 }
 
 class LowerWorksharePass
     : public flangomp::impl::LowerWorkshareBase<LowerWorksharePass> {
 public:
   void runOnOperation() override {
-    SmallPtrSet<Operation *, 8> parents;
+    mlir::DominanceInfo &di = getAnalysis<mlir::DominanceInfo>();
     getOperation()->walk([&](mlir::omp::WorkshareOp wsOp) {
-      Operation *isolatedParent =
-          wsOp->getParentWithTrait<OpTrait::IsIsolatedFromAbove>();
-      parents.insert(isolatedParent);
-
-      lowerWorkshare(wsOp);
+      if (failed(lowerWorkshare(wsOp, di)))
+        signalPassFailure();
     });
   }
 };
diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index 063d3865065e01..b31e951223d56f 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -86,43 +86,46 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 
 // CHECK-LABEL:   func.func @wsfunc(
 // CHECK-SAME:                      %[[VAL_0:.*]]: !fir.ref<!fir.array<42xi32>>) {
-// CHECK:           %[[VAL_1:.*]] = arith.constant 1 : index
-// CHECK:           %[[VAL_2:.*]] = arith.constant 1 : i32
-// CHECK:           %[[VAL_3:.*]] = arith.constant 42 : index
-// CHECK:           %[[VAL_4:.*]] = arith.constant true
 // CHECK:           omp.parallel {
-// CHECK:             fir.if %[[VAL_4]] {
-// CHECK:               %[[VAL_5:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
-// CHECK:               omp.single copyprivate(%[[VAL_5]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
-// CHECK:                 %[[VAL_6:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
-// CHECK:                 %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_6]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:                 %[[VAL_8:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
-// CHECK:                 fir.store %[[VAL_8]] to %[[VAL_5]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
-// CHECK:                 %[[VAL_9:.*]]:2 = hlfir.declare %[[VAL_8]](%[[VAL_6]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:                 omp.terminator
-// CHECK:               }
-// CHECK:               %[[VAL_10:.*]] = fir.shape %[[VAL_3]] : (index) -> !fir.shape<1>
-// CHECK:               %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_10]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:               %[[VAL_12:.*]] = fir.load %[[VAL_5]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
-// CHECK:               %[[VAL_13:.*]]:2 = hlfir.declare %[[VAL_12]](%[[VAL_10]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:               omp.wsloop {
-// CHECK:                 omp.loop_nest (%[[VAL_14:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_3]]) inclusive step (%[[VAL_1]]) {
-// CHECK:                   %[[VAL_15:.*]] = hlfir.designate %[[VAL_11]]#0 (%[[VAL_14]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                   %[[VAL_16:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
-// CHECK:                   %[[VAL_17:.*]] = arith.subi %[[VAL_16]], %[[VAL_2]] : i32
-// CHECK:                   %[[VAL_18:.*]] = hlfir.designate %[[VAL_13]]#0 (%[[VAL_14]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                   hlfir.assign %[[VAL_17]] to %[[VAL_18]] temporary_lhs : i32, !fir.ref<i32>
-// CHECK:                   omp.yield
-// CHECK:                 }
-// CHECK:                 omp.terminator
-// CHECK:               }
-// CHECK:               omp.single nowait {
-// CHECK:                 hlfir.assign %[[VAL_13]]#0 to %[[VAL_11]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
-// CHECK:                 fir.freemem %[[VAL_13]]#0 : !fir.heap<!fir.array<42xi32>>
-// CHECK:                 omp.terminator
+// CHECK:             %[[VAL_1:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:             omp.single copyprivate(%[[VAL_1]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:               %[[VAL_2:.*]] = arith.constant 42 : index
+// CHECK:               %[[VAL_3:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
+// CHECK:               %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_3]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_5:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:               fir.store %[[VAL_5]] to %[[VAL_1]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:               %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_5]](%[[VAL_3]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             %[[VAL_7:.*]] = arith.constant 42 : index
+// CHECK:             %[[VAL_8:.*]] = arith.constant 1 : i32
+// CHECK:             %[[VAL_9:.*]] = fir.shape %[[VAL_7]] : (index) -> !fir.shape<1>
+// CHECK:             %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_9]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:             %[[VAL_11:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             %[[VAL_12:.*]]:2 = hlfir.declare %[[VAL_11]](%[[VAL_9]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:             %[[VAL_13:.*]] = arith.constant true
+// CHECK:             %[[VAL_14:.*]] = arith.constant 1 : index
+// CHECK:             omp.wsloop {
+// CHECK:               omp.loop_nest (%[[VAL_15:.*]]) : index = (%[[VAL_14]]) to (%[[VAL_7]]) inclusive step (%[[VAL_14]]) {
+// CHECK:                 %[[VAL_16:.*]] = hlfir.designate %[[VAL_10]]#0 (%[[VAL_15]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 %[[VAL_17:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_18:.*]] = arith.subi %[[VAL_17]], %[[VAL_8]] : i32
+// CHECK:                 %[[VAL_19:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_15]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 hlfir.assign %[[VAL_18]] to %[[VAL_19]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:                 omp.yield
 // CHECK:               }
-// CHECK:               omp.barrier
+// CHECK:               omp.terminator
 // CHECK:             }
+// CHECK:             omp.single nowait {
+// CHECK:               %[[VAL_20:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:               %[[VAL_21:.*]] = fir.insert_value %[[VAL_20]], %[[VAL_13]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:               hlfir.assign %[[VAL_12]]#0 to %[[VAL_10]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:               fir.freemem %[[VAL_12]]#0 : !fir.heap<!fir.array<42xi32>>
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             %[[VAL_22:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:             %[[VAL_23:.*]] = fir.insert_value %[[VAL_22]], %[[VAL_13]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:             omp.barrier
 // CHECK:             omp.terminator
 // CHECK:           }
 // CHECK:           return
@@ -146,46 +149,51 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 
 // CHECK-LABEL:   func.func @wsfunc(
 // CHECK-SAME:                      %[[VAL_0:.*]]: !fir.ref<!fir.array<42xi32>>) {
-// CHECK:           %[[VAL_1:.*]] = arith.constant 1 : index
-// CHECK:           %[[VAL_2:.*]] = arith.constant 42 : index
-// CHECK:           %[[VAL_3:.*]] = arith.constant 1 : i32
-// CHECK:           %[[VAL_4:.*]] = arith.constant true
-// CHECK:           fir.if %[[VAL_4]] {
-// CHECK:             %[[VAL_5:.*]] = fir.alloca i32
-// CHECK:             %[[VAL_6:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
-// CHECK:             omp.single copyprivate(%[[VAL_5]] -> @_workshare_copy_i32 : !fir.ref<i32>, %[[VAL_6]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
-// CHECK:               fir.store %[[VAL_3]] to %[[VAL_5]] : !fir.ref<i32>
-// CHECK:               %[[VAL_7:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
-// CHECK:               %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_7]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:               %[[VAL_9:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
-// CHECK:               fir.store %[[VAL_9]] to %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
-// CHECK:               %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_9]](%[[VAL_7]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:               omp.terminator
-// CHECK:             }
-// CHECK:             %[[VAL_11:.*]] = fir.shape %[[VAL_2]] : (index) -> !fir.shape<1>
-// CHECK:             %[[VAL_12:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_11]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
-// CHECK:             %[[VAL_13:.*]] = fir.load %[[VAL_6]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
-// CHECK:             %[[VAL_14:.*]]:2 = hlfir.declare %[[VAL_13]](%[[VAL_11]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:             omp.wsloop {
-// CHECK:               omp.loop_nest (%[[VAL_15:.*]]) : index = (%[[VAL_1]]) to (%[[VAL_2]]) inclusive step (%[[VAL_1]]) {
-// CHECK:                 %[[VAL_16:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_15]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 %[[VAL_17:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
-// CHECK:                 %[[VAL_18:.*]] = fir.load %[[VAL_5]] : !fir.ref<i32>
-// CHECK:                 %[[VAL_19:.*]] = arith.subi %[[VAL_17]], %[[VAL_18]] : i32
-// CHECK:                 %[[VAL_20:.*]] = arith.subi %[[VAL_19]], %[[VAL_3]] : i32
-// CHECK:                 %[[VAL_21:.*]] = hlfir.designate %[[VAL_14]]#0 (%[[VAL_15]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 hlfir.assign %[[VAL_20]] to %[[VAL_21]] temporary_lhs : i32, !fir.ref<i32>
-// CHECK:                 omp.yield
-// CHECK:               }
-// CHECK:               omp.terminator
-// CHECK:             }
-// CHECK:             omp.single nowait {
-// CHECK:               "test.test1"(%[[VAL_5]]) : (!fir.ref<i32>) -> ()
-// CHECK:               hlfir.assign %[[VAL_14]]#0 to %[[VAL_12]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
-// CHECK:               fir.freemem %[[VAL_14]]#0 : !fir.heap<!fir.array<42xi32>>
-// CHECK:               omp.terminator
+// CHECK:           %[[VAL_1:.*]] = fir.alloca i32
+// CHECK:           %[[VAL_2:.*]] = fir.alloca !fir.heap<!fir.array<42xi32>>
+// CHECK:           omp.single copyprivate(%[[VAL_1]] -> @_workshare_copy_i32 : !fir.ref<i32>, %[[VAL_2]] -> @_workshare_copy_heap_42xi32 : !fir.ref<!fir.heap<!fir.array<42xi32>>>) {
+// CHECK:             %[[VAL_3:.*]] = arith.constant 1 : i32
+// CHECK:             fir.store %[[VAL_3]] to %[[VAL_1]] : !fir.ref<i32>
+// CHECK:             %[[VAL_4:.*]] = arith.constant 42 : index
+// CHECK:             %[[VAL_5:.*]] = fir.shape %[[VAL_4]] : (index) -> !fir.shape<1>
+// CHECK:             %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_5]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:             %[[VAL_7:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:             fir.store %[[VAL_7]] to %[[VAL_2]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:             %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_7]](%[[VAL_5]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           %[[VAL_9:.*]] = arith.constant 1 : i32
+// CHECK:           %[[VAL_10:.*]] = arith.constant 42 : index
+// CHECK:           %[[VAL_11:.*]] = fir.shape %[[VAL_10]] : (index) -> !fir.shape<1>
+// CHECK:           %[[VAL_12:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_11]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:           %[[VAL_13:.*]] = fir.load %[[VAL_2]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
+// CHECK:           %[[VAL_14:.*]]:2 = hlfir.declare %[[VAL_13]](%[[VAL_11]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:           %[[VAL_15:.*]] = arith.constant true
+// CHECK:           %[[VAL_16:.*]] = arith.constant 1 : index
+// CHECK:           omp.wsloop {
+// CHECK:             omp.loop_nest (%[[VAL_17:.*]]) : index = (%[[VAL_16]]) to (%[[VAL_10]]) inclusive step (%[[VAL_16]]) {
+// CHECK:               %[[VAL_18:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_17]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:               %[[VAL_19:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
+// CHECK:               %[[VAL_20:.*]] = fir.load %[[VAL_1]] : !fir.ref<i32>
+// CHECK:               %[[VAL_21:.*]] = arith.subi %[[VAL_19]], %[[VAL_20]] : i32
+// CHECK:               %[[VAL_22:.*]] = arith.subi %[[VAL_21]], %[[VAL_9]] : i32
+// CHECK:               %[[VAL_23:.*]] = hlfir.designate %[[VAL_14]]#0 (%[[VAL_17]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:               hlfir.assign %[[VAL_22]] to %[[VAL_23]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:               omp.yield
 // CHECK:             }
-// CHECK:             omp.barrier
+// CHECK:             omp.terminator
 // CHECK:           }
+// CHECK:           omp.single nowait {
+// CHECK:             %[[VAL_24:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:             %[[VAL_25:.*]] = fir.insert_value %[[VAL_24]], %[[VAL_15]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:             "test.test1"(%[[VAL_1]]) : (!fir.ref<i32>) -> ()
+// CHECK:             hlfir.assign %[[VAL_14]]#0 to %[[VAL_12]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:             fir.freemem %[[VAL_14]]#0 : !fir.heap<!fir.array<42xi32>>
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           %[[VAL_26:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:           %[[VAL_27:.*]] = fir.insert_value %[[VAL_26]], %[[VAL_15]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:           omp.barrier
 // CHECK:           return
 // CHECK:         }
+
diff --git a/flang/test/Transforms/OpenMP/lower-workshare4.mlir b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
index 6cff0075b4fe50..d695a1c354517b 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare4.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
@@ -29,25 +29,22 @@ func.func @wsfunc() {
 // CHECK-LABEL:   func.func @wsfunc() {
 // CHECK:           %[[VAL_0:.*]] = fir.alloca i32
 // CHECK:           omp.parallel {
-// CHECK:             %[[VAL_1:.*]] = arith.constant true
-// CHECK:             fir.if %[[VAL_1]] {
-// CHECK:               omp.single {
-// CHECK:                 %[[VAL_2:.*]] = "test.test1"() : () -> i32
-// CHECK:                 %[[VAL_3:.*]] = arith.constant 2 : index
-// CHECK:                 "test.test3"(%[[VAL_3]]) : (index) -> ()
-// CHECK:                 omp.terminator
-// CHECK:               }
-// CHECK:               %[[VAL_4:.*]] = arith.constant 1 : index
-// CHECK:               %[[VAL_5:.*]] = arith.constant 42 : index
-// CHECK:               omp.wsloop nowait {
-// CHECK:                 omp.loop_nest (%[[VAL_6:.*]]) : index = (%[[VAL_4]]) to (%[[VAL_5]]) inclusive step (%[[VAL_4]]) {
-// CHECK:                   "test.test2"() : () -> ()
-// CHECK:                   omp.yield
-// CHECK:                 }
-// CHECK:                 omp.terminator
+// CHECK:             omp.single {
+// CHECK:               %[[VAL_1:.*]] = "test.test1"() : () -> i32
+// CHECK:               %[[VAL_2:.*]] = arith.constant 2 : index
+// CHECK:               "test.test3"(%[[VAL_2]]) : (index) -> ()
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             %[[VAL_3:.*]] = arith.constant 1 : index
+// CHECK:             %[[VAL_4:.*]] = arith.constant 42 : index
+// CHECK:             omp.wsloop nowait {
+// CHECK:               omp.loop_nest (%[[VAL_5:.*]]) : index = (%[[VAL_3]]) to (%[[VAL_4]]) inclusive step (%[[VAL_3]]) {
+// CHECK:                 "test.test2"() : () -> ()
+// CHECK:                 omp.yield
 // CHECK:               }
-// CHECK:               omp.barrier
+// CHECK:               omp.terminator
 // CHECK:             }
+// CHECK:             omp.barrier
 // CHECK:             omp.terminator
 // CHECK:           }
 // CHECK:           return

>From c03a38536c8266ed5d482ee2433bbf372b77a6c5 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Tue, 6 Aug 2024 13:52:20 +0900
Subject: [PATCH 104/116] Fix tests for custom assembly for loop wrapper

---
 flang/test/Transforms/OpenMP/lower-workshare.mlir  | 8 ++++----
 flang/test/Transforms/OpenMP/lower-workshare3.mlir | 4 ++--
 flang/test/Transforms/OpenMP/lower-workshare4.mlir | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index b31e951223d56f..9347863dc4a609 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -13,7 +13,7 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
       %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
       %true = arith.constant true
       %c1 = arith.constant 1 : index
-      "omp.workshare_loop_wrapper"() ({
+      omp.workshare_loop_wrapper {
         omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
           %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
           %8 = fir.load %7 : !fir.ref<i32>
@@ -23,7 +23,7 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
           omp.yield
         }
         omp.terminator
-      }) : () -> ()
+      }
       %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
       %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
       %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
@@ -52,7 +52,7 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
     %3:2 = hlfir.declare %2(%0) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
     %true = arith.constant true
     %c1 = arith.constant 1 : index
-    "omp.workshare_loop_wrapper"() ({
+    omp.workshare_loop_wrapper {
       omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
         %7 = hlfir.designate %1#0 (%arg1)  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
         %8 = fir.load %7 : !fir.ref<i32>
@@ -64,7 +64,7 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
         omp.yield
       }
       omp.terminator
-    }) : () -> ()
+    }
     %4 = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
     %5 = fir.insert_value %4, %true, [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
     %6 = fir.insert_value %5, %3#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
diff --git a/flang/test/Transforms/OpenMP/lower-workshare3.mlir b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
index aee95a464a31bd..afb41d95e7198e 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare3.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
@@ -52,7 +52,7 @@ func.func @wsfunc() {
 
       %c42 = arith.constant 42 : index
       %c1 = arith.constant 1 : index
-      "omp.workshare_loop_wrapper"() ({
+      omp.workshare_loop_wrapper {
         omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
           "test.test10"(%t1) : (i32) -> ()
           "test.test10"(%t5_pure_use) : (i32) -> ()
@@ -60,7 +60,7 @@ func.func @wsfunc() {
           omp.yield
         }
         omp.terminator
-      }) : () -> ()
+      }
 
       "test.test10"(%t2) : (i32) -> ()
       fir.if %true {
diff --git a/flang/test/Transforms/OpenMP/lower-workshare4.mlir b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
index d695a1c354517b..0a70007a9e78dd 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare4.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
@@ -12,13 +12,13 @@ func.func @wsfunc() {
       %c2 = arith.constant 2 : index
       "test.test3"(%c2) : (index) -> ()
 
-      "omp.workshare_loop_wrapper"() ({
+      omp.workshare_loop_wrapper {
         omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
           "test.test2"() : () -> ()
           omp.yield
         }
         omp.terminator
-      }) : () -> ()
+      }
       omp.terminator
     }
     omp.terminator

>From 16b39fb33c20f4aac0bca09dc01114bb107f4e72 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 14:43:50 +0900
Subject: [PATCH 105/116] Only run the lower workshare pass if openmp is
 enabled

---
 flang/include/flang/Tools/CLOptions.inc      |  7 ++++---
 flang/include/flang/Tools/CrossToolHelpers.h |  1 +
 flang/lib/Frontend/FrontendActions.cpp       | 10 +++++++++-
 flang/tools/bbc/bbc.cpp                      |  5 ++++-
 flang/tools/tco/tco.cpp                      |  1 +
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/flang/include/flang/Tools/CLOptions.inc b/flang/include/flang/Tools/CLOptions.inc
index d43e1c736020a2..bb00e079008a0b 100644
--- a/flang/include/flang/Tools/CLOptions.inc
+++ b/flang/include/flang/Tools/CLOptions.inc
@@ -337,7 +337,7 @@ inline void createDefaultFIROptimizerPassPipeline(
 /// \param optLevel - optimization level used for creating FIR optimization
 ///   passes pipeline
 inline void createHLFIRToFIRPassPipeline(
-    mlir::PassManager &pm, llvm::OptimizationLevel optLevel = defaultOptLevel) {
+    mlir::PassManager &pm, bool enableOpenMP, llvm::OptimizationLevel optLevel = defaultOptLevel) {
   if (optLevel.isOptimizingForSpeed()) {
     addCanonicalizerPassWithoutRegionSimplification(pm);
     addNestedPassToAllTopLevelOperations(
@@ -354,7 +354,8 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
-  pm.addPass(flangomp::createLowerWorkshare());
+  if (enableOpenMP)
+    pm.addPass(flangomp::createLowerWorkshare());
 }
 
 /// Create a pass pipeline for handling certain OpenMP transformations needed
@@ -426,7 +427,7 @@ inline void createDefaultFIRCodeGenPassPipeline(mlir::PassManager &pm,
 ///   passes pipeline
 inline void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
     MLIRToLLVMPassPipelineConfig &config, llvm::StringRef inputFilename = {}) {
-  fir::createHLFIRToFIRPassPipeline(pm, config.OptLevel);
+  fir::createHLFIRToFIRPassPipeline(pm, config.EnableOpenMP, config.OptLevel);
 
   // Add default optimizer pass pipeline.
   fir::createDefaultFIROptimizerPassPipeline(pm, config);
diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h
index 75fd783af237d0..0911b9bca67332 100644
--- a/flang/include/flang/Tools/CrossToolHelpers.h
+++ b/flang/include/flang/Tools/CrossToolHelpers.h
@@ -123,6 +123,7 @@ struct MLIRToLLVMPassPipelineConfig : public FlangEPCallBacks {
       false; ///< Set no-signed-zeros-fp-math attribute for functions.
   bool UnsafeFPMath = false; ///< Set unsafe-fp-math attribute for functions.
   bool NSWOnLoopVarInc = false; ///< Add nsw flag to loop variable increments.
+  bool EnableOpenMP = false; ///< Enable OpenMP lowering.
 };
 
 struct OffloadModuleOpts {
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 5c86bd947ce73f..db5c5649337528 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -711,7 +711,11 @@ void CodeGenAction::lowerHLFIRToFIR() {
   pm.enableVerifier(/*verifyPasses=*/true);
 
   // Create the pass pipeline
-  fir::createHLFIRToFIRPassPipeline(pm, level);
+  fir::createHLFIRToFIRPassPipeline(
+      pm,
+      ci.getInvocation().getFrontendOpts().features.IsEnabled(
+          Fortran::common::LanguageFeature::OpenMP),
+      level);
   (void)mlir::applyPassManagerCLOptions(pm);
 
   if (!mlir::succeeded(pm.run(*mlirModule))) {
@@ -824,6 +828,10 @@ void CodeGenAction::generateLLVMIR() {
     config.VScaleMax = vsr->second;
   }
 
+  if (ci.getInvocation().getFrontendOpts().features.IsEnabled(
+          Fortran::common::LanguageFeature::OpenMP))
+    config.EnableOpenMP = true;
+
   if (ci.getInvocation().getLoweringOpts().getNSWOnLoopVarInc())
     config.NSWOnLoopVarInc = true;
 
diff --git a/flang/tools/bbc/bbc.cpp b/flang/tools/bbc/bbc.cpp
index 736d68219581dd..1a7dac1b76bc20 100644
--- a/flang/tools/bbc/bbc.cpp
+++ b/flang/tools/bbc/bbc.cpp
@@ -440,7 +440,8 @@ static llvm::LogicalResult convertFortranSourceToMLIR(
 
     if (emitFIR && useHLFIR) {
       // lower HLFIR to FIR
-      fir::createHLFIRToFIRPassPipeline(pm, llvm::OptimizationLevel::O2);
+      fir::createHLFIRToFIRPassPipeline(pm, enableOpenMP,
+                                        llvm::OptimizationLevel::O2);
       if (mlir::failed(pm.run(mlirModule))) {
         llvm::errs() << "FATAL: lowering from HLFIR to FIR failed";
         return mlir::failure();
@@ -455,6 +456,8 @@ static llvm::LogicalResult convertFortranSourceToMLIR(
 
     // Add O2 optimizer pass pipeline.
     MLIRToLLVMPassPipelineConfig config(llvm::OptimizationLevel::O2);
+    if (enableOpenMP)
+      config.EnableOpenMP = true;
     config.NSWOnLoopVarInc = setNSW;
     fir::registerDefaultInlinerPass(config);
     fir::createDefaultFIROptimizerPassPipeline(pm, config);
diff --git a/flang/tools/tco/tco.cpp b/flang/tools/tco/tco.cpp
index a8c64333109aeb..06892cdc3f6a80 100644
--- a/flang/tools/tco/tco.cpp
+++ b/flang/tools/tco/tco.cpp
@@ -138,6 +138,7 @@ compileFIR(const mlir::PassPipelineCLParser &passPipeline) {
       return mlir::failure();
   } else {
     MLIRToLLVMPassPipelineConfig config(llvm::OptimizationLevel::O2);
+    config.EnableOpenMP = true;  // assume the input contains OpenMP
     config.AliasAnalysis = true; // enabled when optimizing for speed
     if (codeGenLLVM) {
       // Run only CodeGen passes.

>From 9bb403e153c4d885f78eab5870efd2c54275b8a1 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 16:16:38 +0900
Subject: [PATCH 106/116] Implement some missing functionality

---
 flang/include/flang/Optimizer/OpenMP/Passes.h |   3 +
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 113 ++++++++++++------
 2 files changed, 81 insertions(+), 35 deletions(-)

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.h b/flang/include/flang/Optimizer/OpenMP/Passes.h
index 11fa4e59f891ea..feb395f1a12dbd 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.h
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.h
@@ -25,6 +25,9 @@ namespace flangomp {
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/OpenMP/Passes.h.inc"
 
+/// Impelements the logic specified in the 2.8.3  workshare Construct section of
+/// the OpenMP standard which specifies what statements or constructs shall be
+/// divided into units of work.
 bool shouldUseWorkshareLowering(mlir::Operation *op);
 
 } // namespace flangomp
diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 5998489c13d382..e921b80d0c571e 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -5,7 +5,15 @@
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
-// Lower omp workshare construct.
+//
+// This file implements the lowering of omp.workshare to other omp constructs.
+//
+// This pass is tasked with parallelizing the loops nested in
+// workshare_loop_wrapper while both the Fortran to mlir lowering and the hlfir
+// to fir lowering pipelines are responsible for emitting the
+// workshare_loop_wrapper ops where appropriate according to the
+// `shouldUseWorkshareLowering` function.
+//
 //===----------------------------------------------------------------------===//
 
 #include <flang/Optimizer/Builder/FIRBuilder.h>
@@ -44,25 +52,52 @@ namespace flangomp {
 using namespace mlir;
 
 namespace flangomp {
+
+// Checks for nesting pattern below as we need to avoid sharing the work of
+// statements which are nested in some constructs such as omp.critical or
+// another omp.parallel.
+//
+// omp.workshare { // `wsOp`
+//   ...
+//     omp.T { // `parent`
+//       ...
+//         `op`
+//
+template <typename T>
+static bool isNestedIn(omp::WorkshareOp wsOp, Operation *op) {
+  T parent = op->getParentOfType<T>();
+  if (!parent)
+    return false;
+  return wsOp->isProperAncestor(parent);
+}
+
 bool shouldUseWorkshareLowering(Operation *op) {
-  // TODO this is insufficient, as we could have
-  // omp.parallel {
-  //   omp.workshare {
-  //     omp.parallel {
-  //       hlfir.elemental {}
-  //
-  // Then this hlfir.elemental shall _not_ use the lowering for workshare
-  //
-  // Standard says:
-  //   For a parallel construct, the construct is a unit of work with respect to
-  //   the workshare construct. The statements contained in the parallel
-  //   construct are executed by a new thread team.
-  //
-  // TODO similarly for single, critical, etc. Need to think through the
-  // patterns and implement this function.
-  //
-  return op->getParentOfType<omp::WorkshareOp>();
+  auto parentWorkshare = op->getParentOfType<omp::WorkshareOp>();
+
+  if (!parentWorkshare)
+    return false;
+
+  if (isNestedIn<omp::CriticalOp>(parentWorkshare, op))
+    return false;
+
+  // 2.8.3  workshare Construct
+  // For a parallel construct, the construct is a unit of work with respect to
+  // the workshare construct. The statements contained in the parallel construct
+  // are executed by a new thread team.
+  if (isNestedIn<omp::ParallelOp>(parentWorkshare, op))
+    return false;
+
+  // 2.8.2  single Construct
+  // Binding The binding thread set for a single region is the current team. A
+  // single region binds to the innermost enclosing parallel region.
+  // Description Only one of the encountering threads will execute the
+  // structured block associated with the single construct.
+  if (isNestedIn<omp::SingleOp>(parentWorkshare, op))
+    return false;
+
+  return true;
 }
+
 } // namespace flangomp
 
 namespace {
@@ -72,19 +107,27 @@ struct SingleRegion {
 };
 
 static bool mustParallelizeOp(Operation *op) {
-  // TODO as in shouldUseWorkshareLowering we be careful not to pick up
-  // workshare_loop_wrapper in nested omp.parallel ops
-  //
-  // e.g.
-  //
-  // omp.parallel {
-  //   omp.workshare {
-  //     omp.parallel {
-  //       omp.workshare {
-  //         omp.workshare_loop_wrapper {}
   return op
-      ->walk(
-          [](omp::WorkshareLoopWrapperOp) { return WalkResult::interrupt(); })
+      ->walk([&](Operation *nested) {
+        // We need to be careful not to pick up workshare_loop_wrapper in nested
+        // omp.parallel{omp.workshare} regions, i.e. make sure that `nested`
+        // binds to the workshare region we are currently handling.
+        //
+        // For example:
+        //
+        // omp.parallel {
+        //   omp.workshare { // currently handling this
+        //     omp.parallel {
+        //       omp.workshare { // nested workshare
+        //         omp.workshare_loop_wrapper {}
+        //
+        // Therefore, we skip if we encounter a nested omp.workshare.
+        if (isa<omp::WorkshareOp>(op))
+          WalkResult::skip();
+        if (isa<omp::WorkshareLoopWrapperOp>(op))
+          WalkResult::interrupt();
+        WalkResult::advance();
+      })
       .wasInterrupted();
 }
 
@@ -340,7 +383,8 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
 ///
 /// becomes
 ///
-/// omp.single {
+/// %tmp = fir.alloca
+/// omp.single copyprivate(%tmp) {
 ///   %a = fir.allocmem
 ///   fir.store %a %tmp
 /// }
@@ -352,16 +396,15 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
 /// }
 ///
 /// Note that we allocate temporary memory for values in omp.single's which need
-/// to be accessed in all threads in the closest omp.parallel
+/// to be accessed by all threads and broadcast them using single's copyprivate
 LogicalResult lowerWorkshare(mlir::omp::WorkshareOp wsOp, DominanceInfo &di) {
   Location loc = wsOp->getLoc();
   IRMapping rootMapping;
 
   OpBuilder rootBuilder(wsOp);
 
-  // TODO We need something like an scf.execute here, but that is not registered
-  // so using omp.workshare as a placeholder. We need this op as our
-  // parallelizeRegion works on regions and not blocks.
+  // This operation is just a placeholder which will be erased later. We need it
+  // because our `parallelizeRegion` function works on regions and not blocks.
   omp::WorkshareOp newOp =
       rootBuilder.create<omp::WorkshareOp>(loc, omp::WorkshareOperands());
   if (!wsOp.getNowait())

>From b35b82633027b13ed60d1dc7a7400698739ecc80 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 16:53:22 +0900
Subject: [PATCH 107/116] Fix tests

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp |  6 +--
 .../Transforms/OpenMP/lower-workshare2.mlir   |  2 +
 .../Transforms/OpenMP/lower-workshare3.mlir   |  2 +-
 .../Transforms/OpenMP/lower-workshare4.mlir   |  3 ++
 .../Transforms/OpenMP/lower-workshare6.mlir   | 51 +++++++++++++++++++
 5 files changed, 60 insertions(+), 4 deletions(-)
 create mode 100644 flang/test/Transforms/OpenMP/lower-workshare6.mlir

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index e921b80d0c571e..9557dd200cacee 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -123,10 +123,10 @@ static bool mustParallelizeOp(Operation *op) {
         //
         // Therefore, we skip if we encounter a nested omp.workshare.
         if (isa<omp::WorkshareOp>(op))
-          WalkResult::skip();
+          return WalkResult::skip();
         if (isa<omp::WorkshareLoopWrapperOp>(op))
-          WalkResult::interrupt();
-        WalkResult::advance();
+          return WalkResult::interrupt();
+        return WalkResult::advance();
       })
       .wasInterrupted();
 }
diff --git a/flang/test/Transforms/OpenMP/lower-workshare2.mlir b/flang/test/Transforms/OpenMP/lower-workshare2.mlir
index 325a40d4184453..940662e0bdccc2 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare2.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare2.mlir
@@ -1,5 +1,7 @@
 // RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
 
+// Check that we correctly handle nowait
+
 // CHECK-LABEL:   func.func @nonowait
 func.func @nonowait(%arg0: !fir.ref<!fir.array<42xi32>>) {
   // CHECK: omp.barrier
diff --git a/flang/test/Transforms/OpenMP/lower-workshare3.mlir b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
index afb41d95e7198e..09217757512881 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare3.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare3.mlir
@@ -1,7 +1,7 @@
 // RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
 
 
-// tests if the correct values are stored
+// Check if we store the correct values
 
 func.func @wsfunc() {
   omp.parallel {
diff --git a/flang/test/Transforms/OpenMP/lower-workshare4.mlir b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
index 0a70007a9e78dd..44f68cd2ca3654 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare4.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
@@ -1,5 +1,8 @@
 // RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
 
+// Check that we cleanup unused pure operations from either the parallel or
+// single regions
+
 func.func @wsfunc() {
   %a = fir.alloca i32
   omp.parallel {
diff --git a/flang/test/Transforms/OpenMP/lower-workshare6.mlir b/flang/test/Transforms/OpenMP/lower-workshare6.mlir
new file mode 100644
index 00000000000000..b66f00a47c114f
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/lower-workshare6.mlir
@@ -0,0 +1,51 @@
+// RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
+
+// Checks that the omp.workshare_loop_wrapper binds to the correct omp.workshare
+
+func.func @wsfunc() {
+  %c1 = arith.constant 1 : index
+  %c42 = arith.constant 42 : index
+  omp.parallel {
+    omp.workshare nowait {
+      omp.parallel {
+        omp.workshare nowait {
+          omp.workshare_loop_wrapper {
+            omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
+              "test.test2"() : () -> ()
+              omp.yield
+            }
+            omp.terminator
+          }
+          omp.terminator
+        }
+        omp.terminator
+      }
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL:   func.func @wsfunc() {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 1 : index
+// CHECK:           %[[VAL_1:.*]] = arith.constant 42 : index
+// CHECK:           omp.parallel {
+// CHECK:             omp.single nowait {
+// CHECK:               omp.parallel {
+// CHECK:                 omp.wsloop nowait {
+// CHECK:                   omp.loop_nest (%[[VAL_2:.*]]) : index = (%[[VAL_0]]) to (%[[VAL_1]]) inclusive step (%[[VAL_0]]) {
+// CHECK:                     "test.test2"() : () -> ()
+// CHECK:                     omp.yield
+// CHECK:                   }
+// CHECK:                   omp.terminator
+// CHECK:                 }
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+

>From c96c163ce65aed5535182cf11d6f9b7627c11df0 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 17:16:36 +0900
Subject: [PATCH 108/116] Fix test

---
 flang/test/Fir/basic-program.fir | 1 +
 1 file changed, 1 insertion(+)

diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index bca454c13ff9cc..4b18acb7c2b430 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -47,6 +47,7 @@ func.func @_QQmain() {
 // PASSES-NEXT:   LowerHLFIRIntrinsics
 // PASSES-NEXT:   BufferizeHLFIR
 // PASSES-NEXT:   ConvertHLFIRtoFIR
+// PASSES-NEXT:   LowerWorkshare
 // PASSES-NEXT:   CSE
 // PASSES-NEXT:   (S) 0 num-cse'd - Number of operations CSE'd
 // PASSES-NEXT:   (S) 0 num-dce'd - Number of operations DCE'd

>From 8a28e29f423df79d4c2baf4a540b4f87b18a3260 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Tue, 20 Aug 2024 09:28:15 +0900
Subject: [PATCH 109/116] Iterate backwards to find all trivially dead ops

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp |  3 +-
 .../Transforms/OpenMP/lower-workshare4.mlir   | 56 ++++++++++---------
 2 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index 9557dd200cacee..bfb9708af70923 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -200,7 +200,8 @@ static bool isTransitivelyUsedOutside(Value v, SingleRegion sr) {
 /// We clone pure operations in both the parallel and single blocks. this
 /// functions cleans them up if they end up with no uses
 static void cleanupBlock(Block *block) {
-  for (Operation &op : llvm::make_early_inc_range(*block))
+  for (Operation &op : llvm::make_early_inc_range(
+           llvm::make_range(block->rbegin(), block->rend())))
     if (isOpTriviallyDead(&op))
       op.erase();
 }
diff --git a/flang/test/Transforms/OpenMP/lower-workshare4.mlir b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
index 44f68cd2ca3654..81bc20cb34b65d 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare4.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare4.mlir
@@ -1,8 +1,33 @@
 // RUN: fir-opt --split-input-file --lower-workshare --allow-unregistered-dialect %s | FileCheck %s
 
-// Check that we cleanup unused pure operations from either the parallel or
-// single regions
+// Check that we cleanup unused pure operations from the parallel and single
+// regions
 
+// CHECK-LABEL:   func.func @wsfunc() {
+// CHECK:           %[[VAL_0:.*]] = fir.alloca i32
+// CHECK:           omp.parallel {
+// CHECK:             omp.single {
+// CHECK:               %[[VAL_1:.*]] = "test.test1"() : () -> i32
+// CHECK:               %[[VAL_2:.*]] = arith.constant 2 : index
+// CHECK:               %[[VAL_3:.*]] = arith.constant 3 : index
+// CHECK:               %[[VAL_4:.*]] = arith.addi %[[VAL_2]], %[[VAL_3]] : index
+// CHECK:               "test.test3"(%[[VAL_4]]) : (index) -> ()
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             %[[VAL_5:.*]] = arith.constant 1 : index
+// CHECK:             %[[VAL_6:.*]] = arith.constant 42 : index
+// CHECK:             omp.wsloop nowait {
+// CHECK:               omp.loop_nest (%[[VAL_7:.*]]) : index = (%[[VAL_5]]) to (%[[VAL_6]]) inclusive step (%[[VAL_5]]) {
+// CHECK:                 "test.test2"() : () -> ()
+// CHECK:                 omp.yield
+// CHECK:               }
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             omp.barrier
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
 func.func @wsfunc() {
   %a = fir.alloca i32
   omp.parallel {
@@ -13,7 +38,9 @@ func.func @wsfunc() {
       %c42 = arith.constant 42 : index
 
       %c2 = arith.constant 2 : index
-      "test.test3"(%c2) : (index) -> ()
+      %c3 = arith.constant 3 : index
+      %add = arith.addi %c2, %c3 : index
+      "test.test3"(%add) : (index) -> ()
 
       omp.workshare_loop_wrapper {
         omp.loop_nest (%arg1) : index = (%c1) to (%c42) inclusive step (%c1) {
@@ -29,27 +56,4 @@ func.func @wsfunc() {
   return
 }
 
-// CHECK-LABEL:   func.func @wsfunc() {
-// CHECK:           %[[VAL_0:.*]] = fir.alloca i32
-// CHECK:           omp.parallel {
-// CHECK:             omp.single {
-// CHECK:               %[[VAL_1:.*]] = "test.test1"() : () -> i32
-// CHECK:               %[[VAL_2:.*]] = arith.constant 2 : index
-// CHECK:               "test.test3"(%[[VAL_2]]) : (index) -> ()
-// CHECK:               omp.terminator
-// CHECK:             }
-// CHECK:             %[[VAL_3:.*]] = arith.constant 1 : index
-// CHECK:             %[[VAL_4:.*]] = arith.constant 42 : index
-// CHECK:             omp.wsloop nowait {
-// CHECK:               omp.loop_nest (%[[VAL_5:.*]]) : index = (%[[VAL_3]]) to (%[[VAL_4]]) inclusive step (%[[VAL_3]]) {
-// CHECK:                 "test.test2"() : () -> ()
-// CHECK:                 omp.yield
-// CHECK:               }
-// CHECK:               omp.terminator
-// CHECK:             }
-// CHECK:             omp.barrier
-// CHECK:             omp.terminator
-// CHECK:           }
-// CHECK:           return
-// CHECK:         }
 

>From c78d3aedf4131775ca7e25fea2bca2eb640ae969 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Tue, 20 Aug 2024 12:17:45 +0900
Subject: [PATCH 110/116] Add expalanation comment for createCopyFun

---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index bfb9708af70923..284b2bed9628fe 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -136,6 +136,9 @@ static bool isSafeToParallelize(Operation *op) {
          isMemoryEffectFree(op);
 }
 
+/// Simple shallow copies suffice for our purposes in this pass, so we implement
+/// this simpler alternative to the full fledged `createCopyFunc` in the
+/// frontend
 static mlir::func::FuncOp createCopyFunc(mlir::Location loc, mlir::Type varType,
                                          fir::FirOpBuilder builder) {
   mlir::ModuleOp module = builder.getModule();

>From 1ef3d69b6cd0f4fc3a8755afc6bad9087e781aa0 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Tue, 20 Aug 2024 16:57:25 +0900
Subject: [PATCH 111/116] Update test

---
 .../Transforms/OpenMP/lower-workshare.mlir    | 42 +++++++------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/flang/test/Transforms/OpenMP/lower-workshare.mlir b/flang/test/Transforms/OpenMP/lower-workshare.mlir
index 9347863dc4a609..c189e54aaeb0d4 100644
--- a/flang/test/Transforms/OpenMP/lower-workshare.mlir
+++ b/flang/test/Transforms/OpenMP/lower-workshare.mlir
@@ -103,28 +103,23 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK:             %[[VAL_10:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_9]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
 // CHECK:             %[[VAL_11:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
 // CHECK:             %[[VAL_12:.*]]:2 = hlfir.declare %[[VAL_11]](%[[VAL_9]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:             %[[VAL_13:.*]] = arith.constant true
-// CHECK:             %[[VAL_14:.*]] = arith.constant 1 : index
+// CHECK:             %[[VAL_13:.*]] = arith.constant 1 : index
 // CHECK:             omp.wsloop {
-// CHECK:               omp.loop_nest (%[[VAL_15:.*]]) : index = (%[[VAL_14]]) to (%[[VAL_7]]) inclusive step (%[[VAL_14]]) {
-// CHECK:                 %[[VAL_16:.*]] = hlfir.designate %[[VAL_10]]#0 (%[[VAL_15]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 %[[VAL_17:.*]] = fir.load %[[VAL_16]] : !fir.ref<i32>
-// CHECK:                 %[[VAL_18:.*]] = arith.subi %[[VAL_17]], %[[VAL_8]] : i32
-// CHECK:                 %[[VAL_19:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_15]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:                 hlfir.assign %[[VAL_18]] to %[[VAL_19]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:               omp.loop_nest (%[[VAL_14:.*]]) : index = (%[[VAL_13]]) to (%[[VAL_7]]) inclusive step (%[[VAL_13]]) {
+// CHECK:                 %[[VAL_15:.*]] = hlfir.designate %[[VAL_10]]#0 (%[[VAL_14]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 %[[VAL_16:.*]] = fir.load %[[VAL_15]] : !fir.ref<i32>
+// CHECK:                 %[[VAL_17:.*]] = arith.subi %[[VAL_16]], %[[VAL_8]] : i32
+// CHECK:                 %[[VAL_18:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_14]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                 hlfir.assign %[[VAL_17]] to %[[VAL_18]] temporary_lhs : i32, !fir.ref<i32>
 // CHECK:                 omp.yield
 // CHECK:               }
 // CHECK:               omp.terminator
 // CHECK:             }
 // CHECK:             omp.single nowait {
-// CHECK:               %[[VAL_20:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
-// CHECK:               %[[VAL_21:.*]] = fir.insert_value %[[VAL_20]], %[[VAL_13]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:               hlfir.assign %[[VAL_12]]#0 to %[[VAL_10]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
 // CHECK:               fir.freemem %[[VAL_12]]#0 : !fir.heap<!fir.array<42xi32>>
 // CHECK:               omp.terminator
 // CHECK:             }
-// CHECK:             %[[VAL_22:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
-// CHECK:             %[[VAL_23:.*]] = fir.insert_value %[[VAL_22]], %[[VAL_13]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:             omp.barrier
 // CHECK:             omp.terminator
 // CHECK:           }
@@ -168,31 +163,26 @@ func.func @wsfunc(%arg0: !fir.ref<!fir.array<42xi32>>) {
 // CHECK:           %[[VAL_12:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_11]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
 // CHECK:           %[[VAL_13:.*]] = fir.load %[[VAL_2]] : !fir.ref<!fir.heap<!fir.array<42xi32>>>
 // CHECK:           %[[VAL_14:.*]]:2 = hlfir.declare %[[VAL_13]](%[[VAL_11]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
-// CHECK:           %[[VAL_15:.*]] = arith.constant true
-// CHECK:           %[[VAL_16:.*]] = arith.constant 1 : index
+// CHECK:           %[[VAL_15:.*]] = arith.constant 1 : index
 // CHECK:           omp.wsloop {
-// CHECK:             omp.loop_nest (%[[VAL_17:.*]]) : index = (%[[VAL_16]]) to (%[[VAL_10]]) inclusive step (%[[VAL_16]]) {
-// CHECK:               %[[VAL_18:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_17]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:               %[[VAL_19:.*]] = fir.load %[[VAL_18]] : !fir.ref<i32>
-// CHECK:               %[[VAL_20:.*]] = fir.load %[[VAL_1]] : !fir.ref<i32>
-// CHECK:               %[[VAL_21:.*]] = arith.subi %[[VAL_19]], %[[VAL_20]] : i32
-// CHECK:               %[[VAL_22:.*]] = arith.subi %[[VAL_21]], %[[VAL_9]] : i32
-// CHECK:               %[[VAL_23:.*]] = hlfir.designate %[[VAL_14]]#0 (%[[VAL_17]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
-// CHECK:               hlfir.assign %[[VAL_22]] to %[[VAL_23]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:             omp.loop_nest (%[[VAL_16:.*]]) : index = (%[[VAL_15]]) to (%[[VAL_10]]) inclusive step (%[[VAL_15]]) {
+// CHECK:               %[[VAL_17:.*]] = hlfir.designate %[[VAL_12]]#0 (%[[VAL_16]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:               %[[VAL_18:.*]] = fir.load %[[VAL_17]] : !fir.ref<i32>
+// CHECK:               %[[VAL_19:.*]] = fir.load %[[VAL_1]] : !fir.ref<i32>
+// CHECK:               %[[VAL_20:.*]] = arith.subi %[[VAL_18]], %[[VAL_19]] : i32
+// CHECK:               %[[VAL_21:.*]] = arith.subi %[[VAL_20]], %[[VAL_9]] : i32
+// CHECK:               %[[VAL_22:.*]] = hlfir.designate %[[VAL_14]]#0 (%[[VAL_16]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:               hlfir.assign %[[VAL_21]] to %[[VAL_22]] temporary_lhs : i32, !fir.ref<i32>
 // CHECK:               omp.yield
 // CHECK:             }
 // CHECK:             omp.terminator
 // CHECK:           }
 // CHECK:           omp.single nowait {
-// CHECK:             %[[VAL_24:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
-// CHECK:             %[[VAL_25:.*]] = fir.insert_value %[[VAL_24]], %[[VAL_15]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:             "test.test1"(%[[VAL_1]]) : (!fir.ref<i32>) -> ()
 // CHECK:             hlfir.assign %[[VAL_14]]#0 to %[[VAL_12]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
 // CHECK:             fir.freemem %[[VAL_14]]#0 : !fir.heap<!fir.array<42xi32>>
 // CHECK:             omp.terminator
 // CHECK:           }
-// CHECK:           %[[VAL_26:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
-// CHECK:           %[[VAL_27:.*]] = fir.insert_value %[[VAL_26]], %[[VAL_15]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:           omp.barrier
 // CHECK:           return
 // CHECK:         }

>From 1ee626d3d2caef7e0ca7c34ab0e110ea47f41ccb Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Sun, 4 Aug 2024 17:33:52 +0900
Subject: [PATCH 112/116] Add workshare loop wrapper lowerings

---
 .../lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp  |  6 ++++--
 .../HLFIR/Transforms/OptimizedBufferization.cpp        | 10 +++++++---
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index b608677c526310..1848dbe2c7a2c2 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -26,12 +26,13 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/Dominance.h"
 #include "mlir/IR/PatternMatch.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/DialectConversion.h"
-#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "llvm/ADT/TypeSwitch.h"
 
 namespace hlfir {
@@ -792,7 +793,8 @@ struct ElementalOpConversion
     // Generate a loop nest looping around the fir.elemental shape and clone
     // fir.elemental region inside the inner loop.
     hlfir::LoopNest loopNest =
-        hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
+        hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(),
+                           flangomp::shouldUseWorkshareLowering(elemental));
     auto insPt = builder.saveInsertionPoint();
     builder.setInsertionPointToStart(loopNest.body);
     auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 3a0a98dc594463..f014724861e333 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -20,6 +20,7 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Transforms/Utils.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/IR/Dominance.h"
@@ -482,7 +483,8 @@ llvm::LogicalResult ElementalAssignBufferization::matchAndRewrite(
   // Generate a loop nest looping around the hlfir.elemental shape and clone
   // hlfir.elemental region inside the inner loop
   hlfir::LoopNest loopNest =
-      hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
+      hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(),
+                         flangomp::shouldUseWorkshareLowering(elemental));
   builder.setInsertionPointToStart(loopNest.body);
   auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
                                         loopNest.oneBasedIndices);
@@ -553,7 +555,8 @@ llvm::LogicalResult BroadcastAssignBufferization::matchAndRewrite(
   llvm::SmallVector<mlir::Value> extents =
       hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
-      hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
+      hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true,
+                         flangomp::shouldUseWorkshareLowering(assign));
   builder.setInsertionPointToStart(loopNest.body);
   auto arrayElement =
       hlfir::getElementAt(loc, builder, lhs, loopNest.oneBasedIndices);
@@ -648,7 +651,8 @@ llvm::LogicalResult VariableAssignBufferization::matchAndRewrite(
   llvm::SmallVector<mlir::Value> extents =
       hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
-      hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
+      hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true,
+                         flangomp::shouldUseWorkshareLowering(assign));
   builder.setInsertionPointToStart(loopNest.body);
   auto rhsArrayElement =
       hlfir::getElementAt(loc, builder, rhs, loopNest.oneBasedIndices);

>From f9cf0f4a20c0c9b4884a5c65dfcb9125cecb6161 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 15:01:31 +0900
Subject: [PATCH 113/116] Bufferize test

---
 flang/test/HLFIR/bufferize-workshare.fir | 58 ++++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 flang/test/HLFIR/bufferize-workshare.fir

diff --git a/flang/test/HLFIR/bufferize-workshare.fir b/flang/test/HLFIR/bufferize-workshare.fir
new file mode 100644
index 00000000000000..86a2f031478dd7
--- /dev/null
+++ b/flang/test/HLFIR/bufferize-workshare.fir
@@ -0,0 +1,58 @@
+// RUN: fir-opt --bufferize-hlfir %s | FileCheck %s
+
+// CHECK-LABEL:   func.func @simple(
+// CHECK-SAME:                      %[[VAL_0:.*]]: !fir.ref<!fir.array<42xi32>>) {
+// CHECK:           omp.parallel {
+// CHECK:             omp.workshare {
+// CHECK:               %[[VAL_1:.*]] = arith.constant 42 : index
+// CHECK:               %[[VAL_2:.*]] = arith.constant 1 : i32
+// CHECK:               %[[VAL_3:.*]] = fir.shape %[[VAL_1]] : (index) -> !fir.shape<1>
+// CHECK:               %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_0]](%[[VAL_3]]) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_5:.*]] = fir.allocmem !fir.array<42xi32> {bindc_name = ".tmp.array", uniq_name = ""}
+// CHECK:               %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_5]](%[[VAL_3]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
+// CHECK:               %[[VAL_7:.*]] = arith.constant true
+// CHECK:               %[[VAL_8:.*]] = arith.constant 1 : index
+// CHECK:               omp.wsloop {
+// CHECK:                 omp.loop_nest (%[[VAL_9:.*]]) : index = (%[[VAL_8]]) to (%[[VAL_1]]) inclusive step (%[[VAL_8]]) {
+// CHECK:                   %[[VAL_10:.*]] = hlfir.designate %[[VAL_4]]#0 (%[[VAL_9]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                   %[[VAL_11:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
+// CHECK:                   %[[VAL_12:.*]] = arith.subi %[[VAL_11]], %[[VAL_2]] : i32
+// CHECK:                   %[[VAL_13:.*]] = hlfir.designate %[[VAL_6]]#0 (%[[VAL_9]])  : (!fir.heap<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+// CHECK:                   hlfir.assign %[[VAL_12]] to %[[VAL_13]] temporary_lhs : i32, !fir.ref<i32>
+// CHECK:                   omp.yield
+// CHECK:                 }
+// CHECK:                 omp.terminator
+// CHECK:               }
+// CHECK:               %[[VAL_14:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:               %[[VAL_15:.*]] = fir.insert_value %[[VAL_14]], %[[VAL_7]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:               %[[VAL_16:.*]] = fir.insert_value %[[VAL_15]], %[[VAL_6]]#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
+// CHECK:               hlfir.assign %[[VAL_6]]#0 to %[[VAL_4]]#0 : !fir.heap<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>
+// CHECK:               fir.freemem %[[VAL_6]]#0 : !fir.heap<!fir.array<42xi32>>
+// CHECK:               omp.terminator
+// CHECK:             }
+// CHECK:             omp.terminator
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }
+func.func @simple(%arg: !fir.ref<!fir.array<42xi32>>) {
+  omp.parallel {
+    omp.workshare {
+      %c42 = arith.constant 42 : index
+      %c1_i32 = arith.constant 1 : i32
+      %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+      %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+      %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+      ^bb0(%i: index):
+        %ref = hlfir.designate %array#0 (%i) : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
+        %val = fir.load %ref : !fir.ref<i32>
+        %sub = arith.subi %val, %c1_i32 : i32
+        hlfir.yield_element %sub : i32
+      }
+      hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+      hlfir.destroy %elemental : !hlfir.expr<42xi32>
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}

>From 8157417ccd1c143a06e434a4dc38effe56bb9b80 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 15:03:42 +0900
Subject: [PATCH 114/116] Bufferize test

---
 flang/test/HLFIR/bufferize-workshare.fir | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/flang/test/HLFIR/bufferize-workshare.fir b/flang/test/HLFIR/bufferize-workshare.fir
index 86a2f031478dd7..33b368a62eaabf 100644
--- a/flang/test/HLFIR/bufferize-workshare.fir
+++ b/flang/test/HLFIR/bufferize-workshare.fir
@@ -12,7 +12,7 @@
 // CHECK:               %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_5]](%[[VAL_3]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
 // CHECK:               %[[VAL_7:.*]] = arith.constant true
 // CHECK:               %[[VAL_8:.*]] = arith.constant 1 : index
-// CHECK:               omp.wsloop {
+// CHECK:               "omp.workshare_loop_wrapper"() ({
 // CHECK:                 omp.loop_nest (%[[VAL_9:.*]]) : index = (%[[VAL_8]]) to (%[[VAL_1]]) inclusive step (%[[VAL_8]]) {
 // CHECK:                   %[[VAL_10:.*]] = hlfir.designate %[[VAL_4]]#0 (%[[VAL_9]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
 // CHECK:                   %[[VAL_11:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
@@ -22,7 +22,7 @@
 // CHECK:                   omp.yield
 // CHECK:                 }
 // CHECK:                 omp.terminator
-// CHECK:               }
+// CHECK:               }) : () -> ()
 // CHECK:               %[[VAL_14:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:               %[[VAL_15:.*]] = fir.insert_value %[[VAL_14]], %[[VAL_7]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:               %[[VAL_16:.*]] = fir.insert_value %[[VAL_15]], %[[VAL_6]]#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>

>From 106d832c3de8209e609332dcd3184658bd672f5e Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 15:04:39 +0900
Subject: [PATCH 115/116] Bufferize test

---
 flang/test/HLFIR/bufferize-workshare.fir | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/flang/test/HLFIR/bufferize-workshare.fir b/flang/test/HLFIR/bufferize-workshare.fir
index 33b368a62eaabf..02fe6b1d1799c3 100644
--- a/flang/test/HLFIR/bufferize-workshare.fir
+++ b/flang/test/HLFIR/bufferize-workshare.fir
@@ -12,7 +12,7 @@
 // CHECK:               %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_5]](%[[VAL_3]]) {uniq_name = ".tmp.array"} : (!fir.heap<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.heap<!fir.array<42xi32>>, !fir.heap<!fir.array<42xi32>>)
 // CHECK:               %[[VAL_7:.*]] = arith.constant true
 // CHECK:               %[[VAL_8:.*]] = arith.constant 1 : index
-// CHECK:               "omp.workshare_loop_wrapper"() ({
+// CHECK:               omp.workshare_loop_wrapper {
 // CHECK:                 omp.loop_nest (%[[VAL_9:.*]]) : index = (%[[VAL_8]]) to (%[[VAL_1]]) inclusive step (%[[VAL_8]]) {
 // CHECK:                   %[[VAL_10:.*]] = hlfir.designate %[[VAL_4]]#0 (%[[VAL_9]])  : (!fir.ref<!fir.array<42xi32>>, index) -> !fir.ref<i32>
 // CHECK:                   %[[VAL_11:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
@@ -22,7 +22,7 @@
 // CHECK:                   omp.yield
 // CHECK:                 }
 // CHECK:                 omp.terminator
-// CHECK:               }) : () -> ()
+// CHECK:               }
 // CHECK:               %[[VAL_14:.*]] = fir.undefined tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:               %[[VAL_15:.*]] = fir.insert_value %[[VAL_14]], %[[VAL_7]], [1 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, i1) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>
 // CHECK:               %[[VAL_16:.*]] = fir.insert_value %[[VAL_15]], %[[VAL_6]]#0, [0 : index] : (tuple<!fir.heap<!fir.array<42xi32>>, i1>, !fir.heap<!fir.array<42xi32>>) -> tuple<!fir.heap<!fir.array<42xi32>>, i1>

>From aa824a033a6e9111090e1697442c19392b9f08ff Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov <ivanov.i.aa at m.titech.ac.jp>
Date: Mon, 19 Aug 2024 16:53:40 +0900
Subject: [PATCH 116/116] Add test for should use workshare lowering

---
 .../OpenMP/should-use-workshare-lowering.mlir | 140 ++++++++++++++++++
 1 file changed, 140 insertions(+)
 create mode 100644 flang/test/Transforms/OpenMP/should-use-workshare-lowering.mlir

diff --git a/flang/test/Transforms/OpenMP/should-use-workshare-lowering.mlir b/flang/test/Transforms/OpenMP/should-use-workshare-lowering.mlir
new file mode 100644
index 00000000000000..2ba445faf780ea
--- /dev/null
+++ b/flang/test/Transforms/OpenMP/should-use-workshare-lowering.mlir
@@ -0,0 +1,140 @@
+// RUN: fir-opt --bufferize-hlfir %s | FileCheck %s
+
+// Checks that we correctly identify when to use the lowering to
+// omp.workshare_loop_wrapper
+
+// CHECK-LABEL: @should_parallelize_0
+// CHECK: omp.workshare_loop_wrapper
+func.func @should_parallelize_0(%arg: !fir.ref<!fir.array<42xi32>>, %idx : index) {
+  omp.workshare {
+    %c42 = arith.constant 42 : index
+    %c1_i32 = arith.constant 1 : i32
+    %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+    %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+    %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+    ^bb0(%i: index):
+      hlfir.yield_element %c1_i32 : i32
+    }
+    hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+    hlfir.destroy %elemental : !hlfir.expr<42xi32>
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: @should_parallelize_1
+// CHECK: omp.workshare_loop_wrapper
+func.func @should_parallelize_1(%arg: !fir.ref<!fir.array<42xi32>>, %idx : index) {
+  omp.parallel {
+    omp.workshare {
+      %c42 = arith.constant 42 : index
+      %c1_i32 = arith.constant 1 : i32
+      %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+      %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+      %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+      ^bb0(%i: index):
+        hlfir.yield_element %c1_i32 : i32
+      }
+      hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+      hlfir.destroy %elemental : !hlfir.expr<42xi32>
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+
+// CHECK-LABEL: @should_not_parallelize_0
+// CHECK-NOT: omp.workshare_loop_wrapper
+func.func @should_not_parallelize_0(%arg: !fir.ref<!fir.array<42xi32>>, %idx : index) {
+  omp.workshare {
+    omp.single {
+      %c42 = arith.constant 42 : index
+      %c1_i32 = arith.constant 1 : i32
+      %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+      %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+      %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+      ^bb0(%i: index):
+        hlfir.yield_element %c1_i32 : i32
+      }
+      hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+      hlfir.destroy %elemental : !hlfir.expr<42xi32>
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: @should_not_parallelize_1
+// CHECK-NOT: omp.workshare_loop_wrapper
+func.func @should_not_parallelize_1(%arg: !fir.ref<!fir.array<42xi32>>, %idx : index) {
+  omp.workshare {
+    omp.critical {
+      %c42 = arith.constant 42 : index
+      %c1_i32 = arith.constant 1 : i32
+      %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+      %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+      %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+      ^bb0(%i: index):
+        hlfir.yield_element %c1_i32 : i32
+      }
+      hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+      hlfir.destroy %elemental : !hlfir.expr<42xi32>
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: @should_not_parallelize_2
+// CHECK-NOT: omp.workshare_loop_wrapper
+func.func @should_not_parallelize_2(%arg: !fir.ref<!fir.array<42xi32>>, %idx : index) {
+  omp.workshare {
+    omp.parallel {
+      %c42 = arith.constant 42 : index
+      %c1_i32 = arith.constant 1 : i32
+      %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+      %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+      %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+      ^bb0(%i: index):
+        hlfir.yield_element %c1_i32 : i32
+      }
+      hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+      hlfir.destroy %elemental : !hlfir.expr<42xi32>
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}
+
+// CHECK-LABEL: @should_not_parallelize_3
+// CHECK-NOT: omp.workshare_loop_wrapper
+func.func @should_not_parallelize_3(%arg: !fir.ref<!fir.array<42xi32>>, %idx : index) {
+  omp.workshare {
+    omp.parallel {
+      omp.workshare {
+        omp.parallel {
+          %c42 = arith.constant 42 : index
+          %c1_i32 = arith.constant 1 : i32
+          %shape = fir.shape %c42 : (index) -> !fir.shape<1>
+          %array:2 = hlfir.declare %arg(%shape) {uniq_name = "array"} : (!fir.ref<!fir.array<42xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<42xi32>>, !fir.ref<!fir.array<42xi32>>)
+          %elemental = hlfir.elemental %shape unordered : (!fir.shape<1>) -> !hlfir.expr<42xi32> {
+          ^bb0(%i: index):
+            hlfir.yield_element %c1_i32 : i32
+          }
+          hlfir.assign %elemental to %array#0 : !hlfir.expr<42xi32>, !fir.ref<!fir.array<42xi32>>
+          hlfir.destroy %elemental : !hlfir.expr<42xi32>
+          omp.terminator
+        }
+        omp.terminator
+      }
+      omp.terminator
+    }
+    omp.terminator
+  }
+  return
+}



More information about the llvm-branch-commits mailing list