[flang-commits] [clang] [flang] [libc] [libcxx] [libcxxabi] [libunwind] [lldb] [llvm] [mlir] [clang][frontend] Add support for attribute plugins for statement attributes (PR #110334)

Eric Astor via flang-commits flang-commits at lists.llvm.org
Thu Oct 10 11:11:08 PDT 2024


Valentin Clement =?utf-8?b?KOODkOODrOODsw=?=,Renato Golin
 <rengolin at systemcall.eu>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/110334 at github.com>


https://github.com/ericastor updated https://github.com/llvm/llvm-project/pull/110334

>From 0411b2939e10ca335e84731502126145509bef2d Mon Sep 17 00:00:00 2001
From: Eric Astor <epastor at google.com>
Date: Fri, 27 Sep 2024 22:35:28 +0000
Subject: [PATCH 01/20] [clang][frontend] Add support for attribute plugins for
 statement attributes

We already have support for declaration attributes; this is just a matter of extending the plugin infrastructure to cover one more case.
---
 clang/docs/ClangPlugins.rst                | 17 +++++---
 clang/examples/Attribute/Attribute.cpp     | 49 ++++++++++++++++++++++
 clang/include/clang/Basic/ParsedAttrInfo.h | 10 +++++
 clang/lib/Sema/SemaStmtAttr.cpp            |  4 ++
 clang/test/Frontend/plugin-attribute.cpp   | 40 ++++++++++++++----
 5 files changed, 108 insertions(+), 12 deletions(-)

diff --git a/clang/docs/ClangPlugins.rst b/clang/docs/ClangPlugins.rst
index 001e66e434efb1..92e41fb5877fe8 100644
--- a/clang/docs/ClangPlugins.rst
+++ b/clang/docs/ClangPlugins.rst
@@ -92,11 +92,6 @@ The members of ``ParsedAttrInfo`` that a plugin attribute must define are:
    attribute, each of which consists of an attribute syntax and how the
    attribute name is spelled for that syntax. If the syntax allows a scope then
    the spelling must be "scope::attr" if a scope is present or "::attr" if not.
- * ``handleDeclAttribute``, which is the function that applies the attribute to
-   a declaration. It is responsible for checking that the attribute's arguments
-   are valid, and typically applies the attribute by adding an ``Attr`` to the
-   ``Decl``. It returns either ``AttributeApplied``, to indicate that the
-   attribute was successfully applied, or ``AttributeNotApplied`` if it wasn't.
 
 The members of ``ParsedAttrInfo`` that may need to be defined, depending on the
 attribute, are:
@@ -105,6 +100,18 @@ attribute, are:
    arguments to the attribute.
  * ``diagAppertainsToDecl``, which checks if the attribute has been used on the
    right kind of declaration and issues a diagnostic if not.
+ * ``handleDeclAttribute``, which is the function that applies the attribute to
+   a declaration. It is responsible for checking that the attribute's arguments
+   are valid, and typically applies the attribute by adding an ``Attr`` to the
+   ``Decl``. It returns either ``AttributeApplied``, to indicate that the
+   attribute was successfully applied, or ``AttributeNotApplied`` if it wasn't.
+ * ``diagAppertainsToStmt``, which checks if the attribute has been used on the
+   right kind of statement and issues a diagnostic if not.
+ * ``handleStmtAttribute``, which is the function that applies the attribute to
+   a statement. It is responsible for checking that the attribute's arguments
+   are valid, and typically applies the attribute by adding an ``Attr`` to the
+   ``Stmt``. It returns either ``AttributeApplied``, to indicate that the
+   attribute was successfully applied, or ``AttributeNotApplied`` if it wasn't.
  * ``diagLangOpts``, which checks if the attribute is permitted for the current
    language mode and issues a diagnostic if not.
  * ``existsInTarget``, which checks if the attribute is permitted for the given
diff --git a/clang/examples/Attribute/Attribute.cpp b/clang/examples/Attribute/Attribute.cpp
index 9d6cf9ae36c6a6..07dd19321195c8 100644
--- a/clang/examples/Attribute/Attribute.cpp
+++ b/clang/examples/Attribute/Attribute.cpp
@@ -94,6 +94,55 @@ struct ExampleAttrInfo : public ParsedAttrInfo {
     }
     return AttributeApplied;
   }
+
+  bool diagAppertainsToStmt(Sema &S, const ParsedAttr &Attr,
+                            const Stmt *St) const override {
+    // This attribute appertains to for loop statements only.
+    if (!isa<ForStmt>(St)) {
+      S.Diag(Attr.getLoc(), diag::warn_attribute_wrong_decl_type_str)
+          << Attr << Attr.isRegularKeywordAttribute() << "for loop statements";
+      return false;
+    }
+    return true;
+  }
+
+  AttrHandling handleStmtAttribute(Sema &S, Stmt *St, const ParsedAttr &Attr,
+                                   class Attr *&Result) const override {
+    // We make some rules here:
+    // 1. Only accept at most 3 arguments here.
+    // 2. The first argument must be a string literal if it exists.
+    if (Attr.getNumArgs() > 3) {
+      unsigned ID = S.getDiagnostics().getCustomDiagID(
+          DiagnosticsEngine::Error,
+          "'example' attribute only accepts at most three arguments");
+      S.Diag(Attr.getLoc(), ID);
+      return AttributeNotApplied;
+    }
+    // If there are arguments, the first argument should be a string literal.
+    if (Attr.getNumArgs() > 0) {
+      auto *Arg0 = Attr.getArgAsExpr(0);
+      StringLiteral *Literal =
+          dyn_cast<StringLiteral>(Arg0->IgnoreParenCasts());
+      if (!Literal) {
+        unsigned ID = S.getDiagnostics().getCustomDiagID(
+            DiagnosticsEngine::Error, "first argument to the 'example' "
+                                      "attribute must be a string literal");
+        S.Diag(Attr.getLoc(), ID);
+        return AttributeNotApplied;
+      }
+      SmallVector<Expr *, 16> ArgsBuf;
+      for (unsigned i = 0; i < Attr.getNumArgs(); i++) {
+        ArgsBuf.push_back(Attr.getArgAsExpr(i));
+      }
+      Result = AnnotateAttr::Create(S.Context, "example", ArgsBuf.data(),
+                                    ArgsBuf.size(), Attr.getRange());
+    } else {
+      // Attach an annotate attribute to the Decl.
+      Result = AnnotateAttr::Create(S.Context, "example", nullptr, 0,
+                                    Attr.getRange());
+    }
+    return AttributeApplied;
+  }
 };
 
 } // namespace
diff --git a/clang/include/clang/Basic/ParsedAttrInfo.h b/clang/include/clang/Basic/ParsedAttrInfo.h
index 537d8f3391d589..fab5c6f1377d27 100644
--- a/clang/include/clang/Basic/ParsedAttrInfo.h
+++ b/clang/include/clang/Basic/ParsedAttrInfo.h
@@ -24,6 +24,7 @@
 
 namespace clang {
 
+class Attr;
 class Decl;
 class LangOptions;
 class ParsedAttr;
@@ -154,6 +155,15 @@ struct ParsedAttrInfo {
                                            const ParsedAttr &Attr) const {
     return NotHandled;
   }
+  /// If this ParsedAttrInfo knows how to handle this ParsedAttr applied to this
+  /// Stmt then do so (referencing the resulting Attr in Result) and return
+  /// either AttributeApplied if it was applied or AttributeNotApplied if it
+  /// wasn't. Otherwise return NotHandled.
+  virtual AttrHandling handleStmtAttribute(Sema &S, Stmt *St,
+                                           const ParsedAttr &Attr,
+                                           class Attr *&Result) const {
+    return NotHandled;
+  }
 
   static const ParsedAttrInfo &get(const AttributeCommonInfo &A);
   static ArrayRef<const ParsedAttrInfo *> getAllBuiltin();
diff --git a/clang/lib/Sema/SemaStmtAttr.cpp b/clang/lib/Sema/SemaStmtAttr.cpp
index b9b3b4063bc383..3ebd1148e8bbfb 100644
--- a/clang/lib/Sema/SemaStmtAttr.cpp
+++ b/clang/lib/Sema/SemaStmtAttr.cpp
@@ -680,6 +680,10 @@ static Attr *ProcessStmtAttribute(Sema &S, Stmt *St, const ParsedAttr &A,
   case ParsedAttr::AT_NoConvergent:
     return handleNoConvergentAttr(S, St, A, Range);
   default:
+    if (Attr *AT = nullptr; A.getInfo().handleStmtAttribute(S, St, A, AT) !=
+                            ParsedAttrInfo::NotHandled) {
+      return AT;
+    }
     // N.B., ClangAttrEmitter.cpp emits a diagnostic helper that ensures a
     // declaration attribute is not written on a statement, but this code is
     // needed for attributes in Attr.td that do not list any subjects.
diff --git a/clang/test/Frontend/plugin-attribute.cpp b/clang/test/Frontend/plugin-attribute.cpp
index 1c5a2440b26888..2e9d171a0095a9 100644
--- a/clang/test/Frontend/plugin-attribute.cpp
+++ b/clang/test/Frontend/plugin-attribute.cpp
@@ -4,11 +4,33 @@
 // REQUIRES: plugins, examples
 //--- good_attr.cpp
 // expected-no-diagnostics
-void fn1a() __attribute__((example)) {}
-[[example]] void fn1b() {}
-[[plugin::example]] void fn1c() {}
-void fn2() __attribute__((example("somestring", 1, 2.0))) {}
-// CHECK-COUNT-4: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} "example"
+void fn1a() __attribute__((example)) {
+  __attribute__((example)) for (int i = 0; i < 10; ++i) {}
+}
+[[example]] void fn1b() {
+  [[example]] for (int i = 0; i < 10; ++i) {}
+}
+[[plugin::example]] void fn1c() {
+  [[plugin::example]] for (int i = 0; i < 10; ++i) {}
+}
+void fn2() __attribute__((example("somestring", 1, 2.0))) {
+  __attribute__((example("abc", 3, 4.0))) for (int i = 0; i < 10; ++i) {}
+}
+// CHECK: -AttributedStmt 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}}
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -AttributedStmt 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}}
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -AttributedStmt 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}}
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -AttributedStmt 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}}
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -StringLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'const char[{{[0-9]+}}]' lvalue "abc"
+// CHECK: -IntegerLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'int' 3
+// CHECK: -FloatingLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'double' 4.000000e+00
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} "example"
 // CHECK: -StringLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'const char[{{[0-9]+}}]' lvalue "somestring"
 // CHECK: -IntegerLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'int' 1
 // CHECK: -FloatingLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'double' 2.000000e+00
@@ -18,5 +40,9 @@ int var1 __attribute__((example("otherstring"))) = 1; // expected-warning {{'exa
 class Example {
   void __attribute__((example)) fn3(); // expected-error {{'example' attribute only allowed at file scope}}
 };
-void fn4() __attribute__((example(123))) { } // expected-error {{first argument to the 'example' attribute must be a string literal}}
-void fn5() __attribute__((example("a","b", 3, 4.0))) { } // expected-error {{'example' attribute only accepts at most three arguments}}
+void fn4() __attribute__((example(123))) { // expected-error {{first argument to the 'example' attribute must be a string literal}}
+  __attribute__((example("somestring"))) while (true); // expected-warning {{'example' attribute only applies to for loop statements}}
+}
+void fn5() __attribute__((example("a","b", 3, 4.0))) { // expected-error {{'example' attribute only accepts at most three arguments}}
+  __attribute__((example("a","b", 3, 4.0))) for (int i = 0; i < 10; ++i) {} // expected-error {{'example' attribute only accepts at most three arguments}}
+}

>From 2398a73022d1f6d2d669e2265e8cdb7a5e3425a9 Mon Sep 17 00:00:00 2001
From: Eric Astor <epastor at google.com>
Date: Mon, 30 Sep 2024 13:20:23 +0000
Subject: [PATCH 02/20] Fix desync'd comment

---
 clang/examples/Attribute/Attribute.cpp | 1 -
 1 file changed, 1 deletion(-)

diff --git a/clang/examples/Attribute/Attribute.cpp b/clang/examples/Attribute/Attribute.cpp
index 07dd19321195c8..3b90724ad22205 100644
--- a/clang/examples/Attribute/Attribute.cpp
+++ b/clang/examples/Attribute/Attribute.cpp
@@ -137,7 +137,6 @@ struct ExampleAttrInfo : public ParsedAttrInfo {
       Result = AnnotateAttr::Create(S.Context, "example", ArgsBuf.data(),
                                     ArgsBuf.size(), Attr.getRange());
     } else {
-      // Attach an annotate attribute to the Decl.
       Result = AnnotateAttr::Create(S.Context, "example", nullptr, 0,
                                     Attr.getRange());
     }

>From 5a5ff132a96c471c079b35851c6c067687dc28e0 Mon Sep 17 00:00:00 2001
From: Eric Astor <epastor at google.com>
Date: Wed, 9 Oct 2024 17:05:13 +0000
Subject: [PATCH 03/20] Add support for template-instantiations of AnnotateAttr

---
 clang/lib/Sema/SemaTemplateInstantiate.cpp | 14 ++++++++++++++
 clang/test/Frontend/plugin-attribute.cpp   | 20 ++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/clang/lib/Sema/SemaTemplateInstantiate.cpp b/clang/lib/Sema/SemaTemplateInstantiate.cpp
index fd51fa4afcacbf..00a5f81dbca8fe 100644
--- a/clang/lib/Sema/SemaTemplateInstantiate.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiate.cpp
@@ -1537,6 +1537,7 @@ namespace {
                           NamedDecl *FirstQualifierInScope = nullptr,
                           bool AllowInjectedClassName = false);
 
+    const AnnotateAttr *TransformAnnotateAttr(const AnnotateAttr *AA);
     const CXXAssumeAttr *TransformCXXAssumeAttr(const CXXAssumeAttr *AA);
     const LoopHintAttr *TransformLoopHintAttr(const LoopHintAttr *LH);
     const NoInlineAttr *TransformStmtNoInlineAttr(const Stmt *OrigS,
@@ -2125,6 +2126,19 @@ TemplateInstantiator::TransformTemplateParmRefExpr(DeclRefExpr *E,
                                          Arg, PackIndex);
 }
 
+const AnnotateAttr *
+TemplateInstantiator::TransformAnnotateAttr(const AnnotateAttr *AA) {
+  SmallVector<Expr *> Args;
+  for (Expr *Arg : AA->args()) {
+    ExprResult Res = getDerived().TransformExpr(Arg);
+    if (!Res.isUsable())
+      return AA;
+    Args.push_back(Res.get());
+  }
+  return AnnotateAttr::CreateImplicit(getSema().Context, AA->getAnnotation(),
+                                      Args.data(), Args.size(), AA->getRange());
+}
+
 const CXXAssumeAttr *
 TemplateInstantiator::TransformCXXAssumeAttr(const CXXAssumeAttr *AA) {
   ExprResult Res = getDerived().TransformExpr(AA->getAssumption());
diff --git a/clang/test/Frontend/plugin-attribute.cpp b/clang/test/Frontend/plugin-attribute.cpp
index 2e9d171a0095a9..094ce9f5cbb85f 100644
--- a/clang/test/Frontend/plugin-attribute.cpp
+++ b/clang/test/Frontend/plugin-attribute.cpp
@@ -5,17 +5,21 @@
 //--- good_attr.cpp
 // expected-no-diagnostics
 void fn1a() __attribute__((example)) {
-  __attribute__((example)) for (int i = 0; i < 10; ++i) {}
+  __attribute__((example)) for (int i = 0; i < 9; ++i) {}
 }
 [[example]] void fn1b() {
-  [[example]] for (int i = 0; i < 10; ++i) {}
+  [[example]] for (int i = 0; i < 9; ++i) {}
 }
 [[plugin::example]] void fn1c() {
-  [[plugin::example]] for (int i = 0; i < 10; ++i) {}
+  [[plugin::example]] for (int i = 0; i < 9; ++i) {}
 }
 void fn2() __attribute__((example("somestring", 1, 2.0))) {
-  __attribute__((example("abc", 3, 4.0))) for (int i = 0; i < 10; ++i) {}
+  __attribute__((example("abc", 3, 4.0))) for (int i = 0; i < 9; ++i) {}
 }
+template <int N> void template_fn() __attribute__((example("template", N))) {
+  __attribute__((example("def", N + 1))) for (int i = 0; i < 9; ++i) {}
+}
+void fn3() { template_fn<5>(); }
 // CHECK: -AttributedStmt 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}}
 // CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} "example"
 // CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} "example"
@@ -34,6 +38,14 @@ void fn2() __attribute__((example("somestring", 1, 2.0))) {
 // CHECK: -StringLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'const char[{{[0-9]+}}]' lvalue "somestring"
 // CHECK: -IntegerLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'int' 1
 // CHECK: -FloatingLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'double' 2.000000e+00
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} Implicit "example"
+// CHECK: -StringLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'const char[{{[0-9]+}}]' lvalue "def"
+// CHECK: -BinaryOperator 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'int' '+'
+// CHECK: -IntegerLiteral 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} 'int' 5
+// CHECK: -IntegerLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'int' 1
+// CHECK: -AnnotateAttr 0x{{[0-9a-z]+}} {{<line:[0-9]+:[0-9]+(, col:[0-9]+)?>}} "example"
+// CHECK: -StringLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'const char[{{[0-9]+}}]' lvalue "template"
+// CHECK: -IntegerLiteral 0x{{[0-9a-z]+}} {{<col:[0-9]+(, col:[0-9]+)?>}} 'int' 5
 
 //--- bad_attr.cpp
 int var1 __attribute__((example("otherstring"))) = 1; // expected-warning {{'example' attribute only applies to functions}}

>From 8fba34799d2936bef7bfe925ff825b499932b7a9 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 10 Oct 2024 11:07:14 -0500
Subject: [PATCH 04/20] [libc] Fix missing namespace declarations

---
 libc/src/stdio/asprintf.h  | 4 ++--
 libc/src/stdio/vasprintf.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libc/src/stdio/asprintf.h b/libc/src/stdio/asprintf.h
index 222dfdee9d4fd7..168721c4f98b98 100644
--- a/libc/src/stdio/asprintf.h
+++ b/libc/src/stdio/asprintf.h
@@ -11,10 +11,10 @@
 
 #include "src/__support/macros/config.h"
 
-namespace LIBC_NAMESPACE {
+namespace LIBC_NAMESPACE_DECL {
 
 int asprintf(char **__restrict s, const char *__restrict format, ...);
 
-} // namespace LIBC_NAMESPACE
+} // namespace LIBC_NAMESPACE_DECL
 
 #endif // LLVM_LIBC_SRC_STDIO_ASPRINTF_H
diff --git a/libc/src/stdio/vasprintf.h b/libc/src/stdio/vasprintf.h
index 8b286fe69bf203..b914c2f9ae0789 100644
--- a/libc/src/stdio/vasprintf.h
+++ b/libc/src/stdio/vasprintf.h
@@ -11,11 +11,11 @@
 
 #include <stdarg.h>
 
-namespace LIBC_NAMESPACE {
+namespace LIBC_NAMESPACE_DECL {
 
 int vasprintf(char **__restrict s, const char *__restrict format,
               va_list vlist);
 
-} // namespace LIBC_NAMESPACE
+} // namespace LIBC_NAMESPACE_DECL
 
 #endif // LLVM_LIBC_SRC_STDIO_VASPRINTF_H

>From 3d2cac4ba368a2968d448c64f3d1e4adfe2bfef1 Mon Sep 17 00:00:00 2001
From: Eric Astor <epastor at google.com>
Date: Thu, 10 Oct 2024 12:21:34 -0400
Subject: [PATCH 05/20] [clang][frontend] Support applying the annotate
 attribute to statements (#111841)

By allowing AnnotateAttr to be applied to statements, users can place arbitrary information in the AST for later use.

For example, this can be used for HW-targeted language extensions that involve specialized loop annotations.
---
 clang/include/clang/AST/Attr.h                | 17 +++++++++
 clang/include/clang/Basic/Attr.td             |  7 +++-
 clang/include/clang/Sema/Sema.h               |  7 ++--
 clang/lib/Sema/Sema.cpp                       | 28 ++++++++++++++
 clang/lib/Sema/SemaDeclAttr.cpp               | 25 ++----------
 clang/lib/Sema/SemaStmtAttr.cpp               |  2 +
 clang/lib/Sema/SemaTemplateInstantiate.cpp    |  5 +--
 .../lib/Sema/SemaTemplateInstantiateDecl.cpp  |  5 ++-
 clang/test/AST/attr-print-emit.cpp            |  3 ++
 clang/test/Sema/annotate.c                    |  3 ++
 clang/test/SemaTemplate/attributes.cpp        | 38 +++++++++++++++++++
 clang/utils/TableGen/ClangAttrEmitter.cpp     | 30 ++++++++-------
 12 files changed, 126 insertions(+), 44 deletions(-)

diff --git a/clang/include/clang/AST/Attr.h b/clang/include/clang/AST/Attr.h
index ac44e9fdd7c4e9..725498e132fc28 100644
--- a/clang/include/clang/AST/Attr.h
+++ b/clang/include/clang/AST/Attr.h
@@ -197,6 +197,23 @@ class InheritableParamAttr : public InheritableAttr {
   }
 };
 
+class InheritableParamOrStmtAttr : public InheritableParamAttr {
+protected:
+  InheritableParamOrStmtAttr(ASTContext &Context,
+                             const AttributeCommonInfo &CommonInfo,
+                             attr::Kind AK, bool IsLateParsed,
+                             bool InheritEvenIfAlreadyPresent)
+      : InheritableParamAttr(Context, CommonInfo, AK, IsLateParsed,
+                             InheritEvenIfAlreadyPresent) {}
+
+public:
+  // Implement isa/cast/dyncast/etc.
+  static bool classof(const Attr *A) {
+    return A->getKind() >= attr::FirstInheritableParamOrStmtAttr &&
+           A->getKind() <= attr::LastInheritableParamOrStmtAttr;
+  }
+};
+
 class HLSLAnnotationAttr : public InheritableAttr {
 protected:
   HLSLAnnotationAttr(ASTContext &Context, const AttributeCommonInfo &CommonInfo,
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index fbcbf0ed416416..ec3d6e0079f630 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -759,6 +759,11 @@ class TargetSpecificAttr<TargetSpec target> {
 /// redeclarations, even when it's written on a parameter.
 class InheritableParamAttr : InheritableAttr;
 
+/// A attribute that is either a declaration attribute or a statement attribute,
+/// and if used as a declaration attribute, is inherited by later
+/// redeclarations, even when it's written on a parameter.
+class InheritableParamOrStmtAttr : InheritableParamAttr;
+
 /// An attribute which changes the ABI rules for a specific parameter.
 class ParameterABIAttr : InheritableParamAttr {
   let Subjects = SubjectList<[ParmVar]>;
@@ -928,7 +933,7 @@ def AnalyzerNoReturn : InheritableAttr {
   let Documentation = [Undocumented];
 }
 
-def Annotate : InheritableParamAttr {
+def Annotate : InheritableParamOrStmtAttr {
   let Spellings = [Clang<"annotate">];
   let Args = [StringArgument<"Annotation">, VariadicExprArgument<"Args">];
   // Ensure that the annotate attribute can be used with
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index ef010fafb1573e..f8118ca64ad3f2 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -4528,9 +4528,10 @@ class Sema final : public SemaBase {
   /// declaration.
   void AddAlignValueAttr(Decl *D, const AttributeCommonInfo &CI, Expr *E);
 
-  /// AddAnnotationAttr - Adds an annotation Annot with Args arguments to D.
-  void AddAnnotationAttr(Decl *D, const AttributeCommonInfo &CI,
-                         StringRef Annot, MutableArrayRef<Expr *> Args);
+  /// CreateAnnotationAttr - Creates an annotation Annot with Args arguments.
+  Attr *CreateAnnotationAttr(const AttributeCommonInfo &CI, StringRef Annot,
+                             MutableArrayRef<Expr *> Args);
+  Attr *CreateAnnotationAttr(const ParsedAttr &AL);
 
   bool checkMSInheritanceAttrOnDefinition(CXXRecordDecl *RD, SourceRange Range,
                                           bool BestCase,
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index f05760428458b1..9f91ee9a39f2f9 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -2777,3 +2777,31 @@ bool Sema::isDeclaratorFunctionLike(Declarator &D) {
   });
   return Result;
 }
+
+Attr *Sema::CreateAnnotationAttr(const AttributeCommonInfo &CI, StringRef Annot,
+                                 MutableArrayRef<Expr *> Args) {
+
+  auto *A = AnnotateAttr::Create(Context, Annot, Args.data(), Args.size(), CI);
+  if (!ConstantFoldAttrArgs(
+          CI, MutableArrayRef<Expr *>(A->args_begin(), A->args_end()))) {
+    return nullptr;
+  }
+  return A;
+}
+
+Attr *Sema::CreateAnnotationAttr(const ParsedAttr &AL) {
+  // Make sure that there is a string literal as the annotation's first
+  // argument.
+  StringRef Str;
+  if (!checkStringLiteralArgumentAttr(AL, 0, Str))
+    return nullptr;
+
+  llvm::SmallVector<Expr *, 4> Args;
+  Args.reserve(AL.getNumArgs() - 1);
+  for (unsigned Idx = 1; Idx < AL.getNumArgs(); Idx++) {
+    assert(!AL.isArgIdent(Idx));
+    Args.push_back(AL.getArgAsExpr(Idx));
+  }
+
+  return CreateAnnotationAttr(AL, Str, Args);
+}
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index e2174ba926f17f..6759aae37afac1 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3958,30 +3958,11 @@ static void handleTransparentUnionAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   RD->addAttr(::new (S.Context) TransparentUnionAttr(S.Context, AL));
 }
 
-void Sema::AddAnnotationAttr(Decl *D, const AttributeCommonInfo &CI,
-                             StringRef Str, MutableArrayRef<Expr *> Args) {
-  auto *Attr = AnnotateAttr::Create(Context, Str, Args.data(), Args.size(), CI);
-  if (ConstantFoldAttrArgs(
-          CI, MutableArrayRef<Expr *>(Attr->args_begin(), Attr->args_end()))) {
-    D->addAttr(Attr);
-  }
-}
-
 static void handleAnnotateAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
-  // Make sure that there is a string literal as the annotation's first
-  // argument.
-  StringRef Str;
-  if (!S.checkStringLiteralArgumentAttr(AL, 0, Str))
-    return;
-
-  llvm::SmallVector<Expr *, 4> Args;
-  Args.reserve(AL.getNumArgs() - 1);
-  for (unsigned Idx = 1; Idx < AL.getNumArgs(); Idx++) {
-    assert(!AL.isArgIdent(Idx));
-    Args.push_back(AL.getArgAsExpr(Idx));
+  auto *Attr = S.CreateAnnotationAttr(AL);
+  if (Attr) {
+    D->addAttr(Attr);
   }
-
-  S.AddAnnotationAttr(D, AL, Str, Args);
 }
 
 static void handleAlignValueAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
diff --git a/clang/lib/Sema/SemaStmtAttr.cpp b/clang/lib/Sema/SemaStmtAttr.cpp
index 3ebd1148e8bbfb..f801455596fe6f 100644
--- a/clang/lib/Sema/SemaStmtAttr.cpp
+++ b/clang/lib/Sema/SemaStmtAttr.cpp
@@ -679,6 +679,8 @@ static Attr *ProcessStmtAttribute(Sema &S, Stmt *St, const ParsedAttr &A,
     return handleMSConstexprAttr(S, St, A, Range);
   case ParsedAttr::AT_NoConvergent:
     return handleNoConvergentAttr(S, St, A, Range);
+  case ParsedAttr::AT_Annotate:
+    return S.CreateAnnotationAttr(A);
   default:
     if (Attr *AT = nullptr; A.getInfo().handleStmtAttribute(S, St, A, AT) !=
                             ParsedAttrInfo::NotHandled) {
diff --git a/clang/lib/Sema/SemaTemplateInstantiate.cpp b/clang/lib/Sema/SemaTemplateInstantiate.cpp
index 02f12f5c2d4220..2f60c0beb22e73 100644
--- a/clang/lib/Sema/SemaTemplateInstantiate.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiate.cpp
@@ -2188,9 +2188,8 @@ TemplateInstantiator::TransformAnnotateAttr(const AnnotateAttr *AA) {
   SmallVector<Expr *> Args;
   for (Expr *Arg : AA->args()) {
     ExprResult Res = getDerived().TransformExpr(Arg);
-    if (!Res.isUsable())
-      return AA;
-    Args.push_back(Res.get());
+    if (Res.isUsable())
+      Args.push_back(Res.get());
   }
   return AnnotateAttr::CreateImplicit(getSema().Context, AA->getAnnotation(),
                                       Args.data(), Args.size(), AA->getRange());
diff --git a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
index 34558e1a005d5a..6b1af35f5c80a8 100644
--- a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
@@ -230,7 +230,10 @@ static void instantiateDependentAnnotationAttr(
     ActualArgs.insert(ActualArgs.begin(), Args.begin() + 1, Args.end());
     std::swap(Args, ActualArgs);
   }
-  S.AddAnnotationAttr(New, *Attr, Str, Args);
+  auto *AA = S.CreateAnnotationAttr(*Attr, Str, Args);
+  if (AA) {
+    New->addAttr(AA);
+  }
 }
 
 static Expr *instantiateDependentFunctionAttrCondition(
diff --git a/clang/test/AST/attr-print-emit.cpp b/clang/test/AST/attr-print-emit.cpp
index d8e62ed5f6cd11..a9bca6778d0f1a 100644
--- a/clang/test/AST/attr-print-emit.cpp
+++ b/clang/test/AST/attr-print-emit.cpp
@@ -78,6 +78,9 @@ class C {
 ANNOTATE_ATTR int annotated_attr ANNOTATE_ATTR = 0;
 // CHECK: __attribute__((annotate("Annotated"))) int annotated_attr __attribute__((annotate("Annotated"))) = 0;
 
+void increment() { [[clang::annotate("Annotated")]] annotated_attr++; }
+// CHECK: {{\[\[}}clang::annotate("Annotated")]] annotated_attr++;
+
 // FIXME: We do not print the attribute as written after the type specifier.
 int ANNOTATE_ATTR annotated_attr_fixme = 0;
 // CHECK: __attribute__((annotate("Annotated"))) int annotated_attr_fixme = 0;
diff --git a/clang/test/Sema/annotate.c b/clang/test/Sema/annotate.c
index b4551a102e6174..f2ef08d6378975 100644
--- a/clang/test/Sema/annotate.c
+++ b/clang/test/Sema/annotate.c
@@ -3,10 +3,12 @@
 void __attribute__((annotate("foo"))) foo(float *a) {
   __attribute__((annotate("bar"))) int x;
   [[clang::annotate("bar")]] int x2;
+  [[clang::annotate("bar")]] x2 += 1;
   __attribute__((annotate(1))) int y; // expected-error {{expected string literal as argument of 'annotate' attribute}}
   [[clang::annotate(1)]] int y2; // expected-error {{expected string literal as argument of 'annotate' attribute}}
   __attribute__((annotate("bar", 1))) int z;
   [[clang::annotate("bar", 1)]] int z2;
+  [[clang::annotate("bar", 1)]] z2 += 1;
 
   int u = __builtin_annotation(z, (char*) 0); // expected-error {{second argument to __builtin_annotation must be a non-wide string constant}}
   int v = __builtin_annotation(z, (char*) L"bar"); // expected-error {{second argument to __builtin_annotation must be a non-wide string constant}}
@@ -15,4 +17,5 @@ void __attribute__((annotate("foo"))) foo(float *a) {
 
   __attribute__((annotate())) int c; // expected-error {{'annotate' attribute takes at least 1 argument}}
   [[clang::annotate()]] int c2;      // expected-error {{'annotate' attribute takes at least 1 argument}}
+  [[clang::annotate()]] c2 += 1;     // expected-error {{'annotate' attribute takes at least 1 argument}}
 }
diff --git a/clang/test/SemaTemplate/attributes.cpp b/clang/test/SemaTemplate/attributes.cpp
index f6c9f13f0842d2..dea19d09745ca2 100644
--- a/clang/test/SemaTemplate/attributes.cpp
+++ b/clang/test/SemaTemplate/attributes.cpp
@@ -65,6 +65,17 @@ namespace attribute_annotate {
 template<typename T> [[clang::annotate("ANNOTATE_FOO"), clang::annotate("ANNOTATE_BAR")]] void HasAnnotations();
 void UseAnnotations() { HasAnnotations<int>(); }
 
+// CHECK: FunctionTemplateDecl {{.*}} HasStmtAnnotations
+// CHECK:   AnnotateAttr {{.*}} "ANNOTATE_BAZ"
+// CHECK: FunctionDecl {{.*}} HasStmtAnnotations
+// CHECK:   TemplateArgument type 'int'
+// CHECK:   AnnotateAttr {{.*}} "ANNOTATE_BAZ"
+template<typename T> void HasStmtAnnotations() {
+  int x = 0;
+  [[clang::annotate("ANNOTATE_BAZ")]] x++;
+}
+void UseStmtAnnotations() { HasStmtAnnotations<int>(); }
+
 // CHECK:      FunctionTemplateDecl {{.*}} HasPackAnnotations
 // CHECK-NEXT:   NonTypeTemplateParmDecl {{.*}} referenced 'int' depth 0 index 0 ... Is
 // CHECK-NEXT:   FunctionDecl {{.*}} HasPackAnnotations 'void ()'
@@ -95,6 +106,33 @@ void UseAnnotations() { HasAnnotations<int>(); }
 template <int... Is> [[clang::annotate("ANNOTATE_BAZ", Is...)]] void HasPackAnnotations();
 void UsePackAnnotations() { HasPackAnnotations<1, 2, 3>(); }
 
+// CHECK:      FunctionTemplateDecl {{.*}} HasStmtPackAnnotations
+// CHECK-NEXT:   NonTypeTemplateParmDecl {{.*}} referenced 'int' depth 0 index 0 ... Is
+// CHECK-NEXT:   FunctionDecl {{.*}} HasStmtPackAnnotations 'void ()'
+// CHECK:          AttributedStmt {{.*}}
+// CHECK-NEXT:       AnnotateAttr {{.*}} "ANNOTATE_QUUX"
+// CHECK-NEXT:         PackExpansionExpr {{.*}} '<dependent type>'
+// CHECK-NEXT:           DeclRefExpr {{.*}} 'int' NonTypeTemplateParm {{.*}} 'Is' 'int'
+// CHECK:        FunctionDecl {{.*}} used HasStmtPackAnnotations 'void ()'
+// CHECK-NEXT:     TemplateArgument{{.*}} pack
+// CHECK-NEXT:       TemplateArgument{{.*}} integral '1'
+// CHECK-NEXT:       TemplateArgument{{.*}} integral '2'
+// CHECK-NEXT:       TemplateArgument{{.*}} integral '3'
+// CHECK:          AttributedStmt {{.*}}
+// CHECK-NEXT:       AnnotateAttr {{.*}} "ANNOTATE_QUUX"
+// CHECK-NEXT:         PackExpansionExpr {{.*}}
+// CHECK-NEXT:         SubstNonTypeTemplateParmPackExpr {{.*}}
+// CHECK-NEXT:         NonTypeTemplateParmDecl {{.*}} referenced 'int' depth 0 index 0 ... Is
+// CHECK-NEXT:           TemplateArgument pack '<1, 2, 3>'
+// CHECK-NEXT:             TemplateArgument integral '1'
+// CHECK-NEXT:             TemplateArgument integral '2'
+// CHECK-NEXT:             TemplateArgument integral '3'
+template <int... Is> void HasStmtPackAnnotations() {
+  int x = 0;
+  [[clang::annotate("ANNOTATE_QUUX", Is...)]] x++;
+}
+void UseStmtPackAnnotations() { HasStmtPackAnnotations<1, 2, 3>(); }
+
 template <int... Is> [[clang::annotate(Is...)]] void HasOnlyPackAnnotation() {} // expected-error {{expected string literal as argument of 'annotate' attribute}}
 
 void UseOnlyPackAnnotations() {
diff --git a/clang/utils/TableGen/ClangAttrEmitter.cpp b/clang/utils/TableGen/ClangAttrEmitter.cpp
index 28b7ec8f822cf8..4890d249c6d8f7 100644
--- a/clang/utils/TableGen/ClangAttrEmitter.cpp
+++ b/clang/utils/TableGen/ClangAttrEmitter.cpp
@@ -3282,16 +3282,16 @@ namespace {
 } // end anonymous namespace
 
 static const AttrClassDescriptor AttrClassDescriptors[] = {
-  { "ATTR", "Attr" },
-  { "TYPE_ATTR", "TypeAttr" },
-  { "STMT_ATTR", "StmtAttr" },
-  { "DECL_OR_STMT_ATTR", "DeclOrStmtAttr" },
-  { "INHERITABLE_ATTR", "InheritableAttr" },
-  { "DECL_OR_TYPE_ATTR", "DeclOrTypeAttr" },
-  { "INHERITABLE_PARAM_ATTR", "InheritableParamAttr" },
-  { "PARAMETER_ABI_ATTR", "ParameterABIAttr" },
-  { "HLSL_ANNOTATION_ATTR", "HLSLAnnotationAttr"}
-};
+    {"ATTR", "Attr"},
+    {"TYPE_ATTR", "TypeAttr"},
+    {"STMT_ATTR", "StmtAttr"},
+    {"DECL_OR_STMT_ATTR", "DeclOrStmtAttr"},
+    {"INHERITABLE_ATTR", "InheritableAttr"},
+    {"DECL_OR_TYPE_ATTR", "DeclOrTypeAttr"},
+    {"INHERITABLE_PARAM_ATTR", "InheritableParamAttr"},
+    {"INHERITABLE_PARAM_OR_STMT_ATTR", "InheritableParamOrStmtAttr"},
+    {"PARAMETER_ABI_ATTR", "ParameterABIAttr"},
+    {"HLSL_ANNOTATION_ATTR", "HLSLAnnotationAttr"}};
 
 static void emitDefaultDefine(raw_ostream &OS, StringRef name,
                               const char *superName) {
@@ -4319,10 +4319,12 @@ static void GenerateMutualExclusionsChecks(const Record &Attr,
 
   // This means the attribute is either a statement attribute, a decl
   // attribute, or both; find out which.
-  bool CurAttrIsStmtAttr =
-      Attr.isSubClassOf("StmtAttr") || Attr.isSubClassOf("DeclOrStmtAttr");
-  bool CurAttrIsDeclAttr =
-      !CurAttrIsStmtAttr || Attr.isSubClassOf("DeclOrStmtAttr");
+  bool CurAttrIsStmtAttr = Attr.isSubClassOf("StmtAttr") ||
+                           Attr.isSubClassOf("DeclOrStmtAttr") ||
+                           Attr.isSubClassOf("InheritableParamOrStmtAttr");
+  bool CurAttrIsDeclAttr = !CurAttrIsStmtAttr ||
+                           Attr.isSubClassOf("DeclOrStmtAttr") ||
+                           Attr.isSubClassOf("InheritableParamOrStmtAttr");
 
   std::vector<std::string> DeclAttrs, StmtAttrs;
 

>From ec7e46a51c9dcd7a7dc8e4fd5e9b9aa15976391f Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 10 Oct 2024 11:21:58 -0500
Subject: [PATCH 06/20] [libc] Add missing config include

---
 libc/src/stdio/vasprintf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libc/src/stdio/vasprintf.h b/libc/src/stdio/vasprintf.h
index b914c2f9ae0789..7a98568edbc071 100644
--- a/libc/src/stdio/vasprintf.h
+++ b/libc/src/stdio/vasprintf.h
@@ -9,6 +9,7 @@
 #ifndef LLVM_LIBC_SRC_STDIO_VASPRINTF_H
 #define LLVM_LIBC_SRC_STDIO_VASPRINTF_H
 
+#include "src/__support/macros/config.h"
 #include <stdarg.h>
 
 namespace LIBC_NAMESPACE_DECL {

>From 8297051c9403392bff4174a1478366463dd2b989 Mon Sep 17 00:00:00 2001
From: TatWai Chong <78814694+tatwaichong at users.noreply.github.com>
Date: Thu, 10 Oct 2024 09:54:34 -0700
Subject: [PATCH 07/20] [mlir][tosa] Change the type of profile option to
 ListOption (#111214)

In tosa valiation pass, change the type of profile option to ListOption.
Now TOSA profiles is turned from hierarchical to composable. Each
profile is an independent set, i.e. an target can implement multiple
profiles.

Set the profile option to none by default, and limit to profiles if
requested.
The profiles can be specified via command line, e.g.
$ mlir-opt ... --tosa-validate="profile=bi,mi" which tells the valiation
pass that BI and MI are enabled.

Change-Id: I1fb8d0c1b27eccd768349b6eb4234093313efb57
---
 .../mlir/Conversion/TosaToLinalg/TosaToLinalg.h |  4 ++--
 .../mlir/Dialect/Tosa/Transforms/Passes.td      | 17 +++--------------
 .../TosaToLinalg/TosaToLinalgPass.cpp           |  2 +-
 .../Dialect/Tosa/Transforms/TosaValidation.cpp  | 16 +++++++++++++++-
 mlir/test/Dialect/Tosa/invalid.mlir             |  8 +++++++-
 mlir/test/Dialect/Tosa/level_check.mlir         |  6 +++++-
 6 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/mlir/include/mlir/Conversion/TosaToLinalg/TosaToLinalg.h b/mlir/include/mlir/Conversion/TosaToLinalg/TosaToLinalg.h
index 192583f347b8a4..1822016fc88fe6 100644
--- a/mlir/include/mlir/Conversion/TosaToLinalg/TosaToLinalg.h
+++ b/mlir/include/mlir/Conversion/TosaToLinalg/TosaToLinalg.h
@@ -39,8 +39,8 @@ void addTosaToLinalgPasses(
         TosaToLinalgNamedOptions(),
     // Note: Default to 'none' level unless otherwise specified.
     std::optional<tosa::TosaValidationOptions> validationOptions =
-        tosa::TosaValidationOptions{tosa::TosaProfileEnum::Undefined, false,
-                                    tosa::TosaLevelEnum::None});
+        tosa::TosaValidationOptions{
+            {"none"}, false, tosa::TosaLevelEnum::None});
 
 /// Populates TOSA to linalg pipelines
 /// Currently, this includes only the "tosa-to-linalg-pipeline".
diff --git a/mlir/include/mlir/Dialect/Tosa/Transforms/Passes.td b/mlir/include/mlir/Dialect/Tosa/Transforms/Passes.td
index c0352fa88fe08d..dac67633769c76 100644
--- a/mlir/include/mlir/Dialect/Tosa/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/Tosa/Transforms/Passes.td
@@ -76,7 +76,7 @@ def TosaProfileType : I32EnumAttr<"TosaProfileEnum", "Tosa profile",
       I32EnumAttrCase<"BaseInference", 0, "bi">,
       I32EnumAttrCase<"MainInference", 1, "mi">,
       I32EnumAttrCase<"MainTraining", 2, "mt">,
-      I32EnumAttrCase<"Undefined", 3>
+      I32EnumAttrCase<"Undefined", 3, "none">
     ]>{
   let cppNamespace = "mlir::tosa";
 }
@@ -97,19 +97,8 @@ def TosaValidation : Pass<"tosa-validate", "mlir::ModuleOp"> {
   }];
 
   let options = [
-      Option<"profile", "profile", "mlir::tosa::TosaProfileEnum",
-             /*default=*/"mlir::tosa::TosaProfileEnum::Undefined",
-             "Validate if operations match for the given profile",
-             [{::llvm::cl::values(
-               clEnumValN(mlir::tosa::TosaProfileEnum::BaseInference, "bi",
-                "Use Base Inference profile."),
-               clEnumValN(mlir::tosa::TosaProfileEnum::MainInference, "mi",
-                "Use Main Inference profile."),
-               clEnumValN(mlir::tosa::TosaProfileEnum::MainTraining, "mt",
-                "Use Main Training profile."),
-               clEnumValN(mlir::tosa::TosaProfileEnum::Undefined, "undefined",
-                "Do not define a profile.")
-              )}]>,
+      ListOption<"profile", "profile", "std::string",
+             "Validate if operations match for the given profile set">,
       Option<"StrictOperationSpecAlignment", "strict-op-spec-alignment", "bool",
              /*default=*/"false",
              "Verify if the properties of certain operations align the spec requirement">,
diff --git a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalgPass.cpp b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalgPass.cpp
index 44036d7c31a912..06a7262c467421 100644
--- a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalgPass.cpp
+++ b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalgPass.cpp
@@ -115,7 +115,7 @@ void mlir::tosa::registerTosaToLinalgPipelines() {
         TosaToLinalgOptions tosaToLinalgOptions;
         TosaToLinalgNamedOptions tosaToLinalgNamedOptions;
         TosaValidationOptions validationOptions;
-        validationOptions.profile = tosa::TosaProfileEnum::BaseInference;
+        validationOptions.profile = {"none"};
         validationOptions.StrictOperationSpecAlignment = true;
         validationOptions.level = tosa::TosaLevelEnum::EightK;
         tosa::addTosaToLinalgPasses(pm, tosaToLinalgOptions,
diff --git a/mlir/lib/Dialect/Tosa/Transforms/TosaValidation.cpp b/mlir/lib/Dialect/Tosa/Transforms/TosaValidation.cpp
index b78c372af77e64..e390a613b58077 100644
--- a/mlir/lib/Dialect/Tosa/Transforms/TosaValidation.cpp
+++ b/mlir/lib/Dialect/Tosa/Transforms/TosaValidation.cpp
@@ -405,14 +405,28 @@ struct TosaValidation : public tosa::impl::TosaValidationBase<TosaValidation> {
     if (level == TosaLevelEnum::EightK) {
       tosaLevel = TOSA_LEVEL_EIGHTK;
     }
+
+    if (!profile.empty()) {
+      for (std::string &prof : profile) {
+        auto profSymbol = symbolizeTosaProfileEnum(prof);
+        if (profSymbol) {
+          enabled_profiles.push_back(profSymbol.value());
+        }
+      }
+    }
   }
 
   bool CheckVariable(Operation *op);
   bool CheckVariableReadOrWrite(Operation *op);
 
   bool isValidElementType(Type type);
+  bool isEnabledProfile(TosaProfileEnum prof) {
+    return std::find(enabled_profiles.begin(), enabled_profiles.end(), prof) !=
+           std::end(enabled_profiles);
+  }
 
   SmallVector<std::function<LogicalResult(Operation *)>> constCheckers;
+  SmallVector<TosaProfileEnum, 3> enabled_profiles;
   TosaLevel tosaLevel;
   DenseMap<StringAttr, mlir::Type> variablesMap;
 };
@@ -507,7 +521,7 @@ LogicalResult TosaValidation::applyVariableCheck(Operation *op) {
 
 bool TosaValidation::isValidElementType(Type type) {
   if (isa<FloatType>(type)) {
-    if (profile == TosaProfileEnum::BaseInference)
+    if (!isEnabledProfile(TosaProfileEnum::MainInference))
       return false;
     return type.isF32() || type.isF16() || type.isBF16();
   }
diff --git a/mlir/test/Dialect/Tosa/invalid.mlir b/mlir/test/Dialect/Tosa/invalid.mlir
index e5c5b9b3663903..b9298b66643538 100644
--- a/mlir/test/Dialect/Tosa/invalid.mlir
+++ b/mlir/test/Dialect/Tosa/invalid.mlir
@@ -1,4 +1,10 @@
-// RUN: mlir-opt %s -split-input-file -verify-diagnostics --tosa-validate=strict-op-spec-alignment
+//--------------------------------------------------------------------------------------------------
+// Test expected errors in terms of the shape and type of tensor, and the argument type of
+// operation. Excludes the profile compilance checking since it is performed earlier in the
+// validation flow.
+//--------------------------------------------------------------------------------------------------
+
+// RUN: mlir-opt %s -split-input-file -verify-diagnostics --tosa-validate="profile=bi,mi,mt strict-op-spec-alignment"
 
 
 func.func @test_const() -> tensor<1xf32> {
diff --git a/mlir/test/Dialect/Tosa/level_check.mlir b/mlir/test/Dialect/Tosa/level_check.mlir
index 9b652f2d0bd142..e851019362958f 100644
--- a/mlir/test/Dialect/Tosa/level_check.mlir
+++ b/mlir/test/Dialect/Tosa/level_check.mlir
@@ -1,4 +1,8 @@
-// RUN: mlir-opt %s -split-input-file -verify-diagnostics --tosa-validate
+//--------------------------------------------------------------------------------------------------
+// Enable all supported profiles to focus the verification of expected level errors.
+//--------------------------------------------------------------------------------------------------
+
+// RUN: mlir-opt %s -split-input-file -verify-diagnostics --tosa-validate="profile=bi,mi,mt"
 
 
 func.func @test_argmax(%arg0: tensor<1x1x1x1x29x29x4xf32>) -> tensor<1x1x1x1x29x4xi32> {

>From 865cae51f8e8d5d377af98bd246b9bca9c81cb70 Mon Sep 17 00:00:00 2001
From: Jonas Devlieghere <jonas at devlieghere.com>
Date: Thu, 10 Oct 2024 09:53:45 -0700
Subject: [PATCH 08/20] [lldb] Fix a variety of LLDB_LOG format strings

LLVM now triggers an assertion when the format string and arguments
don't match. Fix a variety of incorrect format strings I discovered when
enabling logging with a debug build.
---
 .../ExpressionParser/Clang/ClangExpressionDeclMap.cpp        | 4 ++--
 .../ObjC/AppleObjCRuntime/AppleObjCDeclVendor.cpp            | 2 +-
 lldb/source/Target/ScriptedThreadPlan.cpp                    | 5 +++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp b/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
index f994d025043352..5edaa9e4e053cc 100644
--- a/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
+++ b/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
@@ -934,7 +934,7 @@ void ClangExpressionDeclMap::LookUpLldbObjCClass(NameSearchContext &context) {
         QualType(interface_type, 0).getAsOpaquePtr(),
         function_decl_ctx.GetTypeSystem()->weak_from_this());
 
-    LLDB_LOG(log, "  FEVD[{0}] Adding type for $__lldb_objc_class: {1}",
+    LLDB_LOG(log, "  FEVD Adding type for $__lldb_objc_class: {0}",
              ClangUtil::ToString(interface_type));
 
     AddOneType(context, class_user_type);
@@ -974,7 +974,7 @@ void ClangExpressionDeclMap::LookUpLldbObjCClass(NameSearchContext &context) {
   if (!self_clang_type)
     return;
 
-  LLDB_LOG(log, "  FEVD[{0}] Adding type for $__lldb_objc_class: {1}",
+  LLDB_LOG(log, "  FEVD Adding type for $__lldb_objc_class: {0}",
            ClangUtil::ToString(self_type->GetFullCompilerType()));
 
   TypeFromUser class_user_type(self_clang_type);
diff --git a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCDeclVendor.cpp b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCDeclVendor.cpp
index f3a008ff1e8932..96a259b811b5e7 100644
--- a/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCDeclVendor.cpp
+++ b/lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCDeclVendor.cpp
@@ -605,7 +605,7 @@ uint32_t AppleObjCDeclVendor::FindDecls(ConstString name, bool append,
     if (log) {
       clang::QualType new_iface_type = ast_ctx.getObjCInterfaceType(iface_decl);
 
-      LLDB_LOG(log, "AOCTV::FT Created {1} (isa 0x{2:x})",
+      LLDB_LOG(log, "AOCTV::FT Created {0} (isa 0x{1:x})",
                new_iface_type.getAsString(), (uint64_t)isa);
     }
 
diff --git a/lldb/source/Target/ScriptedThreadPlan.cpp b/lldb/source/Target/ScriptedThreadPlan.cpp
index a8432f12258ee4..c4bdc8d080e350 100644
--- a/lldb/source/Target/ScriptedThreadPlan.cpp
+++ b/lldb/source/Target/ScriptedThreadPlan.cpp
@@ -184,8 +184,9 @@ void ScriptedThreadPlan::GetDescription(Stream *s,
       lldb::StreamSP stream = std::make_shared<lldb_private::StreamString>();
       llvm::Error err = m_interface->GetStopDescription(stream);
       if (err) {
-        LLDB_LOG_ERROR(GetLog(LLDBLog::Thread), std::move(err),
-                       "Can't call ScriptedThreadPlan::GetStopDescription.");
+        LLDB_LOG_ERROR(
+            GetLog(LLDBLog::Thread), std::move(err),
+            "Can't call ScriptedThreadPlan::GetStopDescription: {0}");
         s->Printf("Scripted thread plan implemented by class %s.",
                   m_class_name.c_str());
       } else

>From 51a2d67e4b4d8b0575ae449c4cdab9ab4aa0a6ed Mon Sep 17 00:00:00 2001
From: Ryosuke Niwa <rniwa at webkit.org>
Date: Thu, 10 Oct 2024 10:00:42 -0700
Subject: [PATCH 09/20] [alpha.webkit.UncountedCallArgsChecker] Skip
 std::forward in tryToFindPtrOrigin. (#111222)

Ignore std::forward when it appears while looking for the pointer
origin.
---
 .../StaticAnalyzer/Checkers/WebKit/ASTUtils.cpp   |  5 +++++
 .../Checkers/WebKit/uncounted-obj-arg.cpp         | 15 +++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/ASTUtils.cpp b/clang/lib/StaticAnalyzer/Checkers/WebKit/ASTUtils.cpp
index 394cb26f03cf99..b7b2f8a16f07b3 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/ASTUtils.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/ASTUtils.cpp
@@ -101,6 +101,11 @@ bool tryToFindPtrOrigin(
         if (isSingleton(callee))
           return callback(E, true);
 
+        if (callee->isInStdNamespace() && safeGetName(callee) == "forward") {
+          E = call->getArg(0);
+          continue;
+        }
+
         if (isPtrConversion(callee)) {
           E = call->getArg(0);
           continue;
diff --git a/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp b/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp
index 97efb354f0371d..b6ab369f69a87d 100644
--- a/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp
+++ b/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp
@@ -588,6 +588,8 @@ class UnrelatedClass {
     getFieldTrivial().nonTrivial23();
     // expected-warning at -1{{Call argument for 'this' parameter is uncounted and unsafe}}
   }
+
+  void setField(RefCounted*);
 };
 
 class UnrelatedClass2 {
@@ -598,11 +600,24 @@ class UnrelatedClass2 {
   RefCounted &getFieldTrivialRecursively() { return getFieldTrivial().getFieldTrivial(); }
   RefCounted *getFieldTrivialTernary() { return Field ? Field->getFieldTernary() : nullptr; }
 
+  template<typename T, typename ... AdditionalArgs>
+  void callSetField(T&& item, AdditionalArgs&&... args)
+  {
+    item.setField(std::forward<AdditionalArgs>(args)...);
+  }
+
+  template<typename T, typename ... AdditionalArgs>
+  void callSetField2(T&& item, AdditionalArgs&&... args)
+  {
+    item.setField(std::move<AdditionalArgs>(args)...);
+  }
+
   void test() {
     getFieldTrivialRecursively().trivial1(); // no-warning
     getFieldTrivialTernary()->trivial2(); // no-warning
     getFieldTrivialRecursively().someFunction();
     // expected-warning at -1{{Call argument for 'this' parameter is uncounted and unsafe}}
+    callSetField(getFieldTrivial(), refCountedObj()); // no-warning
   }
 };
 

>From 4839e50676bf57d4670daa51fd48181c2bdfb99b Mon Sep 17 00:00:00 2001
From: Ryosuke Niwa <rniwa at webkit.org>
Date: Thu, 10 Oct 2024 10:01:35 -0700
Subject: [PATCH 10/20] [alpha.webkit.UncountedCallArgsChecker] Add the support
 for trivial CXXInheritedCtorInitExpr. (#111198)

---
 .../Checkers/WebKit/PtrTypesSemantics.cpp     |  4 ++++
 .../Checkers/WebKit/uncounted-obj-arg.cpp     | 21 +++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp b/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
index 4d145be808f6d8..317642c5b9ca20 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
@@ -508,6 +508,10 @@ class TrivialFunctionAnalysisVisitor
     return IsFunctionTrivial(CE->getConstructor());
   }
 
+  bool VisitCXXInheritedCtorInitExpr(const CXXInheritedCtorInitExpr *E) {
+    return IsFunctionTrivial(E->getConstructor());
+  }
+
   bool VisitCXXNewExpr(const CXXNewExpr *NE) { return VisitChildren(NE); }
 
   bool VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
diff --git a/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp b/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp
index b6ab369f69a87d..1a42de90105a55 100644
--- a/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp
+++ b/clang/test/Analysis/Checkers/WebKit/uncounted-obj-arg.cpp
@@ -224,6 +224,20 @@ class ObjectWithMutatingDestructor {
   Number n;
 };
 
+class BaseType {
+public:
+  BaseType() : n(0) { }
+  BaseType(int v) : n(v) { }
+  BaseType(const char*);
+private:
+  Number n;
+};
+
+class SomeType : public BaseType {
+public:
+  using BaseType::BaseType;
+};
+
 class RefCounted {
 public:
   void ref() const;
@@ -336,6 +350,8 @@ class RefCounted {
   unsigned trivial60() { return ObjectWithNonTrivialDestructor { 5 }.value(); }
   unsigned trivial61() { return DerivedNumber('7').value(); }
   void trivial62() { WTFReportBacktrace(); }
+  SomeType trivial63() { return SomeType(0); }
+  SomeType trivial64() { return SomeType(); }
 
   static RefCounted& singleton() {
     static RefCounted s_RefCounted;
@@ -425,6 +441,7 @@ class RefCounted {
   unsigned nonTrivial21() { return Number("123").value(); }
   unsigned nonTrivial22() { return ComplexNumber(123, "456").real().value(); }
   unsigned nonTrivial23() { return DerivedNumber("123").value(); }
+  SomeType nonTrivial24() { return SomeType("123"); }
 
   static unsigned s_v;
   unsigned v { 0 };
@@ -515,6 +532,8 @@ class UnrelatedClass {
     getFieldTrivial().trivial60(); // no-warning
     getFieldTrivial().trivial61(); // no-warning
     getFieldTrivial().trivial62(); // no-warning
+    getFieldTrivial().trivial63(); // no-warning
+    getFieldTrivial().trivial64(); // no-warning
 
     RefCounted::singleton().trivial18(); // no-warning
     RefCounted::singleton().someFunction(); // no-warning
@@ -587,6 +606,8 @@ class UnrelatedClass {
     // expected-warning at -1{{Call argument for 'this' parameter is uncounted and unsafe}}
     getFieldTrivial().nonTrivial23();
     // expected-warning at -1{{Call argument for 'this' parameter is uncounted and unsafe}}
+    getFieldTrivial().nonTrivial24();
+    // expected-warning at -1{{Call argument for 'this' parameter is uncounted and unsafe}}
   }
 
   void setField(RefCounted*);

>From 842314f80753bd0455dc02c005069a5518eaefb2 Mon Sep 17 00:00:00 2001
From: Ryosuke Niwa <rniwa at webkit.org>
Date: Thu, 10 Oct 2024 10:02:07 -0700
Subject: [PATCH 11/20] isUncountedPtr should take QualType as an argument.
 (#110213)

Make isUncountedPtr take QualType as an argument instead of Type*. This
simplifies some code.
---
 .../Checkers/WebKit/PtrTypesSemantics.cpp        | 16 ++++------------
 .../Checkers/WebKit/PtrTypesSemantics.h          |  2 +-
 .../Checkers/WebKit/UncountedCallArgsChecker.cpp |  6 +-----
 .../WebKit/UncountedLambdaCapturesChecker.cpp    | 10 +++++-----
 .../WebKit/UncountedLocalVarsChecker.cpp         |  6 +-----
 5 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp b/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
index 317642c5b9ca20..2298fe39850de5 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.cpp
@@ -177,14 +177,10 @@ std::optional<bool> isUncounted(const CXXRecordDecl* Class)
   return (*IsRefCountable);
 }
 
-std::optional<bool> isUncountedPtr(const Type* T)
-{
-  assert(T);
-
+std::optional<bool> isUncountedPtr(const QualType T) {
   if (T->isPointerType() || T->isReferenceType()) {
-    if (auto *CXXRD = T->getPointeeCXXRecordDecl()) {
+    if (auto *CXXRD = T->getPointeeCXXRecordDecl())
       return isUncounted(CXXRD);
-    }
   }
   return false;
 }
@@ -208,12 +204,8 @@ std::optional<bool> isGetterOfRefCounted(const CXXMethodDecl* M)
     // Ref<T> -> T conversion
     // FIXME: Currently allowing any Ref<T> -> whatever cast.
     if (isRefType(className)) {
-      if (auto *maybeRefToRawOperator = dyn_cast<CXXConversionDecl>(M)) {
-        if (auto *targetConversionType =
-                maybeRefToRawOperator->getConversionType().getTypePtrOrNull()) {
-          return isUncountedPtr(targetConversionType);
-        }
-      }
+      if (auto *maybeRefToRawOperator = dyn_cast<CXXConversionDecl>(M))
+        return isUncountedPtr(maybeRefToRawOperator->getConversionType());
     }
   }
   return false;
diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.h b/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.h
index 3528c52a7d659d..8e6aadf63b6d67 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.h
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/PtrTypesSemantics.h
@@ -61,7 +61,7 @@ std::optional<bool> isUncounted(const clang::CXXRecordDecl* Class);
 
 /// \returns true if \p T is either a raw pointer or reference to an uncounted
 /// class, false if not, std::nullopt if inconclusive.
-std::optional<bool> isUncountedPtr(const clang::Type* T);
+std::optional<bool> isUncountedPtr(const clang::QualType T);
 
 /// \returns true if Name is a RefPtr, Ref, or its variant, false if not.
 bool isRefType(const std::string &Name);
diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedCallArgsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedCallArgsChecker.cpp
index 0ed93ab26bf5ca..cea3503fa2c314 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedCallArgsChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedCallArgsChecker.cpp
@@ -115,12 +115,8 @@ class UncountedCallArgsChecker
         //  continue;
 
         QualType ArgType = (*P)->getType().getCanonicalType();
-        const auto *TypePtr = ArgType.getTypePtrOrNull();
-        if (!TypePtr)
-          continue; // FIXME? Should we bail?
-
         // FIXME: more complex types (arrays, references to raw pointers, etc)
-        std::optional<bool> IsUncounted = isUncountedPtr(TypePtr);
+        std::optional<bool> IsUncounted = isUncountedPtr(ArgType);
         if (!IsUncounted || !(*IsUncounted))
           continue;
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLambdaCapturesChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLambdaCapturesChecker.cpp
index a226a01ec0a579..998bd4ccee07db 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLambdaCapturesChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLambdaCapturesChecker.cpp
@@ -59,11 +59,11 @@ class UncountedLambdaCapturesChecker
     for (const LambdaCapture &C : L->captures()) {
       if (C.capturesVariable()) {
         ValueDecl *CapturedVar = C.getCapturedVar();
-        if (auto *CapturedVarType = CapturedVar->getType().getTypePtrOrNull()) {
-            std::optional<bool> IsUncountedPtr = isUncountedPtr(CapturedVarType);
-            if (IsUncountedPtr && *IsUncountedPtr) {
-                reportBug(C, CapturedVar, CapturedVarType);
-            }
+        QualType CapturedVarQualType = CapturedVar->getType();
+        if (auto *CapturedVarType = CapturedVarQualType.getTypePtrOrNull()) {
+          auto IsUncountedPtr = isUncountedPtr(CapturedVarQualType);
+          if (IsUncountedPtr && *IsUncountedPtr)
+            reportBug(C, CapturedVar, CapturedVarType);
         }
       }
     }
diff --git a/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
index 9d0a3bb5da7325..81d21100de878d 100644
--- a/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
@@ -199,11 +199,7 @@ class UncountedLocalVarsChecker
     if (shouldSkipVarDecl(V))
       return;
 
-    const auto *ArgType = V->getType().getTypePtr();
-    if (!ArgType)
-      return;
-
-    std::optional<bool> IsUncountedPtr = isUncountedPtr(ArgType);
+    std::optional<bool> IsUncountedPtr = isUncountedPtr(V->getType());
     if (IsUncountedPtr && *IsUncountedPtr) {
       if (tryToFindPtrOrigin(
               Value, /*StopAtFirstRefCountedObj=*/false,

>From 00095b51e1953925d49d5da637f84cae83e1eb07 Mon Sep 17 00:00:00 2001
From: Abid Qadeer <haqadeer at amd.com>
Date: Thu, 10 Oct 2024 18:07:06 +0100
Subject: [PATCH 12/20] [mlir][debug] Support DICommonBlock. (#111706)

A COMMON block is a named area of memory that holds a collection of
variables. Fortran subprograms may map the COMMON block memory area to a
list of variables. A common block is represented in LLVM debug by
DICommonBlock.

This PR adds support for this in MLIR. The changes are mostly mechanical
apart from small change to access the DICompileUnit when the scope of
the variable is DICommonBlock.

---------

Co-authored-by: Tobias Gysi <tobias.gysi at nextsilicon.com>
---
 .../mlir/Dialect/LLVMIR/LLVMAttrDefs.td       | 16 ++++++++++
 mlir/lib/Dialect/LLVMIR/IR/LLVMAttrs.cpp      | 20 +++++++------
 mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp    | 11 +++----
 mlir/lib/Target/LLVMIR/DebugImporter.cpp      |  9 ++++++
 mlir/lib/Target/LLVMIR/DebugImporter.h        |  1 +
 mlir/lib/Target/LLVMIR/DebugTranslation.cpp   | 20 +++++++++----
 mlir/lib/Target/LLVMIR/DebugTranslation.h     |  1 +
 mlir/lib/Target/LLVMIR/ModuleTranslation.cpp  | 28 ++++++++++-------
 mlir/test/Dialect/LLVMIR/debuginfo.mlir       |  8 +++++
 mlir/test/Target/LLVMIR/Import/debug-info.ll  | 24 +++++++++++++++
 mlir/test/Target/LLVMIR/llvmir-debug.mlir     | 30 +++++++++++++++++++
 11 files changed, 138 insertions(+), 30 deletions(-)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td
index c298c8277eb0c3..0d904f13037c61 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td
@@ -701,6 +701,22 @@ def LLVM_DISubrangeAttr : LLVM_Attr<"DISubrange", "di_subrange", /*traits=*/[],
   let assemblyFormat = "`<` struct(params) `>`";
 }
 
+//===----------------------------------------------------------------------===//
+// DICommonBlockAttr
+//===----------------------------------------------------------------------===//
+
+def LLVM_DICommonBlockAttr : LLVM_Attr<"DICommonBlock", "di_common_block",
+                                       /*traits=*/[], "DIScopeAttr"> {
+  let parameters = (ins
+    "DIScopeAttr":$scope,
+    OptionalParameter<"DIGlobalVariableAttr">:$decl,
+    "StringAttr":$name,
+    OptionalParameter<"DIFileAttr">:$file,
+    OptionalParameter<"unsigned">:$line
+  );
+  let assemblyFormat = "`<` struct(params) `>`";
+}
+
 //===----------------------------------------------------------------------===//
 // DISubroutineTypeAttr
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMAttrs.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMAttrs.cpp
index 99871dac81d326..9640bbdf28df45 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/LLVMAttrs.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMAttrs.cpp
@@ -56,13 +56,14 @@ void LLVMDialect::registerAttributes() {
 //===----------------------------------------------------------------------===//
 
 bool DINodeAttr::classof(Attribute attr) {
-  return llvm::isa<DIBasicTypeAttr, DICompileUnitAttr, DICompositeTypeAttr,
-                   DIDerivedTypeAttr, DIFileAttr, DIGlobalVariableAttr,
-                   DIImportedEntityAttr, DILabelAttr, DILexicalBlockAttr,
-                   DILexicalBlockFileAttr, DILocalVariableAttr, DIModuleAttr,
-                   DINamespaceAttr, DINullTypeAttr, DIAnnotationAttr,
-                   DIStringTypeAttr, DISubprogramAttr, DISubrangeAttr,
-                   DISubroutineTypeAttr>(attr);
+  return llvm::isa<DIBasicTypeAttr, DICommonBlockAttr, DICompileUnitAttr,
+                   DICompositeTypeAttr, DIDerivedTypeAttr, DIFileAttr,
+                   DIGlobalVariableAttr, DIImportedEntityAttr, DILabelAttr,
+                   DILexicalBlockAttr, DILexicalBlockFileAttr,
+                   DILocalVariableAttr, DIModuleAttr, DINamespaceAttr,
+                   DINullTypeAttr, DIAnnotationAttr, DIStringTypeAttr,
+                   DISubprogramAttr, DISubrangeAttr, DISubroutineTypeAttr>(
+      attr);
 }
 
 //===----------------------------------------------------------------------===//
@@ -70,8 +71,9 @@ bool DINodeAttr::classof(Attribute attr) {
 //===----------------------------------------------------------------------===//
 
 bool DIScopeAttr::classof(Attribute attr) {
-  return llvm::isa<DICompileUnitAttr, DICompositeTypeAttr, DIFileAttr,
-                   DILocalScopeAttr, DIModuleAttr, DINamespaceAttr>(attr);
+  return llvm::isa<DICommonBlockAttr, DICompileUnitAttr, DICompositeTypeAttr,
+                   DIFileAttr, DILocalScopeAttr, DIModuleAttr, DINamespaceAttr>(
+      attr);
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
index 2c7af8712d420c..006d412936a337 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
@@ -3369,11 +3369,12 @@ struct LLVMOpAsmDialectInterface : public OpAsmDialectInterface {
   AliasResult getAlias(Attribute attr, raw_ostream &os) const override {
     return TypeSwitch<Attribute, AliasResult>(attr)
         .Case<AccessGroupAttr, AliasScopeAttr, AliasScopeDomainAttr,
-              DIBasicTypeAttr, DICompileUnitAttr, DICompositeTypeAttr,
-              DIDerivedTypeAttr, DIFileAttr, DIGlobalVariableAttr,
-              DIGlobalVariableExpressionAttr, DIImportedEntityAttr, DILabelAttr,
-              DILexicalBlockAttr, DILexicalBlockFileAttr, DILocalVariableAttr,
-              DIModuleAttr, DINamespaceAttr, DINullTypeAttr, DIStringTypeAttr,
+              DIBasicTypeAttr, DICommonBlockAttr, DICompileUnitAttr,
+              DICompositeTypeAttr, DIDerivedTypeAttr, DIFileAttr,
+              DIGlobalVariableAttr, DIGlobalVariableExpressionAttr,
+              DIImportedEntityAttr, DILabelAttr, DILexicalBlockAttr,
+              DILexicalBlockFileAttr, DILocalVariableAttr, DIModuleAttr,
+              DINamespaceAttr, DINullTypeAttr, DIStringTypeAttr,
               DISubprogramAttr, DISubroutineTypeAttr, LoopAnnotationAttr,
               LoopVectorizeAttr, LoopInterleaveAttr, LoopUnrollAttr,
               LoopUnrollAndJamAttr, LoopLICMAttr, LoopDistributeAttr,
diff --git a/mlir/lib/Target/LLVMIR/DebugImporter.cpp b/mlir/lib/Target/LLVMIR/DebugImporter.cpp
index cd992be62b4719..412125b6ea65f6 100644
--- a/mlir/lib/Target/LLVMIR/DebugImporter.cpp
+++ b/mlir/lib/Target/LLVMIR/DebugImporter.cpp
@@ -302,6 +302,13 @@ DISubrangeAttr DebugImporter::translateImpl(llvm::DISubrange *node) {
                              getAttrOrNull(node->getStride()));
 }
 
+DICommonBlockAttr DebugImporter::translateImpl(llvm::DICommonBlock *node) {
+  return DICommonBlockAttr::get(context, translate(node->getScope()),
+                                translate(node->getDecl()),
+                                getStringAttrOrNull(node->getRawName()),
+                                translate(node->getFile()), node->getLineNo());
+}
+
 DISubroutineTypeAttr
 DebugImporter::translateImpl(llvm::DISubroutineType *node) {
   SmallVector<DITypeAttr> types;
@@ -339,6 +346,8 @@ DINodeAttr DebugImporter::translate(llvm::DINode *node) {
   auto translateNode = [this](llvm::DINode *node) -> DINodeAttr {
     if (auto *casted = dyn_cast<llvm::DIBasicType>(node))
       return translateImpl(casted);
+    if (auto *casted = dyn_cast<llvm::DICommonBlock>(node))
+      return translateImpl(casted);
     if (auto *casted = dyn_cast<llvm::DICompileUnit>(node))
       return translateImpl(casted);
     if (auto *casted = dyn_cast<llvm::DICompositeType>(node))
diff --git a/mlir/lib/Target/LLVMIR/DebugImporter.h b/mlir/lib/Target/LLVMIR/DebugImporter.h
index cb796676759c39..a452e01a9f6041 100644
--- a/mlir/lib/Target/LLVMIR/DebugImporter.h
+++ b/mlir/lib/Target/LLVMIR/DebugImporter.h
@@ -79,6 +79,7 @@ class DebugImporter {
   DIScopeAttr translateImpl(llvm::DIScope *node);
   DISubprogramAttr translateImpl(llvm::DISubprogram *node);
   DISubrangeAttr translateImpl(llvm::DISubrange *node);
+  DICommonBlockAttr translateImpl(llvm::DICommonBlock *node);
   DISubroutineTypeAttr translateImpl(llvm::DISubroutineType *node);
   DITypeAttr translateImpl(llvm::DIType *node);
 
diff --git a/mlir/lib/Target/LLVMIR/DebugTranslation.cpp b/mlir/lib/Target/LLVMIR/DebugTranslation.cpp
index 92ff079a10c8aa..2491db299af312 100644
--- a/mlir/lib/Target/LLVMIR/DebugTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/DebugTranslation.cpp
@@ -397,6 +397,13 @@ llvm::DISubrange *DebugTranslation::translateImpl(DISubrangeAttr attr) {
                                getMetadataOrNull(attr.getStride()));
 }
 
+llvm::DICommonBlock *DebugTranslation::translateImpl(DICommonBlockAttr attr) {
+  return llvm::DICommonBlock::get(llvmCtx, translate(attr.getScope()),
+                                  translate(attr.getDecl()),
+                                  getMDStringOrNull(attr.getName()),
+                                  translate(attr.getFile()), attr.getLine());
+}
+
 llvm::DISubroutineType *
 DebugTranslation::translateImpl(DISubroutineTypeAttr attr) {
   // Concatenate the result and argument types into a single array.
@@ -428,12 +435,13 @@ llvm::DINode *DebugTranslation::translate(DINodeAttr attr) {
 
   if (!node)
     node = TypeSwitch<DINodeAttr, llvm::DINode *>(attr)
-               .Case<DIBasicTypeAttr, DICompileUnitAttr, DICompositeTypeAttr,
-                     DIDerivedTypeAttr, DIFileAttr, DIGlobalVariableAttr,
-                     DIImportedEntityAttr, DILabelAttr, DILexicalBlockAttr,
-                     DILexicalBlockFileAttr, DILocalVariableAttr, DIModuleAttr,
-                     DINamespaceAttr, DINullTypeAttr, DIStringTypeAttr,
-                     DISubprogramAttr, DISubrangeAttr, DISubroutineTypeAttr>(
+               .Case<DIBasicTypeAttr, DICommonBlockAttr, DICompileUnitAttr,
+                     DICompositeTypeAttr, DIDerivedTypeAttr, DIFileAttr,
+                     DIGlobalVariableAttr, DIImportedEntityAttr, DILabelAttr,
+                     DILexicalBlockAttr, DILexicalBlockFileAttr,
+                     DILocalVariableAttr, DIModuleAttr, DINamespaceAttr,
+                     DINullTypeAttr, DIStringTypeAttr, DISubprogramAttr,
+                     DISubrangeAttr, DISubroutineTypeAttr>(
                    [&](auto attr) { return translateImpl(attr); });
 
   if (node && !node->isTemporary())
diff --git a/mlir/lib/Target/LLVMIR/DebugTranslation.h b/mlir/lib/Target/LLVMIR/DebugTranslation.h
index 422aa34e28f3c9..ff4eaa46c564e2 100644
--- a/mlir/lib/Target/LLVMIR/DebugTranslation.h
+++ b/mlir/lib/Target/LLVMIR/DebugTranslation.h
@@ -88,6 +88,7 @@ class DebugTranslation {
   llvm::DIScope *translateImpl(DIScopeAttr attr);
   llvm::DISubprogram *translateImpl(DISubprogramAttr attr);
   llvm::DISubrange *translateImpl(DISubrangeAttr attr);
+  llvm::DICommonBlock *translateImpl(DICommonBlockAttr attr);
   llvm::DISubroutineType *translateImpl(DISubroutineTypeAttr attr);
   llvm::DIType *translateImpl(DITypeAttr attr);
 
diff --git a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
index cc0de5bc838c99..a5de90160c4145 100644
--- a/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
@@ -1064,19 +1064,27 @@ LogicalResult ModuleTranslation::convertGlobals() {
       // There is no `globals` field in DICompileUnitAttr which can be directly
       // assigned to DICompileUnit. We have to build the list by looking at the
       // dbgExpr of all the GlobalOps. The scope of the variable is used to get
-      // the DICompileUnit in which to add it. But for the languages that
-      // support modules, the scope hierarchy can be
-      // variable -> module -> compile unit
-      // If a variable scope points to the module then we use the scope of the
-      // module to get the compile unit.
-      // Global variables are also used for things like static local variables
-      // in C and local variables with the save attribute in Fortran. The scope
-      // of the variable is the parent function. We use the compile unit of the
-      // parent function in this case.
+      // the DICompileUnit in which to add it.
+      // But there are cases where the scope of a global does not
+      // directly point to the DICompileUnit and we have to do a bit more work
+      // to get to it. Some of those cases are:
+      //
+      // 1. For the languages that support modules, the scope hierarchy can be
+      // variable -> DIModule -> DICompileUnit
+      //
+      // 2. For the Fortran common block variable, the scope hierarchy can be
+      // variable -> DICommonBlock -> DISubprogram -> DICompileUnit
+      //
+      // 3. For entities like static local variables in C or variable with
+      // SAVE attribute in Fortran, the scope hierarchy can be
+      // variable -> DISubprogram -> DICompileUnit
       llvm::DIScope *scope = diGlobalVar->getScope();
       if (auto *mod = dyn_cast_if_present<llvm::DIModule>(scope))
         scope = mod->getScope();
-      else if (auto *sp = dyn_cast_if_present<llvm::DISubprogram>(scope))
+      else if (auto *cb = dyn_cast_if_present<llvm::DICommonBlock>(scope)) {
+        if (auto *sp = dyn_cast_if_present<llvm::DISubprogram>(cb->getScope()))
+          scope = sp->getUnit();
+      } else if (auto *sp = dyn_cast_if_present<llvm::DISubprogram>(scope))
         scope = sp->getUnit();
 
       // Get the compile unit (scope) of the the global variable.
diff --git a/mlir/test/Dialect/LLVMIR/debuginfo.mlir b/mlir/test/Dialect/LLVMIR/debuginfo.mlir
index af95ec97833a13..8475ec6c3510db 100644
--- a/mlir/test/Dialect/LLVMIR/debuginfo.mlir
+++ b/mlir/test/Dialect/LLVMIR/debuginfo.mlir
@@ -156,6 +156,14 @@
 // CHECK-DAG: #[[LABEL2:.*]] =  #llvm.di_label<scope = #[[BLOCK2]]>
 #label2 = #llvm.di_label<scope = #block2>
 
+// CHECK-DAG: #llvm.di_common_block<scope = #[[SP1]], name = "block", file = #[[FILE]], line = 3>
+#di_common_block = #llvm.di_common_block<scope = #sp1, name = "block", file = #file, line = 3>
+#global_var = #llvm.di_global_variable<scope = #di_common_block, name = "a",
+ file = #file, line = 2, type = #int0>
+#var_expression = #llvm.di_global_variable_expression<var = #global_var,
+ expr = <>>
+llvm.mlir.global common @block_() {dbg_expr = #var_expression} : i64
+
 // CHECK: llvm.func @addr(%[[ARG:.*]]: i64)
 llvm.func @addr(%arg: i64) {
   // CHECK: %[[ALLOC:.*]] = llvm.alloca
diff --git a/mlir/test/Target/LLVMIR/Import/debug-info.ll b/mlir/test/Target/LLVMIR/Import/debug-info.ll
index 6267990b0bf803..09909d7d63b2ab 100644
--- a/mlir/test/Target/LLVMIR/Import/debug-info.ll
+++ b/mlir/test/Target/LLVMIR/Import/debug-info.ll
@@ -843,3 +843,27 @@ define void @fn_with_annotations() !dbg !12 {
 
 
 ; CHECK-DAG: #llvm.di_subprogram<{{.*}}name = "fn_with_annotations"{{.*}}annotations = #llvm.di_annotation<name = "foo", value = "bar">>
+
+; // -----
+
+ at block = common global [4 x i8] zeroinitializer, !dbg !0
+
+define void @test() !dbg !3 {
+  ret void
+}
+
+!llvm.module.flags = !{!10}
+!llvm.dbg.cu = !{!7}
+
+!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
+!1 = distinct !DIGlobalVariable(name: "alpha", scope: !2, file: !4, type: !9)
+!2 = !DICommonBlock(scope: !3, declaration: null, name: "block", file: !4, line: 3)
+!3 = distinct !DISubprogram(name: "test", scope: !4, file: !4, spFlags: DISPFlagDefinition, unit: !7)
+!4 = !DIFile(filename: "test.f90", directory: "")
+!7 = distinct !DICompileUnit(language: DW_LANG_Fortran95, file: !4)
+!9 = !DIBasicType(name: "integer", size: 32, encoding: DW_ATE_signed)
+!10 = !{i32 2, !"Debug Info Version", i32 3}
+
+; CHECK: #[[FILE:.+]] = #llvm.di_file<"test.f90" in "">
+; CHECK: #[[SP:.+]] = #llvm.di_subprogram<{{.*}}name = "test"{{.*}}>
+; CHECK: #llvm.di_common_block<scope = #[[SP]], name = "block", file = #[[FILE]], line = 3>
diff --git a/mlir/test/Target/LLVMIR/llvmir-debug.mlir b/mlir/test/Target/LLVMIR/llvmir-debug.mlir
index b09a60b8dcac90..826fda60c5efef 100644
--- a/mlir/test/Target/LLVMIR/llvmir-debug.mlir
+++ b/mlir/test/Target/LLVMIR/llvmir-debug.mlir
@@ -660,3 +660,33 @@ llvm.func @string_ty(%arg0: !llvm.ptr) {
 
 // CHECK-DAG: !DIStringType(name: "character(*)", stringLength: ![[VAR:[0-9]+]], stringLengthExpression: !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8), stringLocationExpression: !DIExpression(DW_OP_push_object_address, DW_OP_deref), size: 32, align: 8)
 // CHECK-DAG: ![[VAR]] = !DILocalVariable(name: "string_size"{{.*}} flags: DIFlagArtificial)
+
+// -----
+
+// Test translation of DICommonBlockAttr.
+#bt = #llvm.di_basic_type<tag = DW_TAG_base_type, name = "int", sizeInBits = 32>
+#file = #llvm.di_file<"test.f90" in "">
+#cu = #llvm.di_compile_unit<id = distinct[0]<>, sourceLanguage = DW_LANG_C,
+ file = #file, isOptimized = false, emissionKind = Full>
+#sp = #llvm.di_subprogram<compileUnit = #cu, scope = #file, name = "test",
+ file = #file, subprogramFlags = Definition>
+#di_common_block = #llvm.di_common_block<scope = #sp, name = "block",
+ file = #file, line = 3>
+#global_var = #llvm.di_global_variable<scope = #di_common_block, name = "a",
+ file = #file, line = 2, type = #bt>
+#var_expression = #llvm.di_global_variable_expression<var = #global_var,
+ expr = <>>
+
+llvm.mlir.global common @block_(dense<0> : tensor<8xi8>)
+  {dbg_expr = #var_expression} : !llvm.array<8 x i8>
+
+llvm.func @test() {
+  llvm.return
+} loc(#loc2)
+
+#loc1 = loc("test.f90":1:0)
+#loc2 = loc(fused<#sp>[#loc1])
+
+// CHECK: !DICommonBlock(scope: ![[SCOPE:[0-9]+]], declaration: null, name: "block", file: ![[FILE:[0-9]+]], line: 3)
+// CHECK: ![[SCOPE]] = {{.*}}!DISubprogram(name: "test"{{.*}})
+// CHECK: ![[FILE]] = !DIFile(filename: "test.f90"{{.*}})

>From 953dd253ad9384d4dc925a19ddd41d0b34288294 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 10 Oct 2024 13:13:17 -0400
Subject: [PATCH 13/20] [runtimes][NFC] Reindent CMake files (#111821)

This is a purely mechanical commit for fixing the indentation of the
runtimes' CMakeLists files after #80007. That PR didn't update the
indentation in order to make the diff easier to review and for merge
conflicts to be easier to resolve (for downstream changes).

This doesn't change any code, it only reindents it.
---
 libcxx/src/CMakeLists.txt    | 194 +++++++++++++++++------------------
 libcxxabi/src/CMakeLists.txt | 140 ++++++++++++-------------
 libunwind/src/CMakeLists.txt |  40 ++++----
 3 files changed, 187 insertions(+), 187 deletions(-)

diff --git a/libcxx/src/CMakeLists.txt b/libcxx/src/CMakeLists.txt
index 9f31822065be9d..4af04f202db1f7 100644
--- a/libcxx/src/CMakeLists.txt
+++ b/libcxx/src/CMakeLists.txt
@@ -173,76 +173,76 @@ split_list(LIBCXX_COMPILE_FLAGS)
 split_list(LIBCXX_LINK_FLAGS)
 
 # Build the shared library.
-  add_library(cxx_shared SHARED ${LIBCXX_SOURCES} ${LIBCXX_HEADERS})
-  target_include_directories(cxx_shared PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
-  target_link_libraries(cxx_shared PUBLIC cxx-headers libcxx-libc-shared
-                                   PRIVATE ${LIBCXX_LIBRARIES})
-  set_target_properties(cxx_shared
-    PROPERTIES
-      EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXX_ENABLE_SHARED}>,FALSE,TRUE>"
-      COMPILE_FLAGS "${LIBCXX_COMPILE_FLAGS}"
-      LINK_FLAGS    "${LIBCXX_LINK_FLAGS}"
-      OUTPUT_NAME   "${LIBCXX_SHARED_OUTPUT_NAME}"
-      VERSION       "${LIBCXX_LIBRARY_VERSION}"
-      SOVERSION     "${LIBCXX_ABI_VERSION}"
-      DEFINE_SYMBOL ""
+add_library(cxx_shared SHARED ${LIBCXX_SOURCES} ${LIBCXX_HEADERS})
+target_include_directories(cxx_shared PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
+target_link_libraries(cxx_shared PUBLIC cxx-headers libcxx-libc-shared
+                                  PRIVATE ${LIBCXX_LIBRARIES})
+set_target_properties(cxx_shared
+  PROPERTIES
+    EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXX_ENABLE_SHARED}>,FALSE,TRUE>"
+    COMPILE_FLAGS "${LIBCXX_COMPILE_FLAGS}"
+    LINK_FLAGS    "${LIBCXX_LINK_FLAGS}"
+    OUTPUT_NAME   "${LIBCXX_SHARED_OUTPUT_NAME}"
+    VERSION       "${LIBCXX_LIBRARY_VERSION}"
+    SOVERSION     "${LIBCXX_ABI_VERSION}"
+    DEFINE_SYMBOL ""
+)
+cxx_add_common_build_flags(cxx_shared)
+
+if(ZOS)
+  add_custom_command(TARGET cxx_shared POST_BUILD
+    COMMAND
+      ${LIBCXX_SOURCE_DIR}/utils/zos_rename_dll_side_deck.sh
+      $<TARGET_LINKER_FILE_NAME:cxx_shared> $<TARGET_FILE_NAME:cxx_shared> "${LIBCXX_DLL_NAME}"
+    COMMENT "Rename dll name inside the side deck file"
+    WORKING_DIRECTORY $<TARGET_FILE_DIR:cxx_shared>
   )
-  cxx_add_common_build_flags(cxx_shared)
-
-  if(ZOS)
-    add_custom_command(TARGET cxx_shared POST_BUILD
-      COMMAND
-        ${LIBCXX_SOURCE_DIR}/utils/zos_rename_dll_side_deck.sh
-        $<TARGET_LINKER_FILE_NAME:cxx_shared> $<TARGET_FILE_NAME:cxx_shared> "${LIBCXX_DLL_NAME}"
-      COMMENT "Rename dll name inside the side deck file"
-      WORKING_DIRECTORY $<TARGET_FILE_DIR:cxx_shared>
-    )
-  endif()
+endif()
 
-  # Link against libc++abi
-  if (LIBCXX_STATICALLY_LINK_ABI_IN_SHARED_LIBRARY)
-    target_link_libraries(cxx_shared PRIVATE libcxx-abi-shared-objects)
-  else()
-    target_link_libraries(cxx_shared PUBLIC libcxx-abi-shared)
-  endif()
+# Link against libc++abi
+if (LIBCXX_STATICALLY_LINK_ABI_IN_SHARED_LIBRARY)
+  target_link_libraries(cxx_shared PRIVATE libcxx-abi-shared-objects)
+else()
+  target_link_libraries(cxx_shared PUBLIC libcxx-abi-shared)
+endif()
 
-  # Maybe force some symbols to be weak, not weak or not exported.
-  # TODO: This shouldn't depend on the platform, and ideally it should be done in the sources.
-  if (APPLE AND LIBCXX_CXX_ABI MATCHES "libcxxabi$"
-            AND NOT LIBCXX_STATICALLY_LINK_ABI_IN_SHARED_LIBRARY)
-    target_link_libraries(cxx_shared PRIVATE
-      "-Wl,-force_symbols_not_weak_list,${CMAKE_CURRENT_SOURCE_DIR}/../lib/notweak.exp"
-      "-Wl,-force_symbols_weak_list,${CMAKE_CURRENT_SOURCE_DIR}/../lib/weak.exp")
-  endif()
+# Maybe force some symbols to be weak, not weak or not exported.
+# TODO: This shouldn't depend on the platform, and ideally it should be done in the sources.
+if (APPLE AND LIBCXX_CXX_ABI MATCHES "libcxxabi$"
+          AND NOT LIBCXX_STATICALLY_LINK_ABI_IN_SHARED_LIBRARY)
+  target_link_libraries(cxx_shared PRIVATE
+    "-Wl,-force_symbols_not_weak_list,${CMAKE_CURRENT_SOURCE_DIR}/../lib/notweak.exp"
+    "-Wl,-force_symbols_weak_list,${CMAKE_CURRENT_SOURCE_DIR}/../lib/weak.exp")
+endif()
 
-  # Generate a linker script in place of a libc++.so symlink.
-  if (LIBCXX_ENABLE_ABI_LINKER_SCRIPT)
-    set(link_libraries)
-
-    set(imported_libname "$<TARGET_PROPERTY:libcxx-abi-shared,IMPORTED_LIBNAME>")
-    set(output_name "$<TARGET_PROPERTY:libcxx-abi-shared,OUTPUT_NAME>")
-    string(APPEND link_libraries "${CMAKE_LINK_LIBRARY_FLAG}$<IF:$<BOOL:${imported_libname}>,${imported_libname},${output_name}>")
-
-    # TODO: Move to the same approach as above for the unwind library
-    if (LIBCXXABI_USE_LLVM_UNWINDER)
-      if (LIBCXXABI_STATICALLY_LINK_UNWINDER_IN_SHARED_LIBRARY)
-        # libunwind is already included in libc++abi
-      elseif (TARGET unwind_shared OR HAVE_LIBUNWIND)
-        string(APPEND link_libraries " ${CMAKE_LINK_LIBRARY_FLAG}$<TARGET_PROPERTY:unwind_shared,OUTPUT_NAME>")
-      else()
-        string(APPEND link_libraries " ${CMAKE_LINK_LIBRARY_FLAG}unwind")
-      endif()
-    endif()
+# Generate a linker script in place of a libc++.so symlink.
+if (LIBCXX_ENABLE_ABI_LINKER_SCRIPT)
+  set(link_libraries)
 
-    set(linker_script "INPUT($<TARGET_SONAME_FILE_NAME:cxx_shared> ${link_libraries})")
-    add_custom_command(TARGET cxx_shared POST_BUILD
-      COMMAND "${CMAKE_COMMAND}" -E remove "$<TARGET_LINKER_FILE:cxx_shared>"
-      COMMAND "${CMAKE_COMMAND}" -E echo "${linker_script}" > "$<TARGET_LINKER_FILE:cxx_shared>"
-      COMMENT "Generating linker script: '${linker_script}' as file $<TARGET_LINKER_FILE:cxx_shared>"
-      VERBATIM
-    )
+  set(imported_libname "$<TARGET_PROPERTY:libcxx-abi-shared,IMPORTED_LIBNAME>")
+  set(output_name "$<TARGET_PROPERTY:libcxx-abi-shared,OUTPUT_NAME>")
+  string(APPEND link_libraries "${CMAKE_LINK_LIBRARY_FLAG}$<IF:$<BOOL:${imported_libname}>,${imported_libname},${output_name}>")
+
+  # TODO: Move to the same approach as above for the unwind library
+  if (LIBCXXABI_USE_LLVM_UNWINDER)
+    if (LIBCXXABI_STATICALLY_LINK_UNWINDER_IN_SHARED_LIBRARY)
+      # libunwind is already included in libc++abi
+    elseif (TARGET unwind_shared OR HAVE_LIBUNWIND)
+      string(APPEND link_libraries " ${CMAKE_LINK_LIBRARY_FLAG}$<TARGET_PROPERTY:unwind_shared,OUTPUT_NAME>")
+    else()
+      string(APPEND link_libraries " ${CMAKE_LINK_LIBRARY_FLAG}unwind")
+    endif()
   endif()
 
+  set(linker_script "INPUT($<TARGET_SONAME_FILE_NAME:cxx_shared> ${link_libraries})")
+  add_custom_command(TARGET cxx_shared POST_BUILD
+    COMMAND "${CMAKE_COMMAND}" -E remove "$<TARGET_LINKER_FILE:cxx_shared>"
+    COMMAND "${CMAKE_COMMAND}" -E echo "${linker_script}" > "$<TARGET_LINKER_FILE:cxx_shared>"
+    COMMENT "Generating linker script: '${linker_script}' as file $<TARGET_LINKER_FILE:cxx_shared>"
+    VERBATIM
+  )
+endif()
+
 if (LIBCXX_ENABLE_SHARED)
   list(APPEND LIBCXX_BUILD_TARGETS "cxx_shared")
 endif()
@@ -263,43 +263,43 @@ endif()
 set(CMAKE_STATIC_LIBRARY_PREFIX "lib")
 
 # Build the static library.
-  add_library(cxx_static STATIC ${LIBCXX_SOURCES} ${LIBCXX_HEADERS})
-  target_include_directories(cxx_static PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
-  target_link_libraries(cxx_static PUBLIC cxx-headers libcxx-libc-static
-                                   PRIVATE ${LIBCXX_LIBRARIES}
-                                   PRIVATE libcxx-abi-static)
-  set_target_properties(cxx_static
-    PROPERTIES
-      EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXX_ENABLE_STATIC}>,FALSE,TRUE>"
-      COMPILE_FLAGS "${LIBCXX_COMPILE_FLAGS}"
-      LINK_FLAGS    "${LIBCXX_LINK_FLAGS}"
-      OUTPUT_NAME   "${LIBCXX_STATIC_OUTPUT_NAME}"
-  )
-  cxx_add_common_build_flags(cxx_static)
-
-  if (LIBCXX_HERMETIC_STATIC_LIBRARY)
-    # If the hermetic library doesn't define the operator new/delete functions
-    # then its code shouldn't declare them with hidden visibility.  They might
-    # actually be provided by a shared library at link time.
-    if (LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS)
-      append_flags_if_supported(CXX_STATIC_LIBRARY_FLAGS -fvisibility-global-new-delete=force-hidden)
-      if (NOT CXX_SUPPORTS_FVISIBILITY_GLOBAL_NEW_DELETE_EQ_FORCE_HIDDEN_FLAG)
-        append_flags_if_supported(CXX_STATIC_LIBRARY_FLAGS -fvisibility-global-new-delete-hidden)
-      endif()
+add_library(cxx_static STATIC ${LIBCXX_SOURCES} ${LIBCXX_HEADERS})
+target_include_directories(cxx_static PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
+target_link_libraries(cxx_static PUBLIC cxx-headers libcxx-libc-static
+                                  PRIVATE ${LIBCXX_LIBRARIES}
+                                  PRIVATE libcxx-abi-static)
+set_target_properties(cxx_static
+  PROPERTIES
+    EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXX_ENABLE_STATIC}>,FALSE,TRUE>"
+    COMPILE_FLAGS "${LIBCXX_COMPILE_FLAGS}"
+    LINK_FLAGS    "${LIBCXX_LINK_FLAGS}"
+    OUTPUT_NAME   "${LIBCXX_STATIC_OUTPUT_NAME}"
+)
+cxx_add_common_build_flags(cxx_static)
+
+if (LIBCXX_HERMETIC_STATIC_LIBRARY)
+  # If the hermetic library doesn't define the operator new/delete functions
+  # then its code shouldn't declare them with hidden visibility.  They might
+  # actually be provided by a shared library at link time.
+  if (LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS)
+    append_flags_if_supported(CXX_STATIC_LIBRARY_FLAGS -fvisibility-global-new-delete=force-hidden)
+    if (NOT CXX_SUPPORTS_FVISIBILITY_GLOBAL_NEW_DELETE_EQ_FORCE_HIDDEN_FLAG)
+      append_flags_if_supported(CXX_STATIC_LIBRARY_FLAGS -fvisibility-global-new-delete-hidden)
     endif()
-    target_compile_options(cxx_static PRIVATE ${CXX_STATIC_LIBRARY_FLAGS})
-    # _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS can be defined in __config_site
-    # too. Define it in the same way here, to avoid redefinition conflicts.
-    target_compile_definitions(cxx_static PRIVATE _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS=)
   endif()
+  target_compile_options(cxx_static PRIVATE ${CXX_STATIC_LIBRARY_FLAGS})
+  # _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS can be defined in __config_site
+  # too. Define it in the same way here, to avoid redefinition conflicts.
+  target_compile_definitions(cxx_static PRIVATE _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS=)
+endif()
 
-  if (LIBCXX_ENABLE_STATIC)
-    list(APPEND LIBCXX_BUILD_TARGETS "cxx_static")
-  endif()
-  # Attempt to merge the libc++.a archive and the ABI library archive into one.
-  if (LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY)
-    target_link_libraries(cxx_static PRIVATE libcxx-abi-static-objects)
-  endif()
+if (LIBCXX_ENABLE_STATIC)
+  list(APPEND LIBCXX_BUILD_TARGETS "cxx_static")
+endif()
+# Attempt to merge the libc++.a archive and the ABI library archive into one.
+if (LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY)
+  target_link_libraries(cxx_static PRIVATE libcxx-abi-static-objects)
+endif()
 
 # Add a meta-target for both libraries.
 add_custom_target(cxx DEPENDS ${LIBCXX_BUILD_TARGETS})
diff --git a/libcxxabi/src/CMakeLists.txt b/libcxxabi/src/CMakeLists.txt
index e496cf3339164e..84fe2784bec5ca 100644
--- a/libcxxabi/src/CMakeLists.txt
+++ b/libcxxabi/src/CMakeLists.txt
@@ -184,78 +184,78 @@ if (CMAKE_POSITION_INDEPENDENT_CODE OR NOT DEFINED CMAKE_POSITION_INDEPENDENT_CO
 endif()
 target_compile_options(cxxabi_shared_objects PRIVATE "${LIBCXXABI_ADDITIONAL_COMPILE_FLAGS}")
 
-  add_library(cxxabi_shared SHARED)
-  set_target_properties(cxxabi_shared
-    PROPERTIES
-      EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXXABI_ENABLE_SHARED}>,FALSE,TRUE>"
-      LINK_FLAGS "${LIBCXXABI_LINK_FLAGS}"
-      OUTPUT_NAME "${LIBCXXABI_SHARED_OUTPUT_NAME}"
-      SOVERSION "1"
-      VERSION "${LIBCXXABI_LIBRARY_VERSION}"
-  )
+add_library(cxxabi_shared SHARED)
+set_target_properties(cxxabi_shared
+  PROPERTIES
+    EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXXABI_ENABLE_SHARED}>,FALSE,TRUE>"
+    LINK_FLAGS "${LIBCXXABI_LINK_FLAGS}"
+    OUTPUT_NAME "${LIBCXXABI_SHARED_OUTPUT_NAME}"
+    SOVERSION "1"
+    VERSION "${LIBCXXABI_LIBRARY_VERSION}"
+)
 
-  if (ZOS)
-    add_custom_command(TARGET cxxabi_shared POST_BUILD
-      COMMAND
-        ${LIBCXXABI_LIBCXX_PATH}/utils/zos_rename_dll_side_deck.sh
-        $<TARGET_LINKER_FILE_NAME:cxxabi_shared> $<TARGET_FILE_NAME:cxxabi_shared> "${LIBCXXABI_DLL_NAME}"
-      COMMENT "Rename dll name inside the side deck file"
-      WORKING_DIRECTORY $<TARGET_FILE_DIR:cxxabi_shared>
-    )
-  endif ()
+if (ZOS)
+  add_custom_command(TARGET cxxabi_shared POST_BUILD
+    COMMAND
+      ${LIBCXXABI_LIBCXX_PATH}/utils/zos_rename_dll_side_deck.sh
+      $<TARGET_LINKER_FILE_NAME:cxxabi_shared> $<TARGET_FILE_NAME:cxxabi_shared> "${LIBCXXABI_DLL_NAME}"
+    COMMENT "Rename dll name inside the side deck file"
+    WORKING_DIRECTORY $<TARGET_FILE_DIR:cxxabi_shared>
+  )
+endif ()
 
-  target_link_libraries(cxxabi_shared
-    PUBLIC cxxabi_shared_objects
-    PRIVATE ${LIBCXXABI_LIBRARIES})
+target_link_libraries(cxxabi_shared
+  PUBLIC cxxabi_shared_objects
+  PRIVATE ${LIBCXXABI_LIBRARIES})
 
 if (LIBCXXABI_ENABLE_SHARED)
-  list(APPEND LIBCXXABI_BUILD_TARGETS "cxxabi_shared")
+list(APPEND LIBCXXABI_BUILD_TARGETS "cxxabi_shared")
 endif()
 if (LIBCXXABI_INSTALL_SHARED_LIBRARY)
-  list(APPEND LIBCXXABI_INSTALL_TARGETS "cxxabi_shared")
+list(APPEND LIBCXXABI_INSTALL_TARGETS "cxxabi_shared")
 endif()
 
-  # TODO: Move this to libc++'s HandleLibCXXABI.cmake since this is effectively trying to control
-  #       what libc++ re-exports.
-  add_library(cxxabi-reexports INTERFACE)
-  function(export_symbols file)
-    # -exported_symbols_list is only available on Apple platforms
-    if (APPLE)
-      target_link_libraries(cxxabi_shared PRIVATE "-Wl,-exported_symbols_list,${file}")
-    endif()
-  endfunction()
+# TODO: Move this to libc++'s HandleLibCXXABI.cmake since this is effectively trying to control
+#       what libc++ re-exports.
+add_library(cxxabi-reexports INTERFACE)
+function(export_symbols file)
+  # -exported_symbols_list is only available on Apple platforms
+  if (APPLE)
+    target_link_libraries(cxxabi_shared PRIVATE "-Wl,-exported_symbols_list,${file}")
+  endif()
+endfunction()
 
-  function(reexport_symbols file)
-    export_symbols("${file}")
-    # -reexported_symbols_list is only available on Apple platforms
-    if (APPLE)
-      target_link_libraries(cxxabi-reexports INTERFACE "-Wl,-reexported_symbols_list,${file}")
-    endif()
-  endfunction()
+function(reexport_symbols file)
+  export_symbols("${file}")
+  # -reexported_symbols_list is only available on Apple platforms
+  if (APPLE)
+    target_link_libraries(cxxabi-reexports INTERFACE "-Wl,-reexported_symbols_list,${file}")
+  endif()
+endfunction()
 
-  export_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/symbols-not-reexported.exp")
-  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/cxxabiv1.exp")
-  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/fundamental-types.exp")
-  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/itanium-base.exp")
-  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/std-misc.exp")
+export_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/symbols-not-reexported.exp")
+reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/cxxabiv1.exp")
+reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/fundamental-types.exp")
+reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/itanium-base.exp")
+reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/std-misc.exp")
 
-  if (LIBCXXABI_ENABLE_NEW_DELETE_DEFINITIONS)
-    reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/new-delete.exp")
-  endif()
+if (LIBCXXABI_ENABLE_NEW_DELETE_DEFINITIONS)
+  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/new-delete.exp")
+endif()
 
-  # Note that std:: exception types are always defined by the library regardless of
-  # whether the exception runtime machinery is provided.
-  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/std-exceptions.exp")
+# Note that std:: exception types are always defined by the library regardless of
+# whether the exception runtime machinery is provided.
+reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/std-exceptions.exp")
 
-  if (LIBCXXABI_ENABLE_EXCEPTIONS)
-    reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/itanium-exceptions.exp")
+if (LIBCXXABI_ENABLE_EXCEPTIONS)
+  reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/itanium-exceptions.exp")
 
-    if ("${CMAKE_OSX_ARCHITECTURES}" MATCHES "^(armv6|armv7|armv7s)$")
-      reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/personality-sjlj.exp")
-    else()
-      reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/personality-v0.exp")
-    endif()
+  if ("${CMAKE_OSX_ARCHITECTURES}" MATCHES "^(armv6|armv7|armv7s)$")
+    reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/personality-sjlj.exp")
+  else()
+    reexport_symbols("${CMAKE_CURRENT_SOURCE_DIR}/../lib/personality-v0.exp")
   endif()
+endif()
 
 # Build the static library.
 add_library(cxxabi_static_objects OBJECT EXCLUDE_FROM_ALL ${LIBCXXABI_SOURCES} ${LIBCXXABI_HEADERS})
@@ -295,19 +295,19 @@ if(LIBCXXABI_HERMETIC_STATIC_LIBRARY)
       _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS=)
 endif()
 
-  add_library(cxxabi_static STATIC)
-  if (LIBCXXABI_USE_LLVM_UNWINDER AND NOT LIBCXXABI_STATICALLY_LINK_UNWINDER_IN_STATIC_LIBRARY)
-    target_link_libraries(cxxabi_static PUBLIC unwind_static)
-  endif()
-  set_target_properties(cxxabi_static
-    PROPERTIES
-      EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXXABI_ENABLE_STATIC}>,FALSE,TRUE>"
-      LINK_FLAGS "${LIBCXXABI_LINK_FLAGS}"
-      OUTPUT_NAME "${LIBCXXABI_STATIC_OUTPUT_NAME}"
-    )
-  target_link_libraries(cxxabi_static
-    PUBLIC cxxabi_static_objects
-    PRIVATE ${LIBCXXABI_STATIC_LIBRARIES} ${LIBCXXABI_LIBRARIES})
+add_library(cxxabi_static STATIC)
+if (LIBCXXABI_USE_LLVM_UNWINDER AND NOT LIBCXXABI_STATICALLY_LINK_UNWINDER_IN_STATIC_LIBRARY)
+  target_link_libraries(cxxabi_static PUBLIC unwind_static)
+endif()
+set_target_properties(cxxabi_static
+  PROPERTIES
+    EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBCXXABI_ENABLE_STATIC}>,FALSE,TRUE>"
+    LINK_FLAGS "${LIBCXXABI_LINK_FLAGS}"
+    OUTPUT_NAME "${LIBCXXABI_STATIC_OUTPUT_NAME}"
+  )
+target_link_libraries(cxxabi_static
+  PUBLIC cxxabi_static_objects
+  PRIVATE ${LIBCXXABI_STATIC_LIBRARIES} ${LIBCXXABI_LIBRARIES})
 
 if (LIBCXXABI_ENABLE_STATIC)
   list(APPEND LIBCXXABI_BUILD_TARGETS "cxxabi_static")
diff --git a/libunwind/src/CMakeLists.txt b/libunwind/src/CMakeLists.txt
index 3065bfc8a07050..2e18b109656331 100644
--- a/libunwind/src/CMakeLists.txt
+++ b/libunwind/src/CMakeLists.txt
@@ -153,17 +153,17 @@ if (CMAKE_POSITION_INDEPENDENT_CODE OR NOT DEFINED CMAKE_POSITION_INDEPENDENT_CO
   set_target_properties(unwind_shared_objects PROPERTIES POSITION_INDEPENDENT_CODE ON) # must set manually because it's an object library
 endif()
 
-  add_library(unwind_shared SHARED)
-  target_link_libraries(unwind_shared PUBLIC unwind_shared_objects)
-  set_target_properties(unwind_shared
-    PROPERTIES
-      EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBUNWIND_ENABLE_SHARED}>,FALSE,TRUE>"
-      LINK_FLAGS "${LIBUNWIND_LINK_FLAGS}"
-      LINKER_LANGUAGE C
-      OUTPUT_NAME "${LIBUNWIND_SHARED_OUTPUT_NAME}"
-      VERSION     "${LIBUNWIND_LIBRARY_VERSION}"
-      SOVERSION   "1"
-  )
+add_library(unwind_shared SHARED)
+target_link_libraries(unwind_shared PUBLIC unwind_shared_objects)
+set_target_properties(unwind_shared
+  PROPERTIES
+    EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBUNWIND_ENABLE_SHARED}>,FALSE,TRUE>"
+    LINK_FLAGS "${LIBUNWIND_LINK_FLAGS}"
+    LINKER_LANGUAGE C
+    OUTPUT_NAME "${LIBUNWIND_SHARED_OUTPUT_NAME}"
+    VERSION     "${LIBUNWIND_LIBRARY_VERSION}"
+    SOVERSION   "1"
+)
 
 if (LIBUNWIND_ENABLE_SHARED)
   list(APPEND LIBUNWIND_BUILD_TARGETS "unwind_shared")
@@ -200,15 +200,15 @@ if(LIBUNWIND_HIDE_SYMBOLS)
   target_compile_definitions(unwind_static_objects PRIVATE _LIBUNWIND_HIDE_SYMBOLS)
 endif()
 
-  add_library(unwind_static STATIC)
-  target_link_libraries(unwind_static PUBLIC unwind_static_objects)
-  set_target_properties(unwind_static
-    PROPERTIES
-      EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBUNWIND_ENABLE_STATIC}>,FALSE,TRUE>"
-      LINK_FLAGS "${LIBUNWIND_LINK_FLAGS}"
-      LINKER_LANGUAGE C
-      OUTPUT_NAME "${LIBUNWIND_STATIC_OUTPUT_NAME}"
-  )
+add_library(unwind_static STATIC)
+target_link_libraries(unwind_static PUBLIC unwind_static_objects)
+set_target_properties(unwind_static
+  PROPERTIES
+    EXCLUDE_FROM_ALL "$<IF:$<BOOL:${LIBUNWIND_ENABLE_STATIC}>,FALSE,TRUE>"
+    LINK_FLAGS "${LIBUNWIND_LINK_FLAGS}"
+    LINKER_LANGUAGE C
+    OUTPUT_NAME "${LIBUNWIND_STATIC_OUTPUT_NAME}"
+)
 
 if (LIBUNWIND_ENABLE_STATIC)
   list(APPEND LIBUNWIND_BUILD_TARGETS "unwind_static")

>From 9c1d96f59e21ce53884e719237a49d6a9afe3579 Mon Sep 17 00:00:00 2001
From: Justin Fargnoli <jfargnoli at nvidia.com>
Date: Thu, 10 Oct 2024 10:24:02 -0700
Subject: [PATCH 14/20] [NVPTX] Prefer prmt.b32 over bfi.b32 (#110766)

In [[NVPTX] Improve lowering of
v4i8](https://github.com/llvm/llvm-project/commit/cbafb6f2f5c99474164dcc725820cbbeb2e02e14)
@Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX
instructions. @Artem-B did this because:
([source](https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911))

> Under the hood byte extraction/insertion ends up as BFI/BFE
instructions, so we may as well do that in PTX, too.
https://godbolt.org/z/Tb3zWbj9b

However, the example that @Artem-B linked was targeting sm_52. On modern
architectures, ptxas uses prmt.b32.
[Example](https://godbolt.org/z/Ye4W1n84o).

Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
---
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp  |  31 +-
 llvm/test/CodeGen/NVPTX/i8x4-instructions.ll | 614 ++++++++++---------
 llvm/test/CodeGen/NVPTX/sext-setcc.ll        |  18 +-
 3 files changed, 335 insertions(+), 328 deletions(-)

diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 57bc5fe0ac361c..d95f8f214be557 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -2332,20 +2332,23 @@ SDValue NVPTXTargetLowering::LowerBUILD_VECTOR(SDValue Op,
     // Lower non-const v4i8 vector as byte-wise constructed i32, which allows us
     // to optimize calculation of constant parts.
     if (VT == MVT::v4i8) {
-      SDValue C8 = DAG.getConstant(8, DL, MVT::i32);
-      SDValue E01 = DAG.getNode(
-          NVPTXISD::BFI, DL, MVT::i32,
-          DAG.getAnyExtOrTrunc(Op->getOperand(1), DL, MVT::i32),
-          DAG.getAnyExtOrTrunc(Op->getOperand(0), DL, MVT::i32), C8, C8);
-      SDValue E012 =
-          DAG.getNode(NVPTXISD::BFI, DL, MVT::i32,
-                      DAG.getAnyExtOrTrunc(Op->getOperand(2), DL, MVT::i32),
-                      E01, DAG.getConstant(16, DL, MVT::i32), C8);
-      SDValue E0123 =
-          DAG.getNode(NVPTXISD::BFI, DL, MVT::i32,
-                      DAG.getAnyExtOrTrunc(Op->getOperand(3), DL, MVT::i32),
-                      E012, DAG.getConstant(24, DL, MVT::i32), C8);
-      return DAG.getNode(ISD::BITCAST, DL, VT, E0123);
+      SDValue PRMT__10 = DAG.getNode(
+          NVPTXISD::PRMT, DL, MVT::v4i8,
+          {DAG.getAnyExtOrTrunc(Op->getOperand(0), DL, MVT::i32),
+           DAG.getAnyExtOrTrunc(Op->getOperand(1), DL, MVT::i32),
+           DAG.getConstant(0x3340, DL, MVT::i32),
+           DAG.getConstant(NVPTX::PTXPrmtMode::NONE, DL, MVT::i32)});
+      SDValue PRMT32__ = DAG.getNode(
+          NVPTXISD::PRMT, DL, MVT::v4i8,
+          {DAG.getAnyExtOrTrunc(Op->getOperand(2), DL, MVT::i32),
+           DAG.getAnyExtOrTrunc(Op->getOperand(3), DL, MVT::i32),
+           DAG.getConstant(0x4033, DL, MVT::i32),
+           DAG.getConstant(NVPTX::PTXPrmtMode::NONE, DL, MVT::i32)});
+      SDValue PRMT3210 = DAG.getNode(
+          NVPTXISD::PRMT, DL, MVT::v4i8,
+          {PRMT__10, PRMT32__, DAG.getConstant(0x5410, DL, MVT::i32),
+           DAG.getConstant(NVPTX::PTXPrmtMode::NONE, DL, MVT::i32)});
+      return DAG.getNode(ISD::BITCAST, DL, VT, PRMT3210);
     }
     return Op;
   }
diff --git a/llvm/test/CodeGen/NVPTX/i8x4-instructions.ll b/llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
index 96a4359d0ec43e..84dde539ce4c47 100644
--- a/llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
@@ -101,38 +101,38 @@ define <4 x i8> @test_add(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_add(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<13>;
-; CHECK-NEXT:    .reg .b32 %r<19>;
+; CHECK-NEXT:    .reg .b32 %r<18>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_add_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_add_param_0];
-; CHECK-NEXT:    bfe.u32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r2, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs1, %r3;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs2, %r4;
 ; CHECK-NEXT:    add.s16 %rs3, %rs2, %rs1;
 ; CHECK-NEXT:    cvt.u32.u16 %r5, %rs3;
-; CHECK-NEXT:    bfe.u32 %r6, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r2, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs4, %r6;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs5, %r7;
 ; CHECK-NEXT:    add.s16 %rs6, %rs5, %rs4;
 ; CHECK-NEXT:    cvt.u32.u16 %r8, %rs6;
-; CHECK-NEXT:    bfi.b32 %r9, %r8, %r5, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r2, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r9, %r8, %r5, 16435;
+; CHECK-NEXT:    bfe.u32 %r10, %r2, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs7, %r10;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs8, %r11;
 ; CHECK-NEXT:    add.s16 %rs9, %rs8, %rs7;
 ; CHECK-NEXT:    cvt.u32.u16 %r12, %rs9;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r9, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r14, %r2, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs10, %r14;
-; CHECK-NEXT:    bfe.u32 %r15, %r1, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs11, %r15;
+; CHECK-NEXT:    bfe.u32 %r13, %r2, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs10, %r13;
+; CHECK-NEXT:    bfe.u32 %r14, %r1, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs11, %r14;
 ; CHECK-NEXT:    add.s16 %rs12, %rs11, %rs10;
-; CHECK-NEXT:    cvt.u32.u16 %r16, %rs12;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r13, 24, 8;
+; CHECK-NEXT:    cvt.u32.u16 %r15, %rs12;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r12, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r9, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r17;
 ; CHECK-NEXT:    ret;
   %r = add <4 x i8> %a, %b
@@ -143,29 +143,29 @@ define <4 x i8> @test_add_imm_0(<4 x i8> %a) #0 {
 ; CHECK-LABEL: test_add_imm_0(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<9>;
-; CHECK-NEXT:    .reg .b32 %r<14>;
+; CHECK-NEXT:    .reg .b32 %r<13>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_add_imm_0_param_0];
-; CHECK-NEXT:    bfe.u32 %r2, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r2, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs1, %r2;
-; CHECK-NEXT:    add.s16 %rs2, %rs1, 1;
+; CHECK-NEXT:    add.s16 %rs2, %rs1, 4;
 ; CHECK-NEXT:    cvt.u32.u16 %r3, %rs2;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs3, %r4;
-; CHECK-NEXT:    add.s16 %rs4, %rs3, 2;
+; CHECK-NEXT:    add.s16 %rs4, %rs3, 3;
 ; CHECK-NEXT:    cvt.u32.u16 %r5, %rs4;
-; CHECK-NEXT:    bfi.b32 %r6, %r5, %r3, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r6, %r5, %r3, 16435;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs5, %r7;
-; CHECK-NEXT:    add.s16 %rs6, %rs5, 3;
+; CHECK-NEXT:    add.s16 %rs6, %rs5, 2;
 ; CHECK-NEXT:    cvt.u32.u16 %r8, %rs6;
-; CHECK-NEXT:    bfi.b32 %r9, %r8, %r6, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r1, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs7, %r10;
-; CHECK-NEXT:    add.s16 %rs8, %rs7, 4;
-; CHECK-NEXT:    cvt.u32.u16 %r11, %rs8;
-; CHECK-NEXT:    bfi.b32 %r12, %r11, %r9, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r1, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs7, %r9;
+; CHECK-NEXT:    add.s16 %rs8, %rs7, 1;
+; CHECK-NEXT:    cvt.u32.u16 %r10, %rs8;
+; CHECK-NEXT:    prmt.b32 %r11, %r10, %r8, 13120;
+; CHECK-NEXT:    prmt.b32 %r12, %r11, %r6, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r12;
 ; CHECK-NEXT:    ret;
   %r = add <4 x i8> <i8 1, i8 2, i8 3, i8 4>, %a
@@ -176,29 +176,29 @@ define <4 x i8> @test_add_imm_1(<4 x i8> %a) #0 {
 ; CHECK-LABEL: test_add_imm_1(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<9>;
-; CHECK-NEXT:    .reg .b32 %r<14>;
+; CHECK-NEXT:    .reg .b32 %r<13>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_add_imm_1_param_0];
-; CHECK-NEXT:    bfe.u32 %r2, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r2, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs1, %r2;
-; CHECK-NEXT:    add.s16 %rs2, %rs1, 1;
+; CHECK-NEXT:    add.s16 %rs2, %rs1, 4;
 ; CHECK-NEXT:    cvt.u32.u16 %r3, %rs2;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs3, %r4;
-; CHECK-NEXT:    add.s16 %rs4, %rs3, 2;
+; CHECK-NEXT:    add.s16 %rs4, %rs3, 3;
 ; CHECK-NEXT:    cvt.u32.u16 %r5, %rs4;
-; CHECK-NEXT:    bfi.b32 %r6, %r5, %r3, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r6, %r5, %r3, 16435;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs5, %r7;
-; CHECK-NEXT:    add.s16 %rs6, %rs5, 3;
+; CHECK-NEXT:    add.s16 %rs6, %rs5, 2;
 ; CHECK-NEXT:    cvt.u32.u16 %r8, %rs6;
-; CHECK-NEXT:    bfi.b32 %r9, %r8, %r6, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r1, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs7, %r10;
-; CHECK-NEXT:    add.s16 %rs8, %rs7, 4;
-; CHECK-NEXT:    cvt.u32.u16 %r11, %rs8;
-; CHECK-NEXT:    bfi.b32 %r12, %r11, %r9, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r1, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs7, %r9;
+; CHECK-NEXT:    add.s16 %rs8, %rs7, 1;
+; CHECK-NEXT:    cvt.u32.u16 %r10, %rs8;
+; CHECK-NEXT:    prmt.b32 %r11, %r10, %r8, 13120;
+; CHECK-NEXT:    prmt.b32 %r12, %r11, %r6, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r12;
 ; CHECK-NEXT:    ret;
   %r = add <4 x i8> %a, <i8 1, i8 2, i8 3, i8 4>
@@ -209,38 +209,38 @@ define <4 x i8> @test_sub(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_sub(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<13>;
-; CHECK-NEXT:    .reg .b32 %r<19>;
+; CHECK-NEXT:    .reg .b32 %r<18>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_sub_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_sub_param_0];
-; CHECK-NEXT:    bfe.u32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r2, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs1, %r3;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs2, %r4;
 ; CHECK-NEXT:    sub.s16 %rs3, %rs2, %rs1;
 ; CHECK-NEXT:    cvt.u32.u16 %r5, %rs3;
-; CHECK-NEXT:    bfe.u32 %r6, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r2, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs4, %r6;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs5, %r7;
 ; CHECK-NEXT:    sub.s16 %rs6, %rs5, %rs4;
 ; CHECK-NEXT:    cvt.u32.u16 %r8, %rs6;
-; CHECK-NEXT:    bfi.b32 %r9, %r8, %r5, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r2, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r9, %r8, %r5, 16435;
+; CHECK-NEXT:    bfe.u32 %r10, %r2, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs7, %r10;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs8, %r11;
 ; CHECK-NEXT:    sub.s16 %rs9, %rs8, %rs7;
 ; CHECK-NEXT:    cvt.u32.u16 %r12, %rs9;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r9, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r14, %r2, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs10, %r14;
-; CHECK-NEXT:    bfe.u32 %r15, %r1, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs11, %r15;
+; CHECK-NEXT:    bfe.u32 %r13, %r2, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs10, %r13;
+; CHECK-NEXT:    bfe.u32 %r14, %r1, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs11, %r14;
 ; CHECK-NEXT:    sub.s16 %rs12, %rs11, %rs10;
-; CHECK-NEXT:    cvt.u32.u16 %r16, %rs12;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r13, 24, 8;
+; CHECK-NEXT:    cvt.u32.u16 %r15, %rs12;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r12, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r9, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r17;
 ; CHECK-NEXT:    ret;
   %r = sub <4 x i8> %a, %b
@@ -251,38 +251,38 @@ define <4 x i8> @test_smax(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_smax(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<27>;
+; CHECK-NEXT:    .reg .b32 %r<26>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_smax_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_smax_param_0];
-; CHECK-NEXT:    bfe.s32 %r3, %r2, 24, 8;
-; CHECK-NEXT:    bfe.s32 %r4, %r1, 24, 8;
+; CHECK-NEXT:    bfe.s32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r4, %r1, 0, 8;
 ; CHECK-NEXT:    setp.gt.s32 %p1, %r4, %r3;
-; CHECK-NEXT:    bfe.s32 %r5, %r2, 16, 8;
-; CHECK-NEXT:    bfe.s32 %r6, %r1, 16, 8;
+; CHECK-NEXT:    bfe.s32 %r5, %r2, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r6, %r1, 8, 8;
 ; CHECK-NEXT:    setp.gt.s32 %p2, %r6, %r5;
-; CHECK-NEXT:    bfe.s32 %r7, %r2, 8, 8;
-; CHECK-NEXT:    bfe.s32 %r8, %r1, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r7, %r2, 16, 8;
+; CHECK-NEXT:    bfe.s32 %r8, %r1, 16, 8;
 ; CHECK-NEXT:    setp.gt.s32 %p3, %r8, %r7;
-; CHECK-NEXT:    bfe.s32 %r9, %r2, 0, 8;
-; CHECK-NEXT:    bfe.s32 %r10, %r1, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r9, %r2, 24, 8;
+; CHECK-NEXT:    bfe.s32 %r10, %r1, 24, 8;
 ; CHECK-NEXT:    setp.gt.s32 %p4, %r10, %r9;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r12, %r1, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r13, %r1, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r14, %r1, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r15, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r12, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r13, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r14, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r15, %r2, 24, 8;
 ; CHECK-NEXT:    selp.b32 %r16, %r14, %r15, %p4;
-; CHECK-NEXT:    bfe.u32 %r17, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r17, %r2, 16, 8;
 ; CHECK-NEXT:    selp.b32 %r18, %r13, %r17, %p3;
-; CHECK-NEXT:    bfi.b32 %r19, %r18, %r16, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r20, %r2, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r19, %r18, %r16, 16435;
+; CHECK-NEXT:    bfe.u32 %r20, %r2, 8, 8;
 ; CHECK-NEXT:    selp.b32 %r21, %r12, %r20, %p2;
-; CHECK-NEXT:    bfi.b32 %r22, %r21, %r19, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r23, %r2, 24, 8;
-; CHECK-NEXT:    selp.b32 %r24, %r11, %r23, %p1;
-; CHECK-NEXT:    bfi.b32 %r25, %r24, %r22, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r22, %r2, 0, 8;
+; CHECK-NEXT:    selp.b32 %r23, %r11, %r22, %p1;
+; CHECK-NEXT:    prmt.b32 %r24, %r23, %r21, 13120;
+; CHECK-NEXT:    prmt.b32 %r25, %r24, %r19, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r25;
 ; CHECK-NEXT:    ret;
   %cmp = icmp sgt <4 x i8> %a, %b
@@ -294,30 +294,30 @@ define <4 x i8> @test_umax(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_umax(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<19>;
+; CHECK-NEXT:    .reg .b32 %r<18>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_umax_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_umax_param_0];
-; CHECK-NEXT:    bfe.u32 %r3, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 0, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p1, %r4, %r3;
-; CHECK-NEXT:    bfe.u32 %r5, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r6, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r1, 8, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p2, %r6, %r5;
-; CHECK-NEXT:    bfe.u32 %r7, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r8, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r8, %r1, 16, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p3, %r8, %r7;
-; CHECK-NEXT:    bfe.u32 %r9, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r10, %r1, 24, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p4, %r10, %r9;
 ; CHECK-NEXT:    selp.b32 %r11, %r10, %r9, %p4;
 ; CHECK-NEXT:    selp.b32 %r12, %r8, %r7, %p3;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r11, 8, 8;
+; CHECK-NEXT:    prmt.b32 %r13, %r12, %r11, 16435;
 ; CHECK-NEXT:    selp.b32 %r14, %r6, %r5, %p2;
-; CHECK-NEXT:    bfi.b32 %r15, %r14, %r13, 16, 8;
-; CHECK-NEXT:    selp.b32 %r16, %r4, %r3, %p1;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r15, 24, 8;
+; CHECK-NEXT:    selp.b32 %r15, %r4, %r3, %p1;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r14, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r13, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r17;
 ; CHECK-NEXT:    ret;
   %cmp = icmp ugt <4 x i8> %a, %b
@@ -329,38 +329,38 @@ define <4 x i8> @test_smin(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_smin(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<27>;
+; CHECK-NEXT:    .reg .b32 %r<26>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_smin_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_smin_param_0];
-; CHECK-NEXT:    bfe.s32 %r3, %r2, 24, 8;
-; CHECK-NEXT:    bfe.s32 %r4, %r1, 24, 8;
+; CHECK-NEXT:    bfe.s32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r4, %r1, 0, 8;
 ; CHECK-NEXT:    setp.le.s32 %p1, %r4, %r3;
-; CHECK-NEXT:    bfe.s32 %r5, %r2, 16, 8;
-; CHECK-NEXT:    bfe.s32 %r6, %r1, 16, 8;
+; CHECK-NEXT:    bfe.s32 %r5, %r2, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r6, %r1, 8, 8;
 ; CHECK-NEXT:    setp.le.s32 %p2, %r6, %r5;
-; CHECK-NEXT:    bfe.s32 %r7, %r2, 8, 8;
-; CHECK-NEXT:    bfe.s32 %r8, %r1, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r7, %r2, 16, 8;
+; CHECK-NEXT:    bfe.s32 %r8, %r1, 16, 8;
 ; CHECK-NEXT:    setp.le.s32 %p3, %r8, %r7;
-; CHECK-NEXT:    bfe.s32 %r9, %r2, 0, 8;
-; CHECK-NEXT:    bfe.s32 %r10, %r1, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r9, %r2, 24, 8;
+; CHECK-NEXT:    bfe.s32 %r10, %r1, 24, 8;
 ; CHECK-NEXT:    setp.le.s32 %p4, %r10, %r9;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r12, %r1, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r13, %r1, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r14, %r1, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r15, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r12, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r13, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r14, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r15, %r2, 24, 8;
 ; CHECK-NEXT:    selp.b32 %r16, %r14, %r15, %p4;
-; CHECK-NEXT:    bfe.u32 %r17, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r17, %r2, 16, 8;
 ; CHECK-NEXT:    selp.b32 %r18, %r13, %r17, %p3;
-; CHECK-NEXT:    bfi.b32 %r19, %r18, %r16, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r20, %r2, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r19, %r18, %r16, 16435;
+; CHECK-NEXT:    bfe.u32 %r20, %r2, 8, 8;
 ; CHECK-NEXT:    selp.b32 %r21, %r12, %r20, %p2;
-; CHECK-NEXT:    bfi.b32 %r22, %r21, %r19, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r23, %r2, 24, 8;
-; CHECK-NEXT:    selp.b32 %r24, %r11, %r23, %p1;
-; CHECK-NEXT:    bfi.b32 %r25, %r24, %r22, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r22, %r2, 0, 8;
+; CHECK-NEXT:    selp.b32 %r23, %r11, %r22, %p1;
+; CHECK-NEXT:    prmt.b32 %r24, %r23, %r21, 13120;
+; CHECK-NEXT:    prmt.b32 %r25, %r24, %r19, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r25;
 ; CHECK-NEXT:    ret;
   %cmp = icmp sle <4 x i8> %a, %b
@@ -372,30 +372,30 @@ define <4 x i8> @test_umin(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_umin(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<19>;
+; CHECK-NEXT:    .reg .b32 %r<18>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_umin_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_umin_param_0];
-; CHECK-NEXT:    bfe.u32 %r3, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 0, 8;
 ; CHECK-NEXT:    setp.ls.u32 %p1, %r4, %r3;
-; CHECK-NEXT:    bfe.u32 %r5, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r6, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r1, 8, 8;
 ; CHECK-NEXT:    setp.ls.u32 %p2, %r6, %r5;
-; CHECK-NEXT:    bfe.u32 %r7, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r8, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r8, %r1, 16, 8;
 ; CHECK-NEXT:    setp.ls.u32 %p3, %r8, %r7;
-; CHECK-NEXT:    bfe.u32 %r9, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r10, %r1, 24, 8;
 ; CHECK-NEXT:    setp.ls.u32 %p4, %r10, %r9;
 ; CHECK-NEXT:    selp.b32 %r11, %r10, %r9, %p4;
 ; CHECK-NEXT:    selp.b32 %r12, %r8, %r7, %p3;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r11, 8, 8;
+; CHECK-NEXT:    prmt.b32 %r13, %r12, %r11, 16435;
 ; CHECK-NEXT:    selp.b32 %r14, %r6, %r5, %p2;
-; CHECK-NEXT:    bfi.b32 %r15, %r14, %r13, 16, 8;
-; CHECK-NEXT:    selp.b32 %r16, %r4, %r3, %p1;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r15, 24, 8;
+; CHECK-NEXT:    selp.b32 %r15, %r4, %r3, %p1;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r14, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r13, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r17;
 ; CHECK-NEXT:    ret;
   %cmp = icmp ule <4 x i8> %a, %b
@@ -407,35 +407,35 @@ define <4 x i8> @test_eq(<4 x i8> %a, <4 x i8> %b, <4 x i8> %c) #0 {
 ; CHECK-LABEL: test_eq(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<24>;
+; CHECK-NEXT:    .reg .b32 %r<23>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r3, [test_eq_param_2];
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_eq_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_eq_param_0];
-; CHECK-NEXT:    bfe.u32 %r4, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r5, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r1, 0, 8;
 ; CHECK-NEXT:    setp.eq.u32 %p1, %r5, %r4;
-; CHECK-NEXT:    bfe.u32 %r6, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
 ; CHECK-NEXT:    setp.eq.u32 %p2, %r7, %r6;
-; CHECK-NEXT:    bfe.u32 %r8, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r9, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r8, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r1, 16, 8;
 ; CHECK-NEXT:    setp.eq.u32 %p3, %r9, %r8;
-; CHECK-NEXT:    bfe.u32 %r10, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r10, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 24, 8;
 ; CHECK-NEXT:    setp.eq.u32 %p4, %r11, %r10;
-; CHECK-NEXT:    bfe.u32 %r12, %r3, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r12, %r3, 24, 8;
 ; CHECK-NEXT:    selp.b32 %r13, %r11, %r12, %p4;
-; CHECK-NEXT:    bfe.u32 %r14, %r3, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r14, %r3, 16, 8;
 ; CHECK-NEXT:    selp.b32 %r15, %r9, %r14, %p3;
-; CHECK-NEXT:    bfi.b32 %r16, %r15, %r13, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r17, %r3, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r13, 16435;
+; CHECK-NEXT:    bfe.u32 %r17, %r3, 8, 8;
 ; CHECK-NEXT:    selp.b32 %r18, %r7, %r17, %p2;
-; CHECK-NEXT:    bfi.b32 %r19, %r18, %r16, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r20, %r3, 24, 8;
-; CHECK-NEXT:    selp.b32 %r21, %r5, %r20, %p1;
-; CHECK-NEXT:    bfi.b32 %r22, %r21, %r19, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r19, %r3, 0, 8;
+; CHECK-NEXT:    selp.b32 %r20, %r5, %r19, %p1;
+; CHECK-NEXT:    prmt.b32 %r21, %r20, %r18, 13120;
+; CHECK-NEXT:    prmt.b32 %r22, %r21, %r16, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r22;
 ; CHECK-NEXT:    ret;
   %cmp = icmp eq <4 x i8> %a, %b
@@ -447,35 +447,35 @@ define <4 x i8> @test_ne(<4 x i8> %a, <4 x i8> %b, <4 x i8> %c) #0 {
 ; CHECK-LABEL: test_ne(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<24>;
+; CHECK-NEXT:    .reg .b32 %r<23>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r3, [test_ne_param_2];
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_ne_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_ne_param_0];
-; CHECK-NEXT:    bfe.u32 %r4, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r5, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r1, 0, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p1, %r5, %r4;
-; CHECK-NEXT:    bfe.u32 %r6, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p2, %r7, %r6;
-; CHECK-NEXT:    bfe.u32 %r8, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r9, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r8, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r1, 16, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p3, %r9, %r8;
-; CHECK-NEXT:    bfe.u32 %r10, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r10, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 24, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p4, %r11, %r10;
-; CHECK-NEXT:    bfe.u32 %r12, %r3, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r12, %r3, 24, 8;
 ; CHECK-NEXT:    selp.b32 %r13, %r11, %r12, %p4;
-; CHECK-NEXT:    bfe.u32 %r14, %r3, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r14, %r3, 16, 8;
 ; CHECK-NEXT:    selp.b32 %r15, %r9, %r14, %p3;
-; CHECK-NEXT:    bfi.b32 %r16, %r15, %r13, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r17, %r3, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r13, 16435;
+; CHECK-NEXT:    bfe.u32 %r17, %r3, 8, 8;
 ; CHECK-NEXT:    selp.b32 %r18, %r7, %r17, %p2;
-; CHECK-NEXT:    bfi.b32 %r19, %r18, %r16, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r20, %r3, 24, 8;
-; CHECK-NEXT:    selp.b32 %r21, %r5, %r20, %p1;
-; CHECK-NEXT:    bfi.b32 %r22, %r21, %r19, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r19, %r3, 0, 8;
+; CHECK-NEXT:    selp.b32 %r20, %r5, %r19, %p1;
+; CHECK-NEXT:    prmt.b32 %r21, %r20, %r18, 13120;
+; CHECK-NEXT:    prmt.b32 %r22, %r21, %r16, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r22;
 ; CHECK-NEXT:    ret;
   %cmp = icmp ne <4 x i8> %a, %b
@@ -487,38 +487,38 @@ define <4 x i8> @test_mul(<4 x i8> %a, <4 x i8> %b) #0 {
 ; CHECK-LABEL: test_mul(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<13>;
-; CHECK-NEXT:    .reg .b32 %r<19>;
+; CHECK-NEXT:    .reg .b32 %r<18>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_mul_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_mul_param_0];
-; CHECK-NEXT:    bfe.u32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r2, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs1, %r3;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs2, %r4;
 ; CHECK-NEXT:    mul.lo.s16 %rs3, %rs2, %rs1;
 ; CHECK-NEXT:    cvt.u32.u16 %r5, %rs3;
-; CHECK-NEXT:    bfe.u32 %r6, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r2, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs4, %r6;
-; CHECK-NEXT:    bfe.u32 %r7, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs5, %r7;
 ; CHECK-NEXT:    mul.lo.s16 %rs6, %rs5, %rs4;
 ; CHECK-NEXT:    cvt.u32.u16 %r8, %rs6;
-; CHECK-NEXT:    bfi.b32 %r9, %r8, %r5, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r2, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r9, %r8, %r5, 16435;
+; CHECK-NEXT:    bfe.u32 %r10, %r2, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs7, %r10;
-; CHECK-NEXT:    bfe.u32 %r11, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs8, %r11;
 ; CHECK-NEXT:    mul.lo.s16 %rs9, %rs8, %rs7;
 ; CHECK-NEXT:    cvt.u32.u16 %r12, %rs9;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r9, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r14, %r2, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs10, %r14;
-; CHECK-NEXT:    bfe.u32 %r15, %r1, 24, 8;
-; CHECK-NEXT:    cvt.u16.u32 %rs11, %r15;
+; CHECK-NEXT:    bfe.u32 %r13, %r2, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs10, %r13;
+; CHECK-NEXT:    bfe.u32 %r14, %r1, 0, 8;
+; CHECK-NEXT:    cvt.u16.u32 %rs11, %r14;
 ; CHECK-NEXT:    mul.lo.s16 %rs12, %rs11, %rs10;
-; CHECK-NEXT:    cvt.u32.u16 %r16, %rs12;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r13, 24, 8;
+; CHECK-NEXT:    cvt.u32.u16 %r15, %rs12;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r12, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r9, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r17;
 ; CHECK-NEXT:    ret;
   %r = mul <4 x i8> %a, %b
@@ -548,12 +548,13 @@ define <4 x i8> @test_or_computed(i8 %a) {
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u8 %rs1, [test_or_computed_param_0];
-; CHECK-NEXT:    cvt.u32.u16 %r1, %rs1;
-; CHECK-NEXT:    bfi.b32 %r2, 0, %r1, 8, 8;
-; CHECK-NEXT:    bfi.b32 %r3, 0, %r2, 16, 8;
-; CHECK-NEXT:    bfi.b32 %r4, 0, %r3, 24, 8;
-; CHECK-NEXT:    bfi.b32 %r6, 5, %r4, 8, 8;
-; CHECK-NEXT:    or.b32 %r8, %r6, %r4;
+; CHECK-NEXT:    mov.b32 %r1, 0;
+; CHECK-NEXT:    prmt.b32 %r2, %r1, 0, 16435;
+; CHECK-NEXT:    cvt.u32.u16 %r3, %rs1;
+; CHECK-NEXT:    prmt.b32 %r4, %r3, 0, 13120;
+; CHECK-NEXT:    prmt.b32 %r5, %r4, %r2, 21520;
+; CHECK-NEXT:    bfi.b32 %r6, 5, %r5, 8, 8;
+; CHECK-NEXT:    or.b32 %r8, %r6, %r5;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r8;
 ; CHECK-NEXT:    ret;
   %ins.0 = insertelement <4 x i8> zeroinitializer, i8 %a, i32 0
@@ -613,12 +614,13 @@ define <4 x i8> @test_xor_computed(i8 %a) {
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u8 %rs1, [test_xor_computed_param_0];
-; CHECK-NEXT:    cvt.u32.u16 %r1, %rs1;
-; CHECK-NEXT:    bfi.b32 %r2, 0, %r1, 8, 8;
-; CHECK-NEXT:    bfi.b32 %r3, 0, %r2, 16, 8;
-; CHECK-NEXT:    bfi.b32 %r4, 0, %r3, 24, 8;
-; CHECK-NEXT:    bfi.b32 %r6, 5, %r4, 8, 8;
-; CHECK-NEXT:    xor.b32 %r8, %r6, %r4;
+; CHECK-NEXT:    mov.b32 %r1, 0;
+; CHECK-NEXT:    prmt.b32 %r2, %r1, 0, 16435;
+; CHECK-NEXT:    cvt.u32.u16 %r3, %rs1;
+; CHECK-NEXT:    prmt.b32 %r4, %r3, 0, 13120;
+; CHECK-NEXT:    prmt.b32 %r5, %r4, %r2, 21520;
+; CHECK-NEXT:    bfi.b32 %r6, 5, %r5, 8, 8;
+; CHECK-NEXT:    xor.b32 %r8, %r6, %r5;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r8;
 ; CHECK-NEXT:    ret;
   %ins.0 = insertelement <4 x i8> zeroinitializer, i8 %a, i32 0
@@ -678,12 +680,13 @@ define <4 x i8> @test_and_computed(i8 %a) {
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u8 %rs1, [test_and_computed_param_0];
-; CHECK-NEXT:    cvt.u32.u16 %r1, %rs1;
-; CHECK-NEXT:    bfi.b32 %r2, 0, %r1, 8, 8;
-; CHECK-NEXT:    bfi.b32 %r3, 0, %r2, 16, 8;
-; CHECK-NEXT:    bfi.b32 %r4, 0, %r3, 24, 8;
-; CHECK-NEXT:    bfi.b32 %r6, 5, %r4, 8, 8;
-; CHECK-NEXT:    and.b32 %r8, %r6, %r4;
+; CHECK-NEXT:    mov.b32 %r1, 0;
+; CHECK-NEXT:    prmt.b32 %r2, %r1, 0, 16435;
+; CHECK-NEXT:    cvt.u32.u16 %r3, %rs1;
+; CHECK-NEXT:    prmt.b32 %r4, %r3, 0, 13120;
+; CHECK-NEXT:    prmt.b32 %r5, %r4, %r2, 21520;
+; CHECK-NEXT:    bfi.b32 %r6, 5, %r5, 8, 8;
+; CHECK-NEXT:    and.b32 %r8, %r6, %r5;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r8;
 ; CHECK-NEXT:    ret;
   %ins.0 = insertelement <4 x i8> zeroinitializer, i8 %a, i32 0
@@ -926,40 +929,40 @@ define <4 x i8> @test_select_cc(<4 x i8> %a, <4 x i8> %b, <4 x i8> %c, <4 x i8>
 ; CHECK-LABEL: test_select_cc(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<29>;
+; CHECK-NEXT:    .reg .b32 %r<28>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u32 %r4, [test_select_cc_param_3];
 ; CHECK-NEXT:    ld.param.u32 %r3, [test_select_cc_param_2];
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_select_cc_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_select_cc_param_0];
-; CHECK-NEXT:    bfe.u32 %r5, %r4, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r6, %r3, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r4, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r3, 0, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p1, %r6, %r5;
-; CHECK-NEXT:    bfe.u32 %r7, %r4, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r8, %r3, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r4, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r8, %r3, 8, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p2, %r8, %r7;
-; CHECK-NEXT:    bfe.u32 %r9, %r4, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r3, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r4, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r10, %r3, 16, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p3, %r10, %r9;
-; CHECK-NEXT:    bfe.u32 %r11, %r4, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r12, %r3, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r11, %r4, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r12, %r3, 24, 8;
 ; CHECK-NEXT:    setp.ne.u32 %p4, %r12, %r11;
-; CHECK-NEXT:    bfe.u32 %r13, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r14, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r13, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r14, %r1, 24, 8;
 ; CHECK-NEXT:    selp.b32 %r15, %r14, %r13, %p4;
-; CHECK-NEXT:    bfe.u32 %r16, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r17, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r16, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r17, %r1, 16, 8;
 ; CHECK-NEXT:    selp.b32 %r18, %r17, %r16, %p3;
-; CHECK-NEXT:    bfi.b32 %r19, %r18, %r15, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r20, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r21, %r1, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r19, %r18, %r15, 16435;
+; CHECK-NEXT:    bfe.u32 %r20, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r21, %r1, 8, 8;
 ; CHECK-NEXT:    selp.b32 %r22, %r21, %r20, %p2;
-; CHECK-NEXT:    bfi.b32 %r23, %r22, %r19, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r24, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r25, %r1, 24, 8;
-; CHECK-NEXT:    selp.b32 %r26, %r25, %r24, %p1;
-; CHECK-NEXT:    bfi.b32 %r27, %r26, %r23, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r23, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r24, %r1, 0, 8;
+; CHECK-NEXT:    selp.b32 %r25, %r24, %r23, %p1;
+; CHECK-NEXT:    prmt.b32 %r26, %r25, %r22, 13120;
+; CHECK-NEXT:    prmt.b32 %r27, %r26, %r19, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r27;
 ; CHECK-NEXT:    ret;
   %cc = icmp ne <4 x i8> %c, %d
@@ -1006,32 +1009,32 @@ define <4 x i8> @test_select_cc_i8_i32(<4 x i8> %a, <4 x i8> %b,
 ; CHECK-LABEL: test_select_cc_i8_i32(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
-; CHECK-NEXT:    .reg .b32 %r<27>;
+; CHECK-NEXT:    .reg .b32 %r<26>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.v4.u32 {%r7, %r8, %r9, %r10}, [test_select_cc_i8_i32_param_3];
 ; CHECK-NEXT:    ld.param.v4.u32 {%r3, %r4, %r5, %r6}, [test_select_cc_i8_i32_param_2];
 ; CHECK-NEXT:    ld.param.u32 %r2, [test_select_cc_i8_i32_param_1];
 ; CHECK-NEXT:    ld.param.u32 %r1, [test_select_cc_i8_i32_param_0];
-; CHECK-NEXT:    setp.ne.s32 %p1, %r6, %r10;
-; CHECK-NEXT:    setp.ne.s32 %p2, %r5, %r9;
-; CHECK-NEXT:    setp.ne.s32 %p3, %r4, %r8;
-; CHECK-NEXT:    setp.ne.s32 %p4, %r3, %r7;
-; CHECK-NEXT:    bfe.u32 %r11, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r12, %r1, 0, 8;
+; CHECK-NEXT:    setp.ne.s32 %p1, %r3, %r7;
+; CHECK-NEXT:    setp.ne.s32 %p2, %r4, %r8;
+; CHECK-NEXT:    setp.ne.s32 %p3, %r5, %r9;
+; CHECK-NEXT:    setp.ne.s32 %p4, %r6, %r10;
+; CHECK-NEXT:    bfe.u32 %r11, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r12, %r1, 24, 8;
 ; CHECK-NEXT:    selp.b32 %r13, %r12, %r11, %p4;
-; CHECK-NEXT:    bfe.u32 %r14, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r15, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r14, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r15, %r1, 16, 8;
 ; CHECK-NEXT:    selp.b32 %r16, %r15, %r14, %p3;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r13, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r18, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r19, %r1, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r13, 16435;
+; CHECK-NEXT:    bfe.u32 %r18, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r19, %r1, 8, 8;
 ; CHECK-NEXT:    selp.b32 %r20, %r19, %r18, %p2;
-; CHECK-NEXT:    bfi.b32 %r21, %r20, %r17, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r22, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r23, %r1, 24, 8;
-; CHECK-NEXT:    selp.b32 %r24, %r23, %r22, %p1;
-; CHECK-NEXT:    bfi.b32 %r25, %r24, %r21, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r21, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r22, %r1, 0, 8;
+; CHECK-NEXT:    selp.b32 %r23, %r22, %r21, %p1;
+; CHECK-NEXT:    prmt.b32 %r24, %r23, %r20, 13120;
+; CHECK-NEXT:    prmt.b32 %r25, %r24, %r17, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r25;
 ; CHECK-NEXT:    ret;
                                           <4 x i32> %c, <4 x i32> %d) #0 {
@@ -1044,13 +1047,13 @@ define <4 x i8> @test_select_cc_i8_i32(<4 x i8> %a, <4 x i8> %b,
 define <4 x i8> @test_trunc_2xi32(<4 x i32> %a) #0 {
 ; CHECK-LABEL: test_trunc_2xi32(
 ; CHECK:       {
-; CHECK-NEXT:    .reg .b32 %r<9>;
+; CHECK-NEXT:    .reg .b32 %r<8>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.v4.u32 {%r1, %r2, %r3, %r4}, [test_trunc_2xi32_param_0];
-; CHECK-NEXT:    bfi.b32 %r5, %r2, %r1, 8, 8;
-; CHECK-NEXT:    bfi.b32 %r6, %r3, %r5, 16, 8;
-; CHECK-NEXT:    bfi.b32 %r7, %r4, %r6, 24, 8;
+; CHECK-NEXT:    prmt.b32 %r5, %r3, %r4, 16435;
+; CHECK-NEXT:    prmt.b32 %r6, %r1, %r2, 13120;
+; CHECK-NEXT:    prmt.b32 %r7, %r6, %r5, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r7;
 ; CHECK-NEXT:    ret;
   %r = trunc <4 x i32> %a to <4 x i8>
@@ -1060,19 +1063,19 @@ define <4 x i8> @test_trunc_2xi32(<4 x i32> %a) #0 {
 define <4 x i8> @test_trunc_2xi64(<4 x i64> %a) #0 {
 ; CHECK-LABEL: test_trunc_2xi64(
 ; CHECK:       {
-; CHECK-NEXT:    .reg .b32 %r<9>;
+; CHECK-NEXT:    .reg .b32 %r<8>;
 ; CHECK-NEXT:    .reg .b64 %rd<5>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.v2.u64 {%rd3, %rd4}, [test_trunc_2xi64_param_0+16];
 ; CHECK-NEXT:    ld.param.v2.u64 {%rd1, %rd2}, [test_trunc_2xi64_param_0];
-; CHECK-NEXT:    cvt.u32.u64 %r1, %rd1;
-; CHECK-NEXT:    cvt.u32.u64 %r2, %rd2;
-; CHECK-NEXT:    bfi.b32 %r3, %r2, %r1, 8, 8;
-; CHECK-NEXT:    cvt.u32.u64 %r4, %rd3;
-; CHECK-NEXT:    bfi.b32 %r5, %r4, %r3, 16, 8;
-; CHECK-NEXT:    cvt.u32.u64 %r6, %rd4;
-; CHECK-NEXT:    bfi.b32 %r7, %r6, %r5, 24, 8;
+; CHECK-NEXT:    cvt.u32.u64 %r1, %rd4;
+; CHECK-NEXT:    cvt.u32.u64 %r2, %rd3;
+; CHECK-NEXT:    prmt.b32 %r3, %r2, %r1, 16435;
+; CHECK-NEXT:    cvt.u32.u64 %r4, %rd2;
+; CHECK-NEXT:    cvt.u32.u64 %r5, %rd1;
+; CHECK-NEXT:    prmt.b32 %r6, %r5, %r4, 13120;
+; CHECK-NEXT:    prmt.b32 %r7, %r6, %r3, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r7;
 ; CHECK-NEXT:    ret;
   %r = trunc <4 x i64> %a to <4 x i8>
@@ -1184,15 +1187,16 @@ define <2 x half> @test_bitcast_4xi8_to_2xhalf(i8 %a) #0 {
 ; CHECK-LABEL: test_bitcast_4xi8_to_2xhalf(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<2>;
-; CHECK-NEXT:    .reg .b32 %r<6>;
+; CHECK-NEXT:    .reg .b32 %r<7>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.u8 %rs1, [test_bitcast_4xi8_to_2xhalf_param_0];
-; CHECK-NEXT:    cvt.u32.u16 %r1, %rs1;
-; CHECK-NEXT:    bfi.b32 %r2, 5, %r1, 8, 8;
-; CHECK-NEXT:    bfi.b32 %r3, 6, %r2, 16, 8;
-; CHECK-NEXT:    bfi.b32 %r4, 7, %r3, 24, 8;
-; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r4;
+; CHECK-NEXT:    mov.b32 %r1, 6;
+; CHECK-NEXT:    prmt.b32 %r2, %r1, 7, 16435;
+; CHECK-NEXT:    cvt.u32.u16 %r3, %rs1;
+; CHECK-NEXT:    prmt.b32 %r4, %r3, 5, 13120;
+; CHECK-NEXT:    prmt.b32 %r5, %r4, %r2, 21520;
+; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r5;
 ; CHECK-NEXT:    ret;
   %ins.0 = insertelement <4 x i8> undef, i8 %a, i32 0
   %ins.1 = insertelement <4 x i8> %ins.0, i8 5, i32 1
@@ -1255,27 +1259,27 @@ define <4 x i8> @test_fptosi_4xhalf_to_4xi8(<4 x half> %a) #0 {
 ; CHECK-LABEL: test_fptosi_4xhalf_to_4xi8(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<13>;
-; CHECK-NEXT:    .reg .b32 %r<15>;
+; CHECK-NEXT:    .reg .b32 %r<14>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.v2.u32 {%r3, %r4}, [test_fptosi_4xhalf_to_4xi8_param_0];
-; CHECK-NEXT:    mov.b32 {%rs1, %rs2}, %r3;
+; CHECK-NEXT:    mov.b32 {%rs1, %rs2}, %r4;
 ; CHECK-NEXT:    cvt.rzi.s16.f16 %rs3, %rs2;
 ; CHECK-NEXT:    cvt.rzi.s16.f16 %rs4, %rs1;
 ; CHECK-NEXT:    mov.b32 %r5, {%rs4, %rs3};
 ; CHECK-NEXT:    mov.b32 {%rs5, %rs6}, %r5;
-; CHECK-NEXT:    cvt.u32.u16 %r6, %rs5;
-; CHECK-NEXT:    cvt.u32.u16 %r7, %rs6;
-; CHECK-NEXT:    bfi.b32 %r8, %r7, %r6, 8, 8;
-; CHECK-NEXT:    mov.b32 {%rs7, %rs8}, %r4;
+; CHECK-NEXT:    cvt.u32.u16 %r6, %rs6;
+; CHECK-NEXT:    cvt.u32.u16 %r7, %rs5;
+; CHECK-NEXT:    prmt.b32 %r8, %r7, %r6, 16435;
+; CHECK-NEXT:    mov.b32 {%rs7, %rs8}, %r3;
 ; CHECK-NEXT:    cvt.rzi.s16.f16 %rs9, %rs8;
 ; CHECK-NEXT:    cvt.rzi.s16.f16 %rs10, %rs7;
 ; CHECK-NEXT:    mov.b32 %r9, {%rs10, %rs9};
 ; CHECK-NEXT:    mov.b32 {%rs11, %rs12}, %r9;
-; CHECK-NEXT:    cvt.u32.u16 %r10, %rs11;
-; CHECK-NEXT:    bfi.b32 %r11, %r10, %r8, 16, 8;
-; CHECK-NEXT:    cvt.u32.u16 %r12, %rs12;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r11, 24, 8;
+; CHECK-NEXT:    cvt.u32.u16 %r10, %rs12;
+; CHECK-NEXT:    cvt.u32.u16 %r11, %rs11;
+; CHECK-NEXT:    prmt.b32 %r12, %r11, %r10, 13120;
+; CHECK-NEXT:    prmt.b32 %r13, %r12, %r8, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r13;
 ; CHECK-NEXT:    ret;
   %r = fptosi <4 x half> %a to <4 x i8>
@@ -1286,27 +1290,27 @@ define <4 x i8> @test_fptoui_4xhalf_to_4xi8(<4 x half> %a) #0 {
 ; CHECK-LABEL: test_fptoui_4xhalf_to_4xi8(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<13>;
-; CHECK-NEXT:    .reg .b32 %r<15>;
+; CHECK-NEXT:    .reg .b32 %r<14>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.v2.u32 {%r3, %r4}, [test_fptoui_4xhalf_to_4xi8_param_0];
-; CHECK-NEXT:    mov.b32 {%rs1, %rs2}, %r3;
+; CHECK-NEXT:    mov.b32 {%rs1, %rs2}, %r4;
 ; CHECK-NEXT:    cvt.rzi.u16.f16 %rs3, %rs2;
 ; CHECK-NEXT:    cvt.rzi.u16.f16 %rs4, %rs1;
 ; CHECK-NEXT:    mov.b32 %r5, {%rs4, %rs3};
 ; CHECK-NEXT:    mov.b32 {%rs5, %rs6}, %r5;
-; CHECK-NEXT:    cvt.u32.u16 %r6, %rs5;
-; CHECK-NEXT:    cvt.u32.u16 %r7, %rs6;
-; CHECK-NEXT:    bfi.b32 %r8, %r7, %r6, 8, 8;
-; CHECK-NEXT:    mov.b32 {%rs7, %rs8}, %r4;
+; CHECK-NEXT:    cvt.u32.u16 %r6, %rs6;
+; CHECK-NEXT:    cvt.u32.u16 %r7, %rs5;
+; CHECK-NEXT:    prmt.b32 %r8, %r7, %r6, 16435;
+; CHECK-NEXT:    mov.b32 {%rs7, %rs8}, %r3;
 ; CHECK-NEXT:    cvt.rzi.u16.f16 %rs9, %rs8;
 ; CHECK-NEXT:    cvt.rzi.u16.f16 %rs10, %rs7;
 ; CHECK-NEXT:    mov.b32 %r9, {%rs10, %rs9};
 ; CHECK-NEXT:    mov.b32 {%rs11, %rs12}, %r9;
-; CHECK-NEXT:    cvt.u32.u16 %r10, %rs11;
-; CHECK-NEXT:    bfi.b32 %r11, %r10, %r8, 16, 8;
-; CHECK-NEXT:    cvt.u32.u16 %r12, %rs12;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r11, 24, 8;
+; CHECK-NEXT:    cvt.u32.u16 %r10, %rs12;
+; CHECK-NEXT:    cvt.u32.u16 %r11, %rs11;
+; CHECK-NEXT:    prmt.b32 %r12, %r11, %r10, 13120;
+; CHECK-NEXT:    prmt.b32 %r13, %r12, %r8, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r13;
 ; CHECK-NEXT:    ret;
   %r = fptoui <4 x half> %a to <4 x i8>
@@ -1326,33 +1330,33 @@ define void @test_srem_v4i8(ptr %a, ptr %b, ptr %c) {
 ; CHECK-NEXT:    ld.param.u64 %rd1, [test_srem_v4i8_param_0];
 ; CHECK-NEXT:    ld.u32 %r1, [%rd1];
 ; CHECK-NEXT:    ld.u32 %r2, [%rd2];
-; CHECK-NEXT:    bfe.s32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r3, %r2, 24, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs1, %r3;
-; CHECK-NEXT:    bfe.s32 %r4, %r1, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r4, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs2, %r4;
 ; CHECK-NEXT:    rem.s16 %rs3, %rs2, %rs1;
 ; CHECK-NEXT:    cvt.u32.u16 %r5, %rs3;
-; CHECK-NEXT:    bfe.s32 %r6, %r2, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r6, %r2, 16, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs4, %r6;
-; CHECK-NEXT:    bfe.s32 %r7, %r1, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r7, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs5, %r7;
 ; CHECK-NEXT:    rem.s16 %rs6, %rs5, %rs4;
 ; CHECK-NEXT:    cvt.u32.u16 %r8, %rs6;
-; CHECK-NEXT:    bfi.b32 %r9, %r8, %r5, 8, 8;
-; CHECK-NEXT:    bfe.s32 %r10, %r2, 16, 8;
+; CHECK-NEXT:    prmt.b32 %r9, %r8, %r5, 16435;
+; CHECK-NEXT:    bfe.s32 %r10, %r2, 8, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs7, %r10;
-; CHECK-NEXT:    bfe.s32 %r11, %r1, 16, 8;
+; CHECK-NEXT:    bfe.s32 %r11, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs8, %r11;
 ; CHECK-NEXT:    rem.s16 %rs9, %rs8, %rs7;
 ; CHECK-NEXT:    cvt.u32.u16 %r12, %rs9;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r9, 16, 8;
-; CHECK-NEXT:    bfe.s32 %r14, %r2, 24, 8;
-; CHECK-NEXT:    cvt.s8.s32 %rs10, %r14;
-; CHECK-NEXT:    bfe.s32 %r15, %r1, 24, 8;
-; CHECK-NEXT:    cvt.s8.s32 %rs11, %r15;
+; CHECK-NEXT:    bfe.s32 %r13, %r2, 0, 8;
+; CHECK-NEXT:    cvt.s8.s32 %rs10, %r13;
+; CHECK-NEXT:    bfe.s32 %r14, %r1, 0, 8;
+; CHECK-NEXT:    cvt.s8.s32 %rs11, %r14;
 ; CHECK-NEXT:    rem.s16 %rs12, %rs11, %rs10;
-; CHECK-NEXT:    cvt.u32.u16 %r16, %rs12;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r13, 24, 8;
+; CHECK-NEXT:    cvt.u32.u16 %r15, %rs12;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r12, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r9, 21520;
 ; CHECK-NEXT:    st.u32 [%rd3], %r17;
 ; CHECK-NEXT:    ret;
 entry:
@@ -1373,7 +1377,7 @@ define void @test_srem_v3i8(ptr %a, ptr %b, ptr %c) {
 ; CHECK-LABEL: test_srem_v3i8(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b16 %rs<20>;
-; CHECK-NEXT:    .reg .b32 %r<16>;
+; CHECK-NEXT:    .reg .b32 %r<17>;
 ; CHECK-NEXT:    .reg .b64 %rd<4>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0: // %entry
@@ -1392,25 +1396,25 @@ define void @test_srem_v3i8(ptr %a, ptr %b, ptr %c) {
 ; CHECK-NEXT:    or.b16 %rs9, %rs8, %rs6;
 ; CHECK-NEXT:    cvt.u32.u16 %r3, %rs9;
 ; CHECK-NEXT:    ld.s8 %rs10, [%rd2+2];
-; CHECK-NEXT:    bfe.s32 %r5, %r3, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r5, %r3, 8, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs11, %r5;
-; CHECK-NEXT:    bfe.s32 %r6, %r1, 0, 8;
+; CHECK-NEXT:    bfe.s32 %r6, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs12, %r6;
 ; CHECK-NEXT:    rem.s16 %rs13, %rs12, %rs11;
 ; CHECK-NEXT:    cvt.u32.u16 %r7, %rs13;
-; CHECK-NEXT:    bfe.s32 %r8, %r3, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r8, %r3, 0, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs14, %r8;
-; CHECK-NEXT:    bfe.s32 %r9, %r1, 8, 8;
+; CHECK-NEXT:    bfe.s32 %r9, %r1, 0, 8;
 ; CHECK-NEXT:    cvt.s8.s32 %rs15, %r9;
 ; CHECK-NEXT:    rem.s16 %rs16, %rs15, %rs14;
 ; CHECK-NEXT:    cvt.u32.u16 %r10, %rs16;
-; CHECK-NEXT:    bfi.b32 %r11, %r10, %r7, 8, 8;
+; CHECK-NEXT:    prmt.b32 %r11, %r10, %r7, 13120;
 ; CHECK-NEXT:    // implicit-def: %r13
-; CHECK-NEXT:    bfi.b32 %r12, %r13, %r11, 16, 8;
-; CHECK-NEXT:    // implicit-def: %r15
-; CHECK-NEXT:    bfi.b32 %r14, %r15, %r12, 24, 8;
+; CHECK-NEXT:    // implicit-def: %r14
+; CHECK-NEXT:    prmt.b32 %r12, %r13, %r14, 16435;
+; CHECK-NEXT:    prmt.b32 %r15, %r11, %r12, 21520;
 ; CHECK-NEXT:    rem.s16 %rs17, %rs5, %rs10;
-; CHECK-NEXT:    cvt.u16.u32 %rs18, %r14;
+; CHECK-NEXT:    { .reg .b16 tmp; mov.b32 {%rs18, tmp}, %r15; }
 ; CHECK-NEXT:    st.u8 [%rd3], %rs18;
 ; CHECK-NEXT:    shr.u16 %rs19, %rs18, 8;
 ; CHECK-NEXT:    st.u8 [%rd3+1], %rs19;
@@ -1437,25 +1441,25 @@ define void @test_sext_v4i1_to_v4i8(ptr %a, ptr %b, ptr %c) {
 ; CHECK-NEXT:    ld.param.u64 %rd1, [test_sext_v4i1_to_v4i8_param_0];
 ; CHECK-NEXT:    ld.u32 %r1, [%rd1];
 ; CHECK-NEXT:    ld.u32 %r2, [%rd2];
-; CHECK-NEXT:    bfe.u32 %r3, %r2, 24, 8;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r2, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 0, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p1, %r4, %r3;
-; CHECK-NEXT:    bfe.u32 %r5, %r2, 16, 8;
-; CHECK-NEXT:    bfe.u32 %r6, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r2, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r6, %r1, 8, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p2, %r6, %r5;
-; CHECK-NEXT:    bfe.u32 %r7, %r2, 8, 8;
-; CHECK-NEXT:    bfe.u32 %r8, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r7, %r2, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r8, %r1, 16, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p3, %r8, %r7;
-; CHECK-NEXT:    bfe.u32 %r9, %r2, 0, 8;
-; CHECK-NEXT:    bfe.u32 %r10, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r9, %r2, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r10, %r1, 24, 8;
 ; CHECK-NEXT:    setp.hi.u32 %p4, %r10, %r9;
 ; CHECK-NEXT:    selp.s32 %r11, -1, 0, %p4;
 ; CHECK-NEXT:    selp.s32 %r12, -1, 0, %p3;
-; CHECK-NEXT:    bfi.b32 %r13, %r12, %r11, 8, 8;
+; CHECK-NEXT:    prmt.b32 %r13, %r12, %r11, 16435;
 ; CHECK-NEXT:    selp.s32 %r14, -1, 0, %p2;
-; CHECK-NEXT:    bfi.b32 %r15, %r14, %r13, 16, 8;
-; CHECK-NEXT:    selp.s32 %r16, -1, 0, %p1;
-; CHECK-NEXT:    bfi.b32 %r17, %r16, %r15, 24, 8;
+; CHECK-NEXT:    selp.s32 %r15, -1, 0, %p1;
+; CHECK-NEXT:    prmt.b32 %r16, %r15, %r14, 13120;
+; CHECK-NEXT:    prmt.b32 %r17, %r16, %r13, 21520;
 ; CHECK-NEXT:    st.u32 [%rd3], %r17;
 ; CHECK-NEXT:    ret;
 entry:
diff --git a/llvm/test/CodeGen/NVPTX/sext-setcc.ll b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
index f471d47077cf0d..8b7e5235443f05 100644
--- a/llvm/test/CodeGen/NVPTX/sext-setcc.ll
+++ b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
@@ -33,35 +33,35 @@ define <4 x i8> @sext_setcc_v4i1_to_v4i8(ptr %p) {
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .pred %p<5>;
 ; CHECK-NEXT:    .reg .b16 %rs<9>;
-; CHECK-NEXT:    .reg .b32 %r<14>;
+; CHECK-NEXT:    .reg .b32 %r<13>;
 ; CHECK-NEXT:    .reg .b64 %rd<2>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:    ld.param.u64 %rd1, [sext_setcc_v4i1_to_v4i8_param_0];
 ; CHECK-NEXT:    ld.u32 %r1, [%rd1];
-; CHECK-NEXT:    bfe.u32 %r2, %r1, 24, 8;
+; CHECK-NEXT:    bfe.u32 %r2, %r1, 0, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs1, %r2;
 ; CHECK-NEXT:    and.b16 %rs2, %rs1, 255;
 ; CHECK-NEXT:    setp.eq.s16 %p1, %rs2, 0;
-; CHECK-NEXT:    bfe.u32 %r3, %r1, 16, 8;
+; CHECK-NEXT:    bfe.u32 %r3, %r1, 8, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs3, %r3;
 ; CHECK-NEXT:    and.b16 %rs4, %rs3, 255;
 ; CHECK-NEXT:    setp.eq.s16 %p2, %rs4, 0;
-; CHECK-NEXT:    bfe.u32 %r4, %r1, 8, 8;
+; CHECK-NEXT:    bfe.u32 %r4, %r1, 16, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs5, %r4;
 ; CHECK-NEXT:    and.b16 %rs6, %rs5, 255;
 ; CHECK-NEXT:    setp.eq.s16 %p3, %rs6, 0;
-; CHECK-NEXT:    bfe.u32 %r5, %r1, 0, 8;
+; CHECK-NEXT:    bfe.u32 %r5, %r1, 24, 8;
 ; CHECK-NEXT:    cvt.u16.u32 %rs7, %r5;
 ; CHECK-NEXT:    and.b16 %rs8, %rs7, 255;
 ; CHECK-NEXT:    setp.eq.s16 %p4, %rs8, 0;
 ; CHECK-NEXT:    selp.s32 %r6, -1, 0, %p4;
 ; CHECK-NEXT:    selp.s32 %r7, -1, 0, %p3;
-; CHECK-NEXT:    bfi.b32 %r8, %r7, %r6, 8, 8;
+; CHECK-NEXT:    prmt.b32 %r8, %r7, %r6, 16435;
 ; CHECK-NEXT:    selp.s32 %r9, -1, 0, %p2;
-; CHECK-NEXT:    bfi.b32 %r10, %r9, %r8, 16, 8;
-; CHECK-NEXT:    selp.s32 %r11, -1, 0, %p1;
-; CHECK-NEXT:    bfi.b32 %r12, %r11, %r10, 24, 8;
+; CHECK-NEXT:    selp.s32 %r10, -1, 0, %p1;
+; CHECK-NEXT:    prmt.b32 %r11, %r10, %r9, 13120;
+; CHECK-NEXT:    prmt.b32 %r12, %r11, %r8, 21520;
 ; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r12;
 ; CHECK-NEXT:    ret;
 entry:

>From 2b6b0adb500c520216b63a7a493662ce7ed22130 Mon Sep 17 00:00:00 2001
From: Peter Klausler <pklausler at nvidia.com>
Date: Thu, 10 Oct 2024 10:24:59 -0700
Subject: [PATCH 15/20] [flang][runtime] Fix runtime crash after bad
 recoverable OPEN (#111454)

When an OPEN statement with a unit number fails in a recoverable manner,
the runtime needs to delete the ExternalFileUnit instance that was
created in the unit map. And we do this too soon -- that instance still
holds some of the I/O statement state that will be used by a later call
into the runtime for EndIoStatement.

Move the code that deletes the unit after a failed but recoverable OPEN
into ExternalIoStatementBase::EndIoStatement, and don't do things
afterwards that would need the I/O statement state that has been
destroyed.

Fixes https://github.com/llvm/llvm-project/issues/111404.
---
 flang/runtime/io-stmt.cpp | 14 +++++++++-----
 flang/runtime/io-stmt.h   |  2 ++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/flang/runtime/io-stmt.cpp b/flang/runtime/io-stmt.cpp
index cd7a196335d31e..f24eb929ce748a 100644
--- a/flang/runtime/io-stmt.cpp
+++ b/flang/runtime/io-stmt.cpp
@@ -243,7 +243,15 @@ int ExternalIoStatementBase::EndIoStatement() {
   CompleteOperation();
   auto result{IoStatementBase::EndIoStatement()};
 #if !defined(RT_USE_PSEUDO_FILE_UNIT)
+  auto unitNumber{unit_.unitNumber()};
   unit_.EndIoStatement(); // annihilates *this in unit_.u_
+  if (destroy_) {
+    if (ExternalFileUnit *
+        toClose{ExternalFileUnit::LookUpForClose(unitNumber)}) {
+      toClose->Close(CloseStatus::Delete, *this);
+      toClose->DestroyClosed();
+    }
+  }
 #else
   // Fetch the unit pointer before *this disappears.
   ExternalFileUnit *unitPtr{&unit_};
@@ -329,11 +337,7 @@ void OpenStatementState::CompleteOperation() {
   }
   if (!wasExtant_ && InError()) {
     // Release the new unit on failure
-    if (ExternalFileUnit *
-        toClose{unit().LookUpForClose(unit().unitNumber())}) {
-      toClose->Close(CloseStatus::Delete, *this);
-      toClose->DestroyClosed();
-    }
+    set_destroy();
   }
   IoStatementBase::CompleteOperation();
 }
diff --git a/flang/runtime/io-stmt.h b/flang/runtime/io-stmt.h
index 2e0ca46078ecdc..1f1419b249e5e5 100644
--- a/flang/runtime/io-stmt.h
+++ b/flang/runtime/io-stmt.h
@@ -455,6 +455,7 @@ class ExternalIoStatementBase : public IoStatementBase {
   RT_API_ATTRS MutableModes &mutableModes();
   RT_API_ATTRS ConnectionState &GetConnectionState();
   RT_API_ATTRS int asynchronousID() const { return asynchronousID_; }
+  RT_API_ATTRS void set_destroy(bool yes = true) { destroy_ = yes; }
   RT_API_ATTRS int EndIoStatement();
   RT_API_ATTRS ExternalFileUnit *GetExternalFileUnit() const { return &unit_; }
   RT_API_ATTRS void SetAsynchronous();
@@ -463,6 +464,7 @@ class ExternalIoStatementBase : public IoStatementBase {
 private:
   ExternalFileUnit &unit_;
   int asynchronousID_{-1};
+  bool destroy_{false};
 };
 
 template <Direction DIR>

>From 6faaa654eedb908b1c26b87b8d19734aee212566 Mon Sep 17 00:00:00 2001
From: Peter Klausler <pklausler at nvidia.com>
Date: Thu, 10 Oct 2024 10:25:19 -0700
Subject: [PATCH 16/20] [flang] Fix references to destroyed objects (#111582)

ProgramTree instances are created as the value of a local variable in
the Pre(const parser::ProgramUnit &) member function in name resolution.
But references to these ProgramTree instances can persist in
SubprogramNameDetails symbol table entries that might survive that
function call's lifetime, and lead to trouble later when (e.g.)
expression semantics needs to deal with a possible forward reference in
a function reference in an expression being processed later in
expression checking.

So put those ProgramTree instances into a longer-lived linked list
within the SemanticsContext.

Might fix some weird crashes reported on big-endian targets (AIX &
Solaris).
---
 flang/{lib => include/flang}/Semantics/program-tree.h | 4 ++--
 flang/include/flang/Semantics/semantics.h             | 7 ++++++-
 flang/lib/Semantics/program-tree.cpp                  | 8 ++++----
 flang/lib/Semantics/resolve-names.cpp                 | 5 +++--
 flang/lib/Semantics/semantics.cpp                     | 4 ++++
 5 files changed, 19 insertions(+), 9 deletions(-)
 rename flang/{lib => include/flang}/Semantics/program-tree.h (97%)

diff --git a/flang/lib/Semantics/program-tree.h b/flang/include/flang/Semantics/program-tree.h
similarity index 97%
rename from flang/lib/Semantics/program-tree.h
rename to flang/include/flang/Semantics/program-tree.h
index ab00261a964a13..1c89e6c175b964 100644
--- a/flang/lib/Semantics/program-tree.h
+++ b/flang/include/flang/Semantics/program-tree.h
@@ -9,8 +9,8 @@
 #ifndef FORTRAN_SEMANTICS_PROGRAM_TREE_H_
 #define FORTRAN_SEMANTICS_PROGRAM_TREE_H_
 
+#include "symbol.h"
 #include "flang/Parser/parse-tree.h"
-#include "flang/Semantics/symbol.h"
 #include <list>
 #include <variant>
 
@@ -35,7 +35,7 @@ class ProgramTree {
       std::list<common::Reference<const parser::GenericSpec>>;
 
   // Build the ProgramTree rooted at one of these program units.
-  static ProgramTree Build(const parser::ProgramUnit &, SemanticsContext &);
+  static ProgramTree &Build(const parser::ProgramUnit &, SemanticsContext &);
   static std::optional<ProgramTree> Build(
       const parser::MainProgram &, SemanticsContext &);
   static std::optional<ProgramTree> Build(
diff --git a/flang/include/flang/Semantics/semantics.h b/flang/include/flang/Semantics/semantics.h
index 606afbe288c38d..c981d86fbd94cb 100644
--- a/flang/include/flang/Semantics/semantics.h
+++ b/flang/include/flang/Semantics/semantics.h
@@ -9,6 +9,8 @@
 #ifndef FORTRAN_SEMANTICS_SEMANTICS_H_
 #define FORTRAN_SEMANTICS_SEMANTICS_H_
 
+#include "module-dependences.h"
+#include "program-tree.h"
 #include "scope.h"
 #include "symbol.h"
 #include "flang/Common/Fortran-features.h"
@@ -17,7 +19,6 @@
 #include "flang/Evaluate/intrinsics.h"
 #include "flang/Evaluate/target.h"
 #include "flang/Parser/message.h"
-#include "flang/Semantics/module-dependences.h"
 #include <iosfwd>
 #include <set>
 #include <string>
@@ -280,6 +281,9 @@ class SemanticsContext {
 
   void DumpSymbols(llvm::raw_ostream &);
 
+  // Top-level ProgramTrees are owned by the SemanticsContext for persistence.
+  ProgramTree &SaveProgramTree(ProgramTree &&);
+
 private:
   struct ScopeIndexComparator {
     bool operator()(parser::CharBlock, parser::CharBlock) const;
@@ -331,6 +335,7 @@ class SemanticsContext {
   ModuleDependences moduleDependences_;
   std::map<const Symbol *, SourceName> moduleFileOutputRenamings_;
   UnorderedSymbolSet isDefined_;
+  std::list<ProgramTree> programTrees_;
 };
 
 class Semantics {
diff --git a/flang/lib/Semantics/program-tree.cpp b/flang/lib/Semantics/program-tree.cpp
index 250f5801b39e1a..86085e78803a23 100644
--- a/flang/lib/Semantics/program-tree.cpp
+++ b/flang/lib/Semantics/program-tree.cpp
@@ -6,7 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "program-tree.h"
+#include "flang/Semantics/program-tree.h"
 #include "flang/Common/idioms.h"
 #include "flang/Parser/char-block.h"
 #include "flang/Semantics/scope.h"
@@ -130,13 +130,13 @@ static ProgramTree BuildModuleTree(
   return node;
 }
 
-ProgramTree ProgramTree::Build(
+ProgramTree &ProgramTree::Build(
     const parser::ProgramUnit &x, SemanticsContext &context) {
   return common::visit(
-      [&](const auto &y) {
+      [&](const auto &y) -> ProgramTree & {
         auto node{Build(y.value(), context)};
         CHECK(node.has_value());
-        return std::move(*node);
+        return context.SaveProgramTree(std::move(*node));
       },
       x.u);
 }
diff --git a/flang/lib/Semantics/resolve-names.cpp b/flang/lib/Semantics/resolve-names.cpp
index e5e03f644f1b00..f1ce0b415ebe9c 100644
--- a/flang/lib/Semantics/resolve-names.cpp
+++ b/flang/lib/Semantics/resolve-names.cpp
@@ -10,7 +10,6 @@
 #include "definable.h"
 #include "mod-file.h"
 #include "pointer-assignment.h"
-#include "program-tree.h"
 #include "resolve-directives.h"
 #include "resolve-names-utils.h"
 #include "rewrite-parse-tree.h"
@@ -32,6 +31,7 @@
 #include "flang/Parser/tools.h"
 #include "flang/Semantics/attr.h"
 #include "flang/Semantics/expression.h"
+#include "flang/Semantics/program-tree.h"
 #include "flang/Semantics/scope.h"
 #include "flang/Semantics/semantics.h"
 #include "flang/Semantics/symbol.h"
@@ -2490,6 +2490,7 @@ Symbol &ScopeHandler::CopySymbol(const SourceName &name, const Symbol &symbol) {
 }
 
 // Look for name only in scope, not in enclosing scopes.
+
 Symbol *ScopeHandler::FindInScope(
     const Scope &scope, const parser::Name &name) {
   return Resolve(name, FindInScope(scope, name.source));
@@ -9120,7 +9121,7 @@ bool ResolveNamesVisitor::Pre(const parser::ProgramUnit &x) {
     ResolveAccParts(context(), x, &topScope_);
     return false;
   }
-  auto root{ProgramTree::Build(x, context())};
+  ProgramTree &root{ProgramTree::Build(x, context())};
   SetScope(topScope_);
   ResolveSpecificationParts(root);
   FinishSpecificationParts(root);
diff --git a/flang/lib/Semantics/semantics.cpp b/flang/lib/Semantics/semantics.cpp
index 637088ff0171c0..58dc1f218b56f4 100644
--- a/flang/lib/Semantics/semantics.cpp
+++ b/flang/lib/Semantics/semantics.cpp
@@ -663,6 +663,10 @@ void SemanticsContext::DumpSymbols(llvm::raw_ostream &os) {
   DoDumpSymbols(os, globalScope());
 }
 
+ProgramTree &SemanticsContext::SaveProgramTree(ProgramTree &&tree) {
+  return programTrees_.emplace_back(std::move(tree));
+}
+
 void Semantics::DumpSymbols(llvm::raw_ostream &os) { context_.DumpSymbols(os); }
 
 void Semantics::DumpSymbolsSources(llvm::raw_ostream &os) const {

>From 9cc0e4890ac73a31d6ae264e182105ffa32067fa Mon Sep 17 00:00:00 2001
From: Peter Klausler <pklausler at nvidia.com>
Date: Thu, 10 Oct 2024 10:25:42 -0700
Subject: [PATCH 17/20] [flang] Minor cleanup (move function into /tools.cpp)
 (#111587)

The semantics utility GetAllNames has declarations in two header files
and a definition that really should be in the common utilities source
file. Remove the redudant declaration from resolve-names-utils.h and
move code from resolve-names-utils.cpp into Semantics/tools.cpp.
---
 flang/lib/Semantics/resolve-names-utils.cpp | 33 ---------------------
 flang/lib/Semantics/resolve-names-utils.h   |  5 ----
 flang/lib/Semantics/tools.cpp               | 31 +++++++++++++++++++
 3 files changed, 31 insertions(+), 38 deletions(-)

diff --git a/flang/lib/Semantics/resolve-names-utils.cpp b/flang/lib/Semantics/resolve-names-utils.cpp
index b8ce8d14a33faa..a838d49c06104d 100644
--- a/flang/lib/Semantics/resolve-names-utils.cpp
+++ b/flang/lib/Semantics/resolve-names-utils.cpp
@@ -31,8 +31,6 @@ using common::NumericOperator;
 using common::RelationalOperator;
 using IntrinsicOperator = parser::DefinedOperator::IntrinsicOperator;
 
-static constexpr const char *operatorPrefix{"operator("};
-
 static GenericKind MapIntrinsicOperator(IntrinsicOperator);
 
 Symbol *Resolve(const parser::Name &name, Symbol *symbol) {
@@ -69,37 +67,6 @@ bool IsIntrinsicOperator(
   return false;
 }
 
-template <typename E>
-std::forward_list<std::string> GetOperatorNames(
-    const SemanticsContext &context, E opr) {
-  std::forward_list<std::string> result;
-  for (const char *name : context.languageFeatures().GetNames(opr)) {
-    result.emplace_front(std::string{operatorPrefix} + name + ')');
-  }
-  return result;
-}
-
-std::forward_list<std::string> GetAllNames(
-    const SemanticsContext &context, const SourceName &name) {
-  std::string str{name.ToString()};
-  if (!name.empty() && name.end()[-1] == ')' &&
-      name.ToString().rfind(std::string{operatorPrefix}, 0) == 0) {
-    for (int i{0}; i != common::LogicalOperator_enumSize; ++i) {
-      auto names{GetOperatorNames(context, LogicalOperator{i})};
-      if (llvm::is_contained(names, str)) {
-        return names;
-      }
-    }
-    for (int i{0}; i != common::RelationalOperator_enumSize; ++i) {
-      auto names{GetOperatorNames(context, RelationalOperator{i})};
-      if (llvm::is_contained(names, str)) {
-        return names;
-      }
-    }
-  }
-  return {str};
-}
-
 bool IsLogicalConstant(
     const SemanticsContext &context, const SourceName &name) {
   std::string str{name.ToString()};
diff --git a/flang/lib/Semantics/resolve-names-utils.h b/flang/lib/Semantics/resolve-names-utils.h
index 5b537d80e5f880..64784722ff4f84 100644
--- a/flang/lib/Semantics/resolve-names-utils.h
+++ b/flang/lib/Semantics/resolve-names-utils.h
@@ -51,11 +51,6 @@ parser::MessageFixedText WithSeverity(
 bool IsIntrinsicOperator(const SemanticsContext &, const SourceName &);
 bool IsLogicalConstant(const SemanticsContext &, const SourceName &);
 
-// Some intrinsic operators have more than one name (e.g. `operator(.eq.)` and
-// `operator(==)`). GetAllNames() returns them all, including symbolName.
-std::forward_list<std::string> GetAllNames(
-    const SemanticsContext &, const SourceName &);
-
 template <typename T>
 MaybeIntExpr EvaluateIntExpr(SemanticsContext &context, const T &expr) {
   if (MaybeExpr maybeExpr{
diff --git a/flang/lib/Semantics/tools.cpp b/flang/lib/Semantics/tools.cpp
index 4d2a0a607abe89..379d5d0eb3eef0 100644
--- a/flang/lib/Semantics/tools.cpp
+++ b/flang/lib/Semantics/tools.cpp
@@ -1654,6 +1654,37 @@ bool HasDefinedIo(common::DefinedIo which, const DerivedTypeSpec &derived,
   return parentType && HasDefinedIo(which, *parentType, scope);
 }
 
+template <typename E>
+std::forward_list<std::string> GetOperatorNames(
+    const SemanticsContext &context, E opr) {
+  std::forward_list<std::string> result;
+  for (const char *name : context.languageFeatures().GetNames(opr)) {
+    result.emplace_front("operator("s + name + ')');
+  }
+  return result;
+}
+
+std::forward_list<std::string> GetAllNames(
+    const SemanticsContext &context, const SourceName &name) {
+  std::string str{name.ToString()};
+  if (!name.empty() && name.end()[-1] == ')' &&
+      name.ToString().rfind("operator(", 0) == 0) {
+    for (int i{0}; i != common::LogicalOperator_enumSize; ++i) {
+      auto names{GetOperatorNames(context, common::LogicalOperator{i})};
+      if (llvm::is_contained(names, str)) {
+        return names;
+      }
+    }
+    for (int i{0}; i != common::RelationalOperator_enumSize; ++i) {
+      auto names{GetOperatorNames(context, common::RelationalOperator{i})};
+      if (llvm::is_contained(names, str)) {
+        return names;
+      }
+    }
+  }
+  return {str};
+}
+
 void WarnOnDeferredLengthCharacterScalar(SemanticsContext &context,
     const SomeExpr *expr, parser::CharBlock at, const char *what) {
   if (context.languageFeatures().ShouldWarn(

>From af9a8f4f42ef402c7ae8def50c714074f31a8e5c Mon Sep 17 00:00:00 2001
From: Adrian Vogelsgesang <avogelsgesang at salesforce.com>
Date: Thu, 10 Oct 2024 19:27:27 +0200
Subject: [PATCH 18/20] [lldb][libc++] Hide all libc++ implementation details
 from stacktraces (#108870)

This commit changes the libc++ frame recognizer to hide implementation
details of libc++ more aggressively. The applied heuristic is rather
straightforward: We consider every function name starting with `__` as
an implementation detail.

This works pretty neatly for `std::invoke`, `std::function`,
`std::sort`, `std::map::emplace` and many others. Also, this should
align quite nicely with libc++'s general coding convention of using the
`__` for their implementation details, thereby keeping the future
maintenance effort low.

However, this heuristic by itself does not work in 100% of the cases:
E.g., `std::ranges::sort` is not a function, but an object with an
overloaded `operator()`, which means that there is no actual call
`std::ranges::sort` in the call stack. Instead, there is a
`std::ranges::__sort::operator()` call. To make sure that we don't hide
this stack frame, we never hide the frame which represents the entry
point from user code into libc++ code
---
 libcxx/docs/UserDocumentation.rst             | 29 +++++++
 .../CPlusPlus/CPPLanguageRuntime.cpp          | 49 ++++++-----
 .../Makefile                                  |  2 +-
 .../TestLibcxxInternalsRecognizer.py          | 67 +++++++++++++++
 .../cpp/libcxx-internals-recognizer/main.cpp  | 86 +++++++++++++++++++
 .../TestStdInvokeRecognizer.py                | 44 ----------
 .../lang/cpp/std-invoke-recognizer/main.cpp   | 30 -------
 7 files changed, 211 insertions(+), 96 deletions(-)
 rename lldb/test/API/lang/cpp/{std-invoke-recognizer => libcxx-internals-recognizer}/Makefile (68%)
 create mode 100644 lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
 create mode 100644 lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
 delete mode 100644 lldb/test/API/lang/cpp/std-invoke-recognizer/TestStdInvokeRecognizer.py
 delete mode 100644 lldb/test/API/lang/cpp/std-invoke-recognizer/main.cpp

diff --git a/libcxx/docs/UserDocumentation.rst b/libcxx/docs/UserDocumentation.rst
index f5e55994aa7572..1db437ce58b95e 100644
--- a/libcxx/docs/UserDocumentation.rst
+++ b/libcxx/docs/UserDocumentation.rst
@@ -355,6 +355,35 @@ Third-party Integrations
 
 Libc++ provides integration with a few third-party tools.
 
+Debugging libc++ internals in LLDB
+----------------------------------
+
+LLDB hides the implementation details of libc++ by default.
+
+E.g., when setting a breakpoint in a comparator passed to ``std::sort``, the
+backtrace will read as
+
+.. code-block::
+
+  (lldb) thread backtrace
+  * thread #1, name = 'a.out', stop reason = breakpoint 3.1
+    * frame #0: 0x000055555555520e a.out`my_comparator(a=1, b=8) at test-std-sort.cpp:6:3
+      frame #7: 0x0000555555555615 a.out`void std::__1::sort[abi:ne200000]<std::__1::__wrap_iter<int*>, bool (*)(int, int)>(__first=(item = 8), __last=(item = 0), __comp=(a.out`my_less(int, int) at test-std-sort.cpp:5)) at sort.h:1003:3
+      frame #8: 0x000055555555531a a.out`main at test-std-sort.cpp:24:3
+
+Note how the caller of ``my_comparator`` is shown as ``std::sort``. Looking at
+the frame numbers, we can see that frames #1 until #6 were hidden. Those frames
+represent internal implementation details such as ``__sort4`` and similar
+utility functions.
+
+To also show those implementation details, use ``thread backtrace -u``.
+Alternatively, to disable those compact backtraces, use ``frame recognizer list``
+and ``frame recognizer disable`` on the "libc++ frame recognizer".
+
+Futhermore, stepping into libc++ functions is disabled by default. This is controlled via the
+setting ``target.process.thread.step-avoid-regexp`` which defaults to ``^std::`` and can be
+disabled using ``settings set target.process.thread.step-avoid-regexp ""``.
+
 GDB Pretty printers for libc++
 ------------------------------
 
diff --git a/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp b/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
index faa05e8f834ea1..e7ca3f655f237c 100644
--- a/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
+++ b/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
@@ -45,7 +45,7 @@ char CPPLanguageRuntime::ID = 0;
 /// A frame recognizer that is installed to hide libc++ implementation
 /// details from the backtrace.
 class LibCXXFrameRecognizer : public StackFrameRecognizer {
-  std::array<RegularExpression, 4> m_hidden_regex;
+  std::array<RegularExpression, 2> m_hidden_regex;
   RecognizedStackFrameSP m_hidden_frame;
 
   struct LibCXXHiddenFrame : public RecognizedStackFrame {
@@ -55,28 +55,17 @@ class LibCXXFrameRecognizer : public StackFrameRecognizer {
 public:
   LibCXXFrameRecognizer()
       : m_hidden_regex{
-            // internal implementation details of std::function
+            // internal implementation details in the `std::` namespace
             //    std::__1::__function::__alloc_func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()[abi:ne200000]
             //    std::__1::__function::__func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()
             //    std::__1::__function::__value_func<void ()>::operator()[abi:ne200000]() const
-            RegularExpression{""
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__function::.*::operator\(\))"},
-            // internal implementation details of std::function in ABI v2
             //    std::__2::__function::__policy_invoker<void (int, int)>::__call_impl[abi:ne200000]<std::__2::__function::__default_alloc_func<int (*)(int, int), int (int, int)>>
-            RegularExpression{""
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__function::.*::__call_impl)"},
-            // internal implementation details of std::invoke
-            //   std::__1::__invoke[abi:ne200000]<void (*&)()>
-            RegularExpression{
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__invoke)"},
-            // internal implementation details of std::invoke
-            //   std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200000]<void (*&)()>
-            RegularExpression{
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__invoke_void_return_wrapper<.*>::__call)"}
+            //    std::__1::__invoke[abi:ne200000]<void (*&)()>
+            //    std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200000]<void (*&)()>
+            RegularExpression{R"(^std::__[^:]*::__)"},
+            // internal implementation details in the `std::ranges` namespace
+            //    std::__1::ranges::__sort::__sort_fn_impl[abi:ne200000]<std::__1::__wrap_iter<int*>, std::__1::__wrap_iter<int*>, bool (*)(int, int), std::__1::identity>
+            RegularExpression{R"(^std::__[^:]*::ranges::__)"},
         },
         m_hidden_frame(new LibCXXHiddenFrame()) {}
 
@@ -90,9 +79,27 @@ class LibCXXFrameRecognizer : public StackFrameRecognizer {
     if (!sc.function)
       return {};
 
-    for (RegularExpression &r : m_hidden_regex)
-      if (r.Execute(sc.function->GetNameNoArguments()))
+    // Check if we have a regex match
+    for (RegularExpression &r : m_hidden_regex) {
+      if (!r.Execute(sc.function->GetNameNoArguments()))
+        continue;
+
+      // Only hide this frame if the immediate caller is also within libc++.
+      lldb::ThreadSP thread_sp = frame_sp->GetThread();
+      if (!thread_sp)
+        return {};
+      lldb::StackFrameSP parent_frame_sp =
+          thread_sp->GetStackFrameAtIndex(frame_sp->GetFrameIndex() + 1);
+      if (!parent_frame_sp)
+        return {};
+      const auto &parent_sc =
+          parent_frame_sp->GetSymbolContext(lldb::eSymbolContextFunction);
+      if (!parent_sc.function)
+        return {};
+      if (parent_sc.function->GetNameNoArguments().GetStringRef().starts_with(
+              "std::"))
         return m_hidden_frame;
+    }
 
     return {};
   }
diff --git a/lldb/test/API/lang/cpp/std-invoke-recognizer/Makefile b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
similarity index 68%
rename from lldb/test/API/lang/cpp/std-invoke-recognizer/Makefile
rename to lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
index 69014eb9c0f2eb..bb571299664934 100644
--- a/lldb/test/API/lang/cpp/std-invoke-recognizer/Makefile
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
@@ -1,5 +1,5 @@
 CXX_SOURCES := main.cpp
 USE_LIBCPP := 1
-CXXFLAGS_EXTRAS := -std=c++17
+CXXFLAGS_EXTRAS := -std=c++20
 
 include Makefile.rules
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
new file mode 100644
index 00000000000000..ad48208f21e502
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
@@ -0,0 +1,67 @@
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+
+class LibCxxInternalsRecognizerTestCase(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    @add_test_categories(["libc++"])
+    def test_frame_recognizer(self):
+        """Test that implementation details of libc++ are hidden"""
+        self.build()
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "break here", lldb.SBFileSpec("main.cpp")
+        )
+
+        expected_parents = {
+            "sort_less(int, int)": ["::sort", "test_algorithms"],
+            # `std::ranges::sort` is implemented as an object of types `__sort`.
+            # We never hide the frame of the entry-point into the standard library, even
+            # if the name starts with `__` which usually indicates an internal function.
+            "ranges_sort_less(int, int)": [
+                "ranges::__sort::operator()",
+                "test_algorithms",
+            ],
+            # `ranges::views::transform` internally uses `std::invoke`, and that
+            # call also shows up in the stack trace
+            "view_transform(int)": [
+                "::invoke",
+                "ranges::transform_view",
+                "test_algorithms",
+            ],
+            # Various types of `invoke` calls
+            "consume_number(int)": ["::invoke", "test_invoke"],
+            "invoke_add(int, int)": ["::invoke", "test_invoke"],
+            "Callable::member_function(int) const": ["::invoke", "test_invoke"],
+            "Callable::operator()(int) const": ["::invoke", "test_invoke"],
+            # Containers
+            "MyKey::operator<(MyKey const&) const": [
+                "less",
+                "::emplace",
+                "test_containers",
+            ],
+        }
+        stop_set = set()
+        while process.GetState() != lldb.eStateExited:
+            fn = thread.GetFrameAtIndex(0).GetFunctionName()
+            stop_set.add(fn)
+            self.assertIn(fn, expected_parents.keys())
+            frame_id = 1
+            for expected_parent in expected_parents[fn]:
+                # Skip all hidden frames
+                while (
+                    frame_id < thread.GetNumFrames()
+                    and thread.GetFrameAtIndex(frame_id).IsHidden()
+                ):
+                    frame_id = frame_id + 1
+                # Expect the correct parent frame
+                self.assertIn(
+                    expected_parent, thread.GetFrameAtIndex(frame_id).GetFunctionName()
+                )
+                frame_id = frame_id + 1
+            process.Continue()
+
+        # Make sure that we actually verified all intended scenarios
+        self.assertEqual(len(stop_set), len(expected_parents))
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
new file mode 100644
index 00000000000000..870301b0970439
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
@@ -0,0 +1,86 @@
+#include <algorithm>
+#include <functional>
+#include <map>
+#include <ranges>
+#include <vector>
+
+bool sort_less(int a, int b) {
+  __builtin_printf("break here");
+  return a < b;
+}
+
+bool ranges_sort_less(int a, int b) {
+  __builtin_printf("break here");
+  return a < b;
+}
+
+int view_transform(int a) {
+  __builtin_printf("break here");
+  return a * a;
+}
+
+void test_algorithms() {
+  std::vector<int> vec{8, 1, 3, 2};
+
+  // The internal frames for `std::sort` should be hidden
+  std::sort(vec.begin(), vec.end(), sort_less);
+
+  // The internal frames for `ranges::sort` should be hidden
+  std::ranges::sort(vec.begin(), vec.end(), ranges_sort_less);
+
+  // Same for views
+  for (auto x : vec | std::ranges::views::transform(view_transform)) {
+    // no-op
+  }
+}
+
+void consume_number(int i) { __builtin_printf("break here"); }
+
+int invoke_add(int i, int j) {
+  __builtin_printf("break here");
+  return i + j;
+}
+
+struct Callable {
+  Callable(int num) : num_(num) {}
+  void operator()(int i) const { __builtin_printf("break here"); }
+  void member_function(int i) const { __builtin_printf("break here"); }
+  int num_;
+};
+
+void test_invoke() {
+  // Invoke a void-returning function
+  std::invoke(consume_number, -9);
+
+  // Invoke a non-void-returning function
+  std::invoke(invoke_add, 1, 10);
+
+  // Invoke a member function
+  const Callable foo(314159);
+  std::invoke(&Callable::member_function, foo, 1);
+
+  // Invoke a function object
+  std::invoke(Callable(12), 18);
+}
+
+struct MyKey {
+  int x;
+  bool operator==(const MyKey &) const = default;
+  bool operator<(const MyKey &other) const {
+    __builtin_printf("break here");
+    return x < other.x;
+  }
+};
+
+void test_containers() {
+  std::map<MyKey, int> map;
+  map.emplace(MyKey{1}, 2);
+  map.emplace(MyKey{2}, 3);
+}
+
+int main() {
+  test_algorithms();
+  test_invoke();
+  test_containers();
+  return 0;
+}
diff --git a/lldb/test/API/lang/cpp/std-invoke-recognizer/TestStdInvokeRecognizer.py b/lldb/test/API/lang/cpp/std-invoke-recognizer/TestStdInvokeRecognizer.py
deleted file mode 100644
index dbe29610bf7982..00000000000000
--- a/lldb/test/API/lang/cpp/std-invoke-recognizer/TestStdInvokeRecognizer.py
+++ /dev/null
@@ -1,44 +0,0 @@
-import lldb
-from lldbsuite.test.decorators import *
-from lldbsuite.test.lldbtest import *
-from lldbsuite.test import lldbutil
-
-
-class LibCxxStdFunctionRecognizerTestCase(TestBase):
-    NO_DEBUG_INFO_TESTCASE = True
-
-    @add_test_categories(["libc++"])
-    def test_frame_recognizer(self):
-        """Test that implementation details of `std::invoke` are hidden"""
-        self.build()
-        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
-            self, "break here", lldb.SBFileSpec("main.cpp")
-        )
-
-        stop_cnt = 0
-        while process.GetState() != lldb.eStateExited:
-            stop_cnt += 1
-            self.assertTrue(
-                any(
-                    f in thread.GetFrameAtIndex(0).GetFunctionName()
-                    for f in ["consume_number", "add", "Callable"]
-                )
-            )
-            # Skip all hidden frames
-            frame_id = 1
-            while (
-                frame_id < thread.GetNumFrames()
-                and thread.GetFrameAtIndex(frame_id).IsHidden()
-            ):
-                frame_id = frame_id + 1
-            # Expect `std::invoke` to be the direct parent
-            self.assertIn(
-                "::invoke", thread.GetFrameAtIndex(frame_id).GetFunctionName()
-            )
-            # And right above that, there should be the `main` frame
-            self.assertIn(
-                "main", thread.GetFrameAtIndex(frame_id + 1).GetFunctionName()
-            )
-            process.Continue()
-
-        self.assertEqual(stop_cnt, 4)
diff --git a/lldb/test/API/lang/cpp/std-invoke-recognizer/main.cpp b/lldb/test/API/lang/cpp/std-invoke-recognizer/main.cpp
deleted file mode 100644
index bafbbd28386e8b..00000000000000
--- a/lldb/test/API/lang/cpp/std-invoke-recognizer/main.cpp
+++ /dev/null
@@ -1,30 +0,0 @@
-#include <functional>
-
-void consume_number(int i) { __builtin_printf("break here"); }
-
-int add(int i, int j) {
-  // break here
-  return i + j;
-}
-
-struct Callable {
-  Callable(int num) : num_(num) {}
-  void operator()(int i) const { __builtin_printf("break here"); }
-  void member_function(int i) const { __builtin_printf("break here"); }
-  int num_;
-};
-
-int main() {
-  // Invoke a void-returning function
-  std::invoke(consume_number, -9);
-
-  // Invoke a non-void-returning function
-  std::invoke(add, 1, 10);
-
-  // Invoke a member function
-  const Callable foo(314159);
-  std::invoke(&Callable::member_function, foo, 1);
-
-  // Invoke a function object
-  std::invoke(Callable(12), 18);
-}

>From b32398e5311b1104000de5baa2a1d4182736ff82 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Valentin=20Clement=20=28=E3=83=90=E3=83=AC=E3=83=B3?=
 =?UTF-8?q?=E3=82=BF=E3=82=A4=E3=83=B3=20=E3=82=AF=E3=83=AC=E3=83=A1?=
 =?UTF-8?q?=E3=83=B3=29?= <clementval at gmail.com>
Date: Thu, 10 Oct 2024 10:31:03 -0700
Subject: [PATCH 19/20] [flang][runtime][NFC] Fix header guard typo (#111741)

Header guard was in sync with the filename.
---
 flang/include/flang/Runtime/allocator-registry.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/flang/include/flang/Runtime/allocator-registry.h b/flang/include/flang/Runtime/allocator-registry.h
index acfada506fafc6..3ccee56dc3fc0f 100644
--- a/flang/include/flang/Runtime/allocator-registry.h
+++ b/flang/include/flang/Runtime/allocator-registry.h
@@ -6,8 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
-#ifndef FORTRAN_RUNTIME_ALLOCATOR_H_
-#define FORTRAN_RUNTIME_ALLOCATOR_H_
+#ifndef FORTRAN_RUNTIME_ALLOCATOR_REGISTRY_H_
+#define FORTRAN_RUNTIME_ALLOCATOR_REGISTRY_H_
 
 #include "flang/Common/api-attrs.h"
 #include <cstdlib>
@@ -62,4 +62,4 @@ RT_OFFLOAD_VAR_GROUP_END
 
 } // namespace Fortran::runtime
 
-#endif // FORTRAN_RUNTIME_ALLOCATOR_H_
+#endif // FORTRAN_RUNTIME_ALLOCATOR_REGISTRY_H_

>From a31c1f4b55881d2ec176a850daeb3a61f22f0346 Mon Sep 17 00:00:00 2001
From: Renato Golin <rengolin at systemcall.eu>
Date: Thu, 10 Oct 2024 18:52:20 +0100
Subject: [PATCH 20/20] Fix GCC build problem with 03483737a7a2

---
 mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
index 4f350ea236da84..c909d13e4314b4 100644
--- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
@@ -157,7 +157,7 @@ static void fillStructuredOpRegion(OpBuilder &opBuilder, Region &region,
 
 /// Helper to create a typical indexing map for MatmulOp. Returns a list of
 /// AffineMap.
-static SmallVector<AffineMap>
+static SmallVector<AffineMap, 3>
 getDefaultIndexingMapsForMatmul(MLIRContext *context) {
   AffineExpr d0, d1, d2;
   SmallVector<AffineMap, 3> indexingMaps;



More information about the flang-commits mailing list