[Mlir-commits] [mlir] [MLIR] Make More Specific Function Header For StringLiteral Optimization in `Diagnostic` (PR #112154)

Andrew Luo llvmlistbot at llvm.org
Mon Oct 14 12:14:04 PDT 2024


https://github.com/AndrewZhaoLuo updated https://github.com/llvm/llvm-project/pull/112154

>From f01d5dc956ebf393b67784ae15399fdfbf0d55a1 Mon Sep 17 00:00:00 2001
From: Andrew Luo <andrew.zhao.luo at gmail.com>
Date: Sun, 13 Oct 2024 17:24:08 -0700
Subject: [PATCH] [MLIR] Make More Specific Function Header For StringLiteral
 Optimization in `Diagnostic`

Diagnostic stores various notes/error messages which might help the user in debugging. For the most part, the `Diagnostic` when receiving an error message will copy and own the contents of the string.

However, there is one optimization where given a `const char*`, the class will assume this is a StringLiteral which is immutable and lifetime matches that of the entire program. As a result, instead of copying the message in these cases the class will simply store the underlying pointer.

This is problematic since `const char*` is not specific enough to always imply a StringLiteral which can lead to bugs, e.g. if the underlying pointer is freed before the diagnostic reports.

We solve this problem by choosing a more specific function signature. While not full-proof, this should cover a lot more cases.

A potentially better alternative is just deleting this special handling of string literals, but I am unsure of the implications (it does sound safe to do however with a negligble impact on performance).
---
 mlir/include/mlir/IR/Diagnostics.h |  3 +-
 mlir/unittests/IR/CMakeLists.txt   |  1 +
 mlir/unittests/IR/Diagnostic.cpp   | 63 ++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 mlir/unittests/IR/Diagnostic.cpp

diff --git a/mlir/include/mlir/IR/Diagnostics.h b/mlir/include/mlir/IR/Diagnostics.h
index cb30bb3f59688a..8429325412dc97 100644
--- a/mlir/include/mlir/IR/Diagnostics.h
+++ b/mlir/include/mlir/IR/Diagnostics.h
@@ -183,7 +183,8 @@ class Diagnostic {
   Diagnostic &operator<<(StringAttr val);
 
   /// Stream in a string literal.
-  Diagnostic &operator<<(const char *val) {
+  template <size_t n>
+  Diagnostic &operator<<(const char (&val)[n]) {
     arguments.push_back(DiagnosticArgument(val));
     return *this;
   }
diff --git a/mlir/unittests/IR/CMakeLists.txt b/mlir/unittests/IR/CMakeLists.txt
index 547e536dd9cbbf..384116ba5c457e 100644
--- a/mlir/unittests/IR/CMakeLists.txt
+++ b/mlir/unittests/IR/CMakeLists.txt
@@ -4,6 +4,7 @@ add_mlir_unittest(MLIRIRTests
   AffineMapTest.cpp
   AttributeTest.cpp
   AttrTypeReplacerTest.cpp
+  Diagnostic.cpp
   DialectTest.cpp
   InterfaceTest.cpp
   IRMapping.cpp
diff --git a/mlir/unittests/IR/Diagnostic.cpp b/mlir/unittests/IR/Diagnostic.cpp
new file mode 100644
index 00000000000000..96e09d33309263
--- /dev/null
+++ b/mlir/unittests/IR/Diagnostic.cpp
@@ -0,0 +1,63 @@
+//===- Diagnostic.cpp - Dialect unit tests -------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/Support/TypeID.h"
+#include "gtest/gtest.h"
+
+using namespace mlir;
+using namespace mlir::detail;
+
+namespace {
+
+TEST(DiagnosticLifetime, TestCopiesConstCharStar) {
+  const auto *expectedMessage = "Error 1, don't mutate this";
+
+  // Copy expected message into a mutable container, and call the constructor.
+  std::string myStr(expectedMessage);
+
+  mlir::MLIRContext context;
+  Diagnostic diagnostic(mlir::UnknownLoc::get(&context),
+                        DiagnosticSeverity::Note);
+  diagnostic << myStr.c_str();
+
+  // Mutate underlying pointer, but ensure diagnostic still has orig. message
+  myStr[0] = '^';
+
+  std::string resultMessage;
+  llvm::raw_string_ostream stringStream(resultMessage);
+  diagnostic.print(stringStream);
+  ASSERT_STREQ(expectedMessage, resultMessage.c_str());
+}
+
+TEST(DiagnosticLifetime, TestLazyCopyStringLiteral) {
+  char charArr[21] = "Error 1, mutate this";
+  mlir::MLIRContext context;
+  Diagnostic diagnostic(mlir::UnknownLoc::get(&context),
+                        DiagnosticSeverity::Note);
+
+  // Diagnostic contains optimization which assumes string literals are
+  // represented by `const char[]` type. This is imperfect as we can sometimes
+  // trick the type system as seen below.
+  //
+  // Still we use this to check the diagnostic is lazily storing the pointer.
+  auto addToDiagnosticAsConst = [&diagnostic](const char(&charArr)[21]) {
+    diagnostic << charArr;
+  };
+  addToDiagnosticAsConst(charArr);
+
+  // Mutate the underlying pointer and ensure the string does change
+  charArr[0] = '^';
+
+  std::string resultMessage;
+  llvm::raw_string_ostream stringStream(resultMessage);
+  diagnostic.print(stringStream);
+  ASSERT_STREQ("^rror 1, mutate this", resultMessage.c_str());
+}
+
+} // namespace



More information about the Mlir-commits mailing list