[clang] [clang] Diagnose problematic diagnostic messages (PR #93229)
Aaron Ballman via cfe-commits
cfe-commits at lists.llvm.org
Thu May 23 12:00:37 PDT 2024
https://github.com/AaronBallman created https://github.com/llvm/llvm-project/pull/93229
Clang has some unwritten rules about diagnostic wording regarding things like punctuation and capitalization. This patch documents those rules and adds some tablegen support for checking diagnostics follow the rules.
Specifically: tablegen now checks that a diagnostic does not start with a capital letter or end with punctuation, except for the usual exceptions like proper nouns or ending with a question.
Now that the code base is clean of such issues, the diagnostics are emitted as an error rather than a warning to ensure that failure to follow these rules is either addressed by an author, or a new exception is added to the checking logic.
>From 38d6d9b809e1cf9d6a8f577630c838421486cd04 Mon Sep 17 00:00:00 2001
From: Aaron Ballman <aaron at aaronballman.com>
Date: Thu, 23 May 2024 14:55:16 -0400
Subject: [PATCH] Diagnose problematic diagnostic messages
Clang has some unwritten rules about diagnostic wording regarding
things like punctuation and capitalization. This patch documents those
rules and adds some tablegen support for checking diagnostics follow
the rules.
Specifically: tablegen now checks that a diagnostic does not start with
a capital letter or end with punctuation, except for the usual
exceptions like proper nouns or ending with a question.
---
clang/docs/InternalsManual.rst | 38 ++++
clang/test/TableGen/deferred-diag.td | 10 +-
clang/test/TableGen/text-substitution.td | 4 +-
clang/test/TableGen/wording-errors.td | 55 +++++
.../TableGen/ClangDiagnosticsEmitter.cpp | 191 ++++++++++++++++++
5 files changed, 291 insertions(+), 7 deletions(-)
create mode 100644 clang/test/TableGen/wording-errors.td
diff --git a/clang/docs/InternalsManual.rst b/clang/docs/InternalsManual.rst
index b3e2b870ae5f9..3d21e37784b36 100644
--- a/clang/docs/InternalsManual.rst
+++ b/clang/docs/InternalsManual.rst
@@ -123,6 +123,44 @@ severe that error recovery won't be able to recover sensibly from them (thus
spewing a ton of bogus errors). One example of this class of error are failure
to ``#include`` a file.
+Diagnostic Wording
+^^^^^^^^^^^^^^^^^^
+The wording used for a diagnostic is critical because it is the only way for a
+user to know how to correct their code. Use the following suggestions when
+wording a diagnostic.
+
+* Diagnostics in Clang do not start with a capital letter and do not end with
+ punctuation.
+
+ * This does not apply to proper nouns like ``Clang`` or ``OpenMP``, to
+ acronyms like ``GCC`` or ``ARC``, or to language standards like ``C23``
+ or ``C++17``.
+ * A trailing question mark is allowed. e.g., ``unknown identifier %0; did
+ you mean %1?``.
+
+* Appropriately capitalize proper nouns like ``Clang``, ``OpenCL``, ``GCC``,
+ ``Objective-C``, etc and language standard versions like ``C11`` or ``C++11``.
+* The wording should be succinct. If necessary, use a semicolon to combine
+ sentence fragments instead of using complete sentences. e.g., prefer wording
+ like ``'%0' is deprecated; it will be removed in a future release of Clang``
+ over wording like ``'%0' is deprecated. It will be removed in a future release
+ of Clang``.
+* The wording should be actionable and avoid using standards terms or grammar
+ productions that a new user would not be familiar with. e.g., prefer wording
+ like ``missing semicolon`` over wording like ``syntax error`` (which is not
+ actionable) or ``expected unqualified-id`` (which uses standards terminology).
+* The wording should clearly explain what is wrong with the code rather than
+ restating what the code does. e.g., prefer wording like ``type %0 requires a
+ value in the range %1 to %2`` over wording like ``%0 is invalid``.
+* The wording should have enough contextual information to help the user
+ identify the issue in a complex expression. e.g., prefer wording like
+ ``both sides of the %0 binary operator are identical`` over wording like
+ ``identical operands to binary operator``.
+* Use single quotes to denote syntactic constructs or command line arguments
+ named in a diagnostic message. e.g., prefer wording like ``'this' pointer
+ cannot be null in well-defined C++ code`` over wording like ``this pointer
+ cannot be null in well-defined C++ code``.
+
The Format String
^^^^^^^^^^^^^^^^^
diff --git a/clang/test/TableGen/deferred-diag.td b/clang/test/TableGen/deferred-diag.td
index c1906d4a9e45e..d7e8e694c7b3e 100644
--- a/clang/test/TableGen/deferred-diag.td
+++ b/clang/test/TableGen/deferred-diag.td
@@ -4,24 +4,24 @@ include "DiagnosticBase.inc"
// Test usage of Deferrable and NonDeferrable in diagnostics.
-def test_default : Error<"This error is non-deferrable by default">;
+def test_default : Error<"this error is non-deferrable by default">;
// CHECK-DAG: DIAG(test_default, {{.*}}SFINAE_SubstitutionFailure, false, true, true, false, 0)
-def test_deferrable : Error<"This error is deferrable">, Deferrable;
+def test_deferrable : Error<"this error is deferrable">, Deferrable;
// CHECK-DAG: DIAG(test_deferrable, {{.*}} SFINAE_SubstitutionFailure, false, true, true, true, 0)
-def test_non_deferrable : Error<"This error is non-deferrable">, NonDeferrable;
+def test_non_deferrable : Error<"this error is non-deferrable">, NonDeferrable;
// CHECK-DAG: DIAG(test_non_deferrable, {{.*}} SFINAE_SubstitutionFailure, false, true, true, false, 0)
let Deferrable = 1 in {
-def test_let : Error<"This error is deferrable by let">;
+def test_let : Error<"this error is deferrable by let">;
// CHECK-DAG: DIAG(test_let, {{.*}} SFINAE_SubstitutionFailure, false, true, true, true, 0)
// Make sure TextSubstitution is allowed in the let Deferrable block.
def textsub : TextSubstitution<"%select{text1|text2}0">;
-def test_let2 : Error<"This error is deferrable by let %sub{textsub}0">;
+def test_let2 : Error<"this error is deferrable by let %sub{textsub}0">;
// CHECK-DAG: DIAG(test_let2, {{.*}} SFINAE_SubstitutionFailure, false, true, true, true, 0)
}
diff --git a/clang/test/TableGen/text-substitution.td b/clang/test/TableGen/text-substitution.td
index aafdbe48c43be..b0d030aca6513 100644
--- a/clang/test/TableGen/text-substitution.td
+++ b/clang/test/TableGen/text-substitution.td
@@ -26,8 +26,8 @@ def sub_test_rewrite : TextSubstitution<
// CHECK-SAME: Q! %q1.
// CHECK-SAME: PLACEHOLDER! %0.OBJCCLASS!
// CHECK-SAME: %objcclass5. OBJCINSTANCE!
-// CHECK-SAME: %objcinstance4. DONE!",
-def test_rewrite: Error<"%sub{sub_test_rewrite}5,4,3,2,1,0 DONE!">;
+// CHECK-SAME: %objcinstance4. DONE",
+def test_rewrite: Error<"%sub{sub_test_rewrite}5,4,3,2,1,0 DONE">;
def test_sub_basic : Error<"%sub{yes_no}0">;
// CHECK: test_sub_basic
diff --git a/clang/test/TableGen/wording-errors.td b/clang/test/TableGen/wording-errors.td
new file mode 100644
index 0000000000000..eb5eb2f547c78
--- /dev/null
+++ b/clang/test/TableGen/wording-errors.td
@@ -0,0 +1,55 @@
+// RUN: not clang-tblgen -gen-clang-diags-defs -I%S %s -o /dev/null 2>&1 | FileCheck %s
+include "DiagnosticBase.inc"
+
+// Ensure we catch a capital letter at the start of a diagnostic.
+def zero : Error<
+ "This is bad">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not start with a capital letter; 'This' is invalid
+
+// Test that we also correctly handle selections.
+def one : Error<
+ "%select{|or}0 That">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not start with a capital letter; 'That' is invalid
+def two : Error<
+ "%select{as does|}0 This">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not start with a capital letter; 'This' is invalid
+def three : Error<
+ "%select{and||of course}0 Whatever">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not start with a capital letter; 'Whatever' is invalid
+
+// Test that we accept the following cases.
+def four : Error<
+ "this is fine">;
+def five : Error<
+ "%select{this|is|also}0 Fine">;
+def six : Error<
+ "%select{this|is|also|}0 fine">;
+def seven : Error<
+ "%select{ARC|C|C23|C++14|OpenMP}0 are also fine">;
+
+// Next, test that we catch punctuation at the end of the diagnostic.
+def eight : Error<
+ "punctuation is bad.">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '.' is invalid
+def nine : Error<
+ "it's really bad!">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '!' is invalid
+def ten : Error<
+ "we also catch %select{punctuation.|in select}0">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '.' is invalid
+def eleven : Error<
+ "and %select{|here.}0">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '.' is invalid
+def twelve : Error<
+ "and %select{here.|}0">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '.' is invalid
+def thirteen : Error<
+ "and even %select{|here.|}0">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '.' is invalid
+def fourteen : Error<
+ "and %select{here}0.">;
+// CHECK-DAG: wording-errors.td:[[@LINE-2]]:5: error: Diagnostics should not end with punctuation; '.' is invalid
+
+// Test that we accept the following cases.
+def fifteen : Error<
+ "question marks are intentionally okay?">;
diff --git a/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp b/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp
index f564689fff7cf..09201d59ddeb7 100644
--- a/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp
+++ b/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp
@@ -1213,6 +1213,194 @@ static bool isRemark(const Record &Diag) {
return ClsName == "CLASS_REMARK";
}
+// Presumes the text has been split at the first whitespace or hyphen.
+static bool isExemptAtStart(StringRef Text) {
+ // Fast path, the first character is lowercase or not alphanumeric.
+ if (isLower(Text[0]) || !isAlnum(Text[0]))
+ return true;
+
+ // If the text is all uppercase (or numbers, +, or _), then we assume it's an
+ // acronym and that's allowed. This covers cases like ISO, C23, C++14, and
+ // OBJECT_MODE. However, if there's only a single letter other than "C", we
+ // do not exempt it so that we catch a case like "A really bad idea" while
+ // still allowing a case like "C does not allow...".
+ if (llvm::all_of(Text, [](char C) {
+ return isUpper(C) || isDigit(C) || C == '+' || C == '_';
+ }))
+ return Text.size() > 1 || Text[0] == 'C';
+
+ // Otherwise, there are a few other exemptions.
+ return StringSwitch<bool>(Text)
+ .Case("AddressSanitizer", true)
+ .Case("CFString", true)
+ .Case("Clang", true)
+ .Case("Fuchsia", true)
+ .Case("GNUstep", true)
+ .Case("IBOutletCollection", true)
+ .Case("Microsoft", true)
+ .Case("Neon", true)
+ .StartsWith("NSInvocation", true) // NSInvocation, NSInvocation's
+ .Case("Objective", true) // Objective-C, Objective-C++
+ .Case("OpenACC", true)
+ .Case("OpenCL", true)
+ .Case("OpenMP", true)
+ .Case("Pascal", true)
+ .Case("Swift", true)
+ .Case("Unicode", true)
+ .Case("Vulkan", true)
+ .Case("WebAssembly", true)
+ .Default(false);
+}
+
+// Does not presume the text has been split at all.
+static bool isExemptAtEnd(StringRef Text) {
+ // Rather than come up with a list of characters that are allowed, we go the
+ // other way and look only for characters that are not allowed.
+ switch (Text.back()) {
+ default:
+ return true;
+ case '?':
+ // Explicitly allowed to support "; did you mean?".
+ return true;
+ case '.':
+ case '!':
+ return false;
+ }
+}
+
+static void verifyDiagnosticWording(const Record &Diag) {
+ StringRef FullDiagText = Diag.getValueAsString("Summary");
+
+ auto DiagnoseStart = [&](StringRef Text) {
+ // Verify that the text does not start with a capital letter, except for
+ // special cases that are exempt like ISO and C++. Find the first word
+ // by looking for a word breaking character.
+ char Separators[] = {' ', '-', ',', '}'};
+ auto Iter = std::find_first_of(
+ Text.begin(), Text.end(), std::begin(Separators), std::end(Separators));
+
+ StringRef First = Text.substr(0, Iter - Text.begin());
+ if (!isExemptAtStart(First)) {
+ PrintError(&Diag,
+ "Diagnostics should not start with a capital letter; '" +
+ First + "' is invalid");
+ }
+ };
+
+ auto DiagnoseEnd = [&](StringRef Text) {
+ // Verify that the text does not end with punctuation like '.' or '!'.
+ if (!isExemptAtEnd(Text)) {
+ PrintError(&Diag, "Diagnostics should not end with punctuation; '" +
+ Text.substr(Text.size() - 1, 1) + "' is invalid");
+ }
+ };
+
+ // If the diagnostic starts with %select, look through it to see whether any
+ // of the options will cause a problem.
+ if (FullDiagText.starts_with("%select{")) {
+ // Do a balanced delimiter scan from the start of the text to find the
+ // closing '}', skipping intermediary {} pairs.
+
+ size_t BraceCount = 1;
+ auto Iter = FullDiagText.begin() + /*%select{*/ 8;
+ for (auto End = FullDiagText.end(); Iter != End; ++Iter) {
+ char Ch = *Iter;
+ if (Ch == '{')
+ ++BraceCount;
+ else if (Ch == '}')
+ --BraceCount;
+ if (!BraceCount)
+ break;
+ }
+ // Defending against a malformed diagnostic string.
+ if (BraceCount != 0)
+ return;
+
+ StringRef SelectText = FullDiagText.substr(
+ /*%select{*/ 8, Iter - FullDiagText.begin() - /*%select{*/ 8);
+ SmallVector<StringRef, 4> SelectPieces;
+ SelectText.split(SelectPieces, '|');
+
+ // Walk over all of the individual pieces of select text to see if any of
+ // them start with an invalid character. If any of the select pieces is
+ // empty, we need to look at the first word after the %select to see
+ // whether that is invalid or not. If all of the pieces are fine, then we
+ // don't need to check anything else about the start of the diagnostic.
+ bool CheckSecondWord = false;
+ for (StringRef Piece : SelectPieces) {
+ if (Piece.empty())
+ CheckSecondWord = true;
+ else
+ DiagnoseStart(Piece);
+ }
+
+ if (CheckSecondWord) {
+ // There was an empty select piece, so we need to check the second
+ // word. This catches situations like '%select{|fine}0 Not okay'. Add
+ // two to account for the closing curly brace and the number after it.
+ StringRef AfterSelect =
+ FullDiagText.substr(Iter - FullDiagText.begin() + 2).ltrim();
+ DiagnoseStart(AfterSelect);
+ }
+ } else {
+ // If the start of the diagnostic is not %select, we can check the first
+ // word and be done with it.
+ DiagnoseStart(FullDiagText);
+ }
+
+ // If the last character in the diagnostic is a number preceded by a }, scan
+ // backwards to see if this is for a %select{...}0. If it is, we need to look
+ // at each piece to see whether it ends in punctuation or not.
+ bool StillNeedToDiagEnd = true;
+ if (isDigit(FullDiagText.back()) && *(FullDiagText.end() - 2) == '}') {
+ // Scan backwards to find the opening curly brace.
+ size_t BraceCount = 1;
+ auto Iter = FullDiagText.end() - /*}0*/ 3;
+ for (auto End = FullDiagText.begin(); Iter != End; --Iter) {
+ char Ch = *Iter;
+ if (Ch == '}')
+ ++BraceCount;
+ else if (Ch == '{')
+ --BraceCount;
+ if (!BraceCount)
+ break;
+ }
+ // Defending against a malformed diagnostic string.
+ if (BraceCount != 0)
+ return;
+
+ // Continue the backwards scan to find the word before the '{' to see if it
+ // is 'select'.
+ bool IsSelect =
+ (FullDiagText.substr(Iter - /*select*/ 6 - FullDiagText.begin(),
+ /*select*/ 6) == "select");
+ if (IsSelect) {
+ // Gather the content between the {} for the select in question so we can
+ // split it into pieces.
+ StillNeedToDiagEnd = false; // No longer need to handle the end.
+ StringRef SelectText =
+ FullDiagText.substr(Iter - FullDiagText.begin() + /*{*/ 1,
+ FullDiagText.end() - Iter - /*pos before }0*/ 3);
+ SmallVector<StringRef, 4> SelectPieces;
+ SelectText.split(SelectPieces, '|');
+ for (StringRef Piece : SelectPieces) {
+ // Not worrying about a situation like: "this is bar. %select{foo|}0".
+ if (!Piece.empty())
+ DiagnoseEnd(Piece);
+ }
+ }
+ }
+
+ // If we didn't already cover the diagnostic because of a %select, handle it
+ // now.
+ if (StillNeedToDiagEnd)
+ DiagnoseEnd(FullDiagText);
+
+ // FIXME: This could also be improved by looking for instances of clang or
+ // gcc in the diagnostic and recommend Clang or GCC instead. However, this
+ // runs into odd situations like [[clang::warn_unused_result]],
+ // #pragma clang, or --unwindlib=libgcc.
+}
/// ClangDiagsDefsEmitter - The top-level class emits .def files containing
/// declarations of Clang diagnostics.
@@ -1273,6 +1461,9 @@ void clang::EmitClangDiagsDefs(RecordKeeper &Records, raw_ostream &OS,
if (!Component.empty() && Component != R.getValueAsString("Component"))
continue;
+ // Validate diagnostic wording for common issues.
+ verifyDiagnosticWording(R);
+
OS << "DIAG(" << R.getName() << ", ";
OS << R.getValueAsDef("Class")->getName();
OS << ", (unsigned)diag::Severity::"
More information about the cfe-commits
mailing list