[clang] 899f263 - [NFC][analyzer] Document configuration options (#135169)
via cfe-commits
cfe-commits at lists.llvm.org
Wed May 14 09:28:48 PDT 2025
Author: DonĂ¡t Nagy
Date: 2025-05-14T18:28:44+02:00
New Revision: 899f26315cdae25addfd0ddbc6c9390bfbb74c3e
URL: https://github.com/llvm/llvm-project/commit/899f26315cdae25addfd0ddbc6c9390bfbb74c3e
DIFF: https://github.com/llvm/llvm-project/commit/899f26315cdae25addfd0ddbc6c9390bfbb74c3e.diff
LOG: [NFC][analyzer] Document configuration options (#135169)
This commit documents the process of specifying values for the analyzer
options and checker options implemented in the static analyzer, and adds
a script which includes the documentation of the analyzer options (which
was previously only available through a command-line flag) in the
RST-based web documentation.
Added:
clang/docs/analyzer/user-docs/Options.rst.in
clang/docs/tools/generate_analyzer_options_docs.py
clang/test/Analysis/generate_analyzer_options_docs.test
Modified:
clang/docs/CMakeLists.txt
clang/docs/analyzer/user-docs.rst
clang/docs/analyzer/user-docs/CommandLineUsage.rst
clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
clang/test/lit.cfg.py
Removed:
################################################################################
diff --git a/clang/docs/CMakeLists.txt b/clang/docs/CMakeLists.txt
index ca625efc6ccef..1f06c040c96cb 100644
--- a/clang/docs/CMakeLists.txt
+++ b/clang/docs/CMakeLists.txt
@@ -134,6 +134,34 @@ if (LLVM_ENABLE_SPHINX)
gen_rst_file_from_td(DiagnosticsReference.rst -gen-diag-docs ../include/clang/Basic/Diagnostic.td "${docs_targets}")
gen_rst_file_from_td(ClangCommandLineReference.rst -gen-opt-docs ../include/clang/Driver/ClangOptionDocs.td "${docs_targets}")
+ # Another generated file from a
diff erent source
+ set(docs_tools_dir ${CMAKE_CURRENT_SOURCE_DIR}/tools)
+ set(aopts_rst_rel_path analyzer/user-docs/Options.rst)
+ set(aopts_rst "${CMAKE_CURRENT_BINARY_DIR}/${aopts_rst_rel_path}")
+ set(analyzeroptions_def "${CMAKE_CURRENT_SOURCE_DIR}/../include/clang/StaticAnalyzer/Core/AnalyzerOptions.def")
+ set(aopts_rst_in "${CMAKE_CURRENT_SOURCE_DIR}/${aopts_rst_rel_path}.in")
+ add_custom_command(
+ OUTPUT ${aopts_rst}
+ COMMAND ${Python3_EXECUTABLE} generate_analyzer_options_docs.py
+ --options-def "${analyzeroptions_def}"
+ --template "${aopts_rst_in}"
+ --out "${aopts_rst}"
+ WORKING_DIRECTORY ${docs_tools_dir}
+ VERBATIM
+ COMMENT "Generating ${aopts_rst}"
+ DEPENDS ${docs_tools_dir}/${generate_aopts_docs}
+ ${aopts_rst_in}
+ copy-clang-rst-docs
+ )
+ add_custom_target(generate-analyzer-options-rst DEPENDS ${aopts_rst})
+ foreach(target ${docs_targets})
+ add_dependencies(${target} generate-analyzer-options-rst)
+ endforeach()
+
+ # Technically this is redundant because generate-analyzer-options-rst
+ # depends on the copy operation (because it wants to drop a generated file
+ # into a subdirectory of the copied tree), but I'm leaving it here for the
+ # sake of clarity.
foreach(target ${docs_targets})
add_dependencies(${target} copy-clang-rst-docs)
endforeach()
diff --git a/clang/docs/analyzer/user-docs.rst b/clang/docs/analyzer/user-docs.rst
index e265f033a2c54..67c1dfaa40965 100644
--- a/clang/docs/analyzer/user-docs.rst
+++ b/clang/docs/analyzer/user-docs.rst
@@ -8,6 +8,7 @@ Contents:
user-docs/Installation
user-docs/CommandLineUsage
+ user-docs/Options
user-docs/UsingWithXCode
user-docs/FilingBugs
user-docs/CrossTranslationUnit
diff --git a/clang/docs/analyzer/user-docs/CommandLineUsage.rst b/clang/docs/analyzer/user-docs/CommandLineUsage.rst
index 59f8187f374a9..0252de80b788f 100644
--- a/clang/docs/analyzer/user-docs/CommandLineUsage.rst
+++ b/clang/docs/analyzer/user-docs/CommandLineUsage.rst
@@ -194,6 +194,8 @@ When compiling your application to run on the simulator, it is important that **
If you aren't certain which compiler Xcode uses to build your project, try just running ``xcodebuild`` (without **scan-build**). You should see the full path to the compiler that Xcode is using, and use that as an argument to ``--use-cc``.
+.. _command-line-usage-CodeChecker:
+
CodeChecker
-----------
diff --git a/clang/docs/analyzer/user-docs/Options.rst.in b/clang/docs/analyzer/user-docs/Options.rst.in
new file mode 100644
index 0000000000000..0d2883fb9ead1
--- /dev/null
+++ b/clang/docs/analyzer/user-docs/Options.rst.in
@@ -0,0 +1,114 @@
+========================
+Configuring the Analyzer
+========================
+
+The clang static analyzer supports two kinds of options:
+
+1. Global **analyzer options** influence the behavior of the analyzer engine.
+ They are documented on this page, in the section :ref:`List of analyzer
+ options<list-of-analyzer-options>`.
+2. The **checker options** belong to individual checkers (e.g.
+ ``core.BitwiseShift:Pedantic`` and ``unix.Stream:Pedantic`` are completely
+ separate options) and customize the behavior of that particular checker.
+ These are documented within the documentation of each individual checker at
+ :doc:`../checkers`.
+
+Assigning values to options
+===========================
+
+With the compiler frontend
+--------------------------
+
+All options can be configured by using the ``-analyzer-config`` flag of ``clang
+-cc1`` (the so-called *compiler frontend* part of clang). The values of the
+options are specified with the syntax ``-analyzer-config
+OPT=VAL,OPT2=VAL2,...`` which supports specifying multiple options, but
+separate flags like ``-analyzer-config OPT=VAL -analyzer-config OPT2=VAL2`` are
+also accepted (with equivalent behavior). Analyzer options and checker options
+can be freely intermixed here because it's easy to recognize that checker
+option names are always prefixed with ``some.groups.NameOfChecker:``.
+
+.. warning::
+ This is an internal interface, one should prefer `clang --analyze ...` for
+ regular use. Clang does not intend to preserve backwards compatibility or
+ announce breaking changes within the flags accepted by ``clang -cc1``
+ (but ``-analyzer-config`` survived many years without major changes).
+
+With the clang driver
+---------------------
+
+In a conventional workflow ``clang -cc1`` (which is a low-level internal
+interface) is invoked indirectly by the clang *driver* (i.e. plain ``clang``
+without the ``-cc1`` flag), which acts as an "even more frontend" wrapper layer
+around the ``clang -cc1`` *compiler frontend*. In this situation **each**
+command line argument intended for the *compiler frontend* must be prefixed
+with ``-Xclang``.
+
+For example the following command analyzes ``foo.c`` in :ref:`shallow mode
+<analyzer-option-mode>` with :ref:`loop unrolling
+<analyzer-option-unroll-loops>`:
+
+::
+
+ clang --analyze -Xclang -analyzer-config -Xclang mode=shallow,unroll-loops=true foo.c
+
+When this is executed, the *driver* will compose and execute the following
+``clang -cc1`` command (which can be inspected by passing the ``-v`` flag to
+the *driver*):
+
+::
+
+ clang -cc1 -analyze [...] -analyzer-config mode=shallow,unroll-loops=true foo.c
+
+Here ``[...]`` stands for dozens of low-level flags which ensure that ``clang
+-cc1`` does the right thing (e.g. ``-fcolor-diagnostics`` when it's suitable;
+``-analyzer-checker`` flags to enable the default set of checkers). Also
+note the distinction that the ``clang`` *driver* requires ``--analyze`` (double
+dashes) while the ``clang -cc1`` *compiler frontend* requires ``-analyze``
+(single dash).
+
+.. note::
+ The flag ``-Xanalyzer`` is equivalent to ``-Xclang`` in these situations
+ (but doesn't forward other options of the clang frontend).
+
+With CodeChecker
+----------------
+
+If the analysis is performed through :ref:`CodeChecker
+<command-line-usage-CodeChecker>` (which e.g. supports the analysis of a whole
+project instead of a single file) then it will act as another indirection
+layer. CodeChecker provides separate command-line flags called
+``--analyzer-config`` (for analyzer options) and ``--checker-config`` (for
+checker options):
+
+::
+
+ CodeChecker analyze -o outdir --checker-config clangsa:unix.Stream:Pedantic=true \
+ --analyzer-config clangsa:mode=shallow clangsa:unroll-loops=true \
+ -- compile_commands.json
+
+These CodeChecker flags may be followed by multiple ``OPT=VAL`` pairs as
+separate arguments (and this is why the example needs to use ``--`` before
+``compile_commands.json``). The option names are all prefixed with ``clangsa:``
+to ensure that they are passed to the clang static analyzer (and not other
+analyzer tools that are also supported by CodeChecker).
+
+.. _list-of-analyzer-options:
+
+List of analyzer options
+========================
+
+.. warning::
+ These options are primarily intended for development purposes and
+ non-default values are usually unsupported. Changing their values may
+ drastically alter the behavior of the analyzer, and may even result in
+ instabilities or crashes! Crash reports are welcome and depending on the
+ severity they may be fixed.
+
+..
+ The contents of this section are automatically generated by the script
+ clang/docs/tools/generate_analyzer_options_docs.py from the header file
+ AnalyzerOptions.def to ensure that the RST/web documentation is synchronized
+ with the command line help options.
+
+.. OPTIONS_LIST_PLACEHOLDER
diff --git a/clang/docs/tools/generate_analyzer_options_docs.py b/clang/docs/tools/generate_analyzer_options_docs.py
new file mode 100644
index 0000000000000..26c098d8514a0
--- /dev/null
+++ b/clang/docs/tools/generate_analyzer_options_docs.py
@@ -0,0 +1,293 @@
+#!/usr/bin/env python3
+# A tool to automatically generate documentation for the config options of the
+# clang static analyzer by reading `AnalyzerOptions.def`.
+
+import argparse
+from collections import namedtuple
+from enum import Enum, auto
+import re
+import sys
+import textwrap
+
+
+# The following code implements a trivial parser for the narrow subset of C++
+# which is used in AnalyzerOptions.def. This supports the following features:
+# - ignores preprocessor directives, even if they are continued with \ at EOL
+# - ignores comments: both /* ... */ and // ...
+# - parses string literals (even if they contain \" escapes)
+# - concatenates adjacent string literals
+# - parses numbers even if they contain ' as a thousands separator
+# - recognizes MACRO(arg1, arg2, ..., argN) calls
+
+
+class TT(Enum):
+ "Token type enum."
+ number = auto()
+ ident = auto()
+ string = auto()
+ punct = auto()
+
+
+TOKENS = [
+ (re.compile(r"-?[0-9']+"), TT.number),
+ (re.compile(r"\w+"), TT.ident),
+ (re.compile(r'"([^\\"]|\\.)*"'), TT.string),
+ (re.compile(r"[(),]"), TT.punct),
+ (re.compile(r"/\*((?!\*/).)*\*/", re.S), None), # C-style comment
+ (re.compile(r"//.*\n"), None), # C++ style oneline comment
+ (re.compile(r"#.*(\\\n.*)*(?<!\\)\n"), None), # preprocessor directive
+ (re.compile(r"\s+"), None), # whitespace
+]
+
+Token = namedtuple("Token", "kind code")
+
+
+class ErrorHandler:
+ def __init__(self):
+ self.seen_errors = False
+
+ # This script uses some heuristical tweaks to modify the documentation
+ # of some analyzer options. As this code is fragile, we record the use
+ # of these tweaks and report them if they become obsolete:
+ self.unused_tweaks = [
+ "escape star",
+ "escape underline",
+ "accepted values",
+ "example file content",
+ ]
+
+ def record_use_of_tweak(self, tweak_name):
+ try:
+ self.unused_tweaks.remove(tweak_name)
+ except ValueError:
+ pass
+
+ def replace_as_tweak(self, string, pattern, repl, tweak_name):
+ res = string.replace(pattern, repl)
+ if res != string:
+ self.record_use_of_tweak(tweak_name)
+ return res
+
+ def report_error(self, msg):
+ print("Error:", msg, file=sys.stderr)
+ self.seen_errors = True
+
+ def report_unexpected_char(self, s, pos):
+ lines = (s[:pos] + "X").split("\n")
+ lineno, col = (len(lines), len(lines[-1]))
+ self.report_error(
+ "unexpected character %r in AnalyzerOptions.def at line %d column %d"
+ % (s[pos], lineno, col),
+ )
+
+ def report_unused_tweaks(self):
+ if not self.unused_tweaks:
+ return
+ _is = " is" if len(self.unused_tweaks) == 1 else "s are"
+ names = ", ".join(self.unused_tweaks)
+ self.report_error(f"textual tweak{_is} unused in script: {names}")
+
+
+err_handler = ErrorHandler()
+
+
+def tokenize(s):
+ result = []
+ pos = 0
+ while pos < len(s):
+ for regex, kind in TOKENS:
+ if m := regex.match(s, pos):
+ if kind is not None:
+ result.append(Token(kind, m.group(0)))
+ pos = m.end()
+ break
+ else:
+ err_handler.report_unexpected_char(s, pos)
+ pos += 1
+ return result
+
+
+def join_strings(tokens):
+ result = []
+ for tok in tokens:
+ if tok.kind == TT.string and result and result[-1].kind == TT.string:
+ # If this token is a string, and the previous non-ignored token is
+ # also a string, then merge them into a single token. We need to
+ # discard the closing " of the previous string and the opening " of
+ # this string.
+ prev = result.pop()
+ result.append(Token(TT.string, prev.code[:-1] + tok.code[1:]))
+ else:
+ result.append(tok)
+ return result
+
+
+MacroCall = namedtuple("MacroCall", "name args")
+
+
+class State(Enum):
+ "States of the state machine used for parsing the macro calls."
+ init = auto()
+ after_ident = auto()
+ before_arg = auto()
+ after_arg = auto()
+
+
+def get_calls(tokens, macro_names):
+ state = State.init
+ result = []
+ current = None
+ for tok in tokens:
+ if state == State.init and tok.kind == TT.ident and tok.code in macro_names:
+ current = MacroCall(tok.code, [])
+ state = State.after_ident
+ elif state == State.after_ident and tok == Token(TT.punct, "("):
+ state = State.before_arg
+ elif state == State.before_arg:
+ if current is not None:
+ current.args.append(tok)
+ state = State.after_arg
+ elif state == State.after_arg and tok.kind == TT.punct:
+ if tok.code == ")":
+ result.append(current)
+ current = None
+ state = State.init
+ elif tok.code == ",":
+ state = State.before_arg
+ else:
+ current = None
+ state = State.init
+ return result
+
+
+# The information will be extracted from calls to these two macros:
+# #define ANALYZER_OPTION(TYPE, NAME, CMDFLAG, DESC, DEFAULT_VAL)
+# #define ANALYZER_OPTION_DEPENDS_ON_USER_MODE(TYPE, NAME, CMDFLAG, DESC,
+# SHALLOW_VAL, DEEP_VAL)
+
+MACRO_NAMES_PARAMCOUNTS = {
+ "ANALYZER_OPTION": 5,
+ "ANALYZER_OPTION_DEPENDS_ON_USER_MODE": 6,
+}
+
+
+def string_value(tok):
+ if tok.kind != TT.string:
+ raise ValueError(f"expected a string token, got {tok.kind.name}")
+ text = tok.code[1:-1] # Remove quotes
+ text = re.sub(r"\\(.)", r"\1", text) # Resolve backslash escapes
+ return text
+
+
+def cmdflag_to_rst_title(cmdflag_tok):
+ text = string_value(cmdflag_tok)
+ underline = "-" * len(text)
+ ref = f".. _analyzer-option-{text}:"
+
+ return f"{ref}\n\n{text}\n{underline}\n\n"
+
+
+def desc_to_rst_paragraphs(tok):
+ desc = string_value(tok)
+
+ # Escape some characters that have special meaning in RST:
+ desc = err_handler.replace_as_tweak(desc, "*", r"\*", "escape star")
+ desc = err_handler.replace_as_tweak(desc, "_", r"\_", "escape underline")
+
+ # Many descriptions end with "Value: <list of accepted values>", which is
+ # OK for a terse command line printout, but should be prettified for web
+ # documentation.
+ # Moreover, the option ctu-invocation-list shows some example file content
+ # which is formatted as a preformatted block.
+ paragraphs = [desc]
+ extra = ""
+ if m := re.search(r"(^|\s)Value:", desc):
+ err_handler.record_use_of_tweak("accepted values")
+ paragraphs = [desc[: m.start()], "Accepted values:" + desc[m.end() :]]
+ elif m := re.search(r"\s*Example file.content:", desc):
+ err_handler.record_use_of_tweak("example file content")
+ paragraphs = [desc[: m.start()]]
+ extra = "Example file content::\n\n " + desc[m.end() :] + "\n\n"
+
+ wrapped = [textwrap.fill(p, width=80) for p in paragraphs if p.strip()]
+
+ return "\n\n".join(wrapped + [""]) + extra
+
+
+def default_to_rst(tok):
+ if tok.kind == TT.string:
+ if tok.code == '""':
+ return "(empty string)"
+ return tok.code
+ if tok.kind == TT.ident:
+ return tok.code
+ if tok.kind == TT.number:
+ return tok.code.replace("'", "")
+ raise ValueError(f"unexpected token as default value: {tok.kind.name}")
+
+
+def defaults_to_rst_paragraph(defaults):
+ strs = [default_to_rst(d) for d in defaults]
+
+ if len(strs) == 1:
+ return f"Default value: {strs[0]}\n\n"
+ if len(strs) == 2:
+ return (
+ f"Default value: {strs[0]} (in shallow mode) / {strs[1]} (in deep mode)\n\n"
+ )
+ raise ValueError("unexpected count of default values: %d" % len(defaults))
+
+
+def macro_call_to_rst_paragraphs(macro_call):
+ try:
+ arg_count = len(macro_call.args)
+ param_count = MACRO_NAMES_PARAMCOUNTS[macro_call.name]
+ if arg_count != param_count:
+ raise ValueError(
+ f"expected {param_count} arguments for {macro_call.name}, found {arg_count}"
+ )
+
+ _, _, cmdflag, desc, *defaults = macro_call.args
+
+ return (
+ cmdflag_to_rst_title(cmdflag)
+ + desc_to_rst_paragraphs(desc)
+ + defaults_to_rst_paragraph(defaults)
+ )
+ except ValueError as ve:
+ err_handler.report_error(ve.args[0])
+ return ""
+
+
+def get_option_list(input_file):
+ with open(input_file, encoding="utf-8") as f:
+ contents = f.read()
+ tokens = join_strings(tokenize(contents))
+ macro_calls = get_calls(tokens, MACRO_NAMES_PARAMCOUNTS)
+
+ result = ""
+ for mc in macro_calls:
+ result += macro_call_to_rst_paragraphs(mc)
+ return result
+
+
+p = argparse.ArgumentParser()
+p.add_argument("--options-def", help="path to AnalyzerOptions.def")
+p.add_argument("--template", help="template file")
+p.add_argument("--out", help="output file")
+opts = p.parse_args()
+
+with open(opts.template, encoding="utf-8") as f:
+ doc_template = f.read()
+
+PLACEHOLDER = ".. OPTIONS_LIST_PLACEHOLDER\n"
+
+rst_output = doc_template.replace(PLACEHOLDER, get_option_list(opts.options_def))
+
+err_handler.report_unused_tweaks()
+
+with open(opts.out, "w", newline="", encoding="utf-8") as f:
+ f.write(rst_output)
+
+if err_handler.seen_errors:
+ sys.exit(1)
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index fab19c76a22fe..90b80e5201aa8 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -7,6 +7,9 @@
//===----------------------------------------------------------------------===//
//
// This file defines the analyzer options avaible with -analyzer-config.
+// Note that clang/docs/tools/generate_analyzer_options_docs.py relies on the
+// structure of this file, so if this file is refactored, then make sure to
+// update that script as well.
//
//===----------------------------------------------------------------------===//
diff --git a/clang/test/Analysis/generate_analyzer_options_docs.test b/clang/test/Analysis/generate_analyzer_options_docs.test
new file mode 100644
index 0000000000000..0c95346504ae3
--- /dev/null
+++ b/clang/test/Analysis/generate_analyzer_options_docs.test
@@ -0,0 +1,14 @@
+The documentation of analyzer options is generated by a script that parses
+AnalyzerOptions.def. The following line validates that this script
+"understands" everything in its input files:
+
+RUN: %python %src_dir/docs/tools/generate_analyzer_options_docs.py \
+RUN: --options-def %src_include_dir/clang/StaticAnalyzer/Core/AnalyzerOptions.def \
+RUN: --template %src_dir/docs/analyzer/user-docs/Options.rst.in \
+RUN: --out %t.rst
+
+Moreover, verify that the documentation (e.g. this fragment of the
+documentation of the "mode" option) can be found in the output file:
+
+RUN: FileCheck --input-file=%t.rst %s
+CHECK: Controls the high-level analyzer mode
diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index f963b656b663c..2b35bb5dcbdaf 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -70,6 +70,8 @@
llvm_config.use_clang()
+config.substitutions.append(("%src_dir", config.clang_src_dir))
+
config.substitutions.append(("%src_include_dir", config.clang_src_dir + "/include"))
config.substitutions.append(("%target_triple", config.target_triple))
More information about the cfe-commits
mailing list