[clang] 41ac245 - [include-cleaner] Include-cleaner library structure, and simplistic AST walking.
Sam McCall via cfe-commits
cfe-commits at lists.llvm.org
Fri Apr 29 02:04:21 PDT 2022
Author: Sam McCall
Date: 2022-04-29T11:04:11+02:00
New Revision: 41ac245c10fc256c96c45cb874918a3bd5665369
URL: https://github.com/llvm/llvm-project/commit/41ac245c10fc256c96c45cb874918a3bd5665369
DIFF: https://github.com/llvm/llvm-project/commit/41ac245c10fc256c96c45cb874918a3bd5665369.diff
LOG: [include-cleaner] Include-cleaner library structure, and simplistic AST walking.
Include-cleaner is a library that uses the clang AST and preprocessor to
determine which headers are used. It will be used in clang-tidy, in
clangd, in a standalone tool at least for testing, and in out-of-tree tools.
Roughly, it walks the AST, finds referenced decls, maps these to
used sourcelocations, then to FileEntrys, then matching these against #includes.
However there are many wrinkles: dealing with macros, standard library
symbols, umbrella headers, IWYU directives etc.
It is not built on the C++20 modules concept of usage, to allow:
- use with existing non-modules codebases
- a flexible API embeddable in clang-tidy, clangd, and other tools
- avoiding a chicken-and-egg problem where include cleanups are needed
before modules can be adopted
This library is based on existing functionality in clangd that provides
an unused-include warning. However it has design changes:
- it accommodates diagnosing missing includes too (this means tracking
where references come from, not just the set of targets)
- it more clearly separates the different mappings
(symbol => location => header => include) for better testing
- it handles special cases like standard library symbols and IWYU directives
more elegantly by adding unified Location and Header types instead of
side-tables
- it will support some customization of policy where necessary (e.g.
for style questions of what constitutes a use, or to allow
both missing-include and unused-include modes to be conservative)
This patch adds the basic directory structure under clang-tools-extra
and a skeleton version of the AST traversal, which will be the central
piece.
A more end-to-end prototype is in https://reviews.llvm.org/D122677
RFC: https://discourse.llvm.org/t/rfc-lifting-include-cleaner-missing-unused-include-detection-out-of-clangd/61228
Differential Revision: https://reviews.llvm.org/D124164
Added:
clang-tools-extra/include-cleaner/CMakeLists.txt
clang-tools-extra/include-cleaner/README.md
clang-tools-extra/include-cleaner/lib/AnalysisInternal.h
clang-tools-extra/include-cleaner/lib/CMakeLists.txt
clang-tools-extra/include-cleaner/lib/WalkAST.cpp
clang-tools-extra/include-cleaner/test/CMakeLists.txt
clang-tools-extra/include-cleaner/test/Unit/lit.cfg.py
clang-tools-extra/include-cleaner/test/Unit/lit.site.cfg.py.in
clang-tools-extra/include-cleaner/test/lit.cfg.py
clang-tools-extra/include-cleaner/test/lit.site.cfg.py.in
clang-tools-extra/include-cleaner/unittests/CMakeLists.txt
clang-tools-extra/include-cleaner/unittests/WalkASTTest.cpp
Modified:
clang-tools-extra/CMakeLists.txt
clang/include/clang/Testing/TestAST.h
clang/lib/Testing/TestAST.cpp
Removed:
################################################################################
diff --git a/clang-tools-extra/CMakeLists.txt b/clang-tools-extra/CMakeLists.txt
index 285e5529e6fc6..4c1d4ada567f9 100644
--- a/clang-tools-extra/CMakeLists.txt
+++ b/clang-tools-extra/CMakeLists.txt
@@ -14,6 +14,7 @@ add_subdirectory(clang-doc)
add_subdirectory(clang-include-fixer)
add_subdirectory(clang-move)
add_subdirectory(clang-query)
+add_subdirectory(include-cleaner)
add_subdirectory(pp-trace)
add_subdirectory(pseudo)
add_subdirectory(tool-template)
diff --git a/clang-tools-extra/include-cleaner/CMakeLists.txt b/clang-tools-extra/include-cleaner/CMakeLists.txt
new file mode 100644
index 0000000000000..0550b02f603b5
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/CMakeLists.txt
@@ -0,0 +1,5 @@
+add_subdirectory(lib)
+if(CLANG_INCLUDE_TESTS)
+ add_subdirectory(test)
+ add_subdirectory(unittests)
+endif()
diff --git a/clang-tools-extra/include-cleaner/README.md b/clang-tools-extra/include-cleaner/README.md
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang-tools-extra/include-cleaner/lib/AnalysisInternal.h b/clang-tools-extra/include-cleaner/lib/AnalysisInternal.h
new file mode 100644
index 0000000000000..8b0c73fe7997b
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/lib/AnalysisInternal.h
@@ -0,0 +1,47 @@
+//===--- AnalysisInternal.h - Analysis building blocks ------------- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file provides smaller, testable pieces of the used-header analysis.
+// We find the headers by chaining together several mappings.
+//
+// AST => AST node => Symbol => Location => Header
+// /
+// Macro expansion =>
+//
+// The individual steps are declared here.
+// (AST => AST Node => Symbol is one API to avoid materializing DynTypedNodes).
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef CLANG_INCLUDE_CLEANER_ANALYSISINTERNAL_H
+#define CLANG_INCLUDE_CLEANER_ANALYSISINTERNAL_H
+
+#include "clang/Basic/SourceLocation.h"
+#include "llvm/ADT/STLFunctionalExtras.h"
+
+namespace clang {
+class Decl;
+class NamedDecl;
+namespace include_cleaner {
+
+/// Traverses part of the AST from \p Root, finding uses of symbols.
+///
+/// Each use is reported to the callback:
+/// - the SourceLocation describes where the symbol was used. This is usually
+/// the primary location of the AST node found under Root.
+/// - the NamedDecl is the symbol referenced. It is canonical, rather than e.g.
+/// the redecl actually found by lookup.
+///
+/// walkAST is typically called once per top-level declaration in the file
+/// being analyzed, in order to find all references within it.
+void walkAST(Decl &Root, llvm::function_ref<void(SourceLocation, NamedDecl &)>);
+
+} // namespace include_cleaner
+} // namespace clang
+
+#endif
diff --git a/clang-tools-extra/include-cleaner/lib/CMakeLists.txt b/clang-tools-extra/include-cleaner/lib/CMakeLists.txt
new file mode 100644
index 0000000000000..5e2807332f94d
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/lib/CMakeLists.txt
@@ -0,0 +1,10 @@
+set(LLVM_LINK_COMPONENTS Support)
+
+add_clang_library(clangIncludeCleaner
+ WalkAST.cpp
+
+ LINK_LIBS
+ clangBasic
+ clangAST
+ )
+
diff --git a/clang-tools-extra/include-cleaner/lib/WalkAST.cpp b/clang-tools-extra/include-cleaner/lib/WalkAST.cpp
new file mode 100644
index 0000000000000..b7354fe300e04
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/lib/WalkAST.cpp
@@ -0,0 +1,47 @@
+//===--- WalkAST.cpp - Find declaration references in the AST -------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "AnalysisInternal.h"
+#include "clang/AST/RecursiveASTVisitor.h"
+
+namespace clang {
+namespace include_cleaner {
+namespace {
+using DeclCallback = llvm::function_ref<void(SourceLocation, NamedDecl &)>;
+
+class ASTWalker : public RecursiveASTVisitor<ASTWalker> {
+ DeclCallback Callback;
+
+ void report(SourceLocation Loc, NamedDecl *ND) {
+ if (!ND || Loc.isInvalid())
+ return;
+ Callback(Loc, *cast<NamedDecl>(ND->getCanonicalDecl()));
+ }
+
+public:
+ ASTWalker(DeclCallback Callback) : Callback(Callback) {}
+
+ bool VisitTagTypeLoc(TagTypeLoc TTL) {
+ report(TTL.getNameLoc(), TTL.getDecl());
+ return true;
+ }
+
+ bool VisitDeclRefExpr(DeclRefExpr *DRE) {
+ report(DRE->getLocation(), DRE->getFoundDecl());
+ return true;
+ }
+};
+
+} // namespace
+
+void walkAST(Decl &Root, DeclCallback Callback) {
+ ASTWalker(Callback).TraverseDecl(&Root);
+}
+
+} // namespace include_cleaner
+} // namespace clang
diff --git a/clang-tools-extra/include-cleaner/test/CMakeLists.txt b/clang-tools-extra/include-cleaner/test/CMakeLists.txt
new file mode 100644
index 0000000000000..a31858f13aedf
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/test/CMakeLists.txt
@@ -0,0 +1,25 @@
+set(CLANG_INCLUDE_CLEANER_TEST_DEPS
+ ClangIncludeCleanerTests
+ )
+
+foreach (dep FileCheck not count)
+ if(TARGET ${dep})
+ list(APPEND CLANG_INCLUDE_CLEANER_TEST_DEPS ${dep})
+ endif()
+endforeach()
+
+configure_lit_site_cfg(
+ ${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.py.in
+ ${CMAKE_CURRENT_BINARY_DIR}/lit.site.cfg.py
+ MAIN_CONFIG
+ ${CMAKE_CURRENT_BINARY_DIR}/lit.cfg.py)
+
+configure_lit_site_cfg(
+ ${CMAKE_CURRENT_SOURCE_DIR}/Unit/lit.site.cfg.py.in
+ ${CMAKE_CURRENT_BINARY_DIR}/Unit/lit.site.cfg.py
+ MAIN_CONFIG
+ ${CMAKE_CURRENT_BINARY_DIR}/Unit/lit.cfg.py)
+
+add_lit_testsuite(check-clang-include-cleaner "Running the clang-include-cleaner regression tests"
+ ${CMAKE_CURRENT_BINARY_DIR}
+ DEPENDS ${CLANG_INCLUDE_CLEANER_TEST_DEPS})
diff --git a/clang-tools-extra/include-cleaner/test/Unit/lit.cfg.py b/clang-tools-extra/include-cleaner/test/Unit/lit.cfg.py
new file mode 100644
index 0000000000000..baaf334b6a809
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/test/Unit/lit.cfg.py
@@ -0,0 +1,18 @@
+import lit.formats
+config.name = "clangIncludeCleaner Unit Tests"
+config.test_format = lit.formats.GoogleTest('.', 'Tests')
+config.test_source_root = config.clang_include_cleaner_binary_dir + "/unittests"
+config.test_exec_root = config.clang_include_cleaner_binary_dir + "/unittests"
+
+# Point the dynamic loader at dynamic libraries in 'lib'.
+# FIXME: it seems every project has a copy of this logic. Move it somewhere.
+import platform
+if platform.system() == 'Darwin':
+ shlibpath_var = 'DYLD_LIBRARY_PATH'
+elif platform.system() == 'Windows':
+ shlibpath_var = 'PATH'
+else:
+ shlibpath_var = 'LD_LIBRARY_PATH'
+config.environment[shlibpath_var] = os.path.pathsep.join((
+ "@SHLIBDIR@", "@LLVM_LIBS_DIR@",
+ config.environment.get(shlibpath_var,'')))
diff --git a/clang-tools-extra/include-cleaner/test/Unit/lit.site.cfg.py.in b/clang-tools-extra/include-cleaner/test/Unit/lit.site.cfg.py.in
new file mode 100644
index 0000000000000..4926a1a3288bf
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/test/Unit/lit.site.cfg.py.in
@@ -0,0 +1,10 @@
+ at LIT_SITE_CFG_IN_HEADER@
+# This is a shim to run the gtest unittests in ../unittests using lit.
+
+config.llvm_libs_dir = path("@LLVM_LIBS_DIR@")
+config.shlibdir = path("@SHLIBDIR@")
+
+config.clang_include_cleaner_binary_dir = path("@CMAKE_CURRENT_BINARY_DIR@/..")
+
+# Delegate logic to lit.cfg.py.
+lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/Unit/lit.cfg.py")
diff --git a/clang-tools-extra/include-cleaner/test/lit.cfg.py b/clang-tools-extra/include-cleaner/test/lit.cfg.py
new file mode 100644
index 0000000000000..1c189bc512d0b
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/test/lit.cfg.py
@@ -0,0 +1,16 @@
+import lit.llvm
+
+lit.llvm.initialize(lit_config, config)
+lit.llvm.llvm_config.use_default_substitutions()
+
+config.name = 'ClangIncludeCleaner'
+config.suffixes = ['.test', '.c', '.cpp']
+config.excludes = ['Inputs']
+config.test_format = lit.formats.ShTest(not lit.llvm.llvm_config.use_lit_shell)
+config.test_source_root = config.clang_include_cleaner_source_dir + "/test"
+config.test_exec_root = config.clang_include_cleaner_binary_dir + "/test"
+
+config.environment['PATH'] = os.path.pathsep.join((
+ config.clang_tools_dir,
+ config.llvm_tools_dir,
+ config.environment['PATH']))
diff --git a/clang-tools-extra/include-cleaner/test/lit.site.cfg.py.in b/clang-tools-extra/include-cleaner/test/lit.site.cfg.py.in
new file mode 100644
index 0000000000000..6391e5e3286b3
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/test/lit.site.cfg.py.in
@@ -0,0 +1,14 @@
+ at LIT_SITE_CFG_IN_HEADER@
+
+# Variables needed for common llvm config.
+config.clang_tools_dir = path("@CURRENT_TOOLS_DIR@")
+config.lit_tools_dir = path("@LLVM_LIT_TOOLS_DIR@")
+config.llvm_tools_dir = path(lit_config.substitute("@LLVM_TOOLS_DIR@"))
+config.llvm_libs_dir = path(lit_config.substitute("@LLVM_LIBS_DIR@"))
+config.target_triple = "@TARGET_TRIPLE@"
+config.python_executable = "@Python3_EXECUTABLE@"
+
+config.clang_include_cleaner_source_dir = path("@CMAKE_CURRENT_SOURCE_DIR@/..")
+config.clang_include_cleaner_binary_dir = path("@CMAKE_CURRENT_BINARY_DIR@/..")
+# Delegate logic to lit.cfg.py.
+lit_config.load_config(config, "@CMAKE_CURRENT_SOURCE_DIR@/lit.cfg.py")
diff --git a/clang-tools-extra/include-cleaner/unittests/CMakeLists.txt b/clang-tools-extra/include-cleaner/unittests/CMakeLists.txt
new file mode 100644
index 0000000000000..fe33c3bcd1371
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/unittests/CMakeLists.txt
@@ -0,0 +1,25 @@
+set(LLVM_LINK_COMPONENTS
+ Support
+ TestingSupport
+ )
+
+add_custom_target(ClangIncludeCleanerUnitTests)
+add_unittest(ClangIncludeCleanerUnitTests ClangIncludeCleanerTests
+ WalkASTTest.cpp
+)
+
+target_include_directories(ClangIncludeCleanerTests
+ PRIVATE
+ ${CMAKE_CURRENT_SOURCE_DIR}/../lib)
+
+clang_target_link_libraries(ClangIncludeCleanerTests
+ PRIVATE
+ clangBasic
+ )
+
+target_link_libraries(ClangIncludeCleanerTests
+ PRIVATE
+ clangIncludeCleaner
+ clangTesting
+ )
+
diff --git a/clang-tools-extra/include-cleaner/unittests/WalkASTTest.cpp b/clang-tools-extra/include-cleaner/unittests/WalkASTTest.cpp
new file mode 100644
index 0000000000000..3e4fd42bf9cc5
--- /dev/null
+++ b/clang-tools-extra/include-cleaner/unittests/WalkASTTest.cpp
@@ -0,0 +1,110 @@
+#include "AnalysisInternal.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Frontend/TextDiagnostic.h"
+#include "clang/Testing/TestAST.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Testing/Support/Annotations.h"
+#include "gtest/gtest.h"
+
+namespace clang {
+namespace include_cleaner {
+namespace {
+
+// Specifies a test of which symbols are referenced by a piece of code.
+//
+// Example:
+// Target: int ^foo();
+// Referencing: int x = ^foo();
+// There must be exactly one referencing location marked.
+void testWalk(llvm::StringRef TargetCode, llvm::StringRef ReferencingCode) {
+ llvm::Annotations Target(TargetCode);
+ llvm::Annotations Referencing(ReferencingCode);
+
+ TestInputs Inputs(Referencing.code());
+ Inputs.ExtraFiles["target.h"] = Target.code().str();
+ Inputs.ExtraArgs.push_back("-include");
+ Inputs.ExtraArgs.push_back("target.h");
+ TestAST AST(Inputs);
+ const auto &SM = AST.sourceManager();
+
+ // We're only going to record references from the nominated point,
+ // to the target file.
+ FileID ReferencingFile = SM.getMainFileID();
+ SourceLocation ReferencingLoc =
+ SM.getComposedLoc(ReferencingFile, Referencing.point());
+ FileID TargetFile = SM.translateFile(
+ llvm::cantFail(AST.fileManager().getFileRef("target.h")));
+
+ // Perform the walk, and capture the offsets of the referenced targets.
+ std::vector<size_t> ReferencedOffsets;
+ for (Decl *D : AST.context().getTranslationUnitDecl()->decls()) {
+ if (ReferencingFile != SM.getDecomposedExpansionLoc(D->getLocation()).first)
+ continue;
+ walkAST(*D, [&](SourceLocation Loc, NamedDecl &ND) {
+ if (SM.getFileLoc(Loc) != ReferencingLoc)
+ return;
+ auto NDLoc = SM.getDecomposedLoc(SM.getFileLoc(ND.getLocation()));
+ if (NDLoc.first != TargetFile)
+ return;
+ ReferencedOffsets.push_back(NDLoc.second);
+ });
+ }
+ llvm::sort(ReferencedOffsets);
+
+ // Compare results to the expected points.
+ // For each
diff erence, show the target point in context, like a diagnostic.
+ std::string DiagBuf;
+ llvm::raw_string_ostream DiagOS(DiagBuf);
+ auto *DiagOpts = new DiagnosticOptions();
+ DiagOpts->ShowLevel = 0;
+ DiagOpts->ShowNoteIncludeStack = 0;
+ TextDiagnostic Diag(DiagOS, AST.context().getLangOpts(), DiagOpts);
+ auto DiagnosePoint = [&](const char *Message, unsigned Offset) {
+ Diag.emitDiagnostic(
+ FullSourceLoc(SM.getComposedLoc(TargetFile, Offset), SM),
+ DiagnosticsEngine::Note, Message, {}, {});
+ };
+ for (auto Expected : Target.points())
+ if (!llvm::is_contained(ReferencedOffsets, Expected))
+ DiagnosePoint("location not marked used", Expected);
+ for (auto Actual : ReferencedOffsets)
+ if (!llvm::is_contained(Target.points(), Actual))
+ DiagnosePoint("location unexpectedly used", Actual);
+
+ // If there were any
diff erences, we print the entire referencing code once.
+ if (!DiagBuf.empty())
+ ADD_FAILURE() << DiagBuf << "\nfrom code:\n" << ReferencingCode;
+}
+
+TEST(WalkAST, DeclRef) {
+ testWalk("int ^x;", "int y = ^x;");
+ testWalk("int ^foo();", "int y = ^foo();");
+ testWalk("namespace ns { int ^x; }", "int y = ns::^x;");
+ testWalk("struct S { static int ^x; };", "int y = S::^x;");
+ // Canonical declaration only.
+ testWalk("extern int ^x; int x;", "int y = ^x;");
+}
+
+TEST(WalkAST, TagType) {
+ testWalk("struct ^S {};", "^S *y;");
+ testWalk("enum ^E {};", "^E *y;");
+ testWalk("struct ^S { static int x; };", "int y = ^S::x;");
+}
+
+TEST(WalkAST, Alias) {
+ testWalk(R"cpp(
+ namespace ns { int x; }
+ using ns::^x;
+ )cpp",
+ "int y = ^x;");
+ testWalk(R"cpp(
+ namespace ns { struct S; } // Not used
+ using ns::S; // FIXME: S should be used
+ )cpp",
+ "^S *x;");
+}
+
+} // namespace
+} // namespace include_cleaner
+} // namespace clang
diff --git a/clang/include/clang/Testing/TestAST.h b/clang/include/clang/Testing/TestAST.h
index b1ccf459e4b3b..9ed5fe0a0c038 100644
--- a/clang/include/clang/Testing/TestAST.h
+++ b/clang/include/clang/Testing/TestAST.h
@@ -45,6 +45,10 @@ struct TestInputs {
/// Extra argv to pass to clang -cc1.
std::vector<std::string> ExtraArgs = {};
+ /// Extra virtual files that are available to be #included.
+ /// Keys are plain filenames ("foo.h"), values are file content.
+ llvm::StringMap<std::string> ExtraFiles = {};
+
/// By default, error diagnostics during parsing are reported as gtest errors.
/// To suppress this, set ErrorOK or include "error-ok" in a comment in Code.
/// In either case, all diagnostics appear in TestAST::diagnostics().
diff --git a/clang/lib/Testing/TestAST.cpp b/clang/lib/Testing/TestAST.cpp
index 689d407b25537..5817230f4b098 100644
--- a/clang/lib/Testing/TestAST.cpp
+++ b/clang/lib/Testing/TestAST.cpp
@@ -105,6 +105,10 @@ TestAST::TestAST(const TestInputs &In) {
auto VFS = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
VFS->addFile(Filename, /*ModificationTime=*/0,
llvm::MemoryBuffer::getMemBufferCopy(In.Code, Filename));
+ for (const auto &Extra : In.ExtraFiles)
+ VFS->addFile(
+ Extra.getKey(), /*ModificationTime=*/0,
+ llvm::MemoryBuffer::getMemBufferCopy(Extra.getValue(), Extra.getKey()));
Clang->createFileManager(VFS);
// Running the FrontendAction creates the other components: SourceManager,
More information about the cfe-commits
mailing list