<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/63941>63941</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[clang][tooling] regression: clang tools no longer accept already-preprocessed input
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
stephenrkell
</td>
</tr>
</table>
<pre>
Until some time a couple of years ago, a `ClangTool` happily accepted already-preprocessed source (e.g. `.i` or `.ii` files) as input. I'm finding this is no longer the case but it looks like this is not intentional.
Here are some steps to reproduce.
Build a simple `ClangTool`, e.g. using the libTooling documentation's example (https://clang.llvm.org/docs/LibTooling.html). Currently it is necessary to modify that example since it was broken by D94420 and the fix in D152771 has not yet landed; the following code should work.
```
$ cat >test.cc <<EOF
// Declares clang::SyntaxOnlyAction.
#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
// Declares llvm::cl::extrahelp.
#include "llvm/Support/CommandLine.h"
using namespace clang::tooling;
using namespace llvm;
// Apply a custom category to all command-line options so that they are the
// only ones displayed.
static cl::OptionCategory MyToolCategory("my-tool options");
int main(int argc, const char **argv) {
// CommonOptionsParser::create will parse arguments and create a
// CompilationDatabase.
auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
if (!ExpectedParser) {
// Fail gracefully for unsupported options.
llvm::errs() << ExpectedParser.takeError();
return 1;
}
CommonOptionsParser &OptionsParser = ExpectedParser.get();
ClangTool Tool(OptionsParser.getCompilations(),
OptionsParser.getSourcePathList());
return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
}
EOF
```
Build it, e.g. as follows assuming a recent build is installed to `/usr/local/lib` (better ways may be possible!).
```
$ c++ -fno-rtti -fno-exceptions -o test test.cc \
`llvm-config --cxxflags` \
-Wl,-\( /usr/local/lib/libclang*17* -Wl,-\) \
`llvm-config --libs`
```
Run it on some already-preprocessed code; it claims there is not "exactly one compiler job" in the command line, even though there obviously is.
```
$ echo 'int main(void) { return 0; }' > true.i
$ ./test true.i -- clang -c
error: unable to handle compilation, expected exactly one compiler job in ' "/usr/local/src/bug-report-20230718/clang-tool" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-fsyntax-only" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "true.i" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/usr/local/src/bug-report-20230718" "-resource-dir" "/usr/local/src/lib/clang/17" "-fdebug-compilation-dir=/usr/local/src/bug-report-20230718" "-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fcolor-diagnostics" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-x" "cpp-output" "/usr/local/src/bug-report-20230718/true.i"; '
```
The problem is D105695 "Accept Clang invocations with multiple jobs", which tries to identify a single compilation job in the compiler command line. Unfortunately it does not recognise jobs that take already-preprocessed source as input, because it uses `clang::driver::types::isSrcFile` which explicitly excludes already-preprocessed files.
https://clang.llvm.org/doxygen/namespaceclang_1_1driver_1_1types.html#a9402144812ad57ff7ea95a716818da96
The problem is that not all files needing compilation, or that might be input to a `ClangTool`, are "source" files in the sense of also needing preprocessing. The obvious fix is to replace `IsSrcFile` with a test for files needing compilation, not just "source" files. I don't see a neat way to do this using the definitions in `types.h`... but my quick fix does the following.
```
diff --git a/clang/lib/Tooling/Tooling.cpp b/clang/lib/Tooling/Tooling.cpp
index 5242134097da..00c0c4297bd9 100644
--- a/clang/lib/Tooling/Tooling.cpp
+++ b/clang/lib/Tooling/Tooling.cpp
@@ -138,7 +138,10 @@ getCC1Arguments(DiagnosticsEngine *Diagnostics,
};
auto IsSrcFile = [](const driver::InputInfo &II) {
- return isSrcFile(II.getType());
+ return isSrcFile(II.getType())
+ || II.getType() == driver::types::TY_PP_CXX
+ || II.getType() == driver::types::TY_PP_C
+ || II.getType() == driver::types::TY_PP_ObjC;
};
llvm::SmallVector<const driver::Command *, 1> CC1Jobs;
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysWEtv47qS_jXMpiDDovxcZOHY8dwMzkU3bvd5zCqgqJLMDkVqSCqx__2gSMmx0-n0acwBAkcSH_XkV_VReK8ag3jL5ndsvrsRfThYd-sDdgc07gm1viltdbr93QSlwdsWIagWQYC0facRbA0nFM6DaCzjWxDAFtOtFqb5aq1miykcRNcpfQIhJXYBKxDaoahOWeewc1ai91iBt72TCIyvcNJMaJOJotXWpef4UiuNnvE1CA_KdH2YwAPjyxZqZSplGggH5UF5MBa0NQ06CAcEKTxC2QdQAbS1Tx60esKLyQGUCWiCskboCZvu2HSTfv-FDkE4TKaTXzwEC1Hzqpd4NfmuV7oCAV615Jo3jiDvRNt6n1RF0KqkQXqtrOxbNEGQEowvPeBRpG346hBC51mxYXzP-F7SphOtn9uJdQ3j-8pKz_j-t_Nuk0NoNePrCWx759AEfSLbyVYkfwt3IitaW6n6BOEgwlmaV0YiTX4RHkpnn9BAeYLdejbjUxCmiorX6gjKwC6f8-Uyh4NIXjxhAC1MhRUr7tJEq7V9IQOlrRD8wfa6ghfrnq48R-5Jf-mVz0CKAKy4D-jDREpgxZYV2_tP-3EGeQJ2KLVw6CH6hDxUbL6cTBDHT0afNpKcOQrihTJS9xV5lKf5fL93liJfXTymVX5yYJx_tHTwNeP7rW1baz51cd1n4Ty6X1h9jtnFimvjKNTJNqnTfzwGJw6ou3eNi_P5_kvfddaFQUFhqt-UwUsx8TdloxEt-k5IvPRkGHQs7t6fmvS6uwpkUn3TdXTiQfY-2JZiiY1NSSe0Bpn0ybQyCDb5DbxNmRgOeIpHLhzwalNr9AmsQQ-V8p0WJ6wG8z0dGwmjd1IktqPQf5_IxeMr4yvGeXvKyLpROPmEr9_YokyAVijD-IoehWsknWFpjQ8gD8IB4xvGN8I1zwRKbDksBxg0ficxhjg6FAHhRWkNHX2n3eP59_GMDePi-_06pSNE7EQQpfAjAAGIPli4P3YoA1ZJFrBi9zMdGF-NhiU7tt_569UvAKqG6MD8WtIb888K74XS0Dghse61PkFtHfTGp8TEanT_aMRroqNzPgpaDyf_jWmTIJ7w3jnr0qwLFQEcht4ZyC8-suVueHzHH8D44s2HYvdWYIPhO1FneIeI8Xx1jQENhouIDfYwvh2Wfzf5S6yBn0U4_Kb8KO1K4GAZSZu4njLT4Ms1cO2FDBS2YvsBJrLiPm1_YdZl_o_eesXbNwB9Ue9UOFc24QfA9yC871uCCwEOJZoAZZpNldsHoTVWBAexMO577xjfayuFpv-qpGrP-KrEENDBizh5aMUJSoTOeq9KjYznpP_Pqgjjd4zfQVYbm7kQVHrCI_UiEXYyC1Rj4Fxo5tvzbpSPmbSmVg1kmTweay0aH3UbZ2V_asa3Gb3zFbxrSvwdcH-TLxnfwOWq9QcitSr9qz3vxeA_vaFqbU3qUN7traj0UkFWgdBdtZ6g1eHY-zDO8ShkSOhK4NwpjQ6-2ZJxTnU-NlEJs4EwOwb8GWnA9s1h2M6Wz8r2nnoN_7O4oDxYYHx5gbHPVlUDjox5PiWtKRn5knoBCK7HiXrdZcL4PgUvDkCWpfIFmUyTMCJEsYHeiFIjJdxBmEqPVg7d1hZwOOzwI0-QG6IaVCneBNk7yfi-7JvMIQFbxqe8mC7z1disxVpDvmScZ1Lm42NwqqNMjm_H1eJxMct682Tsi6Ha2B-zxvTj5NrHQ5xRGRy_VcqTXVntEM_7axQuEz5kJdbWYVYK-RRbnDRO7s6ok86okg9fB8-OUxySbeScrLUVjrp3So5TOiUzjc_nIX45oHzWqbNCbe1Ei1lnqcl2rNgJfXZG3YpwyNA5Y8-f6o4OQHBCBlbsKD7DAB1h28dOP6N1ZwGxJLuegC8TWglPLGFY1JsXZaoskJ88K3ZnRYNwDYZMdv1rALLF7DzcG7wYbNCgezW_wrJvGnQ0LbZIu6YqzzKlfUYnGswusiyrFNn-K8mTdnOYuFHcIH37wSYJbMYOk5Bm0Cdq-48oU8cTlWnVEuzHb_n6PNiYXmbP6HwsMbvZhE_yC6do67JKicZYH5R8DZGoKudVM77vHh__a7t9_Nfmj_vH3Z-b_-z543b_8Lj58m9W7M77HYcH2XWZ7UPXh4-d8_7xPOd9RBq-_ABpvx4QOmdLjS0h5y6fzhfrOUncRGabugFQ5nk4Oh5eVDhA2-tAx5xQJDWaW3g5KHmA4BRGOqkqIp_1KXJH01zD04g-AwgnRLpE4wn8bmrrQm9EwMT0KosJ2x1K2xjlk_ShwRZPPygUAwkf2TVpWqIUvY-MsPfoqV5f9BWVU89jPxlOHfr0qPwXJ_dKI5XKZCseO62kImjFY2Qq_n0dIse_Kh8_Zb_HU4OG8f2ZmcQ5j_ljnvSjp6jdwIsLsZ5NeT6brXIuqvmyrpco1nOxzBerfFWJ9eKDuEcXkmuJyERlwSBWieJe1RTr0uRWNYdArUt0auRA790NEOVhnKcYUC6nzYfIezQ-3rYI7e1Z4qvniEECqTpU4UTRx8sKTXSNLaYPV4Gh7BSp-6HG_ENjyOJvvQ_vqDiBB6jipUUAjwgCDIpATRtJr2y6aHm99aiwVkalE0JVdTEdgsMW08lkEm9q2hP8b6_kUzQjZvPVbcJH7UWl6hqyrFEBxAUcJnD8nnbLroPy780baWGFR5jzGc-L2XS9rMRkMp3KqZzx9bKs1pBPp4vZbGgPs-zvajG2NXdDy_pLSrHZlM2mkOXFivEttSl36TGfwjBGbGSbb0aeyfhq94rF96YhLs74ZncJ0NszqaIu7MxDzl8j6TwnVaRN6SaR8VXiyZcI8UD5_2Bq6vwWDw9XpDG7IG6v8MFXDw9EUb6eOnyHpvC7X1h1XsKWW7bcwts5pD0Z8ANM-_o_j58_P27_-utKduS6_5_9_tHdPpXftpdM-P2gvbLsL63Q-g-kvon44ncBG-6N0j3Hlgj1PWy3-X9TIXtz73N9Bm-q26JaF2txg7f5YrVe5sV0Vtwcbvl8ynG-mokSy5UopFjlWOarsp4uq7xeLm_U7Vid-XQ2mxaTuaywwFrQokLOczabYiuUPpeAG-V9j7eLYj3Lb7QoUfvxOtvdRjpV9o1ns6lWPvjXZUEFHS--0yGb79j8brzxmu_AYeMIV6mT2Qy0goYvL5fTlfb7dSyC_U3v9O11_WpUOPTlRNqWznW6qYtads5-QxkY30eDPOP7aNP_BQAA__9Ugl8V">