<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/130795>130795</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[clang] Race conditions while building modules
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang:modules
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
hahnjo
</td>
</tr>
</table>
<pre>
In order to test some upcoming changes to Clang Modules, I'm trying to build LLVM with `LLVM_ENABLE_MODULES=ON`. More concretely, I'm first building LLVM + Clang + libc++ with:
```
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"
```
and then use the built `clang` with libc++:
```
-DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/path/to/clang -DCMAKE_CXX_COMPILER=/path/to/clang++ -DLLVM_ENABLE_LIBCXX=ON -DLLVM_ENABLE_MODULES=ON
```
This works sometimes, but I'm regularly hit with error messages about missing / outdated / corrupt modules. Most of the time, the errors go away when simply launching `ninja` again, usually a couple of times. This makes me suspect that there is some sort of race when building the same module from multiple `clang` processes.
With a bit of tinkering, I found that the following diff makes the problem very reproducible (95% of the time):
```diff
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp
index c11c857ea060..7a70341e9b55 100644
--- a/clang/lib/Frontend/CompilerInstance.cpp
+++ b/clang/lib/Frontend/CompilerInstance.cpp
@@ -1476,57 +1476,9 @@ static bool compileModuleAndReadASTBehindLock(
StringRef Dir = llvm::sys::path::parent_path(ModuleFileName);
llvm::sys::fs::create_directories(Dir);
- while (true) {
- llvm::LockFileManager Locked(ModuleFileName);
- switch (Locked) {
- case llvm::LockFileManager::LFS_Error:
- // ModuleCache takes care of correctness and locks are only necessary for
- // performance. Fallback to building the module in case of any lock
- // related errors.
- Diags.Report(ModuleNameLoc, diag::remark_module_lock_failure)
- << Module->Name << Locked.getErrorMessage();
- // Clear out any potential leftover.
- Locked.unsafeRemoveLockFile();
- [[fallthrough]];
- case llvm::LockFileManager::LFS_Owned:
- // We're responsible for building the module ourselves.
- return compileModuleAndReadASTImpl(ImportingInstance, ImportLoc,
- ModuleNameLoc, Module, ModuleFileName);
-
- case llvm::LockFileManager::LFS_Shared:
- break; // The interesting case.
- }
-
- // Someone else is responsible for building the module. Wait for them to
- // finish.
- switch (Locked.waitForUnlock()) {
- case llvm::LockFileManager::Res_Success:
- break; // The interesting case.
- case llvm::LockFileManager::Res_OwnerDied:
- continue; // try again to get the lock.
- case llvm::LockFileManager::Res_Timeout:
- // Since ModuleCache takes care of correctness, we try waiting for
- // another process to complete the build so clang does not do it done
- // twice. If case of timeout, build it ourselves.
- Diags.Report(ModuleNameLoc, diag::remark_module_lock_timeout)
- << Module->Name;
- // Clear the lock file so that future invocations can make progress.
- Locked.unsafeRemoveLockFile();
- continue;
- }
-
- // Read the module that was just written by someone else.
- bool OutOfDate = false;
- if (readASTAfterCompileModule(ImportingInstance, ImportLoc, ModuleNameLoc,
- Module, ModuleFileName, &OutOfDate))
- return true;
- if (!OutOfDate)
- return false;
-
- // The module may be out of date in the presence of file system races,
- // or if one of its imports depends on header search paths that are not
- // consistent with this ImportingInstance. Try again...
- }
+ // We're responsible for building the module ourselves.
+ return compileModuleAndReadASTImpl(ImportingInstance, ImportLoc,
+ ModuleNameLoc, Module, ModuleFileName);
}
/// Compile a module in a separate compiler instance and read the AST,
```
As noted in the comment, "locks are only necessary for performance" and `PCMCache` (now `ModuleCache`) should "[take] care of correctness", which doesn't seem to be the case. After a looong `git-bisect` session, I finally found that the problems are caused by commit 49092d13c217 from 2018! And indeed the following revert seems to resolve the situation (in my limited tests so far, both with and without the diff above that makes the issue more reproducible):
```diff
diff --git a/clang/lib/Basic/FileManager.cpp b/clang/lib/Basic/FileManager.cpp
index f0b6f7be6c84..f6d1bb4d58d3 100644
--- a/clang/lib/Basic/FileManager.cpp
+++ b/clang/lib/Basic/FileManager.cpp
@@ -554,7 +554,6 @@ FileManager::getBufferForFile(FileEntryRef FE, bool isVolatile,
if (Entry->File) {
auto Result = Entry->File->getBuffer(Filename, FileSize,
RequiresNullTerminator, isVolatile);
- Entry->closeFile();
return Result;
}
diff --git a/clang/lib/Serialization/ModuleManager.cpp b/clang/lib/Serialization/ModuleManager.cpp
index 4ecb776513d6..b64bc9fa35f6 100644
--- a/clang/lib/Serialization/ModuleManager.cpp
+++ b/clang/lib/Serialization/ModuleManager.cpp
@@ -183,15 +183,9 @@ ModuleManager::addModule(StringRef FileName, ModuleKind Type,
if (std::unique_ptr<llvm::MemoryBuffer> Buffer = lookupBuffer(FileName)) {
// The buffer was already provided for us.
NewModule->Buffer = &ModuleCache->addBuiltPCM(FileName, std::move(Buffer));
- // Since the cached buffer is reused, it is safe to close the file
- // descriptor that was opened while stat()ing the PCM in
- // lookupModuleFile() above, it won't be needed any longer.
- Entry->closeFile();
} else if (llvm::MemoryBuffer *Buffer =
getModuleCache().lookupPCM(FileName)) {
NewModule->Buffer = Buffer;
- // As above, the file descriptor is no longer needed.
- Entry->closeFile();
} else if (getModuleCache().shouldBuildPCM(FileName)) {
// Report that the module is out of date, since we tried (and failed) to
// import it earlier.
```
@ChuanqiXu9 @vgvassilev if you have any idea off the top of your head how to to fix this in `ModuleManager` (without keeping hundreds of files open), please let me know. If not I need to continue my investigations another day...
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysWVtz27YS_jXwy441FHV_8IMu1oxP7SRjO23OkwcklyJiEGABUIr6688sQEqUYsd2ejqpBQnE3vfbXZBbKzYK8YqNFmy0uuC1K7S5KnihvuuLRGf7qxsF2mRowGlwaB1YXSLUVapLoTaQFlxt0NLuUnK1gTud1RIti5dww-JJCc7s6UGnIamFzOD29s872AlXABtH9OXp-tN8cXv9dPd59fX2-oENVp8_sXHUgzttEFKtUoMO5f5IMhfGukCOSHuKLF40EtBKiiRl8YKWxIoN5iyas3HU_Ivml6vl3fyP66fF15vb1dPjf79cs8HqHiVyi3C56sr15f7zf66XjyRY6hmcbt9__fR4c-flZnFMjH_8YINFWPBEhHWtdkJlLI7P5OAqA1eggtoiLbxWjkzjebFxFGx1VOjDuoTN5dPy892Xm9vrey_ouuKuYPHaaRavW7WaJ799e-vZxrSnhri9WSy_ffPuO9s58eyp7I-FsLDT5tn6wHKiDLGT1K5xtsFNLbmReyiEC8ZAY7SBEq3lFHs80bWDUlgrvPvXoGuXcYeZ_5JqY-rKQRkik-LKOtC5NzcxJH609mQtbDTwHd_DjtxiRVnJPUheq7Tw5MeREuo7J8_wDReKTte25lLugUOq60qip06q9MArWPJntFAi2NpWmDpwBac_aBBE0BysNl4qw1MMvA8BTsJZXmKjAuRGl1DW0gni1Y2VyugUrUXbIztH87_IXBwSERQW6hmNIP8t4QZyXfvoC6JArqXUO-KXiTxvZKaNyuhEYglbNHswWBmd1alIiHU8nY1YPDq15uwsRIkci-ae6uXlRjjgx0BaS5GweL02WjmkDFkvdVkJieZGWcdVir20qiD56AkWzYXK8Aek_X46HU2QR-Oo15vwSTQY9nGWjEbQj6LxcEgZdHn5cZlIxZCR8eK35GPDiA0juOwPJ2MWL0cTgq7mywyaXeu4EykkWktIA50AsXOV3SPP5g-PCyyEym51-sziKYvmAPDgyM33mMNKGGCDFUi5Lckvg7nd27Dwad2sDCr3FPJ8GuivhcRPvPHngsi-QCJvPlOD3OFTJgymThtBSTxdCXM8TFYG2BUihI0zNVEGNln4nSNt0oN433HFN2iAvmP2uliXAAB2J1xaEOH28SPplGDwVfrNj-uHp2tK_xC6niYQehCABMZLnlKE-6xIufE5TtCCqVNoLRCSS50-W_CbSu5BISUjN3vItfmJbIUm16b0AQFrLmXC0-dDnWzzvkl5oYIeOgeu9p7RTwQNSg96Acd6fn8l-Mb27rHSxh1sSPa71SmhQCb4JpjAYMnN81Pg90QMnnIuZG28qQ-8PL_Bkg2WjV0u2eCaCLa_Bgf0Nui8Qe8CSlNknnjsIPVSIjeE2F6xSjtUTnAJEnOnt2h6xxMN6VpZnuM9lnqLrTdfok89zSLnUrrC6HpTsNGK_g0-Fhafdwqzl8LiL2TxxCAYtJVW1gNirs2L7tO1sSi3aDvaGHS1Ua8l9U1ZSRZPb0rynFCbFjs8dPsfgwc9vZ_cGn44rn7OmlaO95rhoeDmzA6JQf7MBovWII8FxalDg9b53pBbDPqyyarLs3n-QZeoFQJK68vgOwzZg7-4cH7XFViC02c0c6GELXqv4UJvx4Vba_NVyQYu49lHseIe7dNDnVJq_6Y53ml2YkTRZ1bizPKpVk6oGjvcnNmHfoQQZIOhpJOSH2b5KErUtWsZts4SKsX3QSGF3Q69RGRu0r3Fv4YYV5qan7ZdIZEpDSS6YxOcgdUQOtNMowWlHWQaBP1V2KXmdoIg9CY_QKRrVPCNJJGi_uc0Bf8VMB7ovwcYXwe91kWQU1W0OvRiee1q6gvVVqfcCa3IyMr3Y2SwjUFrfxsUO5Hzi8wkDOrCl5drxy18r62DnRHOUX-6951rm8JBJt-ofK7d53zFHfrWI-fSdq0gckpIE3Bunjs0yy4Gvgf1fkK8Dgy-hHpLYPH4IFVIen-kAWHfjpxLyOL-yZFz5O7qdWbAx6PtSr6HBH2B0znQWELFPPTVaJGySudNBOytw9JPALbV6UhTG5KLrK1zEM6C8AaxkGGFKrOgFRTIaVa3yE1aAPVzNjiPslRpd0YyJbi1VHHDYOVoWPnJ-D2AxxZder0m9ELgUOv7r-uhJ_J_q4ZE7eP1sNEH_P9eH5-mQRrgnTaMg8WKG_JjI6wB0Yjjm0DTJs_84bERqTPwsmg-92CGWRsHqS5LVC5EafyrHrLbM7I49uzYOPqyvPOQTBMgi6dK7-jXDlYT63gGttC1pKE4ZqMFwTcbrV4G8NhjeCHSwoOvYvHEgUVfcSmavdhUz8DnL3CQWuswHm-Eu0yExdSRPBatFVo1E6dQfk4-mzybATNonfLaYkbgQnYRDoazaBZn_UEa9ydh9I2jPiUnzBXZMEPMzgZYg1s0QWBfXAxaLbdBbCtc7WGVTCUUlHuQohTkD4fW0TQOOTe-dmhXhMQgQ9OCspiI-FmWJ3rbYONxWBbW1hTmPhGOo_LvTcULbkVKs-SxTr8yDb_y5GEKzqNknE8SHKfTYa-Xj7N-kgyz0TQbvDUFv075V9PvL041U-9oNGTx0s-8YTluR96fupINukWd52jW2jR1jT6ulTN7mnHX18FbWoKwf2rJnX9oGWbhAOb-YSrJgUDb8nk857XTcI-2ls4XrJNn6fMgQMNZNUWF1g_inyOvd_53j3_XwqD9VEv5iKYUijvtQ64rf7dyH0RKpbZ4Xty7VSnocfi9wek3Au0BjeBS_OMTg8XrgB2_Drk3zxyCb4hpMpmMR_1BNu71kvEwSWc5H4zy8VvB9x4evwrDd51vr2GmAxYv-yN_C-PXh0uYk2MhJnmWHdqV421Lt-MI238IlcHjvjqPR-uyQKhW4u8anypn2GB5bM_vsNRm30Td4BrCKlzlaP1cVycB2dayY1x3GpEkHKX-jUuqTntC3K3IMPNFpfZV2MfQJ9wdm9cOSxaPO-WENnmWLWoh3Zfl3YkMSzhoRq0oi6etoLPTiD6ZK0I9SQsC_sDVz4NUCXxWOH9NynP0owKlQEB8yoNTehna1IjK-Qmx6Vp1hQqz5vLJOu5C7rQ9yZflHQh1RicY-dgthCMB8xuRdjrUxQRBIZI1w9WM2nRuLd5MXDZZNROwj4uXIwBYPD-6gw5u0HULvKfZC0Kfu2R2BnevObkNtkV3uJrbo86tybtGFtTMNEo3Zgi6f1TxFxUKHQsFWvaWVofRhbrBY2_RNm6224H7MPWB5-dU4d8UTKnI51zIcH3orxa6lEO7TY5HbqQIPj5r7dgwWhY1V3-Lb7VHj-1my60VErek5l7XUPAt-kARGXLQeXNzrisSbq9r45t4KPTOv3LTkIsfoTEX6tjWtVgUOr62NXlGrCiqi1plBmkkCJNFSAFvsyVU4cWQRAclwrPSOz8703x94z0YxvEwJ1J7JNQWrRObZhhth_eM73u93kV2Nchmgxm_wKv-ZNif9KPpMLoorrIYcZykcTRKJylmPIkn02yM41E2jAbTwfhCXMVRPIoG_X5_PBiMZr1kMoiHOItn02Ha76c5G0ZYciF7lBM9bTYXvrm66g-iyWx0IXmC0vpXl3EcYJ8wp3n9GLPR6sJc0dHLpN5YNoyksM4eiTnhpH_zGc6OVnDPU__CMRNB0wAYh_mloX1RG3lVOFf56x8fGxvhijrppbqkykMZHD4uK6O_UyMcr73olsXrRvrtVfy_AAAA___0ZlAm">