r228604 - Diagnose timeouts in the LockFileManager and delete the dead lock file

Richard Smith richard at metafoo.co.uk
Wed Feb 25 18:20:19 PST 2015


On Tue, Feb 24, 2015 at 7:55 PM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Feb 24, 2015, at 6:57 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, Feb 9, 2015 at 12:35 PM, Ben Langmuir <blangmuir at apple.com> wrote:
>
>> Author: benlangmuir
>> Date: Mon Feb  9 14:35:13 2015
>> New Revision: 228604
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=228604&view=rev
>> Log:
>> Diagnose timeouts in the LockFileManager and delete the dead lock file
>>
>> If the lock file manager times out, we should give an error rather than
>> silently trying to load the existing module.  And delete the
>> (presumably) dead lock file, since it will otherwise prevent progress in
>> future invokations. This is unsound since we have no way to prove that
>> the lock file we are deleting is the same one we timed out on, but since
>> the lock is only to avoid excessive rebuilding anyway it should be okay.
>> Depends on llvm r228603.
>>
>
> This diagnostic fires a *lot* when, say, bootstrapping Clang with modules
> enabled at -j16. Is the timeout perhaps too short?
>
>
> I lowered the timeout to 1 minute in my llvm commit,
>

Well, technically, it went from 8m 44s to 65.5s due to the exponential
backoff... :-/

so if it helps, feel free to bump it back up.  I’m not particularly
> concerned about the actual number now that we will actually honour the
> limit, but the old value of 5 minutes seemed really long to me.
>

Really? Consider a Debug+Asserts clang, building a large module (such as
the Clang_AST module). Requiring that to succeed in 65.5 seconds doesn't
seem particularly reasonable (and, indeed, some such modules do not build
in that environment in 65.5s on my machine, at least when it's heavily
loaded).

But I'm not really sure that a timeout is the right approach here at all.
What we want to track is whether the creating process is still alive.
Perhaps we should have that process periodically bump the timestamp on the
lock file, and assume it's died if the timestamp is too far in the past?
This'd mean that in the (scary) case of sharing a module cache across a
build farm, you'd need your machines to not have too much clock skew, but
that's already necessary for build systems that use timestamps to determine
what to rebuild, so it doesn't seem like a ridiculous restriction.

Thoughts?

Ben
>
>
>
>> Modified:
>>     cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td
>>     cfe/trunk/lib/Frontend/CompilerInstance.cpp
>>
>> Modified: cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td (original)
>> +++ cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td Mon Feb  9
>> 14:35:13 2015
>> @@ -83,6 +83,8 @@ def err_module_not_found : Error<"module
>>  def err_module_not_built : Error<"could not build module '%0'">,
>> DefaultFatal;
>>  def err_module_lock_failure : Error<
>>    "could not acquire lock file for module '%0'">, DefaultFatal;
>> +def err_module_lock_timeout : Error<
>> +  "timed out waiting to acquire lock file for module '%0'">,
>> DefaultFatal;
>>  def err_module_cycle : Error<"cyclic dependency in module '%0': %1">,
>>    DefaultFatal;
>>  def note_pragma_entered_here : Note<"#pragma entered here">;
>>
>> Modified: cfe/trunk/lib/Frontend/CompilerInstance.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Frontend/CompilerInstance.cpp (original)
>> +++ cfe/trunk/lib/Frontend/CompilerInstance.cpp Mon Feb  9 14:35:13 2015
>> @@ -1022,9 +1022,19 @@ static bool compileAndLoadModule(Compile
>>      case llvm::LockFileManager::LFS_Shared:
>>        // Someone else is responsible for building the module. Wait for
>> them to
>>        // finish.
>> -      if (Locked.waitForUnlock() == llvm::LockFileManager::Res_OwnerDied)
>> +      switch (Locked.waitForUnlock()) {
>> +      case llvm::LockFileManager::Res_Success:
>> +        ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate;
>> +        break;
>> +      case llvm::LockFileManager::Res_OwnerDied:
>>          continue; // try again to get the lock.
>> -      ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate;
>> +      case llvm::LockFileManager::Res_Timeout:
>> +        Diags.Report(ModuleNameLoc, diag::err_module_lock_timeout)
>> +            << Module->Name;
>> +        // Clear the lock file so that future invokations can make
>> progress.
>> +        Locked.unsafeRemoveLockFile();
>> +        return false;
>> +      }
>>        break;
>>      }
>>
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150225/5e685892/attachment.html>


More information about the cfe-commits mailing list