[PATCH] D79066: [Support] Class to facilitate file locking

Thu May 14 09:44:00 PDT 2020

sepavloff marked 2 inline comments as done.
sepavloff added inline comments.

================
Comment at: llvm/include/llvm/Support/FileSystem.h:1168
+/// RAII class that facilitates file locking.
+class FileLocker {
+  int FD;             ///< Locked file handle.
----------------
dblaikie wrote:
> labath wrote:
> > sepavloff wrote:
> > > labath wrote:
> > > > sepavloff wrote:
> > > > > labath wrote:
> > > > > > sepavloff wrote:
> > > > > > > dblaikie wrote:
> > > > > > > > sepavloff wrote:
> > > > > > > > > dblaikie wrote:
> > > > > > > > > > Perhaps this type should be boolean testable (for the error), or using a factory function to create one that returns llvm::Error (wrapped around the error_code) on failure:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > {
> > > > > > > > > > Expected<FileLocker> F = TryLock(FD, Timeout);
> > > > > > > > > > if (!F) {
> > > > > > > > > > }
> > > > > > > > > > ...
> > > > > > > > > > }
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > Hmm - if unlocking can fail, and that failure needs to be handled, this doesn't seem suitable for an RAII type device...
> > > > > > > > > > Perhaps this type should be boolean testable (for the error)
> > > > > > > > > 
> > > > > > > > > The class has method `locked`, which can be used to check if lock was successfully acquired, and method `error` for those applications that need more information than just the fact of failure.
> > > > > > > > > 
> > > > > > > > > > Hmm - if unlocking can fail, and that failure needs to be handled, this doesn't seem suitable for an RAII type device...
> > > > > > > > > 
> > > > > > > > > If the failure to unlock needs to be handled, the user would use method `unlock` provided by this class. In most expected use cases however this failure need not be handled. User defines RAII object and do things that require exclusive access. Upon destruction lock is released and if something goes wrong at that moment, this fact is just ignored.
> > > > > > > > >> Perhaps this type should be boolean testable (for the error)
> > > > > > > > > The class has method locked, which can be used to check if lock was successfully acquired, and method error for those applications that need more information than just the fact of failure.
> > > > > > > > 
> > > > > > > > Yeah, I think it'd be better if this was only accessible via a free function that returns Expected<FileLocker> so that the Error must be checked for/can't be accidentally ignored.
> > > > > > > > 
> > > > > > > > >> Hmm - if unlocking can fail, and that failure needs to be handled, this doesn't seem suitable for an RAII type device...
> > > > > > > > > If the failure to unlock needs to be handled, the user would use method unlock provided by this class. In most expected use cases however this failure need not be handled. User defines RAII object and do things that require exclusive access. Upon destruction lock is released and if something goes wrong at that moment, this fact is just ignored.
> > > > > > > > 
> > > > > > > > Fair enough/that sounds OK then. But it should return llvm::Error to make sure errors are handled/not ignored
> > > > > > > Added function `tryLock` for using `Expected<FileLocker>`
> > > > > > Looking at the unit test I realized that `fs::tryLockFile` and `fs::tryLock` do quite different things, but that this difference is not obvious from the function names. The way I would deal with that is by making `tryLock` a static member function `fs::FileLocker::tryLock` (and then make the FileLocker constructor private), but I see that David has requested a free function. David, what are your thoughts on that?
> > > > > I renamed the function into `getLock`. It looks not so categorical as `lockFile` and is unlikely to be confused with `tryLockFile`. Using static member functions leads to long names, it decreases readability.
> > > > Having to guess what a short name does also hurts readability. So does inconsistency with other similar classes. `getLock` is less likely to be confused with `tryLockFile`, but its connection to the `FileLocker` class is still fairly vague for my taste, and it has lost the information that it only "tries" to get the lock, which I do find important.
> > > > 
> > > > As for consistency, I think this class is most similar to `TempFile` class in this very file. That class also uses a static factory function `TempFile::create` to return an `Expected<TempFile>`. Other uses of static factory functions in the support library that I could find by a quick search are `GlobPattern::create`, `MemoryBuffer::getFile***`, `SpecialCaseList::create`, `Error::success`. There are also uses of free functions to create an object, but these generally have a different purpose -- to enable automatic template argument deduction (e.g. `make_range`) and not to provide error handling in a -fno-exceptions world.
> > > The problem is not in long names only. The classes you mention (`TempFile`, `MemoryBuffer` and so on) represent some program entities. In contrast, `FileLocker` is a helper class, it does not designate any entity in a program, it exists only to execute some action at scope exit. Ideally user does not need to know about it. We can reach that by using `auto` specifier:
> > > ```
> > > if (auto L = fs::getLock(FD)) {
> > >     // ... do action that require file to be locked.
> > > } else {
> > >     // ... handle lock error.
> > > };
> > > ```
> > > That is why a free function is more convenient than a method.
> > I'll admit that there are some differences between this class and the others, but I don't believe they are sufficiently convincing to justify deviating from the existing practice. I think that the notion of a "program entity" is very vague. I could certainly think of a lock on a file as a "thing" that can be owned, passed around (it has move constructors), etc. And the existence of the `unlock` function means that the user may need to be aware of the class.
> > 
> > `scope_exit` is a class whose only function is to run an action at scope exit, and it is indeed normally used as `auto x = make_scope_exit(...)`, but I don't think that's because we don't want users to be aware of the classes existance. It's because it uses templates for efficiency, and spelling the template name out would be difficult or impossible (lambdas have no names). Given llvm's fairly cautious stance towards `auto`, if there was a way to achieve the same thing without templates, I'm pretty sure we would be spelling out that class everywhere.
> Sorry for the delay getting back to this review!
> 
> > Looking at the unit test I realized that fs::tryLockFile and fs::tryLock do quite different things, but that this difference is not obvious from the function names. The way I would deal with that is by making tryLock a static member function fs::FileLocker::tryLock (and then make the FileLocker constructor private), but I see that David has requested a free function. David, what are your thoughts on that?
> 
> Sorry, the main point was "not a public constructor" - some kind of named constructor so that the result is "Expected<FileLocker>" to enforce testing of failure to lock. I don't mind if it's a static member function of FileLocker, or perhaps a non-static member function of raw_ostream, instead than exposing the FD from the raw_ostream.
> 
> +1 to everything else @labath has said about this being perfectly reasonable to name & scope guards being a fairly special case of unnameable types and I wouldn't apply it in this case.
> I'll admit that there are some differences between this class and the others, but I don't believe they are sufficiently convincing to justify deviating from the existing practice.

The existing practice is to declare RAII object in a scope as a local variable. It was the initial variant, which was rejected because it didn't force a user to check if lock was successful, such check was supported but not enforced.

>  I think that the notion of a "program entity" is very vague. I could certainly think of a lock on a file as a "thing" that can be owned, passed around (it has move constructors), etc.

You are definitely right. A thing that is an auxiliary class can become a full-fledged entity in other design. A lock, which would allow ownership transfer, makes sense as a member of more elaborated class hierarchy. Such lock could support more flexible lock mechanism and provide checking facility. In this case it is an entity, not merely a RAII object.

The class `FileLocker` on the other hand is a simple class, that supports only the simplest kind of file locking. It has move constructor only because it is necessary to use it with `Expected`. The only function it has is lock release. Hardly it need a factory method, as this function is already implemented by `getLock`.

> And the existence of the unlock function means that the user may need to be aware of the class.

This is common practice, when RAII class has possibility to free resources early, for example `ContextRAII::pop` or `ScopeRAII::destroy`.

> scope_exit is a class whose only function is to run an action at scope exit, and it is indeed normally used as auto x = make_scope_exit(...), but I don't think that's because we don't want users to be aware of the classes existance.

Sure, there is no intention to hide the class from user. However if user can be unaware about existence of a class, the class is auxiliary.

IIUC the problem now is whether to create RAII object with a free function or a method of `FileLocker`? What is the advantage of this solution?

================
Comment at: llvm/include/llvm/Support/FileSystem.h:1218
+inline Expected<FileLocker>
+tryLock(int FD, std::chrono::milliseconds T = std::chrono::milliseconds(1000)) {
+  FileLocker Lock(FD, T);
----------------
dblaikie wrote:
> labath wrote:
> > sepavloff wrote:
> > > labath wrote:
> > > > sepavloff wrote:
> > > > > labath wrote:
> > > > > > sepavloff wrote:
> > > > > > > labath wrote:
> > > > > > > > I don't think there should be a default value for the timeout for two reasons:
> > > > > > > > - it makes it very unobvious that there is any timing out happening (`tryLock(FD)` looks a lot like `mutex.try_lock()` which only does an instantaneous lock check with no retries or time outs)
> > > > > > > > - the appropriate value of the timeout sounds like something that is going to be very dependent on the exact use case, and the person using it should make a deliberate choice about the value (and explain it). (Speaking of that, I am still not sure your use case really needs timeouts, though that's more of a discussion for the other patch.)
> > > > > > > You are right, removed default argument.
> > > > > > > 
> > > > > > > Timeout is a form of protection against deadlocks. For the particular task of writing to log file infinite wait time is also acceptable, but in other use cases (such as unit tests) timeouts may be convenient.
> > > > > > I don't think unit tests are a good motivating example. There are plenty of ways a test can not terminate without locking files (or anything else). The solution to that is to have a (fairly high) timeout at the level of the test harness. That we already have.
> > > > > > 
> > > > > > So if the unit tests are the main motivation for introducing these timeouts, I think we should remove them, as they add a fair amount of complexity (and flakyness) to the equation.
> > > > > > 
> > > > > > Even for the clang use case, whether one prefers a deadlock over the program silently doing the wrong thing (perhaps when the system is under heavy load, like it is when running the test suite), depends on what exactly is he trying to achieve. But that's something I don't feel qualified to judge...
> > > > > Unit tests are not a motivation for using timeouts, but they demonstrates the benefits. If users can specify timeouts, they have possibility to handle deadlocks. It is up to user to handle such case or not, but the possibility exists. Without timeouts user must behave as if deadlocks never happen. It limits usability of the solution and it is the motivation.
> > > > Well.. it's not that the user _can_ specify timeouts. In the current setup, the user _must_ specify a timeout -- there is no blocking equivalent of this function. Which is fine, if the only use case we have for it right now wants/needs/etc. a timeout (we are not writing a general purpose library here). However, I remain unconvinced that a timeout is called for in this situation (but you don't have to convince me here -- that's another review).
> > > Collection of build statistics usually is a side effect of build. If something goes wrong and lock cannot be released, it can cause build failure. It is often more acceptable to get incomplete statistics than to break a build. Timeouts can work as watchdog,  excluding log machinery from culprits.
> > > 
> > > However in some cases blocking operations may be more preferable. I think we can solve this problem by adding blocking function as well. It is done in this version.
> > (I think adding the blocking is good because it may prevent needless proliferation of calls to the timeouted versions. I just want to clarify that I did not consider that as a requirement for this patch.)
> +1 please don't include timeout support if there's no immediate planned use case.
The time out support is motivated by the following consideration.

At present compiler jobs executed in parallel build are independent. Each compiler process has exclusive access to any file it writes to. Allowing writes to the same log file creates dependency between compiler processes, which may affect build process. For example, the following hypothetical scenario. If some process having lock to the shared log file is suspended, other processes may be suspended waiting the lock. The memory is occupied with waiting processes, and the build can fail due to resource shortage. Using timeouts can alleviate this issue at expense of log completeness.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79066/new/

https://reviews.llvm.org/D79066