[PATCH] D30010: Improve the robustness of mmap

Peter Collingbourne via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 17 10:25:33 PST 2017


pcc added a comment.

In https://reviews.llvm.org/D30010#680035, @zturner wrote:

> In https://reviews.llvm.org/D30010#680030, @pcc wrote:
>
> > In https://reviews.llvm.org/D30010#680027, @zturner wrote:
> >
> > > In https://reviews.llvm.org/D30010#680022, @zturner wrote:
> > >
> > > > In https://reviews.llvm.org/D30010#680007, @pcc wrote:
> > > >
> > > > > In https://reviews.llvm.org/D30010#679201, @zturner wrote:
> > > > >
> > > > > > In https://reviews.llvm.org/D30010#679198, @ruiu wrote:
> > > > > >
> > > > > > > I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?
> > > > > >
> > > > > >
> > > > > > If you just `read()` the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory.  If you `mmap` the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash.  Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.
> > > > > >
> > > > > > The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's.  But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and `read` crashes all of Xcode.
> > > > >
> > > > >
> > > > > This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call `is_local` by itself and pass the result as the `IsVolatileSize` argument (which should probably have a better name).
> > > >
> > > >
> > > > I actually considered doing something like that, and I'm not opposed to it if people think it's a good idea.  One name I had come up with is `bool Ephemeral`, and it defaults to `true`  (implying that by default mappings are short-lived, and only in the case that the mapping is long-lived would you want to "de-optimize" in this manner.  Then LLDB could pass `false`.  If we're worried that `Ephemeral` is a word people might not be familiar with, I could call it `ShortLived` or `LongLived` as well.
> > >
> > >
> > > In case it's not clear btw, you were proposing re-using an existing argument whereas my idea was to add a new argument.  The `IsVolatileSize` argument seems kind of orthogonal to what we're doing here, so it's hard to come up with an intuitive way to merge the two things into one variable.
> >
> >
> > I would rename the existing argument. I believe that the only thing that `IsVolatileSize` does is disable the use of mmap. Its original use case is pretty much the same I believe (libclang, which may be embedded in a text editor or some other interactive application, needed to arrange to prevent text editors from changing file contents beneath its feet -- NFS shares going away is just another instance of the same issue).
>
>
> The only concern I have is that it's easy for someone to answer the questions "Does the file have a volatile size?" and "Does the file reside locally?" without understanding the implications of specifying true or false.  It's harder for them to answer the question "Should I disable the use of mmap on this file?"  So I can call the argument `AllowMmap = true`, but I worry this might be a step  backwards from an API standpoint.


The semantic effect of this argument is that the application requires that the MemoryBuffer contains a snapshot of the file contents at this point in time (which implies that it is immune to changes to file size, file contents, file system failures, etc.) If we document the flag in those terms and give it a suitable name (`Snapshot` maybe?), I feel that that would be sufficient from an API perspective.


https://reviews.llvm.org/D30010





More information about the llvm-commits mailing list