[PATCH] D66471: [Support] Improve readNativeFile(Slice) interface

Wed Aug 21 11:40:04 PDT 2019

aganea accepted this revision.
aganea added a comment.
This revision is now accepted and ready to land.

LGTM, just a missing include (see below).

In D66471#1638966 <https://reviews.llvm.org/D66471#1638966>, @labath wrote:

> - add a partial read test for the streaming case (not tested on windows yet)

Works on Windows.

================
Comment at: lib/Support/MemoryBuffer.cpp:467
+    Expected<size_t> ReadBytes =
+        sys::fs::readNativeFileSlice(FD, Buf->getBuffer(), Offset);
+    if (!ReadBytes)
----------------
labath wrote:
> aganea wrote:
> > rnk wrote:
> > > labath wrote:
> > > > aganea wrote:
> > > > > labath wrote:
> > > > > > aganea wrote:
> > > > > > > labath wrote:
> > > > > > > > aganea wrote:
> > > > > > > > > Shouldn't we use `ToRead` here instead of `Buf->getBuffer()`? Could you please test two sequential reads in the unittests?
> > > > > > > > Woops, indeed it should. Thanks for catching that!
> > > > > > > > 
> > > > > > > > As for the test, I am not sure how I could force the read syscall to return partial data here. For the non-slice version, I suppose I could make this read from a pipe, and then make sure I write to the pipe very slowly. However, the slice version requires a seekable fd, and I don't know what to do there... Do you have any suggestion?
> > > > > > > You can probably call `MemoryBuffer::getOpenFile` with a smaller `FileSize` & `MapSize` and `IsVolatile = true`? And then a subsequent call by setting `Offset` > 0? Would that work?
> > > > > > I don't fully understand what you mean, but I don't see how could that work. Are you expecting that the second time around the OS will return the partial read first, because that data will already be cached and ready? If so, then I would be very surprised if that works, particularly for a file that has just been created because it will likely be present in the disk cache in its entirety.
> > > > > You're right. You would need to test it from the shell with something like: `cat foo | pv --quiet --line-mode --rate-limit 1 | clang -emit-llvm -o -`. I don't know if `pv` is available everywhere, or if there's a better solution to slow down `cat`? What do you think?
> > > > Ok, I think I am starting to understand what you mean. There's no problem with slowing down the input for a pipe. I can do that from a unit test relatively easily. The problem is that that the the pipe is not seekable (at least on unix OSs, I don't know what would windows do here). The reason that `clang -` works is because it invokes a different `MemoryBuffer` function, which is implemented in terms of `readNativeFile` and so it works fine with a non-seekable fd. However, calling `readNativeFileSlice` with a non-seekable fd would result in an error. And unfortunately, I don't know if there's a way to create a seekable fd, which will not satisfy all read requests immediately (short of some FUSE mounts or similarly heroic efforts).
> > > Right, testing a short read from a seekable file is... very hard. IIRC Linux goes out of its way to avoid surfacing short reads if it can. One way to do it would be to make a 6GB file on Windows, since it has the 2GB ReadFile limitation. Of course, I was happy to commit my code without such a test, so I can only apply the same standard to you. :)
> > What about this? This does exercise this loop, including the memset.
> > And it looks like the 2GB is no more (at least on my Windows 10 1803), calling `::ReadFile` with `(2^32)-1` works as expected. We should probably update the comment in `lib/Support/Windows/Path.inc, L1223` :)
> > ```
> > TEST_F(FileSystemTest, bigFile) {
> >   int FD;
> >   SmallString<64> TempPath;
> >   ASSERT_NO_ERROR(fs::createTemporaryFile("prefix", "temp", FD, TempPath));
> > 
> >   // Would cause two file reads on Windows, the limit for each read being
> >   // (2^32)-1
> >   size_t Size = (1ULL << 32);
> >   auto E = fs::resize_file(FD, Size);
> >   if (E) {
> >     ASSERT_NO_ERROR(fs::remove(Twine(TempPath)));
> >     return; // skip test is there's enough temp/ space
> >   }
> > 
> >   auto Buf = MemoryBuffer::getFile(TempPath, Size+1, true, true);
> >   EXPECT_TRUE((bool)Buf);
> >   EXPECT_EQ(Buf.get()->getBufferSize(), Size+1);
> > 
> >   ASSERT_NO_ERROR(fs::remove(Twine(TempPath)));
> > }
> > ```
> That would work, if we are OK with creating a 4GB file for the purpose of testing this *and* requiring 4GB of ram to be able to read it.
> 
> In fact, linux seems to have the 2GB limitation too (2GB-4KB, to be precise), so this would work there too. Also, most filesystems on linux support sparse files, so we would probably be able to avoid actually wasting that much disk space, if we're really careful. However, the need for a 4GB ram buffer worries me, particularly as detecting OOM conditions reliably is quite hard.
> 
> The standard approach to testing code which relies on external stuff which cannot be (easily) controlled in a test is to mock the external component. However, this requires changes to code being tested, such as parameterizing the MemoryBuffer class with a filesystem implementation, and I am not aware of this approach being used anywhere in llvm...
I am equally happy to commit that above test later on. I could possibly scope it to Windows and ensure, for example, available memory > 16 GB and  available free heap > 4 GB.

================
Comment at: unittests/Support/MemoryBufferTest.cpp:23
+#endif

 using namespace llvm;
----------------
Missing:
```
#ifdef _WIN32
#include <windows.h> 
#endif
```

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66471/new/

https://reviews.llvm.org/D66471