[PATCH] D63453: [Support] Move llvm::MemoryBuffer to sys::fs::file_t

Tue Jul 9 17:19:26 PDT 2019

rnk marked an inline comment as done.
rnk added a comment.

I got back to work and found some time, so now I have easy access to Linux and Windows, and should be able to finish this.

================
Comment at: llvm/lib/Support/Windows/Path.inc:1211
+
+std::error_code readNativeFile(file_t FileHandle, MutableArrayRef<char> Buf,
+                               size_t *BytesRead) {
----------------
aganea wrote:
> rnk wrote:
> > aganea wrote:
> > > rnk wrote:
> > > > aganea wrote:
> > > > > rnk wrote:
> > > > > > aganea wrote:
> > > > > > > Just keep the API below that "works" all the time? (`readNativeFileSlice`)
> > > > > > Are you suggesting simplifying the code by forwarding the call with offset 0, or reducing the API to one entry point?
> > > > > > 
> > > > > > Stream devices like pipes don't support seeking or pread, so to change the public API, I would need to make the offset optional, which I think would be less clear. Perhaps I can simplify this Windows specific code to delegate to the slice version, but that was my concern.
> > > > > I meant forward to a common implementation for both `readNativeFile` and `readNativeFileSlice`
> > > > Sure, done.
> > > Sorry for not being clearer. I was thinking more along the lines of:
> > > ```
> > > static std::error_code readNativeFileImpl(file_t FileHandle,
> > >                                           MutableArrayRef<char> Buf,
> > >                                           size_t *BytesRead, size_t Offset) {
> > >   char *BufPtr = Buf.data();
> > >   size_t BytesLeft = Buf.size();
> > > 
> > >   if (BytesRead)
> > >     *BytesRead = 0;
> > > 
> > >   while (BytesLeft) {
> > >     uint64_t CurOff = Buf.size() - BytesLeft + Offset;
> > >     OVERLAPPED Overlapped{};
> > >     Overlapped.Offset = uint32_t(CurOff);
> > >     Overlapped.OffsetHigh = uint32_t(uint64_t(CurOff) >> 32);
> > > 
> > >     // ReadFile can only read 2GB at a time.
> > >     DWORD BytesToRead32 = std::min(1 << 31, BytesToRead);
> > >     DWORD BytesRead32 = 0;
> > >     bool Success =
> > >         ::ReadFile(FileHandle, BufPtr, BytesToRead32, &BytesRead32, Overlap);
> > >     if (!Success) {
> > >       DWORD Err = ::GetLastError();
> > >       // Pipe EOF is not an error.
> > >       if (Err != ERROR_BROKEN_PIPE)
> > >         return mapWindowsError(Err);
> > >     }
> > >     if (BytesRead)
> > >       *BytesRead += BytesRead32;
> > > 
> > >     // Once we reach EOF, zero the remaining bytes in the buffer.
> > >     if (BytesRead32 == 0) {
> > >       memset(BufPtr, 0, BytesLeft);
> > >       break;
> > >     }
> > >     BytesLeft -= BytesRead32;
> > >     BufPtr += BytesRead32;
> > >   }
> > >   return std::error_code();
> > > }
> > > 
> > > std::error_code readNativeFile(file_t FileHandle, MutableArrayRef<char> Buf,
> > >                                size_t *BytesRead) {
> > >   return readNativeFileImpl(FileHandle, Buf, BytesRead, /*Offset*/ 0);
> > > }
> > > 
> > > std::error_code readNativeFileSlice(file_t FileHandle,
> > >                                     MutableArrayRef<char> Buf, size_t Offset) {
> > >   return readNativeFileImpl(FileHandle, Buf, /*BytesRead*/ nullptr, Offset);
> > > }
> > > ```
> > > 
> > > The [[ https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-readfile | doc ]] says:
> > > ```
> > > For an hFile that supports byte offsets, if you use this parameter you must specify a byte offset at which to start reading from the file or device. This offset is specified by setting the Offset and OffsetHigh members of the OVERLAPPED structure. For an hFile that does not support byte offsets, Offset and OffsetHigh are ignored.
> > > ```
> > > So I think we can use OVERLAPPED with pipes as long as we don't write anything in there.
> > > It'd be interesting to also test that 2 GB read limit :-) My whole point was, if that limit still exists, it should not leak outside of this implementation. Meaning that users should not have to loop to read files larger than 2 GB.
> > The Unix implementation doesn't loop, so it can theoretically return with a short read. I think if I make this one loop, I should make the Unix one loop, and then promise the caller to block until the requested number of bytes are read or EOF is hit. However, because this API doesn't prescribe any particular kind of buffer, the caller typically has to loop anyway to allocate more memory for the next read. I did it that way to make it as compatible with `read` as possible for easy migration.
> > 
> > So, given that short reads theoretically exist on Unix, do you think the code sharing (beyond what we have already) for Windows is worth adding the inconsistency in behavior between Windows and Unix?
> The API behavior should not diverge between Windows and Unix, I think we both agree on this. However I find a bit awkward to serve short reads, when we could serve complete reads, as the user requested in the first place (in the worst case, the caller expects a full read anyway for `readNativeFile` isn't it?). But perhaps this is OT? Let's leave it the way you did it, and discuss this later if you wish?
I guess it's OK if the Windows API guarantees that there are no short reads. If it ever becomes a problem, we can teach the Unix one to loop until EOF.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63453/new/

https://reviews.llvm.org/D63453