[llvm-commits] Proposal/patch: Enable bitcode streaming

Fri Feb 3 23:33:58 PST 2012

Derek Schuff wrote:
> Thanks,
> I'll rebase the patch and run it through a final round of tests.
> 1) By "commit it" I assume you mean all the combined code?

Yep!

> 2) I'm not actually a committer yet. Can you fix that (or commit for me
> if I send you the updated patch)

I'll commit it for you. Once you've had a number of smaller patches go 
through you can apply for commit access per 
http://llvm.org/docs/DeveloperPolicy.html#commitaccess

Nick

>
> -Derek
>
> On Wed, Feb 1, 2012 at 4:32 PM, Nick Lewycky <nlewycky at google.com
> <mailto:nlewycky at google.com>> wrote:
>
>     On 26 January 2012 11:19, Derek Schuff <dschuff at google.com
>     <mailto:dschuff at google.com>> wrote:
>
>         Thanks for the review. comments inline, and updated patches
>         attached.
>
>
>     Thanks.
>
>     As for StreamableMemoryObject vs. StreamingMemoryObject; please just
>     add pretty-much what you wrote in your email as a comment explaining
>     the difference, and then commit it. It seems to me that once we have
>     some experience with these APIs, we'll know what the right factoring
>     of this code really ought to be.
>
>     Nick
>
>         On Wed, Jan 25, 2012 at 2:04 PM, Nick Lewycky
>         <nlewycky at google.com <mailto:nlewycky at google.com>> wrote:
>
>             On 20 January 2012 11:55, Derek Schuff <dschuff at google.com
>             <mailto:dschuff at google.com>> wrote:
>
>                 And finally, the StreamingMemoryObject implementation,
>                 modified BitcodeReader, and modifed llvm-dis.cpp using
>                 the streaming interface.
>                 Please take a look
>
>
>             Overall this looks good. I'm especially happy with some of
>             the refactoring inside BitcodeReader! Comments:
>
>             --- a/include/llvm/Bitcode/ReaderWriter.h
>             +++ b/include/llvm/Bitcode/ReaderWriter.h
>             @@ -21,31 +21,41 @@ namespace llvm {
>                 class MemoryBuffer;
>                 class ModulePass;
>                 class BitstreamWriter;
>             +  class DataStreamer;
>                 class LLVMContext;
>                 class raw_ostream;
>
>             I realize these were unsorted when you got here, but please
>             alphabetize them.
>
>         Done.
>
>
>             +  /// If 'verify' is true, check that the file fits in the
>             buffer.
>             +  static inline bool SkipBitcodeWrapperHeader(const
>             unsigned char *&BufPtr,
>             +                                              const
>             unsigned char *&BufEnd,
>             +                                              bool Verify) {
>
>             I didn't really understand the comment. I think what you're
>             doing is disabling the check that the buffer contained the
>             whole header. Could you make it "bool SkipPastEnd" instead?
>
>         There 2 checks: the first checks that the buffer is large enough
>         to contain the whole header. That always runs. The check that's
>         conditionalized on 'verify' checks that the buffer is large
>         enough to contain the whole bitcode. This doesn't work with the
>         streaming implementation since it does not allocate a buffer
>         that fits the whole file ahead of time. It also always skips
>         past the end of the header if found (since this is how it
>         returns the size of the buffer), so SkipPastEnd would be a bad
>         name for the variable. I clarified the comment and renamed the
>         variable VerifyBufferSize. (Also on this pass I found and
>         deleted some trailing whitespace).
>
>
>             +DataStreamer *getDataFileStreamer(const std::string &Filename,
>             +                                        std::string *Err);
>
>             Please line up the argument to the (.
>
>         Done.
>
>
>             --- a/include/llvm/Support/StreamableMemoryObject.h
>             +++ b/include/llvm/Support/StreamableMemoryObject.h
>             @@ -12,6 +12,9 @@
>               #define STREAMABLEMEMORYOBJECT_H_
>               #include "llvm/Support/MemoryObject.h"
>             +#include <vector>
>             +#include "llvm/ADT/OwningPtr.h"
>             +#include "llvm/Support/DataStream.h"
>
>             ADT, Support, then <system> headers. See
>             http://llvm.org/docs/CodingStandards.html#scf_includes .
>
>             +/// StreamingMemoryObject - interface to data which is
>             actually streamed from
>             +/// at DataStreamer. In addition to inherited members, it
>             has the
>             +/// dropLeadingBytes and setKnownObjectSize methods which
>             are not applicable
>             +/// to non-streamed objects
>             +class StreamingMemoryObject : public StreamableMemoryObject {
>
>             I think that's a full sentence missing a period. It feels
>             *awfully weird* to have StreamableMemoryObject and
>             StreamingMemoryObject, and both of them are interfaces. The
>             comment doesn't seem to sufficiently explain what's going on
>             here. (The DataStreamer can stream from a Streaming but not
>             with a Streamable? What?)
>
>
>         Yeah, this was kind of a tough naming problem. But there really
>         does need to be these 2 different kinds of interfaces. The one I
>         called StreamableMemoryObject (in which the data may or may not
>         actually be streamed) needs to have extra methods over and above
>         MemoryObject, which are directly due to the streamability:
>         isValidAddress and isObjectEnd are needed because if we don't
>         know the length of the stream ahead of time, then calling
>         getExtent requires waiting until the entire stream is fetched.
>         (getPointer is basically just there to support BLOBs, avoiding
>         extra copies). Then you have RawMemoryObject, a non-streamed
>         StreamableMemoryObject, and StreamingMemoryObject is an
>         interface because there could be different implementations of
>         StreamableMemoryObject (getting data from different sources).
>         I'm open to ideas to simplify the situation.
>
>
>             +  // fetch enough bytes such that Pos can be read or EOF is
>             reached
>             +  // (i.e. BytesRead > Pos). Return true if Pos can be read.
>             +  // Unlike most of the functions in BitcodeReader, returns
>             true on success.
>             +  bool fetchToPos(size_t Pos) {
>
>             Comment should start with a capital.
>
>             +  bool fetchToPos(size_t Pos) {
>             +    if (EOFReached) return Pos < ObjectSize;
>             +    while (Pos >= BytesRead) {
>             +      Bytes.resize(BytesRead + kChunkSize);
>             +      size_t bytes = Streamer->GetBytes(&Bytes[BytesRead +
>             BytesSkipped],
>             +                                        kChunkSize);
>
>             Why is kChunkSize so special? Why not ask for all the bytes
>             up until Pos? The comment on DataStreamer::GetBytes doesn't
>             give any reason not to ask for as many bytes as you want?
>
>         The common case (actually the only case, currently) will
>         actually be that the requested size is much smaller than the
>         chunk size, and the chunk size just ensures that we batch them
>         together rather than making a lot of potentially expensive
>         requests into the streamer. The 'while' loop is just there
>         instead of an 'if' to cover the corner case of a large request.
>         I updated the comment
>
>
>             +bool BitcodeReader::SuspendModuleParse() {
>             +  // save our current position
>             +  NextUnreadBit = Stream.GetCurrentBitNo();
>             +  return false;
>             +}
>
>             What's up with that returning bool?
>
>         Originally it was going to check for error and use the same
>         convention as the rest of the functions, but it ended up being
>         simpler than I expected. I removed it entirely now.
>
>
>             +    // ParseModule will parse the next body in the stream
>             and set its
>             +    // position in the DeferredFunctionInfo map
>
>             Sentence needs period.
>
>         Done.
>
>
>             +  unsigned char buf[16];
>             +  if (Bytes->readBytes(0, 16, buf, NULL) == -1)
>             +    return Error("Bitcode stream must be at least 16 bytes
>             in length");
>             +
>             +  if (!isBitcode(buf, buf + 16)) {
>             +    return Error("Invalid bitcode signature");
>             +  }
>
>             So, uh, braces or no braces around one-line return
>             statements? :-)
>
>         LLVM style seems to be no braces, but Google style dies hard :)
>         Fixed.
>
>
>             +Module *llvm::getStreamedBitcodeModule(const std::string &name,
>             +                                   DataStreamer *streamer,
>             +                                   LLVMContext &Context,
>             +                                   std::string *ErrMsg) {
>
>             These args don't line up.
>
>         Done
>
>
>             --- /dev/null
>             +++ b/lib/Support/DataStream.cpp
>             @@ -0,0 +1,96 @@
>             +//===--- llvm/Support/DataStream.cpp - Lazy streamed Data
>             -*- C++ -*-===//
>
>             Don't include emacs mode markers (the -*- C++ -*- bit) on
>             .cpp files, only on .h files.
>
>         Done.
>
>
>             +// Very simple stream backed by a file. Mostly useful for
>             stdin and debugging;
>             +// actual file access is probably still best done with mmap
>             +class DataFileStreamer : public DataStreamer {
>             + int Fd;
>
>             Sentence seeking full stop.
>
>             +DataStreamer *getDataFileStreamer(const std::string &Filename,
>             +                                        std::string
>             *StrError) {
>
>             Line up to the ( again.
>
>         Done.
>
>
>             +  if (e != success) {
>             +    *StrError = std::string() + "Could not open " +
>             Filename + ": "
>             +        + e.message() + "\n";
>             +    return NULL;
>             +  }
>
>             Optional: std::string("Could not open ") + Filename, and
>             also putting the + on the previous line instead of starting
>             a line with the operator.
>
>         Done.
>
>
>             --- a/lib/Support/StreamableMemoryObject.cpp
>             +++ b/lib/Support/StreamableMemoryObject.cpp
>
>             In this file you added some spurious blank lines. Please
>             don't do that.
>
>         I think i got them all. Also fixed some argument alignment.
>
>
>             +bool StreamingMemoryObject::isObjectEnd(uint64_t address) {
>             +  if (ObjectSize) return address == ObjectSize;
>             +  fetchToPos(address);
>             +  return address == BytesRead;
>             +}
>
>             Shouldn't that end with "return address == ObjectSize"? If
>             the file is larger than 'address' bytes, won't this read up
>             to address bytes, then stop, leaving BytesRead == address?
>             ObjectSize on the other hand isn't set until EOF is reached.
>
>
>         Yes, it should. although BytesRead is in practice unlikely to
>         actually == address due to the chunk-fetching behavior. No doubt
>         that's why this slipped through all the testing. Fixed (and
>         covered the case where address == 0; it's never the end because
>         0 is an invalid stream size)
>
>
>             +int StreamingMemoryObject::readBytes(uint64_t address,
>             +                      uint64_t size,
>             +                      uint8_t* buf,
>             +                      uint64_t* copied) {
>
>             Misaligned.
>
>         Fixed.
>
>
>             +  //StreamableMemoryObject
>             +
>
>             Please remove.
>
>         Done.
>
>
>