[llvm-commits] Proposal/patch: Enable bitcode streaming

Fri Feb 3 10:56:25 PST 2012

Derek,

On Feb 3, 2012, at 10:44 AM, Derek Schuff <dschuff at google.com> wrote:

> Thanks,
> I'll rebase the patch and run it through a final round of tests.
> 1) By "commit it" I assume you mean all the combined code?
> 2) I'm not actually a committer yet. Can you fix that (or commit for me if I send you the updated patch)

If you're interested in getting commit access, see http://llvm.org/docs/DeveloperPolicy.html
Specifically, the section on "Obtaining Commit Access."

  Chad

> -Derek
> 
> On Wed, Feb 1, 2012 at 4:32 PM, Nick Lewycky <nlewycky at google.com> wrote:
> On 26 January 2012 11:19, Derek Schuff <dschuff at google.com> wrote:
> Thanks for the review. comments inline, and updated patches attached.
> 
> Thanks.
> 
> As for StreamableMemoryObject vs. StreamingMemoryObject; please just add pretty-much what you wrote in your email as a comment explaining the difference, and then commit it. It seems to me that once we have some experience with these APIs, we'll know what the right factoring of this code really ought to be.
> 
> Nick
> 
> On Wed, Jan 25, 2012 at 2:04 PM, Nick Lewycky <nlewycky at google.com> wrote:
> On 20 January 2012 11:55, Derek Schuff <dschuff at google.com> wrote:
> And finally, the StreamingMemoryObject implementation, modified BitcodeReader, and modifed llvm-dis.cpp using the streaming interface.
> Please take a look
> 
> Overall this looks good. I'm especially happy with some of the refactoring inside BitcodeReader! Comments:
> 
> --- a/include/llvm/Bitcode/ReaderWriter.h
> +++ b/include/llvm/Bitcode/ReaderWriter.h
> @@ -21,31 +21,41 @@ namespace llvm {
>    class MemoryBuffer;
>    class ModulePass;
>    class BitstreamWriter;
> +  class DataStreamer;
>    class LLVMContext;
>    class raw_ostream;
> 
> I realize these were unsorted when you got here, but please alphabetize them.
> Done. 
> 
> +  /// If 'verify' is true, check that the file fits in the buffer.
> +  static inline bool SkipBitcodeWrapperHeader(const unsigned char *&BufPtr,
> +                                              const unsigned char *&BufEnd,
> +                                              bool Verify) {
> 
> I didn't really understand the comment. I think what you're doing is disabling the check that the buffer contained the whole header. Could you make it "bool SkipPastEnd" instead?
>  
> There 2 checks: the first checks that the buffer is large enough to contain the whole header. That always runs. The check that's conditionalized on 'verify' checks that the buffer is large enough to contain the whole bitcode. This doesn't work with the streaming implementation since it does not allocate a buffer that fits the whole file ahead of time. It also always skips past the end of the header if found (since this is how it returns the size of the buffer), so SkipPastEnd would be a bad name for the variable. I clarified the comment and renamed the variable VerifyBufferSize. (Also on this pass I found and deleted some trailing whitespace).
>  
> 
> +DataStreamer *getDataFileStreamer(const std::string &Filename,
> +                                        std::string *Err);
> 
> Please line up the argument to the (.
> Done. 
> 
> --- a/include/llvm/Support/StreamableMemoryObject.h
> +++ b/include/llvm/Support/StreamableMemoryObject.h
> @@ -12,6 +12,9 @@
>  #define STREAMABLEMEMORYOBJECT_H_
>  
>  #include "llvm/Support/MemoryObject.h"
> +#include <vector>
> +#include "llvm/ADT/OwningPtr.h"
> +#include "llvm/Support/DataStream.h"
> 
> ADT, Support, then <system> headers. See http://llvm.org/docs/CodingStandards.html#scf_includes .
> 
> +/// StreamingMemoryObject - interface to data which is actually streamed from
> +/// at DataStreamer. In addition to inherited members, it has the
> +/// dropLeadingBytes and setKnownObjectSize methods which are not applicable
> +/// to non-streamed objects
> +class StreamingMemoryObject : public StreamableMemoryObject {
> 
> I think that's a full sentence missing a period. It feels *awfully weird* to have StreamableMemoryObject and StreamingMemoryObject, and both of them are interfaces. The comment doesn't seem to sufficiently explain what's going on here. (The DataStreamer can stream from a Streaming but not with a Streamable? What?)
> 
> Yeah, this was kind of a tough naming problem. But there really does need to be these 2 different kinds of interfaces. The one I called StreamableMemoryObject (in which the data may or may not actually be streamed) needs to have extra methods over and above MemoryObject, which are directly due to the streamability: isValidAddress and isObjectEnd are needed because if we don't know the length of the stream ahead of time, then calling getExtent requires waiting until the entire stream is fetched. (getPointer is basically just there to support BLOBs, avoiding extra copies). Then you have RawMemoryObject, a non-streamed StreamableMemoryObject, and StreamingMemoryObject is an interface because there could be different implementations of StreamableMemoryObject (getting data from different sources).
> I'm open to ideas to simplify the situation.
>  
> 
> +  // fetch enough bytes such that Pos can be read or EOF is reached
> +  // (i.e. BytesRead > Pos). Return true if Pos can be read.
> +  // Unlike most of the functions in BitcodeReader, returns true on success.
> +  bool fetchToPos(size_t Pos) {
> 
> Comment should start with a capital.
> 
> +  bool fetchToPos(size_t Pos) {
> +    if (EOFReached) return Pos < ObjectSize;
> +    while (Pos >= BytesRead) {
> +      Bytes.resize(BytesRead + kChunkSize);
> +      size_t bytes = Streamer->GetBytes(&Bytes[BytesRead + BytesSkipped],
> +                                        kChunkSize);
> 
> Why is kChunkSize so special? Why not ask for all the bytes up until Pos? The comment on DataStreamer::GetBytes doesn't give any reason not to ask for as many bytes as you want?
>  
> The common case (actually the only case, currently) will actually be that the requested size is much smaller than the chunk size, and the chunk size just ensures that we batch them together rather than making a lot of potentially expensive requests into the streamer. The 'while' loop is just there instead of an 'if' to cover the corner case of a large request. I updated the comment
>  
> 
> +bool BitcodeReader::SuspendModuleParse() {
> +  // save our current position
> +  NextUnreadBit = Stream.GetCurrentBitNo();
> +  return false;
> +}
> 
> What's up with that returning bool?
> Originally it was going to check for error and use the same convention as the rest of the functions, but it ended up being simpler than I expected. I removed it entirely now.
>  
> 
> +    // ParseModule will parse the next body in the stream and set its
> +    // position in the DeferredFunctionInfo map
> 
> Sentence needs period.
> Done.
>  
> 
> +  unsigned char buf[16];
> +  if (Bytes->readBytes(0, 16, buf, NULL) == -1)
> +    return Error("Bitcode stream must be at least 16 bytes in length");
> +
> +  if (!isBitcode(buf, buf + 16)) {
> +    return Error("Invalid bitcode signature");
> +  }
> 
> So, uh, braces or no braces around one-line return statements? :-)
> LLVM style seems to be no braces, but Google style dies hard :)
> Fixed.
>  
> 
> +Module *llvm::getStreamedBitcodeModule(const std::string &name,
> +                                   DataStreamer *streamer,
> +                                   LLVMContext &Context,
> +                                   std::string *ErrMsg) {
> 
> These args don't line up.
> Done
> 
> --- /dev/null
> +++ b/lib/Support/DataStream.cpp
> @@ -0,0 +1,96 @@
> +//===--- llvm/Support/DataStream.cpp - Lazy streamed Data -*- C++ -*-===//
> 
> Don't include emacs mode markers (the -*- C++ -*- bit) on .cpp files, only on .h files.
> Done. 
> 
> +// Very simple stream backed by a file. Mostly useful for stdin and debugging;
> +// actual file access is probably still best done with mmap
> +class DataFileStreamer : public DataStreamer {
> + int Fd;
> 
> Sentence seeking full stop.
> 
> +DataStreamer *getDataFileStreamer(const std::string &Filename,
> +                                        std::string *StrError) {
> 
> Line up to the ( again.
> Done. 
> 
> +  if (e != success) {
> +    *StrError = std::string() + "Could not open " + Filename + ": "
> +        + e.message() + "\n";
> +    return NULL;
> +  }
> 
> Optional: std::string("Could not open ") + Filename, and also putting the + on the previous line instead of starting a line with the operator.
> Done. 
> 
> --- a/lib/Support/StreamableMemoryObject.cpp
> +++ b/lib/Support/StreamableMemoryObject.cpp
> 
> In this file you added some spurious blank lines. Please don't do that.
> I think i got them all. Also fixed some argument alignment. 
> 
> +bool StreamingMemoryObject::isObjectEnd(uint64_t address) {
> +  if (ObjectSize) return address == ObjectSize;
> +  fetchToPos(address);
> +  return address == BytesRead;
> +}
> 
> Shouldn't that end with "return address == ObjectSize"? If the file is larger than 'address' bytes, won't this read up to address bytes, then stop, leaving BytesRead == address? ObjectSize on the other hand isn't set until EOF is reached.
> 
> Yes, it should. although BytesRead is in practice unlikely to actually == address due to the chunk-fetching behavior. No doubt that's why this slipped through all the testing. Fixed (and covered the case where address == 0; it's never the end because 0 is an invalid stream size)
>  
> 
> +int StreamingMemoryObject::readBytes(uint64_t address,
> +                      uint64_t size,
> +                      uint8_t* buf,
> +                      uint64_t* copied) {
> 
> Misaligned.
> Fixed. 
> 
> +  //StreamableMemoryObject
> +
> 
> Please remove.
> Done. 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120203/ee163583/attachment.html>