[llvm-commits] Proposal/patch: Enable bitcode streaming

Fri Feb 3 10:44:44 PST 2012

Thanks,
I'll rebase the patch and run it through a final round of tests.
1) By "commit it" I assume you mean all the combined code?
2) I'm not actually a committer yet. Can you fix that (or commit for me if
I send you the updated patch)

-Derek

On Wed, Feb 1, 2012 at 4:32 PM, Nick Lewycky <nlewycky at google.com> wrote:

> On 26 January 2012 11:19, Derek Schuff <dschuff at google.com> wrote:
>
>> Thanks for the review. comments inline, and updated patches attached.
>
>
> Thanks.
>
> As for StreamableMemoryObject vs. StreamingMemoryObject; please just add
> pretty-much what you wrote in your email as a comment explaining the
> difference, and then commit it. It seems to me that once we have some
> experience with these APIs, we'll know what the right factoring of this
> code really ought to be.
>
> Nick
>
> On Wed, Jan 25, 2012 at 2:04 PM, Nick Lewycky <nlewycky at google.com> wrote:
>>
>>> On 20 January 2012 11:55, Derek Schuff <dschuff at google.com> wrote:
>>>
>>>> And finally, the StreamingMemoryObject implementation, modified
>>>> BitcodeReader, and modifed llvm-dis.cpp using the streaming interface.
>>>> Please take a look
>>>>
>>>
>>> Overall this looks good. I'm especially happy with some of the
>>> refactoring inside BitcodeReader! Comments:
>>>
>>> --- a/include/llvm/Bitcode/ReaderWriter.h
>>> +++ b/include/llvm/Bitcode/ReaderWriter.h
>>> @@ -21,31 +21,41 @@ namespace llvm {
>>>    class MemoryBuffer;
>>>    class ModulePass;
>>>    class BitstreamWriter;
>>> +  class DataStreamer;
>>>    class LLVMContext;
>>>    class raw_ostream;
>>>
>>> I realize these were unsorted when you got here, but please alphabetize
>>> them.
>>>
>> Done.
>>
>>>
>>> +  /// If 'verify' is true, check that the file fits in the buffer.
>>> +  static inline bool SkipBitcodeWrapperHeader(const unsigned char
>>> *&BufPtr,
>>> +                                              const unsigned char
>>> *&BufEnd,
>>> +                                              bool Verify) {
>>>
>>> I didn't really understand the comment. I think what you're doing is
>>> disabling the check that the buffer contained the whole header. Could you
>>> make it "bool SkipPastEnd" instead?
>>>
>>
>> There 2 checks: the first checks that the buffer is large enough to
>> contain the whole header. That always runs. The check that's
>> conditionalized on 'verify' checks that the buffer is large enough to
>> contain the whole bitcode. This doesn't work with the streaming
>> implementation since it does not allocate a buffer that fits the whole file
>> ahead of time. It also always skips past the end of the header if found
>> (since this is how it returns the size of the buffer), so SkipPastEnd would
>> be a bad name for the variable. I clarified the comment and renamed the
>> variable VerifyBufferSize. (Also on this pass I found and deleted some
>> trailing whitespace).
>>
>>
>>>
>>> +DataStreamer *getDataFileStreamer(const std::string &Filename,
>>> +                                        std::string *Err);
>>>
>>> Please line up the argument to the (.
>>>
>> Done.
>>
>>>
>>> --- a/include/llvm/Support/StreamableMemoryObject.h
>>> +++ b/include/llvm/Support/StreamableMemoryObject.h
>>> @@ -12,6 +12,9 @@
>>>  #define STREAMABLEMEMORYOBJECT_H_
>>>
>>>  #include "llvm/Support/MemoryObject.h"
>>> +#include <vector>
>>> +#include "llvm/ADT/OwningPtr.h"
>>> +#include "llvm/Support/DataStream.h"
>>>
>>> ADT, Support, then <system> headers. See
>>> http://llvm.org/docs/CodingStandards.html#scf_includes .
>>>
>>> +/// StreamingMemoryObject - interface to data which is actually
>>> streamed from
>>> +/// at DataStreamer. In addition to inherited members, it has the
>>> +/// dropLeadingBytes and setKnownObjectSize methods which are not
>>> applicable
>>> +/// to non-streamed objects
>>> +class StreamingMemoryObject : public StreamableMemoryObject {
>>>
>>> I think that's a full sentence missing a period. It feels *awfully
>>> weird* to have StreamableMemoryObject and StreamingMemoryObject, and both
>>> of them are interfaces. The comment doesn't seem to sufficiently explain
>>> what's going on here. (The DataStreamer can stream from a Streaming but not
>>> with a Streamable? What?)
>>>
>>
>> Yeah, this was kind of a tough naming problem. But there really does need
>> to be these 2 different kinds of interfaces. The one I called
>> StreamableMemoryObject (in which the data may or may not actually be
>> streamed) needs to have extra methods over and above MemoryObject, which
>> are directly due to the streamability: isValidAddress and isObjectEnd are
>> needed because if we don't know the length of the stream ahead of time,
>> then calling getExtent requires waiting until the entire stream is fetched.
>> (getPointer is basically just there to support BLOBs, avoiding extra
>> copies). Then you have RawMemoryObject, a non-streamed
>> StreamableMemoryObject, and StreamingMemoryObject is an interface because
>> there could be different implementations of StreamableMemoryObject (getting
>> data from different sources).
>> I'm open to ideas to simplify the situation.
>>
>>
>>>
>>> +  // fetch enough bytes such that Pos can be read or EOF is reached
>>> +  // (i.e. BytesRead > Pos). Return true if Pos can be read.
>>> +  // Unlike most of the functions in BitcodeReader, returns true on
>>> success.
>>> +  bool fetchToPos(size_t Pos) {
>>>
>>> Comment should start with a capital.
>>>
>>> +  bool fetchToPos(size_t Pos) {
>>> +    if (EOFReached) return Pos < ObjectSize;
>>> +    while (Pos >= BytesRead) {
>>> +      Bytes.resize(BytesRead + kChunkSize);
>>> +      size_t bytes = Streamer->GetBytes(&Bytes[BytesRead +
>>> BytesSkipped],
>>> +                                        kChunkSize);
>>>
>>> Why is kChunkSize so special? Why not ask for all the bytes up until
>>> Pos? The comment on DataStreamer::GetBytes doesn't give any reason not to
>>> ask for as many bytes as you want?
>>>
>>
>> The common case (actually the only case, currently) will actually be that
>> the requested size is much smaller than the chunk size, and the chunk size
>> just ensures that we batch them together rather than making a lot of
>> potentially expensive requests into the streamer. The 'while' loop is just
>> there instead of an 'if' to cover the corner case of a large request. I
>> updated the comment
>>
>>
>>>
>>> +bool BitcodeReader::SuspendModuleParse() {
>>> +  // save our current position
>>> +  NextUnreadBit = Stream.GetCurrentBitNo();
>>> +  return false;
>>> +}
>>>
>>> What's up with that returning bool?
>>>
>> Originally it was going to check for error and use the same convention as
>> the rest of the functions, but it ended up being simpler than I expected. I
>> removed it entirely now.
>>
>>
>>>
>>> +    // ParseModule will parse the next body in the stream and set its
>>> +    // position in the DeferredFunctionInfo map
>>>
>>> Sentence needs period.
>>>
>> Done.
>>
>>
>>>
>>> +  unsigned char buf[16];
>>> +  if (Bytes->readBytes(0, 16, buf, NULL) == -1)
>>> +    return Error("Bitcode stream must be at least 16 bytes in length");
>>> +
>>> +  if (!isBitcode(buf, buf + 16)) {
>>> +    return Error("Invalid bitcode signature");
>>> +  }
>>>
>>> So, uh, braces or no braces around one-line return statements? :-)
>>>
>> LLVM style seems to be no braces, but Google style dies hard :)
>> Fixed.
>>
>>
>>>
>>> +Module *llvm::getStreamedBitcodeModule(const std::string &name,
>>> +                                   DataStreamer *streamer,
>>> +                                   LLVMContext &Context,
>>> +                                   std::string *ErrMsg) {
>>>
>>> These args don't line up.
>>>
>> Done
>>
>>>
>>> --- /dev/null
>>> +++ b/lib/Support/DataStream.cpp
>>> @@ -0,0 +1,96 @@
>>> +//===--- llvm/Support/DataStream.cpp - Lazy streamed Data -*- C++
>>> -*-===//
>>>
>>> Don't include emacs mode markers (the -*- C++ -*- bit) on .cpp files,
>>> only on .h files.
>>>
>> Done.
>>
>>>
>>> +// Very simple stream backed by a file. Mostly useful for stdin and
>>> debugging;
>>> +// actual file access is probably still best done with mmap
>>> +class DataFileStreamer : public DataStreamer {
>>> + int Fd;
>>>
>>> Sentence seeking full stop.
>>>
>>> +DataStreamer *getDataFileStreamer(const std::string &Filename,
>>> +                                        std::string *StrError) {
>>>
>>> Line up to the ( again.
>>>
>> Done.
>>
>>>
>>> +  if (e != success) {
>>> +    *StrError = std::string() + "Could not open " + Filename + ": "
>>> +        + e.message() + "\n";
>>> +    return NULL;
>>> +  }
>>>
>>> Optional: std::string("Could not open ") + Filename, and also putting
>>> the + on the previous line instead of starting a line with the operator.
>>>
>> Done.
>>
>>>
>>> --- a/lib/Support/StreamableMemoryObject.cpp
>>> +++ b/lib/Support/StreamableMemoryObject.cpp
>>>
>>> In this file you added some spurious blank lines. Please don't do that.
>>>
>> I think i got them all. Also fixed some argument alignment.
>>
>>>
>>> +bool StreamingMemoryObject::isObjectEnd(uint64_t address) {
>>> +  if (ObjectSize) return address == ObjectSize;
>>> +  fetchToPos(address);
>>> +  return address == BytesRead;
>>> +}
>>>
>>> Shouldn't that end with "return address == ObjectSize"? If the file is
>>> larger than 'address' bytes, won't this read up to address bytes, then
>>> stop, leaving BytesRead == address? ObjectSize on the other hand isn't set
>>> until EOF is reached.
>>>
>>
>> Yes, it should. although BytesRead is in practice unlikely to actually ==
>> address due to the chunk-fetching behavior. No doubt that's why this
>> slipped through all the testing. Fixed (and covered the case where address
>> == 0; it's never the end because 0 is an invalid stream size)
>>
>>
>>>
>>> +int StreamingMemoryObject::readBytes(uint64_t address,
>>> +                      uint64_t size,
>>> +                      uint8_t* buf,
>>> +                      uint64_t* copied) {
>>>
>>> Misaligned.
>>>
>> Fixed.
>>
>>>
>>> +  //StreamableMemoryObject
>>> +
>>>
>>> Please remove.
>>>
>> Done.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120203/4ed2cc7f/attachment.html>