[llvm-commits] Proposal/patch: Enable bitcode streaming

Nick Lewycky nlewycky at google.com
Wed Jan 25 14:04:23 PST 2012


On 20 January 2012 11:55, Derek Schuff <dschuff at google.com> wrote:

> And finally, the StreamingMemoryObject implementation, modified
> BitcodeReader, and modifed llvm-dis.cpp using the streaming interface.
> Please take a look
>

Overall this looks good. I'm especially happy with some of the refactoring
inside BitcodeReader! Comments:

--- a/include/llvm/Bitcode/ReaderWriter.h
+++ b/include/llvm/Bitcode/ReaderWriter.h
@@ -21,31 +21,41 @@ namespace llvm {
   class MemoryBuffer;
   class ModulePass;
   class BitstreamWriter;
+  class DataStreamer;
   class LLVMContext;
   class raw_ostream;

I realize these were unsorted when you got here, but please alphabetize
them.

+  /// If 'verify' is true, check that the file fits in the buffer.
+  static inline bool SkipBitcodeWrapperHeader(const unsigned char *&BufPtr,
+                                              const unsigned char *&BufEnd,
+                                              bool Verify) {

I didn't really understand the comment. I think what you're doing is
disabling the check that the buffer contained the whole header. Could you
make it "bool SkipPastEnd" instead?

+DataStreamer *getDataFileStreamer(const std::string &Filename,
+                                        std::string *Err);

Please line up the argument to the (.

--- a/include/llvm/Support/StreamableMemoryObject.h
+++ b/include/llvm/Support/StreamableMemoryObject.h
@@ -12,6 +12,9 @@
 #define STREAMABLEMEMORYOBJECT_H_

 #include "llvm/Support/MemoryObject.h"
+#include <vector>
+#include "llvm/ADT/OwningPtr.h"
+#include "llvm/Support/DataStream.h"

ADT, Support, then <system> headers. See
http://llvm.org/docs/CodingStandards.html#scf_includes .

+/// StreamingMemoryObject - interface to data which is actually streamed
from
+/// at DataStreamer. In addition to inherited members, it has the
+/// dropLeadingBytes and setKnownObjectSize methods which are not
applicable
+/// to non-streamed objects
+class StreamingMemoryObject : public StreamableMemoryObject {

I think that's a full sentence missing a period. It feels *awfully weird*
to have StreamableMemoryObject and StreamingMemoryObject, and both of them
are interfaces. The comment doesn't seem to sufficiently explain what's
going on here. (The DataStreamer can stream from a Streaming but not with a
Streamable? What?)

+  // fetch enough bytes such that Pos can be read or EOF is reached
+  // (i.e. BytesRead > Pos). Return true if Pos can be read.
+  // Unlike most of the functions in BitcodeReader, returns true on
success.
+  bool fetchToPos(size_t Pos) {

Comment should start with a capital.

+  bool fetchToPos(size_t Pos) {
+    if (EOFReached) return Pos < ObjectSize;
+    while (Pos >= BytesRead) {
+      Bytes.resize(BytesRead + kChunkSize);
+      size_t bytes = Streamer->GetBytes(&Bytes[BytesRead + BytesSkipped],
+                                        kChunkSize);

Why is kChunkSize so special? Why not ask for all the bytes up until Pos?
The comment on DataStreamer::GetBytes doesn't give any reason not to ask
for as many bytes as you want?

+bool BitcodeReader::SuspendModuleParse() {
+  // save our current position
+  NextUnreadBit = Stream.GetCurrentBitNo();
+  return false;
+}

What's up with that returning bool?

+    // ParseModule will parse the next body in the stream and set its
+    // position in the DeferredFunctionInfo map

Sentence needs period.

+  unsigned char buf[16];
+  if (Bytes->readBytes(0, 16, buf, NULL) == -1)
+    return Error("Bitcode stream must be at least 16 bytes in length");
+
+  if (!isBitcode(buf, buf + 16)) {
+    return Error("Invalid bitcode signature");
+  }

So, uh, braces or no braces around one-line return statements? :-)

+Module *llvm::getStreamedBitcodeModule(const std::string &name,
+                                   DataStreamer *streamer,
+                                   LLVMContext &Context,
+                                   std::string *ErrMsg) {

These args don't line up.

--- /dev/null
+++ b/lib/Support/DataStream.cpp
@@ -0,0 +1,96 @@
+//===--- llvm/Support/DataStream.cpp - Lazy streamed Data -*- C++ -*-===//

Don't include emacs mode markers (the -*- C++ -*- bit) on .cpp files, only
on .h files.

+// Very simple stream backed by a file. Mostly useful for stdin and
debugging;
+// actual file access is probably still best done with mmap
+class DataFileStreamer : public DataStreamer {
+ int Fd;

Sentence seeking full stop.

+DataStreamer *getDataFileStreamer(const std::string &Filename,
+                                        std::string *StrError) {

Line up to the ( again.

+  if (e != success) {
+    *StrError = std::string() + "Could not open " + Filename + ": "
+        + e.message() + "\n";
+    return NULL;
+  }

Optional: std::string("Could not open ") + Filename, and also putting the +
on the previous line instead of starting a line with the operator.

--- a/lib/Support/StreamableMemoryObject.cpp
+++ b/lib/Support/StreamableMemoryObject.cpp

In this file you added some spurious blank lines. Please don't do that.

+bool StreamingMemoryObject::isObjectEnd(uint64_t address) {
+  if (ObjectSize) return address == ObjectSize;
+  fetchToPos(address);
+  return address == BytesRead;
+}

Shouldn't that end with "return address == ObjectSize"? If the file is
larger than 'address' bytes, won't this read up to address bytes, then
stop, leaving BytesRead == address? ObjectSize on the other hand isn't set
until EOF is reached.

+int StreamingMemoryObject::readBytes(uint64_t address,
+                      uint64_t size,
+                      uint8_t* buf,
+                      uint64_t* copied) {

Misaligned.

+  //StreamableMemoryObject
+

Please remove.

Nick


> thanks,
> -Derek
>
>
> On Wed, Jan 18, 2012 at 4:05 PM, Derek Schuff <dschuff at google.com> wrote:
>
>> Hi Chris & Nick,
>> Attached is a very slightly updated version of patch number 1, and patch
>> number 2, with a new StreamableMemoryObject (derived from MemoryObject),
>> suitable for streaming usage.
>> One consequence of deriving from MemoryObject is that I had to make the
>> getExtent and readByte/readBytes methods of MemoryObject no longer const,
>> since they are definitely not const in the StreamableMemoryObject. This
>> resulted in having to remove const in several usages of MemoryObject. It
>> seemed less bad than adding a bunch of mutable data members in
>> StreamableMemoryObject or creating a near-duplicate of MemoryObject.
>>
>> The third patch is nearly ready as well, please let me know what you
>> think.
>> thanks,
>> -Derek
>>
>>
>> On Thu, Jan 12, 2012 at 3:09 PM, Derek Schuff <dschuff at google.com> wrote:
>>
>>>
>>>> Overall, my recommendation would be to split this into three patches:
>>>> the first patch would just increase the abstraction level of the bitcode
>>>> reader, by adding various predicates that you need and tidy things up.  The
>>>> second would extend MemoryObject as needed to add the functionality that
>>>> you need.  The third would actually switch the meat of the bitcode reader
>>>> to be lazy-streaming, and switch a tool to use it.
>>>>
>>>>
>>>
>>> Attached is patch 1 as listed above: a refactor of BistreamCursor to use
>>> an offset rather than raw pointers, and abstract the relevant operations
>>> into functions. For now BitstreamReader is the same. I've tested it locally
>>> and it works in isolation and is hopefully ready to apply. Patch 2, a
>>> subclass of MemoryObject (StreamableMemoryObject) to replace the
>>> BitstreamBytes class here, is upcoming next.
>>>
>>> thanks,
>>> -Derek
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120125/bb5fe28b/attachment.html>


More information about the llvm-commits mailing list