[llvm-commits] Proposal/patch: Enable bitcode streaming

Nick Lewycky nlewycky at google.com
Mon Nov 21 15:38:54 PST 2011

I don't have many comments since not only does this patch work, I've also
reviewed it before. I didn't notice a single logic error. :) Regardless I
do have a few simple comments.

+class BitstreamBytes {
+  BitstreamBytes() { }
+  virtual ~BitstreamBytes() { }

This is in a header file. Please move the implementation of the virtual
destructor to BitcodeReader.cpp, or else copies will end up in every .o
file that includes this header.

Same with MemoryBitstreamBytes and LazyBitstreamBytes. More general
question, any reason to put MemoryBitstreamBytes and LazyBitstreamBytes
implementations inside the header? You could create

+  BitstreamBytes(const BitstreamBytes&);  // NOT IMPLEMENTED
+  void operator=(const BitstreamBytes&);  // NOT IMPLEMENTED

We use "DO NOT IMPLEMENT" elsewhere. It makes it clearer that it's not a
todo. Also on the subclasses.

+  // fetch enough bytes such that Pos can be read or EOF is reached
+  // (i.e. BytesRead > Pos). Return true if Pos can be read.
+  // Unlike most of the functions in BitcodeReader, returns true on success
+  bool fetchToPos(size_t Pos) {

Missing period on "success".

+  virtual size_t getEndPos() {
+    if (BitcodeSize) return BitcodeSize;
+    size_t pos = BytesRead + kChunkSize;
+    // keep fetching until we run out of bytes
+    while(fetchToPos(pos)) pos += kChunkSize;
+    return BitcodeSize;
+  }

Space after "while", like there is on "if".

+BitcodeStreamer* getBitcodeFileStreamer(const std::string &Filename,
+                                        std::string *Err);

Inconsistent placement of & and * (the rest of your patch put them before
the space).

+/// FindFunctionInStream - Find the function body in the bitcode stream
+bool BitcodeReader::FindFunctionInStream(Function *F,
+       DenseMap<Function*, uint64_t>::iterator
DeferredFunctionInfoIterator) {
+  while (DeferredFunctionInfoIterator->second == 0) {
+    if (Stream.AtEndOfStream())
+      return Error("Could not find Function in stream");
+    // ParseModule will parse the next body in the stream and set its
+    // position in the DeferredFunctionInfo map
+    if(ParseModule(true)) return true;

Space after "if".

+Module *llvm::getStreamedBitcodeModule(const std::string& name,
+                                   BitcodeStreamer* streamer,
+                                   LLVMContext& Context,
+                                   std::string *ErrMsg) {

This file is following space-then-sigil format. Please update three
occurrences there. "BitcodeStreamer* streamer" --> "BitcodeStreamer

+// This file implements BitcodeStreamer, which fetches bytes of bitcode
+// a stream source. It provides support for streaming (lazy reading) of
+// bitcode. TODO(dschuff) describe which final implementations are included

Did you intend to fix the TODO before committing or did you want to leave
that in?

+// * BitstreamBytes doesn't care about complexities like using
+//   threads/async callbacks to actuall overlap download+compile

Typo: actuall --> actually

+// Very simple stream backed by a file. Mostly useful for stdin and
+// actual file access is probably still best done with mmap
+class BitcodeFileStreamer : public BitcodeStreamer {
+ int Fd;
+ BitcodeFileStreamer() : Fd(0) {}

It looks this class has 1-space indent? Please use two.

+  if (LazyBitcode) {
+    std::string StrError;
+    BitcodeStreamer* streamer =

Star on the right.

+    if (LazyBitcode) {
+      FunctionPassManager* P = static_cast<FunctionPassManager*>(PM.get());

Star on the right!

One other thing. The behaviour when LazyBitcode is true could be really
different from when it off on the same .bc file on disk. It'd be good to
add comments pointing out why (module-level assembly comes to mind,
anything else?).

This looks really good, thanks for working on it! Because of the potential
for slowdown, I'd prefer if Chris could give it a thumbs-up before it lands.


On 9 November 2011 16:57, Derek Schuff <dschuff at google.com> wrote:

> Hello all,
> The following is a proposal (and a prototype patch) to enable bitcode
> streaming. The overall goal is to be able to overlap bitcode
> reading/download with compilation, a functionality useful obviously for
> pnacl and renderscript but also potentially for any situation where the
> interface between the frontend and backend is something other than a file.
> In the current state of the world, at a high level, there are 2 things
> keeping this from happening. The first is that BitcodeReader construction
> takes a MemoryBuffer which it expects to be filled with bitcode, and inside
> BitcodeReader, the BitstreamCursor (which is the primary interface to the
> bitcode itself) gets pointers to the bitcode in memory, and does all of its
> magic with pointer arithmetic. The second issue is that in
> BitcodeReader::ParseModule (which is run when right after the Module and
> BitcodeReader objects are created), the reader makes a pass over the entire
> bitcode file. This step does everything except read the function bodies,
> but it records the bit locations of each function for future
> materialization.
> High-level change description:
> This patch creates a class called BitcodeStream, which is a very simple
> interface with one method (GetBytes), which fetches the requested number of
> bytes from the stream, and writes them into the requested destination. This
> method may block the calling thread if there are not yet enough bytes
> available in the stream buffer (similarly to a stdin or socket read).
> The first issue above is addressed by introducing the BitstreamVector, an
> abstraction that wraps the bitcode in memory. Instead of using pointers,
> the BitstreamCursor uses indices and gets bitcode bytes by indexing (i.e.
> operator[] ) the BitstreamVector. When streaming is not used, the
> BitstreamVector itself keeps pointers to the start and end of the backing
> MemoryBuffer and the indexing operator is just a pointer dereference. For
> streaming use, the BitstreamVector has a BitcodeStream object. If a byte is
> requested that has not yet been fetched, it calls GetBytes to get more,
> until it has enough to return the requested byte.
> This model of allowing any byte to be requested and blocking the caller
> has the advantage that there is no structural/architectural change required
> at this lowest level, nor at the high level (A FunctionPassManager is used
> to iterate over all the functions and compile each one).
> The second issue is solved by 2 simple changes. The first is in
> ParseModule. Instead of a single pass over all the bitcode, ParseModule
> becomes resumable.  ParseModule will do its normal handling for top-level
> records, type table blocks, metadata, etc, but if streaming is in use, it
> will save its state and return as soon as a function subblock is
> encountered (rather than saving its location and skipping over it). Each
> subsequent time it is called, it bookmarks and skips one function block.
> Later, when a function needs to be materialized, if the function body has
> been seen already, then materialization is the same as before. Otherwise,
> Materialize will keep calling ParseModule (each time bookmarking and
> skipping one function body) until the requested function is found. The one
> other change required to make this work simply is that the bitcode writer
> writes function bodies as the last subblock (currently the attachment
> metadata and value symbol table are written after the function bodies).
> The prototype patch is attached and can also be viewed online at
> http://codereview.chromium.org/8393017/ . Feedback is welcome, as well as
> guidance from the relevant code owners/reviewers regarding what the next
> step needs to be toward committing this.
> Thanks,
> -Derek
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111121/5804dcf1/attachment.html>

More information about the llvm-commits mailing list