[llvm-commits] Proposal/patch: Enable bitcode streaming

Wed Jan 11 21:04:20 PST 2012

On Jan 9, 2012, at 2:12 PM, Derek Schuff wrote:

> Happy new year, and happy 2-month anniversary to this thread. Here it is again, rebased to r147787:
> 

Some comments on the patch:

--- a/tools/llc/llc.cpp
+++ b/tools/llc/llc.cpp

+static cl::opt<bool>
+LazyBitcode("streaming-bitcode",
+  cl::desc("Use lazy bitcode streaming for file inputs"),
+  cl::init(false));
 
Instead of making lazy bitstreaming an optional feature, you should just switch something to use it by default, otherwise it won't get tested. llvm-dis would be a reasonable one, because we really don't care about a performance hit there.

  // Build up all of the passes that we want to do to the module.
-  PassManager PM;
+  OwningPtr<PassManagerBase> PM;
+  if (LazyBitcode)
+    PM.reset(new FunctionPassManager(&mod));
+  else
+    PM.reset(new PassManager());

Why is this needed?


--- a/include/llvm/Bitcode/BitstreamReader.h
+++ b/include/llvm/Bitcode/BitstreamReader.h

+
+class BitstreamBytes {
+public:
+  BitstreamBytes() { }


How is this different than the code in llvm/Support/MemoryObject.h?  It seems that any additional functionality should be merged into memory object instead of making some new bitcode-specific streaming api.

New classes should have doxygen comments on them.  If MemoryBitstreamBytes needs to continue to exist, it should take a StringRef instead of start/end char positions.  In any case, the abstraction for streaming has nothing to do with bitcode, so it should be in MemoryObject or somewhere else that is not the bitcode reader.

+  // Returns the in memory address of Pos from the beginning of the stream.
+  // May block until Pos bytes have been read, or EOF is reached.
+  // Note that the first byte past the end may be skipped to, but may not have
+  // its address taken.
+  virtual const unsigned char *addressOf(size_t Pos) = 0;

This seems like the wrong level of API for this.  AFAICT, this is all to support BLOBs in bitcode files.  You should just add an API to ensure that N bytes from the stream are available in memory somewhere.


This sort of change:

           for (; NumElts; ++NextChar, --NumElts)
-            Vals.push_back(*NextChar);
+            Vals.push_back(BitStream->getBytes().getByte(NextChar));

Makes me think that you don't have the right abstraction here.  Why is NextChar not some new kind of iterator that handles this?


With this sort of change:
     // If we run out of data, stop at the end of the stream.
-    if (NextChar == BitStream->getLastChar()) {
+    if (BitStream->getBytes().isEndPos(NextChar)) {
 
you're definitely doing a good job raising the abstraction level of the bitcode reader, which is great.  However, why not (as a patch before the big reworking) add a "isEndPos()" method to the bitcode reader itself, and refactor code to use it?  Then in your big rework patch you only have to change the one method to indirect through bitstream, instead of changing every place that checks for the end of stream.

--- /dev/null
+++ b/include/llvm/Support/BitcodeStream.h

This file doesn't seem like it needs to exist, the BitcodeStreamer shouldn't be exported at all, so it is just one C-style entrypoint, which can be added to some pre-existing header.

+++ b/lib/Support/BitcodeStream.cpp
@@ -0,0 +1,97 @@

+namespace {
+
+const static error_code success;

This is odd, why the global variable?


Overall, my recommendation would be to split this into three patches: the first patch would just increase the abstraction level of the bitcode reader, by adding various predicates that you need and tidy things up.  The second would extend MemoryObject as needed to add the functionality that you need.  The third would actually switch the meat of the bitcode reader to be lazy-streaming, and switch a tool to use it.

-Chris