[LLVMdev] [Patches][RFC] What to do about bitcode streaming.

Thu Dec 18 20:15:02 PST 2014

+llvmdev

> On 2014 Dec 18, at 15:14, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> 
> Currently we support reading bitcode in 3 ways:
> 
> * Read everything upfront.
> * Be lazy about reading the function bodies.
> * Read the bitcode over a streaming interface.
> 
> The first two modes are commonly used and well tested. In fact the
> "read everything" mode is implemented by starting lazy and calling
> materializeAllPermanently.
> 
> The thirds mode is used in tree only by llvm-dis, which means that
> some bugs can hide for quite some time (222505 for example). To the
> best of my knowledge it is only real user is PNaCl.
> 
> It also has a disproportional code complexity. There are two virtual
> interfaces (DataStreamer and MemoryObject) and they are leaky: Not all
> bitcode  features are supported when streaming and there are a few "if
> (LazyStreamer)" int the reader.
> 
> I have attached two patches that solve the problem with different trade-offs.
> 
> One just deletes the feature. This would make PNaCl responsible for
> maintaining their own reader. I assume they will have to do it at some
> point since they are looked to a fixed format, but this would make it
> an immediate issue.
> 
> The other one gets most of the simplifications by adding just one
> assumption: the size is known. This should be true for http with
> Content-Length.
> 
> We go from 2 interfaces to 1 and that interface has a single virtual
> method. There are no ifs on the data being streamed or missing
> features.
> 
> It might be even possible to drop the requirement for the size to be
> known: Replace the call to AtEndOfStream by just trying to read more
> and checking if it failed, but that is a bit more than I wanted to do
> for this.
> 
> Cheers,
> Rafael

I CC'ed llvmdev to put a few more eyes on the "what's the right
direction?" question.

IMO these both look like huge improvements.  The streaming interface was
the strangest part of the bitcode reader when I was trying to figure out
how it all fit together for the use-list-order work.

Personally I favour the "just delete it" path, but maybe there's
something I'm missing, and the other patch looks great too.

I didn't read carefully yet (although I noticed two quirks in the first
patch that I've pointed out below) -- I'll have a closer look once
you've decided on a direction.

> diff --git a/lib/Bitcode/Reader/BitcodeReader.cpp b/lib/Bitcode/Reader/BitcodeReader.cpp
> index e228b1d..1322edd 100644
> --- a/lib/Bitcode/Reader/BitcodeReader.cpp
> +++ b/lib/Bitcode/Reader/BitcodeReader.cpp
> @@ -33,6 +32,26 @@ enum {
>    SWITCH_INST_MAGIC = 0x4B5 // May 2012 => 1205 => Hex
>  };
>  
> +BitcodeReader::BitcodeReader(MemoryBuffer * Buffer, LLVMContext & C)

clang-format?

> +    : Context(C), TheModule(nullptr), Buffer(Buffer),
> +      SeenValueSymbolTable(false), ValueList(C), MDValueList(C),
> +      SeenFirstFunctionBody(false), UseRelativeIDs(false),
> +      WillMaterializeAllForwardRefs(false) {
> diff --git a/tools/llvm-dis/llvm-dis.cpp b/tools/llvm-dis/llvm-dis.cpp
> index 072f636..3e21164 100644
> --- a/tools/llvm-dis/llvm-dis.cpp
> +++ b/tools/llvm-dis/llvm-dis.cpp
> @@ -135,12 +205,13 @@ int main(int argc, char **argv) {
>      else
>        DisplayFilename = InputFilename;
>      ErrorOr<std::unique_ptr<Module>> MOrErr =
> -        getStreamedBitcodeModule(DisplayFilename, Streamer, Context);
> +        getStreamedBitcodeModule(DisplayFilename, std::move(Streamer), Context);
>      if (std::error_code EC = MOrErr.getError())
>        ErrorMessage = EC.message();
>      else
>        M = std::move(*MOrErr);
>      if(M.get()) {
> +      errs() << "foobar " << Foo.getMaxPos() << '\n';

Debug output?

>        if (std::error_code EC = M->materializeAllPermanently()) {
>          ErrorMessage = EC.message();
>          M.reset();