[PATCH] D12536: Function bitcode index in Value Symbol Table and lazy reading support

Wed Sep 2 07:26:47 PDT 2015

tejohnson added a comment.



In http://reviews.llvm.org/D12536#237772, @joker.eph wrote:

> Hi Teresa,
>
> Thanks for this patch, it looks good to me overall. 
>  Please find some comments inline.


Thanks, responses inline.

> Also does it change the behavior of the LazyReader? 

>  I don't know it enough to know if the fact that you are populating DeferredFunctionInfo using the module VST means it no longer needs to visit all the blocks.


Previously the lazy reader needs to parse through all the function blocks to populate DeferredFunctionInfo: Each time a function block is encountered by a call to parseModule, it records the bitcode offset (via BitcodeReader::rememberAndSkipFunctionBody) and stops parsing. Then whenever a function is materialized, it invokes BitcodeReader::findFunctionInStream, and if the DeferredFunctionInfo entry isn't found, parseModule is resumed from where it left off (and again records and skips the next function block and stops). findFunctionInStream will repeat this process until the function to be materialized is found. In the end, all the function blocks up through the last one materialized need to be walked through once to populate the full DeferredFunctionInfo (although their contents are skipped so it should be fairly fast).

With my patch, because the DeferredFunctionInfo is populated from the VST when we encounter the first function block, we can avoid this walk and stop parsing the module. I have an assert in findFunctionInStream that it should always find an entry for a function when the VSTOffset is not 0 - so essentially parseModule is never called from here to walk through the function blocks.

> I'd expect `llvm-extract` to be //constant time// now, independently of the number of functions in the module.


Yes, that would be the case with this change.

Thanks,
Teresa

> Best,

> 

> Mehdi



================
Comment at: include/llvm/Bitcode/BitstreamReader.h:237
@@ -236,1 +236,3 @@
 
+  unsigned getCurCodeSize() const { return CurCodeSize; }
+
----------------
joker.eph wrote:
> This is identical to `getAbbrevIDWidth()` two lines above, do you really need to alias it?
I will remove this and switch to using getAbbrevIDWidth.

I originally didn't use the same name since I thought abbrev ids was a specific use case. But looking at the LLVM Bitcode File Format doc online I see that it does refer to this number as the abbrev id width, and block IDs are also referred to as (possibly builtin) abbrev IDs.

Will change the corresponding interface I added to the BitstreamWriter to getAbbrevIDWidth to be consistent.

================
Comment at: include/llvm/Bitcode/BitstreamWriter.h:126
@@ +125,3 @@
+      CurWord |= NewWord >> (32 - StartBit);
+      support::endian::write32le(&Out[ByteNo], CurWord);
+    }
----------------
joker.eph wrote:
> (Note: this shift/insert seems ok to me but I am not certain about if it is endian independent)
It should be - this mirrors the way the CurValue is set in Emit() via shifting.

================
Comment at: include/llvm/Bitcode/BitstreamWriter.h:129
@@ -106,1 +128,3 @@
+  }
+
   void Emit(uint32_t Val, unsigned NumBits) {
----------------
joker.eph wrote:
> It looks like the `BackpatchWord` change is independent, can it be in a separate commit?
It is only needed because of the new use case of being called to backpatch the VSTOffset, which is the first time a non-word aligned location is backpatched. But it is independent in that it could presumably be used in other contexts in the future. If you feel this is better as a separate commit I can split it out and commit it first.

================
Comment at: include/llvm/Bitcode/BitstreamWriter.h:521
@@ -496,1 +520,3 @@
+public:
+
   BlockInfo &getOrCreateBlockInfo(unsigned BlockID) {
----------------
joker.eph wrote:
> It seems this is not needed for this patch? I don't see any use of `getOrCreateBlockInfo` in the diff?
This was leftover from something else I tried and apparently forgot to clean up. Will move the public back down where it was.

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:1589
@@ -1585,2 +1588,3 @@
 
-std::error_code BitcodeReader::parseValueSymbolTable() {
+ErrorOr<Value *> BitcodeReader::recordValue(SmallVector<uint64_t, 64> &Record,
+                                            unsigned NameIndex,
----------------
joker.eph wrote:
> I believe usually function argument are using the base class `SmallVectorImpl<uint64_t>` to avoid specializing on the size. The size is only relevant when you allocate it.
Got it, will change.

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:1693
@@ +1692,3 @@
+        GO = GA->getBaseObject();
+      assert(GO);
+
----------------
joker.eph wrote:
> Minor comment: You can avoid the extra dyn_cast.
> 
> ```
>       GlobalObject *GO;
>       // If this is an alias, need to get the actual Function object
>       // it aliases, in order to set up the DeferredFunctionInfo entry below.
>       auto *GA = dyn_cast<GlobalAlias>(V);
>       if (GA)
>         GO = GA->getBaseObject();
>       else
>         GO = dyn_cast<GlobalObject>(V);
>       assert(GO);
> ```
Good idea, will do.

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:2987
@@ +2986,3 @@
+            // of the last function block recorded in the VST (set when
+            // parsing the VST function entries). Skip it.
+            if (Stream.SkipBlock())
----------------
joker.eph wrote:
> kudos for the comments everywhere! :)
Thanks!

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:3301
@@ +3300,3 @@
+    case bitc::MODULE_CODE_VSTOFFSET:
+      if (Record.size() < 1)
+        return error("Invalid record");
----------------
joker.eph wrote:
> Is `Record.size() > 1` a valid possibility here?
Not in the current record design. It will only contain one record with a 32-bit word offset.

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:4744
@@ -4625,1 +4743,3 @@
+    // Only should happen if we have old format bitcode.
+    assert(VSTOffset == 0);
     if (Stream.AtEndOfStream())
----------------
joker.eph wrote:
> Not sure from the comment: is there a possibility for the assert to fire on old format bitcode? LLVM is supposed to always be able to read //old// bitcode.
Right, I left in the support for the old bitcode. This comment is meant to ensure that if we fall into the old format handling (no DeferredFunctionInfo entry that was pre-populated) that we indeed have the old format (in which case there should not have been a VSTOFFSET record in the bitcode). So the assert will not fire with the old format. Will clarify this in the comment.

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:599
@@ +598,3 @@
+  Vals.push_back(0);
+  Stream.EmitRecord(bitc::MODULE_CODE_VSTOFFSET, Vals, VSTOffsetAbbrev);
+
----------------
joker.eph wrote:
> Could be directly:
> ```
>   SmallVector<unsigned, 2> Vals{llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>   Stream.EmitRecordWithAbbrev(VSTOffsetAbbrev, Rec);
> ```
True, hadn't seen that usage before but it looks like that would work. Will change.

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2132
@@ -2072,1 +2131,3 @@
+    DenseMap<const Function *, uint64_t> *FunctionIndex = nullptr) {
   if (VST.empty()) return;
+
----------------
joker.eph wrote:
> What if there has been a placeholder emitted in this case?
WriteValueSymbolTableForwardDecl has the same early return, so there won't be one. Will add an assert here to ensure that VSTOffsetPlaceholder is 0 so that the two routines are forced to stay in sync on that.

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2188
@@ -2080,1 +2187,3 @@
+
+    const ValueName &Name = *SI;
 
----------------
joker.eph wrote:
> 
> Why did you need to change it to use an iterator?
> I believe you can still use `Name` instead of `SI` to have `getValue()` below. 
> 
> 
> ```
> for (const ValueName &Name : VST) {
>    Value &Val = Name.getValue();
> ```
> 
> 
> (I may have missed a use-case).
Ah, looks like this loop was just rangified a few weeks back, and my merge went back to the old code since I was using the earlier code when I first implemented this handling. Will fix.

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2203
@@ -2095,1 +2202,3 @@
+    if (GA && GA->getBaseObject())
+      F = dyn_cast<Function>(GA->getBaseObject());
 
----------------
joker.eph wrote:
> (Minor comment: Same extra `dyn_cast` as previously)
Will fix.


http://reviews.llvm.org/D12536