[PATCH] D16277: Bitcode: use blob for string storage in the IR: trade a bit of space for faster reading

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 19 07:32:04 PST 2016


tejohnson added a comment.

Was the improvement in speed measured on top of your DecodeChar6 change? If not, do you still see a nice improvement?


================
Comment at: include/llvm/Support/StreamingMemoryObject.h:34
@@ -30,9 +33,3 @@
                      uint64_t Address) const override;
-  const uint8_t *getPointer(uint64_t address, uint64_t size) const override {
-    // FIXME: This could be fixed by ensuring the bytes are fetched and
-    // making a copy, requiring that the bitcode size be known, or
-    // otherwise ensuring that the memory doesn't go away/get reallocated,
-    // but it's not currently necessary. Users that need the pointer (any
-    // that need Blobs) don't stream.
-    report_fatal_error("getPointer in streaming memory objects not allowed");
-    return nullptr;
+  const uint8_t *getPointer(uint64_t Address, uint64_t Size) const override {
+    Buffer.resize(Size);
----------------
filcab wrote:
> joker.eph wrote:
> > filcab wrote:
> > > What happens if there's two BLOBs in the stream? Wouldn't you overwrite one with the other?
> > Yes the validity of the pointer returned last only till the next read from the stream. 
> > The model is that there will be a copy made by the client anyway. But with blob won't "unpack" 6 bits elements to an array of unsigned, and then decode the 6 bits encoding to char, and then do the copy.
> > 
> > Note also that I haven't find another place than llvm-dis that uses this code-path.
> Have you tried clang too?
> 
> Wouldn't this code (in clang) break (I added comments)?
>   case SM_SLOC_BUFFER_ENTRY: {
>     const char *Name = Blob.data();           // <- Getting a ref to the current blob
>     unsigned Offset = Record[0];
>     SrcMgr::CharacteristicKind
>       FileCharacter = (SrcMgr::CharacteristicKind)Record[2];
>     SourceLocation IncludeLoc = ReadSourceLocation(*F, Record[1]);
>     if (IncludeLoc.isInvalid() &&
>         (F->Kind == MK_ImplicitModule || F->Kind == MK_ExplicitModule)) {
>       IncludeLoc = getImportLocation(F);
>     }
>     unsigned Code = SLocEntryCursor.ReadCode();
>     Record.clear();
>     unsigned RecCode
>       = SLocEntryCursor.readRecord(Code, Record, &Blob);     // <- That old blob reference is now invalid
> 
>     if (RecCode != SM_SLOC_BUFFER_BLOB) {
>       Error("AST record has invalid code");
>       return true;
>     }
> 
>     std::unique_ptr<llvm::MemoryBuffer> Buffer =
>         llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name);    // <- Use ref to first blob
>     SourceMgr.createFileID(std::move(Buffer), FileCharacter, ID,
>                            BaseOffset + Offset, IncludeLoc);
>     break;
>   }
> 
> "Nothing" in "llvm only" uses Blobs, basically (llvm-bcanalyzer does get one and dump it :-) ). But clang uses them a lot.
The old streaming getPointer took a fatal error, so presumably the clang code wasn't invoking it during streaming, so wouldn't break with Mehdi's change. Or is the concern that other code like this may creep in for StreamingMemoryObject?

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:1732
@@ -1729,1 +1731,3 @@
 
+ErrorOr<Value *> BitcodeReader::recordValue(SmallVectorImpl<uint64_t> &Record, StringRef Name,
+                             unsigned NameIndex, Triple &TT) {
----------------
There's a lot of code duplication here with the other recordValue(). Looks like the other recordValue() could get the Name string and invoke this one. Or just keep this one and if Name is empty here, invoke convertToString.

================
Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:2675
@@ -2641,2 +2674,3 @@
     }
-    case bitc::CST_CODE_STRING:    // STRING: [values]
+    case bitc::CST_CODE_STRING:  {  // STRING: [values]
+      if (!Blob.empty()) {
----------------
Looks like these two cases could be collapsed as they were before. Is it clearer with them separated?

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:1514
@@ -1506,3 +1513,3 @@
     String8Abbrev = Stream.EmitAbbrev(Abbv);
     // Abbrev for CST_CODE_CSTRING.
     Abbv = new BitCodeAbbrev();
----------------
This should be CST_CODE_STRING. Also, change comment to distinguish from original CST_CODE_STRING case above?

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:1531
@@ -1518,1 +1530,3 @@
     CString6Abbrev = Stream.EmitAbbrev(Abbv);
+    // Abbrev for CST_CODE_CSTRING.
+    Abbv = new BitCodeAbbrev();
----------------
Change comment to distinguish from original CST_CODE_CSTRING case above?

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:1643
@@ -1625,1 +1642,3 @@
         AbbrevToUse = CString7Abbrev;
+      if (OptSpeed) {
+        if (Code == bitc::CST_CODE_STRING)
----------------
Move up and make other cases "else if"

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2300
@@ +2299,3 @@
+
+    Abbv = new BitCodeAbbrev();
+    Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
----------------
Needs comment.

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2337
@@ -2305,1 +2336,3 @@
         AbbrevToUse = VST_BBENTRY_6_ABBREV;
+      if (OptSpeed)
+        AbbrevToUse = VST_BBENTRY_BLOB_ABBREV;
----------------
Move this up and make the other case "else if"

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2360
@@ -2326,1 +2359,3 @@
         AbbrevToUse = FnEntry7BitAbbrev;
+      if (OptSpeed)
+        AbbrevToUse = FnEntryBlobAbbrev;
----------------
Move this up and make the other cases "else if"

================
Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2619
@@ -2581,1 +2618,3 @@
   }
+  {
+    BitCodeAbbrev *Abbv = new BitCodeAbbrev();
----------------
Needs comment.


http://reviews.llvm.org/D16277





More information about the llvm-commits mailing list