[PATCH] Bitcode: Collect all MDString records into a single blob

Thu Mar 24 18:46:30 PDT 2016

On Mar 24, 2016, at 6:12 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> 
>> On 2016-Mar-24, at 16:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> 
>> +        // FIXME: Avoid this upgrade logic; it's costly in the common case, and
>> +        // incorrectly modifies all MDStrings.
>> +        String = Current.str();
>> +        llvm::UpgradeMDStringConstant(String);
>> 
>> Since we can't have old bitcode with METADATA_BULK_STRING_SIZES, and the only way to produce these deprecated "llvm.vectorizer.*" metadata is to write textual IR (for which we don't provide any guarantee). 
>> So I'd advocate to drop it here (any lit test that is using the old syntax should be updated, bitcode files may be checked in with the old names for auto-upgrade test purpose, but that won't exercise this code path any way).
> 
> Fixed in r264373.

Nice!

+        // Test for upgrading !llvm.loop.
+        HasSeenOldLoopTags |= mayBeOldLoopAttachmentTag(Current);

I still think this is not needed for the new block (you won't have old bitcode that can have this block). That is, unless Clang/LLVM is current producing these deprecated loop annotation (I hope not).

> 
>> +static unsigned createMDStringDataAbbrev(BitstreamWriter &Stream) {
>> 
>> There is one call site, for such a simple function are the other abbrev creation also using a separate function for creation?
> 
> Yes, the other metadata abbreviations are in functions.  I think it's
> a cleaner model because both the `unsigned` and the `BitCodeAbbrev`
> are rightly called some variation of "Abbrev".  With a separate
> function you don't have to worry about which name refers to which
> (since they are never in the same scope).
> 
>> +        return error("Invalid record");
>> 
>> The string could be more explicit (all of them).
>> 
>> Overall it is easy for me to understand because I have all the context. Someone that has no idea about what's going on would have hard time (OK admittedly the full state of the bitcode reader/writer is kind of like that, but that's not an excuse), so I think this could benefit some better comments here and there.
> 
> I was following the current error style.  You're right; no reason not
> to set a new standard.

Somehow, I'd rather see a comment header for this function:

+static void writeMDStrings(ArrayRef<const Metadata *> Strings,

Something in the lines of:

/// Emit the MDStrings altogether. For efficiency we emit MDStrings using a 
// pair of record, the first one is an array of integers indicating the size of 
// each MDString encoded by the pair. The second record contains all the
// MDStrings concatenated as a blob.
static void writeMDStrings(ArrayRef<const Metadata *> Strings,

And in the reader:

    case bitc::METADATA_BULK_STRING_SIZES: {
      /// See writeMDStrings for the description of MDString encoding.

If the reader/writer were documented everywhere this way I'd have found it a lot more friendly when I had to jump in a few months ago.

> Please have another look.

+  // Emit strings in large blocks to reduce record overhead.
+  const size_t NumStringsPerBlob = 1024;

How is this constant chosen? This looks quite arbitrary, especially since it does not say anything about the size of the blob itself.
Why not emitting everything in a single record? (I guess we have a limit, but it is not clear how you constant guarantee that we stay under the limit).

+  // Put the strings first.
+  std::stable_partition(MDs.begin(), MDs.end(),
+                        [](const Metadata *MD) { return isa<MDString>(MD); });

I assume the use of std::stable_partition instead of std::partition is on the same level as "-preserve-bc-uselistorder", i.e. paranoid reproducibility but not required for correctness?

   const SmallVectorImpl<const LocalAsMetadata *> &getFunctionLocalMDs() const {
     return FunctionLocalMDs;
   }
+
   const TypeList &getTypes() const { return Types; }

New line is a spurious change?

-- 
Mehdi