[PATCH] Bitcode: Collect all MDString records into a single blob

Duncan P. N. Exon Smith via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 24 19:22:40 PDT 2016


> On 2016-Mar-24, at 18:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
> 
> 
> 
> On Mar 24, 2016, at 6:12 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>> 
>>> On 2016-Mar-24, at 16:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>> 
>>> +        // FIXME: Avoid this upgrade logic; it's costly in the common case, and
>>> +        // incorrectly modifies all MDStrings.
>>> +        String = Current.str();
>>> +        llvm::UpgradeMDStringConstant(String);
>>> 
>>> Since we can't have old bitcode with METADATA_BULK_STRING_SIZES, and the only way to produce these deprecated "llvm.vectorizer.*" metadata is to write textual IR (for which we don't provide any guarantee). 
>>> So I'd advocate to drop it here (any lit test that is using the old syntax should be updated, bitcode files may be checked in with the old names for auto-upgrade test purpose, but that won't exercise this code path any way).
>> 
>> Fixed in r264373.
> 
> Nice!
> 
> +        // Test for upgrading !llvm.loop.
> +        HasSeenOldLoopTags |= mayBeOldLoopAttachmentTag(Current);
> 
> I still think this is not needed for the new block (you won't have old bitcode that can have this block). That is, unless Clang/LLVM is current producing these deprecated loop annotation (I hope not).

I think it's a little too subtle to mix and match where the upgrade
code is put.  Do you think this will cause a slowdown in practice?

>>> +static unsigned createMDStringDataAbbrev(BitstreamWriter &Stream) {
>>> 
>>> There is one call site, for such a simple function are the other abbrev creation also using a separate function for creation?
>> 
>> Yes, the other metadata abbreviations are in functions.  I think it's
>> a cleaner model because both the `unsigned` and the `BitCodeAbbrev`
>> are rightly called some variation of "Abbrev".  With a separate
>> function you don't have to worry about which name refers to which
>> (since they are never in the same scope).
>> 
>>> +        return error("Invalid record");
>>> 
>>> The string could be more explicit (all of them).
>>> 
>>> Overall it is easy for me to understand because I have all the context. Someone that has no idea about what's going on would have hard time (OK admittedly the full state of the bitcode reader/writer is kind of like that, but that's not an excuse), so I think this could benefit some better comments here and there.
>> 
>> I was following the current error style.  You're right; no reason not
>> to set a new standard.
> 
> Somehow, I'd rather see a comment header for this function:
> 
> +static void writeMDStrings(ArrayRef<const Metadata *> Strings,
> 
> Something in the lines of:
> 
> /// Emit the MDStrings altogether. For efficiency we emit MDStrings using a 
> // pair of record, the first one is an array of integers indicating the size of 
> // each MDString encoded by the pair. The second record contains all the
> // MDStrings concatenated as a blob.
> static void writeMDStrings(ArrayRef<const Metadata *> Strings,
> 
> And in the reader:
> 
>    case bitc::METADATA_BULK_STRING_SIZES: {
>      /// See writeMDStrings for the description of MDString encoding.
> 
> 
> If the reader/writer were documented everywhere this way I'd have found it a lot more friendly when I had to jump in a few months ago.

Sure, I'll have a pass at updating the comments.

> 
> 
>> Please have another look.
> 
> 
> +  // Emit strings in large blocks to reduce record overhead.
> +  const size_t NumStringsPerBlob = 1024;
> 
> How is this constant chosen? This looks quite arbitrary, especially since it does not say anything about the size of the blob itself.

It's big enough to convince me that there isn't any overhead,
and small enough to avoid an unnecessarily large allocation
for the `Record` in the BitcodeReader.

Maybe 1024 is too big, even.  That'll be 8096 bytes.  In my
next patch I'll set it to 512 so that it's within a page on
most platforms.

Frankly, I'm happy with anything as big as 128; I'm not
interested in micro-optimizing this.

> Why not emitting everything in a single record? (I guess we have a limit, but it is not clear how you constant guarantee that we stay under the limit).

(as discussed above)

> 
> +  // Put the strings first.
> +  std::stable_partition(MDs.begin(), MDs.end(),
> +                        [](const Metadata *MD) { return isa<MDString>(MD); });
> 
> I assume the use of std::stable_partition instead of std::partition is on the same level as "-preserve-bc-uselistorder", i.e. paranoid reproducibility but not required for correctness?

Not paranoid at all.  This is reproducibility at different levels.

This makes it so that clang will produce the same output when run
twice on the same input.

Preserving use-list order let's you re-enter the compiler and get
the same result as if you hadn't left in the first place.

>   const SmallVectorImpl<const LocalAsMetadata *> &getFunctionLocalMDs() const {
>     return FunctionLocalMDs;
>   }
> +
>   const TypeList &getTypes() const { return Types; }
> 
> New line is a spurious change?

Not spurious.  I'm adding more metadata-related functions to the
group; for clarity, I separated the metadata functions from the
non-metadata functions with a newline.

> 
> 
> -- 
> Mehdi



More information about the llvm-commits mailing list