[PATCH] Bitcode: Collect all MDString records into a single blob

Duncan P. N. Exon Smith via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 25 07:51:46 PDT 2016


> On 2016-Mar-24, at 19:40, Mehdi Amini <mehdi.amini at apple.com> wrote:
> 
> 
> 
> Sent from my iPhone
> 
>> On Mar 24, 2016, at 7:22 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>> 
>>> On 2016-Mar-24, at 18:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>> 
>>> 
>>> 
>>>> On Mar 24, 2016, at 6:12 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>>> 
>>>> 
>>>>> On 2016-Mar-24, at 16:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>>>> 
>>>>> +        // FIXME: Avoid this upgrade logic; it's costly in the common case, and
>>>>> +        // incorrectly modifies all MDStrings.
>>>>> +        String = Current.str();
>>>>> +        llvm::UpgradeMDStringConstant(String);
>>>>> 
>>>>> Since we can't have old bitcode with METADATA_BULK_STRING_SIZES, and the only way to produce these deprecated "llvm.vectorizer.*" metadata is to write textual IR (for which we don't provide any guarantee). 
>>>>> So I'd advocate to drop it here (any lit test that is using the old syntax should be updated, bitcode files may be checked in with the old names for auto-upgrade test purpose, but that won't exercise this code path any way).
>>>> 
>>>> Fixed in r264373.
>>> 
>>> Nice!
>>> 
>>> +        // Test for upgrading !llvm.loop.
>>> +        HasSeenOldLoopTags |= mayBeOldLoopAttachmentTag(Current);
>>> 
>>> I still think this is not needed for the new block (you won't have old bitcode that can have this block). That is, unless Clang/LLVM is current producing these deprecated loop annotation (I hope not).
>> 
>> I think it's a little too subtle to mix and match where the upgrade
>> code is put.  Do you think this will cause a slowdown in practice?
> 
> I don't expect it to show up on the profile.

After thinking overnight, I think you're right that it doesn't make
sense here.  I addressed my own concern by adding a comment to be
clear that the omission is intentional (and why).

>>>>> +static unsigned createMDStringDataAbbrev(BitstreamWriter &Stream) {
>>>>> 
>>>>> There is one call site, for such a simple function are the other abbrev creation also using a separate function for creation?
>>>> 
>>>> Yes, the other metadata abbreviations are in functions.  I think it's
>>>> a cleaner model because both the `unsigned` and the `BitCodeAbbrev`
>>>> are rightly called some variation of "Abbrev".  With a separate
>>>> function you don't have to worry about which name refers to which
>>>> (since they are never in the same scope).
>>>> 
>>>>> +        return error("Invalid record");
>>>>> 
>>>>> The string could be more explicit (all of them).
>>>>> 
>>>>> Overall it is easy for me to understand because I have all the context. Someone that has no idea about what's going on would have hard time (OK admittedly the full state of the bitcode reader/writer is kind of like that, but that's not an excuse), so I think this could benefit some better comments here and there.
>>>> 
>>>> I was following the current error style.  You're right; no reason not
>>>> to set a new standard.
>>> 
>>> Somehow, I'd rather see a comment header for this function:
>>> 
>>> +static void writeMDStrings(ArrayRef<const Metadata *> Strings,
>>> 
>>> Something in the lines of:
>>> 
>>> /// Emit the MDStrings altogether. For efficiency we emit MDStrings using a 
>>> // pair of record, the first one is an array of integers indicating the size of 
>>> // each MDString encoded by the pair. The second record contains all the
>>> // MDStrings concatenated as a blob.
>>> static void writeMDStrings(ArrayRef<const Metadata *> Strings,
>>> 
>>> And in the reader:
>>> 
>>>  case bitc::METADATA_BULK_STRING_SIZES: {
>>>    /// See writeMDStrings for the description of MDString encoding.
>>> 
>>> 
>>> If the reader/writer were documented everywhere this way I'd have found it a lot more friendly when I had to jump in a few months ago.
>> 
>> Sure, I'll have a pass at updating the comments.

The comments were short enough I decided to essentially duplicate
them in both the reader and the writer.

>>>> Please have another look.
>>> 
>>> 
>>> +  // Emit strings in large blocks to reduce record overhead.
>>> +  const size_t NumStringsPerBlob = 1024;
>>> 
>>> How is this constant chosen? This looks quite arbitrary, especially since it does not say anything about the size of the blob itself.
>> 
>> It's big enough to convince me that there isn't any overhead,
>> and small enough to avoid an unnecessarily large allocation
>> for the `Record` in the BitcodeReader.
>> 
>> Maybe 1024 is too big, even.  That'll be 8096 bytes.  In my
>> next patch I'll set it to 512 so that it's within a page on
>> most platforms.
>> 
>> Frankly, I'm happy with anything as big as 128; I'm not
>> interested in micro-optimizing this.

I went with 512, and documented my reasoning.  It shouldn't look
so magical anymore (and maybe someone will even improve it).

>> 
>>> Why not emitting everything in a single record? (I guess we have a limit, but it is not clear how you constant guarantee that we stay under the limit).
>> 
>> (as discussed above)
>> 
>>> 
>>> +  // Put the strings first.
>>> +  std::stable_partition(MDs.begin(), MDs.end(),
>>> +                        [](const Metadata *MD) { return isa<MDString>(MD); });
>>> 
>>> I assume the use of std::stable_partition instead of std::partition is on the same level as "-preserve-bc-uselistorder", i.e. paranoid reproducibility but not required for correctness?
>> 
>> Not paranoid at all.  This is reproducibility at different levels.
>> 
>> This makes it so that clang will produce the same output when run
>> twice on the same input.
>> 
>> Preserving use-list order let's you re-enter the compiler and get
>> the same result as if you hadn't left in the first place.
> 
> 
> But MDString don't have use list right?
> 
>> 
>>> const SmallVectorImpl<const LocalAsMetadata *> &getFunctionLocalMDs() const {
>>>   return FunctionLocalMDs;
>>> }
>>> +
>>> const TypeList &getTypes() const { return Types; }
>>> 
>>> New line is a spurious change?
>> 
>> Not spurious.  I'm adding more metadata-related functions to the
>> group; for clarity, I separated the metadata functions from the
>> non-metadata functions with a newline.
> 
> Oh ok.
> 
> LGTM then.

r264409.  Thanks for the detailed review!




More information about the llvm-commits mailing list