[PATCH] Bitcode: Collect all MDString records into a single blob
Duncan P. N. Exon Smith via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 24 19:27:14 PDT 2016
> On 2016-Mar-24, at 19:22, Duncan P. N. Exon Smith via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>
>>
>> On 2016-Mar-24, at 18:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>
>>
>>
>> On Mar 24, 2016, at 6:12 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>>
>>>
>>>> On 2016-Mar-24, at 16:46, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>>>
>>>> + // FIXME: Avoid this upgrade logic; it's costly in the common case, and
>>>> + // incorrectly modifies all MDStrings.
>>>> + String = Current.str();
>>>> + llvm::UpgradeMDStringConstant(String);
>>>>
>>>> Since we can't have old bitcode with METADATA_BULK_STRING_SIZES, and the only way to produce these deprecated "llvm.vectorizer.*" metadata is to write textual IR (for which we don't provide any guarantee).
>>>> So I'd advocate to drop it here (any lit test that is using the old syntax should be updated, bitcode files may be checked in with the old names for auto-upgrade test purpose, but that won't exercise this code path any way).
>>>
>>> Fixed in r264373.
>>
>> Nice!
>>
>> + // Test for upgrading !llvm.loop.
>> + HasSeenOldLoopTags |= mayBeOldLoopAttachmentTag(Current);
>>
>> I still think this is not needed for the new block (you won't have old bitcode that can have this block). That is, unless Clang/LLVM is current producing these deprecated loop annotation (I hope not).
>
> I think it's a little too subtle to mix and match where the upgrade
> code is put. Do you think this will cause a slowdown in practice?
>
>>>> +static unsigned createMDStringDataAbbrev(BitstreamWriter &Stream) {
>>>>
>>>> There is one call site, for such a simple function are the other abbrev creation also using a separate function for creation?
>>>
>>> Yes, the other metadata abbreviations are in functions. I think it's
>>> a cleaner model because both the `unsigned` and the `BitCodeAbbrev`
>>> are rightly called some variation of "Abbrev". With a separate
>>> function you don't have to worry about which name refers to which
>>> (since they are never in the same scope).
>>>
>>>> + return error("Invalid record");
>>>>
>>>> The string could be more explicit (all of them).
>>>>
>>>> Overall it is easy for me to understand because I have all the context. Someone that has no idea about what's going on would have hard time (OK admittedly the full state of the bitcode reader/writer is kind of like that, but that's not an excuse), so I think this could benefit some better comments here and there.
>>>
>>> I was following the current error style. You're right; no reason not
>>> to set a new standard.
>>
>> Somehow, I'd rather see a comment header for this function:
>>
>> +static void writeMDStrings(ArrayRef<const Metadata *> Strings,
>>
>> Something in the lines of:
>>
>> /// Emit the MDStrings altogether. For efficiency we emit MDStrings using a
>> // pair of record, the first one is an array of integers indicating the size of
>> // each MDString encoded by the pair. The second record contains all the
>> // MDStrings concatenated as a blob.
>> static void writeMDStrings(ArrayRef<const Metadata *> Strings,
>>
>> And in the reader:
>>
>> case bitc::METADATA_BULK_STRING_SIZES: {
>> /// See writeMDStrings for the description of MDString encoding.
>>
>>
>> If the reader/writer were documented everywhere this way I'd have found it a lot more friendly when I had to jump in a few months ago.
>
> Sure, I'll have a pass at updating the comments.
Also, please let me know if/when you're happy moving the remaining
review to post-commit.
>
>>
>>
>>> Please have another look.
>>
>>
>> + // Emit strings in large blocks to reduce record overhead.
>> + const size_t NumStringsPerBlob = 1024;
>>
>> How is this constant chosen? This looks quite arbitrary, especially since it does not say anything about the size of the blob itself.
>
> It's big enough to convince me that there isn't any overhead,
> and small enough to avoid an unnecessarily large allocation
> for the `Record` in the BitcodeReader.
>
> Maybe 1024 is too big, even. That'll be 8096 bytes. In my
> next patch I'll set it to 512 so that it's within a page on
> most platforms.
>
> Frankly, I'm happy with anything as big as 128; I'm not
> interested in micro-optimizing this.
>
>> Why not emitting everything in a single record? (I guess we have a limit, but it is not clear how you constant guarantee that we stay under the limit).
>
> (as discussed above)
>
>>
>> + // Put the strings first.
>> + std::stable_partition(MDs.begin(), MDs.end(),
>> + [](const Metadata *MD) { return isa<MDString>(MD); });
>>
>> I assume the use of std::stable_partition instead of std::partition is on the same level as "-preserve-bc-uselistorder", i.e. paranoid reproducibility but not required for correctness?
>
> Not paranoid at all. This is reproducibility at different levels.
>
> This makes it so that clang will produce the same output when run
> twice on the same input.
>
> Preserving use-list order let's you re-enter the compiler and get
> the same result as if you hadn't left in the first place.
>
>> const SmallVectorImpl<const LocalAsMetadata *> &getFunctionLocalMDs() const {
>> return FunctionLocalMDs;
>> }
>> +
>> const TypeList &getTypes() const { return Types; }
>>
>> New line is a spurious change?
>
> Not spurious. I'm adding more metadata-related functions to the
> group; for clarity, I separated the metadata functions from the
> non-metadata functions with a newline.
>
>>
>>
>> --
>> Mehdi
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list