[PATCH] D16277: Bitcode: use blob for string storage in the IR: trade a bit of space for faster reading

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 20 07:26:45 PST 2016


tejohnson added a comment.

In http://reviews.llvm.org/D16277#330142, @joker.eph wrote:

> Thanks for the comments, the measurements were made after the `decodeChar6` changes.


Do you know where the current hotspot is? Do the strings have such higher overhead than blobs because of the copying required in the BitstreamReader when decoding? I wouldn't think that your new version of decodeChar6 would be very expensive. If it is due to the copying, what about an alternate approach, where the string is returned as is without copying/decoding (like a blob), but the decoding is done in the BitcodeReader or whatever it calls that copies the blob (e.g. ConstantDataArray::getString for constants, and does Value::setName do the copying for VST entries in the blob case?).

That's more invasive, but 10% is a pretty big on-disk size increase.

Another question - do you know the breakdown between the different record types? This should be pretty easily extracted from the llvm-bcanalyzer record histograms. A couple thoughts:

- If it is dominated by the VST FNENTRY records, then I would imagine that reading in the combined function index file in the ThinLTO compile would also suffer from the same problem. Do we need to do something there too?
- If it is module-level constant strings and VST ENTRY records (for declarations), I wonder if there is some kind of lazy reading/decoding we could do for these (they presumably aren't all needed by imported functions.
- If it is function level BBENTRY and ENTRY records (presumably for the imported functions since we wouldn't parse the others), then I wonder if we can get some savings (both time and space) by using a string table. For a simple example I looked at, I had two functions each containing a call to printf that each had a BBENTRY record with the string "entry" and ENTRY record with the string "call". I would imagine the former in particular is duplicated quite frequently. I saw BBENTRY records in another simple function I looked at that had other very common sounding strings such as "return", "if.else", "if.then", "retval", etc. Ditto for ENTRY records for local values/parameters which might have the same name in multiple functions.


http://reviews.llvm.org/D16277





More information about the llvm-commits mailing list