[PATCH] D16277: Bitcode: use blob for string storage in the IR: trade a bit of space for faster reading
Mehdi AMINI via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 27 14:24:28 PST 2016
joker.eph added a comment.
In http://reviews.llvm.org/D16277#331129, @tejohnson wrote:
> In http://reviews.llvm.org/D16277#330142, @joker.eph wrote:
> > Thanks for the comments, the measurements were made after the `decodeChar6` changes.
> Do you know where the current hotspot is? Do the strings have such higher overhead than blobs because of the copying required in the BitstreamReader when decoding? I wouldn't think that your new version of decodeChar6 would be very expensive. If it is due to the copying, what about an alternate approach, where the string is returned as is without copying/decoding (like a blob), but the decoding is done in the BitcodeReader or whatever it calls that copies the blob (e.g. ConstantDataArray::getString for constants, and does Value::setName do the copying for VST entries in the blob case?).
The problem is that the bitcode is not byte aligned. Are you suggesting that we store the strings as "blob", correctly aligned, but potentially "char6 encoded"?
> Another question - do you know the breakdown between the different record types? This should be pretty easily extracted from the llvm-bcanalyzer record histograms. A couple thoughts:
Haven't looked, I have shifted to debug info in the meantime ;)
> - If it is dominated by the VST FNENTRY records, then I would imagine that reading in the combined function index file in the ThinLTO compile would also suffer from the same problem. Do we need to do something there too?
In my use case, the combine function index is only built in memory and never stored on disk.
> - If it is module-level constant strings and VST ENTRY records (for declarations), I wonder if there is some kind of lazy reading/decoding we could do for these (they presumably aren't all needed by imported functions.
Yes we're not lazy enough in many places, but it is not trivial to implement though. :(
> - If it is function level BBENTRY and ENTRY records (presumably for the imported functions since we wouldn't parse the others), then I wonder if we can get some savings (both time and space) by using a string table. For a simple example I looked at, I had two functions each containing a call to printf that each had a BBENTRY record with the string "entry" and ENTRY record with the string "call". I would imagine the former in particular is duplicated quite frequently. I saw BBENTRY records in another simple function I looked at that had other very common sounding strings such as "return", "if.else", "if.then", "retval", etc. Ditto for ENTRY records for local values/parameters which might have the same name in multiple functions.
Bitcode is lazy loaded, so Function were not parsed. I suspect the same problem would show up anywhere. Reading elements per elements out of the bitcode stream is far too expensive. The BitstreamCursor has to keep track of the bit shifting state and co.
I experimented storing all the global metadata in single blob record, that is serialized with a FlatBuffer (https://github.com/google/flatbuffers): my ThinLTO importing gets ~10% faster.
More information about the llvm-commits