[llvm-dev] llvm-profdata determinism

Thu Jun 29 18:27:36 PDT 2017

I haven't tested it, but it looks to me like llvm-profdata merge (well,
InstrProfWriter specifically) would not have deterministic output.

Certainly the textual output iterates over FunctionData which is a
StringMap of SmallDenseMaps, neither of which has deterministic iteration.

The binary writing looks like it'd have similar issues - looping through
these unordered maps & writing output (eg:
InstrProfRecordWriterTrait::EmitData loops through the data in the same
SmallDenseMap and writes content in that order so far as I can tell.

Generally it's important that the compiler (& I believe related tools) have
deterministic output. Is there a reason that wouldn't be the case for
llvm-profdata? Or have I misunderstood how the output is determined?

Ensuring deterministic output may be expensive in terms of memory usage,
though perhaps not prohibitive. The usual approach is to use some of LLVM's
deterministic maps (like MapVector), though they're not exactly tuned for
memory usage. An alternative might be to take the data in each
SmallDenseMap and sort it by the hash as a key - it's unique after all, and
doing each map separately won't do crazy bad things to memory usage (a
small constant overhead).

Handling the StringMap, I'm not sure about - it might be cheap enough to
make a separate vector of StringMapEntry*s, sorting based on the strings
and iterating over that instead of the StringMap itself? (I guess the same
approach could be taken with the SmallDenseMaps, rather than duplicating
anything)

How's all that sound?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170630/8b845eb0/attachment.html>