[llvm-dev] (RFC) Encoding code duplication factor in discriminator

Tue Nov 1 12:15:38 PDT 2016

As illustrated in the above example, it is not like "vectorization has a distinct bit". All different optimizations make clones of code which will be labeled by UIDs represented by N (e.g. 8) bits. In this way, the space will be capped by the number of clones all optimizations have made, instead of # of optimizations that has applied. And it will be capped at 2^N-1. The cons of using uid is that you will not know if a clone is coming from vectorization or unroll or loop distribution.
Okay, but that kind of semantic mapping is important. How should we encode/recover that information? To be clear, I'm not saying that we need to implement that up front, but there needs to be a clear path to an implementation, because I don't want to have two disjoint schemes.

You mean that you want to know which optimization created the clone? How would you use that info? Looks to me this will expose compiler implementation detail in debug info.

This is still doable, assume we have 15 interesting optimizations to track, we can use 4 bits to encode the optimization type that created the clone. But this becomes nasty if the a clone is created by more than one optimizations. In that way, discriminator may not be fit for this purpose.

My understanding was that the encoding scheme would allow the profiling analysis to correctly map execution data back to the original source construct, while preserving the property that each distinct basic block would have its own discriminator value.  That is, the execution data would be attributed back to the original source construct, not whatever each individual optimization had done to it, and the data for the original source construct would correctly reflect the execution (e.g. profiling says you got 82 hits on the original loop, rather than reporting 20 hits on the unrolled-by-4 loop plus 1 each on 2 of the trailing copies).

It sounds like Hal is thinking that the per-discriminator execution info would be preserved down to the point where an individual optimization could look at the profile for each piece, and make decisions on that basis.

I'm not clear how that would be possible, as the optimization would have to first do the transform (or predict how it would do the transform) in order to see which individual-discriminator counts mapped to which actual blocks, and then make some kind of decision about whether to do the transform differently based on that information.  Then, if the optimization did choose to do the transform differently, then that leaves the IR in a state where the individual discriminators *cannot* map back to it.  (Say you unroll by 2 instead of 4; then you have only 1 trailing copy, not 3, and a discriminator that maps to the second trailing copy now maps to nothing.  The individual-discriminator data becomes useless.)

Am I expressing this well enough to show that what Hal is looking for is not feasible?
--paulr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161101/c90917b2/attachment.html>