[llvm-dev] (RFC) Encoding code duplication factor in discriminator

Tue Nov 1 09:33:08 PDT 2016

On Thu, Oct 27, 2016 at 2:53 PM, Robinson, Paul <paul.robinson at sony.com>
wrote:

> It looks like the example doesn't use the encoding described in the text?
>
>
>
> Assume that the discriminator is uint32. The traditional discriminator is
> less than 256, let's take 8 bit for it. For duplication factor (type 1
> duplication), we assume the maximum unroll_factor * vectorize_factor is
> less than 256, thus 8 bit for it. For unique number(type 2 duplication), we
> assume code is at most duplicated 32 times, thus 5 bit for it. Overall, we
> still have 11 free bits left in the discriminator encoding.
>
>
>
> Let's take the original source as an example, after loop unrolling and
> peeling, the code may looks like:
>
>
>
> for (i = 0; i < N & 3; i+= 4) {
>
>   foo();  // discriminator: 0x40
>
>   foo();  // discriminator: 0x40
>
>   foo();  // discriminator: 0x40
>
>   foo();  // discriminator: 0x40
>
> }
>
> if (i++ < N) {
>
>   foo();   // discriminator: 0x100
>
>   if (i++ < N) {
>
>     foo(); // discriminator: 0x200
>
>     if (i++ < N) {
>
>       foo();  // discriminator: 0x300
>
>     }
>
>   }
>
> }
>
>
>
> If we allocate 8 bits to "traditional" discriminators, then 0x40 falls
> into that range, so I'd think the calls to foo() inside the loop should be
> using 0x400 to encode the unroll factor.
>

You are right, thanks for pointing this out.

> Note this requires 2 bytes for ULEB128 instead of 1.
>
> And if we allocate another 8 bits to the unroll factor, then the trailing
> calls should use 0x10000, 0x20000, 0x30000.  These will require 3 bytes for
> ULEB128 instead of 2.
>
>
>
> I think it would be fine to allocate only 4 bits to "traditional"
> discriminators (as you need them to be unique across the same source
> location, but not across different source locations, and 16 independent
> basic blocks for the same source location seems like plenty to me; but I
> haven't looked at a lot of cases with discriminators).  This would allow
> you to use 0x40 to encode the unroll factor in this example.  If you still
> want to allocate 8 bits for unroll factor, then the trailing calls would
> use 0x1000, 0x2000, 0x3000 so as long as you had no more than 3 trailing
> calls you can still encode the discriminator in a 2-byte ULEB128.
>

I did some evaluation on speccpu2006 c/c++ benchmarks. Among all source
built, there are 268K non-zero discriminators emitted. The largest
discriminator is 779 (coming from 471.omnetpp, which has significant amount
of EH code.) The 99% percentile is 18. The 85% percentile is 3. For the
duplication factor, the largest is 320, the 99% percentile is 40, the 85%
percentile is 4.

If we want to encode this into the discriminator, I would propose encode
something like:

high bits   ---------->  low bits
EEEEEEEECCCCCFFDDD CCCFFFFFFDDDDD

D: normal discriminator
F: duplication factor
C: code cloning
E: empty bits

So the lower 14 bits should be able to cover 99% percentile

Or something like:
high bits   ---------->  low bits
EEEEEEEECCCCCFFDDD CFFFDDD CCFFFDD

So the lower 7 bits should be able to cover 85% percentile and the lower 14
bits should be able to cover 99% percentile.

Dehao

> --paulr
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161101/151d1ad3/attachment.html>