[PATCH] D26420: Encode duplication factor from loop vectorization and loop unrolling to discriminator.

Tue Nov 22 18:11:13 PST 2016

danielcdh added a comment.

In https://reviews.llvm.org/D26420#601965, @hfinkel wrote:

> In https://reviews.llvm.org/D26420#601874, @danielcdh wrote:
>
> > Thanks for the review.
> >
> > Any update comments about the proposed encoding before it's ready to land?
>
>
> Could we be smarter about the encoding so that the distinct-copies case did not take up so much space? It seems like the design right now gives a fixed number of low bits to the duplication factor. Could we use a variable number of bits? For example, because of the underlying LEB128 encoding, we could use some of the lowest-order bits as indicators, or better, use a variable-length code:
>
>   low: < duplication factor > < copy id > : high
>   
>
> where each factor is a variable-length code (e.g. a prefix code: https://en.wikipedia.org/wiki/Prefix_code). For example, the Fibonacci code (https://en.wikipedia.org/wiki/Fibonacci_coding) is a well-known code which is good when the encoded numbers are likely to be small. In short, we might encode:
>
>   bitreverse( Fibonacci(copy-id + 1) | Fibonacci(dup-factor + 1) )
>   
>
> In this case, if either field is empty, then we waste only 2 bits encoding that field. Small numbers take only a small number of bits. What do you think?

Thanks for the suggestion. I spent some time study the prefix based encoding, which is a great idea. But when it applies here, with fibonacci encoding, 3 bits can only represent numbers up to 3, 4 bits can represent up to 5 and 7 bits can represent up to 21. This does not seem enough as duplication factor usually will need 5+ bits, making it hard to fit multiple-pieces into 1-byte ULEB.

Another approach, as you mentioned, is to use the lower bits to encode which info it represents, so that when there is only 1 info available, we can more efficiently encode it into one byte. This is a great idea. I did some experiments to explore its potential: when assigning duplication factor, I always remove the original discriminator and put the DP in the lower 7 bits to make it always fit into 1 byte. The debug_line size increase comparing with the current patch is show below:

447.dealII	8.01%	6.16%
453.povray	6.50%	5.04%
482.sphinx3	7.54%	5.74%
470.lbm	0.00%	0.00%
444.namd	6.19%	4.89%
433.milc	23.12%	17.63%
450.soplex	2.66%	1.99%
445.gobmk	7.51%	5.02%
471.omnetpp	0.52%	0.36%
458.sjeng	10.31%	7.37%
473.astar	5.44%	4.26%
456.hmmer	9.74%	7.50%
401.bzip2	9.01%	6.33%
462.libquantum	10.79%	8.36%
403.gcc	2.74%	1.77%
464.h264ref	29.62%	21.14%
483.xalancbmk	1.42%	1.12%
429.mcf	9.55%	7.45%
400.perlbench	1.96%	1.21%
mean	7.81%	5.84%

The first column of data represents the current implementation. The 2nd column is the experiment (fitting all DP into one byte), which represent the upper-bound of any encoding.

>From the data, looks like the improvement is marginal. So I'm wondering does it justify the added complexity comparing with fixed-width encoding?

> 
> 
>> Thanks,
>> Dehao

https://reviews.llvm.org/D26420