[PATCH] D26420: Encode duplication factor from loop vectorization and loop unrolling to discriminator.

Thu Dec 1 17:21:43 PST 2016

hfinkel added a comment.

In https://reviews.llvm.org/D26420#603775, @danielcdh wrote:

> In https://reviews.llvm.org/D26420#601965, @hfinkel wrote:
>
> > In https://reviews.llvm.org/D26420#601874, @danielcdh wrote:
> >
> > > Thanks for the review.
> > >
> > > Any update comments about the proposed encoding before it's ready to land?
> >
> >
> > Could we be smarter about the encoding so that the distinct-copies case did not take up so much space? It seems like the design right now gives a fixed number of low bits to the duplication factor. Could we use a variable number of bits? For example, because of the underlying LEB128 encoding, we could use some of the lowest-order bits as indicators, or better, use a variable-length code:
> >
> >   low: < duplication factor > < copy id > : high
> >   
> >
> > where each factor is a variable-length code (e.g. a prefix code: https://en.wikipedia.org/wiki/Prefix_code). For example, the Fibonacci code (https://en.wikipedia.org/wiki/Fibonacci_coding) is a well-known code which is good when the encoded numbers are likely to be small. In short, we might encode:
> >
> >   bitreverse( Fibonacci(copy-id + 1) | Fibonacci(dup-factor + 1) )
> >   
> >
> > In this case, if either field is empty, then we waste only 2 bits encoding that field. Small numbers take only a small number of bits. What do you think?
>
>
> Thanks for the suggestion. I spent some time study the prefix based encoding, which is a great idea. But when it applies here, with fibonacci encoding, 3 bits can only represent numbers up to 3, 4 bits can represent up to 5 and 7 bits can represent up to 21. This does not seem enough as duplication factor usually will need 5+ bits, making it hard to fit multiple-pieces into 1-byte ULEB.
>
> Another approach, as you mentioned, is to use the lower bits to encode which info it represents, so that when there is only 1 info available, we can more efficiently encode it into one byte. This is a great idea. I did some experiments to explore its potential: when assigning duplication factor, I always remove the original discriminator and put the DP in the lower 7 bits to make it always fit into 1 byte. The debug_line size increase comparing with the current patch is show below:
>
> 447.dealII	8.01%	6.16%
>  453.povray	6.50%	5.04%
>  482.sphinx3	7.54%	5.74%
>  470.lbm	0.00%	0.00%
>  444.namd	6.19%	4.89%
>  433.milc	23.12%	17.63%
>  450.soplex	2.66%	1.99%
>  445.gobmk	7.51%	5.02%
>  471.omnetpp	0.52%	0.36%
>  458.sjeng	10.31%	7.37%
>  473.astar	5.44%	4.26%
>  456.hmmer	9.74%	7.50%
>  401.bzip2	9.01%	6.33%
>  462.libquantum	10.79%	8.36%
>  403.gcc	2.74%	1.77%
>  464.h264ref	29.62%	21.14%
>  483.xalancbmk	1.42%	1.12%
>  429.mcf	9.55%	7.45%
>  400.perlbench	1.96%	1.21%
>  mean	7.81%	5.84%
>
> The first column of data represents the current implementation. The 2nd column is the experiment (fitting all DP into one byte), which represent the upper-bound of any encoding.
>
> From the data, looks like the improvement is marginal. So I'm wondering does it justify the added complexity comparing with fixed-width encoding?

Thanks for doing those experiments! Regarding the second form, many of those changes are on the order of a few percent -- that seems worthwhile. Can you post the patch?

>> 
>> 
>>> Thanks,
>>> Dehao

https://reviews.llvm.org/D26420