[PATCH] D26420: Encode duplication factor from loop vectorization and loop unrolling to discriminator.
Hal Finkel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 1 17:21:43 PST 2016
hfinkel added a comment.
In https://reviews.llvm.org/D26420#603775, @danielcdh wrote:
> In https://reviews.llvm.org/D26420#601965, @hfinkel wrote:
>
> > In https://reviews.llvm.org/D26420#601874, @danielcdh wrote:
> >
> > > Thanks for the review.
> > >
> > > Any update comments about the proposed encoding before it's ready to land?
> >
> >
> > Could we be smarter about the encoding so that the distinct-copies case did not take up so much space? It seems like the design right now gives a fixed number of low bits to the duplication factor. Could we use a variable number of bits? For example, because of the underlying LEB128 encoding, we could use some of the lowest-order bits as indicators, or better, use a variable-length code:
> >
> > low: < duplication factor > < copy id > : high
> >
> >
> > where each factor is a variable-length code (e.g. a prefix code: https://en.wikipedia.org/wiki/Prefix_code). For example, the Fibonacci code (https://en.wikipedia.org/wiki/Fibonacci_coding) is a well-known code which is good when the encoded numbers are likely to be small. In short, we might encode:
> >
> > bitreverse( Fibonacci(copy-id + 1) | Fibonacci(dup-factor + 1) )
> >
> >
> > In this case, if either field is empty, then we waste only 2 bits encoding that field. Small numbers take only a small number of bits. What do you think?
>
>
> Thanks for the suggestion. I spent some time study the prefix based encoding, which is a great idea. But when it applies here, with fibonacci encoding, 3 bits can only represent numbers up to 3, 4 bits can represent up to 5 and 7 bits can represent up to 21. This does not seem enough as duplication factor usually will need 5+ bits, making it hard to fit multiple-pieces into 1-byte ULEB.
>
> Another approach, as you mentioned, is to use the lower bits to encode which info it represents, so that when there is only 1 info available, we can more efficiently encode it into one byte. This is a great idea. I did some experiments to explore its potential: when assigning duplication factor, I always remove the original discriminator and put the DP in the lower 7 bits to make it always fit into 1 byte. The debug_line size increase comparing with the current patch is show below:
>
> 447.dealII 8.01% 6.16%
> 453.povray 6.50% 5.04%
> 482.sphinx3 7.54% 5.74%
> 470.lbm 0.00% 0.00%
> 444.namd 6.19% 4.89%
> 433.milc 23.12% 17.63%
> 450.soplex 2.66% 1.99%
> 445.gobmk 7.51% 5.02%
> 471.omnetpp 0.52% 0.36%
> 458.sjeng 10.31% 7.37%
> 473.astar 5.44% 4.26%
> 456.hmmer 9.74% 7.50%
> 401.bzip2 9.01% 6.33%
> 462.libquantum 10.79% 8.36%
> 403.gcc 2.74% 1.77%
> 464.h264ref 29.62% 21.14%
> 483.xalancbmk 1.42% 1.12%
> 429.mcf 9.55% 7.45%
> 400.perlbench 1.96% 1.21%
> mean 7.81% 5.84%
>
> The first column of data represents the current implementation. The 2nd column is the experiment (fitting all DP into one byte), which represent the upper-bound of any encoding.
>
> From the data, looks like the improvement is marginal. So I'm wondering does it justify the added complexity comparing with fixed-width encoding?
Thanks for doing those experiments! Regarding the second form, many of those changes are on the order of a few percent -- that seems worthwhile. Can you post the patch?
>>
>>
>>> Thanks,
>>> Dehao
https://reviews.llvm.org/D26420
More information about the llvm-commits
mailing list