[llvm-dev] (RFC) Encoding code duplication factor in discriminator

Dehao Chen via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 27 13:11:50 PDT 2016


The impact to debug_line is actually not small. I only implemented the part
1 (encoding duplication factor) for loop unrolling and loop vectorization.
The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++
benchmarks:

433.milc 23.59%
444.namd 6.25%
447.dealII 8.43%
450.soplex 2.41%
453.povray 5.40%
470.lbm 0.00%
482.sphinx3 7.10%
400.perlbench 2.77%
401.bzip2 9.62%
403.gcc 2.67%
429.mcf 9.54%
445.gobmk 7.40%
456.hmmer 9.79%
458.sjeng 9.98%
462.libquantum 10.90%
464.h264ref 30.21%
471.omnetpp 0.52%
473.astar 5.67%
483.xalancbmk 1.46%
mean 7.86%
Dehao

On Thu, Oct 27, 2016 at 11:55 AM, Xinliang David Li <davidxl at google.com>
wrote:

> Do you have an estimate of the debug_line size increase? I guess it will
> be small.
>
> David
>
> On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote:
>
>> Motivation:
>> Many optimizations duplicate code. E.g. loop unroller duplicates the loop
>> body, GVN duplicates computation, etc. The duplicated code will share the
>> same debug info with the original code. For SamplePGO, the debug info is
>> used to present the profile. Code duplication will affect profile accuracy.
>> Taking loop unrolling for example:
>>
>> #1 foo();
>> #2 for (i = 0; i < N; i++) {
>> #3   bar();
>> #4 }
>>
>> If N is 8 during runtime, a reasonable profile will look like:
>>
>> #1: 10
>> #3: 80
>>
>> But if the compiler unrolls the loop by a factor of 4, the callsite to
>> bar() is duplicated 4 times and the profile will look like:
>>
>> #1: 10
>> #3: 20
>>
>> The sample count for #3 is 20 because all 4 callsites to bar() are
>> sampled 20 times each, and they shared the same debug loc (#3) so that 20
>> will be attributed to #3 (If one debugloc is mapped by multiple
>> instructions, the max sample count of these instructions is used as
>> debugloc's sample count).
>>
>> When loading this profile into compiler, it will think the loop trip
>> count is 2 instead of 8.
>>
>> Proposal:
>> When compiler duplicates code, it encodes the duplication info in the
>> debug info. As the duplication is not interesting to debugger, I propose to
>> encode this as part of the discriminator.
>>
>> There are 2 types of code duplication:
>>
>> 1. duplicated code are guaranteed to have the same execution count (e.g.
>> loop unroll and loop vectorize). We can record the duplication factor, for
>> the above example "4" is recorded in the discriminator.
>> 2. duplicated code that may have different execution count (e.g. loop
>> peel and gvn). For a same debugloc, a unique number is assigned to each
>> copy and encoded in the discriminator.
>>
>> Assume that the discriminator is uint32. The traditional discriminator is
>> less than 256, let's take 8 bit for it. For duplication factor (type 1
>> duplication), we assume the maximum unroll_factor * vectorize_factor is
>> less than 256, thus 8 bit for it. For unique number(type 2 duplication), we
>> assume code is at most duplicated 32 times, thus 5 bit for it. Overall, we
>> still have 11 free bits left in the discriminator encoding.
>>
>> Let's take the original source as an example, after loop unrolling and
>> peeling, the code may looks like:
>>
>> for (i = 0; i < N & 3; i+= 4) {
>>   foo();  // discriminator: 0x40
>>   foo();  // discriminator: 0x40
>>   foo();  // discriminator: 0x40
>>   foo();  // discriminator: 0x40
>> }
>> if (i++ < N) {
>>   foo();   // discriminator: 0x100
>>   if (i++ < N) {
>>     foo(); // discriminator: 0x200
>>     if (i++ < N) {
>>       foo();  // discriminator: 0x300
>>     }
>>   }
>> }
>>
>> The cost of this change would be increased debug_line size because: 1. we
>> are recording more discriminators 2. discriminators become larger and will
>> take more ULEB128 encoding.
>>
>> The benefit is that the sample pgo profile can accurately represent the
>> code execution frequency. And we do not need to introduce new building
>> blocks to debug info.
>>
>> Comments?
>>
>> Thanks,
>> Dehao
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161027/75e83e9c/attachment.html>


More information about the llvm-dev mailing list