[PATCH] D26420: Encode duplication factor from loop vectorization and loop unrolling to discriminator.

Dehao Chen via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 21 14:48:18 PST 2016


danielcdh added a comment.

Thanks for the review.

Any update comments about the proposed encoding before it's ready to land?

Thanks,
Dehao



================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:836
+    if (DIL && EncodeDuplicationInDiscriminators)
+      B.SetCurrentDebugLocation(DIL->cloneWithDuplicationFactor(UF * VF));
+    else
----------------
hfinkel wrote:
> danielcdh wrote:
> > hfinkel wrote:
> > > I apologize for rehashing this, but now I'm confused again. We have two situations:
> > > 
> > >  1. Loop is unrolled (or code is otherwise duplicated). In this case, the discriminator value must be different (so that the relevant counts are summed)
> > >  2. Instruction is vectorized. In this case we need a discriminator value with a duplication factor (so that the counts are multiplied by the duplication factor because each vector instruction represents DF scalar instructions).
> > > 
> > > And both situations can obviously be combined. It seems like, in general, we're trying to take a shortcut here: instead of giving each instruction from an unrolled loop a different discriminator (so that all of the relevant counts will be summed), we're giving them all the same discriminator with a duplication factor. This will work in some cases, but not if some of the loop iterations (or instructions therein) are executed conditionally. I don't see why this is worthwhile. Can you please explain?
> > You are right. For unrolled loop, the most accurate way would be assign a distinct discriminator for each cloned body, and use "sum" to compute the source count. We are taking a short-cut here to assign all copies with one discriminator (with duplication factor), and use max*dup_factor to compute the source count. This works fine when all cloned copy has uniformed behavior and thus similar execution count, otherwise will overcount the source.
> > 
> > So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size.
> > 
> > Maybe a better solution would be: if there is control flow inside the unrolled loop, then use distinct discriminator for each clone, otherwise use duplication factor.
> > 
> > In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch?
> > So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size.
> 
> Can you explain again why this saves on line-table size? 
> 
> > In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch?
> 
> That's fine by me.
> 
For the following case:

#1 for (i = 0; i < 4; i++)
#2   a[i] = b[i];

If we use duplication factor, the line table for the expanded code will be:

a[0] = b[0]; // line 2, discriminator 0x400
a[1] = b[1]; // line 2, discriminator 0x400
a[2] = b[2]; // line 2, discriminator 0x400
a[3] = b[3]; // line 2, discriminator 0x400

The all share the same location so there is no new line table entry for the clone code.

But with ditinct discriminator, the expanded code will be:

a[0] = b[0]; // line 2, discriminator 0x10000
a[1] = b[1]; // line 2, discriminator 0x20000
a[2] = b[2]; // line 2, discriminator 0x30000
a[3] = b[3]; // line 2, discriminator 0x40000

There are 2 issues:
1. each line has distinct location, so we need 4 entries in the debug line table.
2. the encoding for distinct discriminator is longer than duplication factor.


https://reviews.llvm.org/D26420





More information about the llvm-commits mailing list