[PATCH] D26420: Encode duplication factor from loop vectorization and loop unrolling to discriminator.

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 21 14:40:54 PST 2016


hfinkel added inline comments.


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:836
+    if (DIL && EncodeDuplicationInDiscriminators)
+      B.SetCurrentDebugLocation(DIL->cloneWithDuplicationFactor(UF * VF));
+    else
----------------
danielcdh wrote:
> hfinkel wrote:
> > I apologize for rehashing this, but now I'm confused again. We have two situations:
> > 
> >  1. Loop is unrolled (or code is otherwise duplicated). In this case, the discriminator value must be different (so that the relevant counts are summed)
> >  2. Instruction is vectorized. In this case we need a discriminator value with a duplication factor (so that the counts are multiplied by the duplication factor because each vector instruction represents DF scalar instructions).
> > 
> > And both situations can obviously be combined. It seems like, in general, we're trying to take a shortcut here: instead of giving each instruction from an unrolled loop a different discriminator (so that all of the relevant counts will be summed), we're giving them all the same discriminator with a duplication factor. This will work in some cases, but not if some of the loop iterations (or instructions therein) are executed conditionally. I don't see why this is worthwhile. Can you please explain?
> You are right. For unrolled loop, the most accurate way would be assign a distinct discriminator for each cloned body, and use "sum" to compute the source count. We are taking a short-cut here to assign all copies with one discriminator (with duplication factor), and use max*dup_factor to compute the source count. This works fine when all cloned copy has uniformed behavior and thus similar execution count, otherwise will overcount the source.
> 
> So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size.
> 
> Maybe a better solution would be: if there is control flow inside the unrolled loop, then use distinct discriminator for each clone, otherwise use duplication factor.
> 
> In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch?
> So the cons of using duplication factor for unrolled loop is less accuracy. The pros is that it saves on line table size.

Can you explain again why this saves on line-table size? 

> In this patch, I only added the logic for the duplication factor. I will have another patch to add the logic to "add distinct discriminator for clones" in another patch. How about I add a FIXME here and address loop-unroll-with-control-flow in that patch?

That's fine by me.



https://reviews.llvm.org/D26420





More information about the llvm-commits mailing list