[PATCH] Propagation of profile samples through the CFG.

Fri Jan 10 13:38:07 PST 2014

On Fri, Jan 10, 2014 at 3:19 PM, Chandler Carruth <chandlerc at gmail.com> wrote:

>> > ================
>> > Comment at: lib/Transforms/Scalar/SampleProfile.cpp:578
>> > @@ +577,3 @@
>> > +      // members of its equivalence set.
>> > +      uint32_t &bb1_weight = BlockWeights[BB1];
>> > +      uint32_t &bb2_weight = BlockWeights[BB2];
>> > ----------------
>> > Chandler Carruth wrote:
>> >> bb1_weight = std::max(bb1_weight, bb2_weight);
>> > Naming convention here.
>>
>> OK, though I'm pretty confused. Your line_iterator patch uses this
>> convention. And, in fact, I've seen _ used as separator in several
>> places. Do we have different conventions for different parts of the
>> compiler? The coding guidelines don't seem to say anything about
>> having different conventions.
>
>
> The only exception is when the interface is mimicing a standard library
> interface. Then we try to follow the naming conventions of the standard
> library. So there are several iterators or similar standard-library-like
> interfaces that use a different naming convention. For general LLVM code,
> that doesn't apply.

Ah, OK. Thanks.

>
> Sorry for that confusion.
>
>>
>>
>> >
>> > ================
>> > Comment at: lib/Transforms/Scalar/SampleProfile.cpp:940
>> > @@ +939,3 @@
>> > +///
>> > +///    - If all the edges are known except one, and the weight of the
>> > +///      block is already known, the weight of the unknown edge will
>> > ----------------
>> > Incoming edges or outgoing edges? Or both?
>>
>> Incoming or outgoing. Clarified.
>>
>> > ================
>> > Comment at: lib/Transforms/Scalar/SampleProfile.cpp:937-938
>> > @@ +936,4 @@
>> > +///
>> > +///    - If B has a single predecessor/successor, then the weight
>> > +///      of that edge is the weight of the block.
>> > +///
>> > ----------------
>> > How do you handle contradictions?
>>
>> From user annotations? I would give a warning, if I could. Right now,
>> it's a silent override.
>
>
> No, contradiction in the samples. You can have random samples set up
> situations that are inexplicable. Such as zero count errors on loop headers.

Well, yes. But that's why we use the equivalence classes, every block
in the same equivalence class gets the same weight (the maximum of all
weights). Given the imprecise nature of the input, there is not a lot
of strict validation we can do.

Blocks in the same class translates to blocks that must execute the
same number of times, essentially. So, assigning the same weight to
all of them is reasonable. You could warn about this, but it would be
noisy.

>> > ================
>> > Comment at: lib/Transforms/Scalar/SampleProfile.cpp:951-953
>> > @@ +950,5 @@
>> > +///
>> > +/// Since this propagation is not guaranteed to finalize for every CFG,
>> > we
>> > +/// only allow it to proceed for a limited number of iterations
>> > (controlled
>> > +/// by -sample-profile-max-propagate-iterations).
>> > +///
>> > ----------------
>> > This is a really frustrating property. Can it be easily avoided?
>>
>> Not without changing the algorithm itself.
>>
>> > The first of the above steps is obviously deterministic.
>> >
>> > For the second and third, I feel like the following is guaranteed to
>> > make forward progress and terminate:
>> > 0) Add all blocks to a list to process.
>> > 1) Take the first block out of the list
>> > 2) if we can add a weight to an edge connected to this block (including
>> > a self referential edge) do so and add the other block the edge connects to
>> > to the list (if not already in the list)
>> > 3) repeat #2 until we cannot add a weight to any edge connected to this
>> > block
>> > 4) if the list is not empty, go to #1
>> > We only add a block to the list if we add a weight to an edge, so we can
>> > at most visit V+E blocks.
>>
>> Yes, I thought about a similar scheme (in fact, mimicking standard
>> value propagation). But this changes the algorithm. Which is something
>> I was trying to avoid at first.
>
>
> OK, definitely leave a FIXME about this. LLVM works very hard to avoid this
> style of "iterate enough times" algorithms, and it feels like the closed
> form algorithm accomplishes the same fundamental goal so I don't expect this
> to be a big source of different results.

Done.

Thanks. Diego.