[PATCH] D93556: [CSSPGO][llvm-profgen] Compress recursive cycles in calling context

Thu Jan 14 15:15:14 PST 2021

wmi added inline comments.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.h:108-110
+          // Populate the non-common-suffix part of the adjacent sequence.
+          std::copy(BeginIter + Right + 1, BeginIter + Left + I + 1,
+                    BeginIter + End);
----------------
wmi wrote:
> wlei wrote:
> > wmi wrote:
> > > wlei wrote:
> > > > wmi wrote:
> > > > > Could you give an example of what the sequence looks like after the population?
> > > > Sure, an example is added in the code comments
> > > Thanks for the example. I can understand where are the redundent comparisons but I havn't understood how to skip the windows by doing the population (std::copy). I don't see the std::copy change anything for the example below. Could you clarify? 
> > > 
> > > Similarly I don't see how the duplicated str is removed from Context sequence when Duplication is found above (if (Right - Left == I) is true). Could you also clarify that?
> > Yeah, our original design is to use a new vector to store the compressed result, then we changed to use in-place algorithm to reduce the memcpy which make readability worse. 
> > rewrote the code comments, added another example, please see this version is clear or not?
> > 
> Thanks, now I understand it better. However, I am surprised seems you doesn't remove the redundent comparison which I thought you tried to remove.
> 
> suppose you have the following sequence:
> a b c d b c d
> 
> Considering I==3.
> In the first iteration. abc and dbc are compared, so Left==0 and Right==2, The algorithm find the first non-common place is Left==0.  Then it updates Right to be Left + I = 3 at the end of the iteration. Note in this iteration, it have compared the common parts so it already knows the 2th/5th char in the string are the same --> 'b', and 3th/6th chars in the string are the same --> 'd'.
> 
> In the next iteration, the algorithm executes the comparison loop again:
>        uint32_t Left = Right;
>        while (Right - Left < I && Context[Left] == Context[Left + I]) {
>           // Find the longest suffix inside the window. When stops, Left points
>           // at the diverging point in the current sequence.
>           Left--;
>        }
> With Right==3, it will compare 3 chars before it can confirm there is duplication found, so it will compare 2th char and 5th char, 4th char and 6th char again which it already know they are the same in the first iteration. 
> 
> Do I understand it correctly?
> 
> 
Sorry: some typos:

> 2th/5th char in the string are the same --> 'b', and 3th/6th chars in the string are the same --> 'd'.

'd' ==> 'c'

> so it will compare 2th char and 5th char, 4th char and 6th char again which it already know they are the same in the first iteration.

4th char and 6th char again ==> 3th char and 6th char again

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93556/new/

https://reviews.llvm.org/D93556