<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 29, 2021 at 11:39 AM David Li via Phabricator <<a href="mailto:reviews@reviews.llvm.org">reviews@reviews.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">davidxl added inline comments.<br>

<br>

<br>

================<br>

Comment at: llvm/include/llvm/Support/FSAFDODiscriminator.h:19<br>

+<br>

+#define PASS_1_DIS_BIT_BEG 8<br>

+#define PASS_1_DIS_BIT_END 13<br>

----------------<br>

snehasish wrote:<br>

> davidxl wrote:<br>

> > snehasish wrote:<br>

> > > A few questions about the discriminator bits:<br>

> > > <br>

> > > * Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each?<br>

> > > <br>

> > > * How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit?<br>

> > > <br>

> > > * If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?<br>

> > > A few questions about the discriminator bits:<br>

> > > <br>

> > > * Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each?<br>

> > <br>

> > I assume most of the transformations produce few clones except for unrolling (which depends on unroll factor).<br>

> > <br>

> > > <br>

> > > * How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit?<br>

> > <br>

> > The problem with this is that the authors won't have any means to change anything.  Rong and I discussed about this. Longer term when this becomes and issue, increasing the size of the discriminator container type will be the way to go.<br>

> > <br>

> > > <br>

> > > * If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?<br>

> > <br>

> > The biggest advantage of fixed width is simplicity,  I think.<br>

> > <br>

> > <br>

> > The biggest advantage of fixed width is simplicity, I think.<br>

> Not sure if I made the point clearly, so repeating - I'm trying to distinguish between using 5 values (one per bit) in a pass vs 2^5-1 values which can be enumerated by 5 bits. For example, with a loop unroll factor of >5, this coding scheme will not be able to assign unique ids for all clones. Is this understanding correct?<br>

> <br>

> I think redistributing the bits based on the pass transformations should be sufficient to avoid a more complex coding scheme. I agree using fixed width bits makes for a simple implementation and we should prefer this unless data shows otherwise.<br>

> <br>

> <br>

Rong can answer this question more thoroughly. By just looking at line 373 of FlowSensitiveSampleProfile.cpp file, it seems it is actually sequentially increasing FSD, instead of bumping the bit?<br></blockquote><div>I choose to have a fixed number of bits for each target pass mainly for simplicity. You are right that if the loop unrolls by a factor >32, the clones after 32th will not get the new discriminator. This does happen in the benchmark I used -- but the number occurrences are small. </div><div>I could use a more elaborate coding scheme,  but the coding itself will use some bits. </div><div>FYI, I did try to use more bits for each pass, but I did not see much difference in term of the performance (to the benchmarks I used)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

CHANGES SINCE LAST ACTION<br>

  <a href="https://reviews.llvm.org/D99123/new/" rel="noreferrer" target="_blank">https://reviews.llvm.org/D99123/new/</a><br>

<br>

<a href="https://reviews.llvm.org/D99123" rel="noreferrer" target="_blank">https://reviews.llvm.org/D99123</a><br>

<br>

</blockquote></div></div>