[PATCH] D99123: [SampleFDO] Flow Sensitive Sample FDO (FSAFDO)

Tue Mar 30 15:15:15 PDT 2021

snehasish added inline comments.

================
Comment at: llvm/include/llvm/Support/FSAFDODiscriminator.h:19
+
+#define PASS_1_DIS_BIT_BEG 8
+#define PASS_1_DIS_BIT_END 13
----------------
davidxl wrote:
> snehasish wrote:
> > davidxl wrote:
> > > snehasish wrote:
> > > > A few questions about the discriminator bits:
> > > > 
> > > > * Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each?
> > > > 
> > > > * How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit?
> > > > 
> > > > * If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?
> > > > A few questions about the discriminator bits:
> > > > 
> > > > * Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each?
> > > 
> > > I assume most of the transformations produce few clones except for unrolling (which depends on unroll factor).
> > > 
> > > > 
> > > > * How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit?
> > > 
> > > The problem with this is that the authors won't have any means to change anything.  Rong and I discussed about this. Longer term when this becomes and issue, increasing the size of the discriminator container type will be the way to go.
> > > 
> > > > 
> > > > * If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?
> > > 
> > > The biggest advantage of fixed width is simplicity,  I think.
> > > 
> > > 
> > > The biggest advantage of fixed width is simplicity, I think.
> > Not sure if I made the point clearly, so repeating - I'm trying to distinguish between using 5 values (one per bit) in a pass vs 2^5-1 values which can be enumerated by 5 bits. For example, with a loop unroll factor of >5, this coding scheme will not be able to assign unique ids for all clones. Is this understanding correct?
> > 
> > I think redistributing the bits based on the pass transformations should be sufficient to avoid a more complex coding scheme. I agree using fixed width bits makes for a simple implementation and we should prefer this unless data shows otherwise.
> > 
> > 
> Rong can answer this question more thoroughly. By just looking at line 373 of FlowSensitiveSampleProfile.cpp file, it seems it is actually sequentially increasing FSD, instead of bumping the bit?
Thanks for the pointer and Rong clarified the usage. My understanding was incorrect and the encoding space is not as limited as I thought. Since there is no measurable performance difference the current approach seems simplest.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99123/new/

https://reviews.llvm.org/D99123