[PATCH] D99123: [SampleFDO] Flow Sensitive Sample FDO (FSAFDO)

Mon Mar 29 11:10:34 PDT 2021

snehasish added inline comments.

================
Comment at: llvm/include/llvm/Support/FSAFDODiscriminator.h:19
+
+#define PASS_1_DIS_BIT_BEG 8
+#define PASS_1_DIS_BIT_END 13
----------------
davidxl wrote:
> snehasish wrote:
> > A few questions about the discriminator bits:
> > 
> > * Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each?
> > 
> > * How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit?
> > 
> > * If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?
> > A few questions about the discriminator bits:
> > 
> > * Depending on the transformation in the target pass the requirement of bits may be different, i.e. 5 bits for each may be too many or too few. Do you have any data to share about how many bits are used by each?
> 
> I assume most of the transformations produce few clones except for unrolling (which depends on unroll factor).
> 
> > 
> > * How do we alert authors of new target optimizations (or code refactoring) additional discriminator bits are needed to disambiguate? Would a late stage analysis only pass which enumerates different instructions with the same debug+discriminator info be useful to commit?
> 
> The problem with this is that the authors won't have any means to change anything.  Rong and I discussed about this. Longer term when this becomes and issue, increasing the size of the discriminator container type will be the way to go.
> 
> > 
> > * If I understand correctly, we bump the bit for each level of cloning. This seems to be a less efficient coding scheme, max 5 bits where by enumeration you could identify 31 clones? Have you considered other coding schemes?
> 
> The biggest advantage of fixed width is simplicity,  I think.
> 
> 
> The biggest advantage of fixed width is simplicity, I think.
Not sure if I made the point clearly, so repeating - I'm trying to distinguish between using 5 values (one per bit) in a pass vs 2^5-1 values which can be enumerated by 5 bits. For example, with a loop unroll factor of >5, this coding scheme will not be able to assign unique ids for all clones. Is this understanding correct?

I think redistributing the bits based on the pass transformations should be sufficient to avoid a more complex coding scheme. I agree using fixed width bits makes for a simple implementation and we should prefer this unless data shows otherwise.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99123/new/

https://reviews.llvm.org/D99123