[PATCH] D99123: [SampleFDO] Flow Sensitive Sample FDO (FSAFDO)

David Li via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 23 13:19:42 PDT 2021


davidxl added a comment.

Regarding the staging of the patch, here is the  suggestion:

1. Replace DF with FSDescriminator.

In this patch, FSdescriminator needs to be added/inserted in the last phase (with ranges of discriminator sections reserved but not used), and there will be no new sample loading passes added. The functionality of current DF should be mostly preserved.

2. Add other AddDiscriminator passes

3. Add new sampleProfile loader passes one by one.

For 1),  the profile reading offline tool should be able to handle two versions of formats to allow co-existence of profiles.



================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1178
+  if (EnableFSDiscriminator && !FSNoFinalDiscrim)
+    addPass(createAddFSDiscriminatorsPass(PASS_LAST_DIS_BIT_BEG,
+                                          PASS_LAST_DIS_BIT_END));
----------------
xur wrote:
> davidxl wrote:
> > Is it necessary for this pass? BranchFolding does not create new clones, but merge them, so discriminator subsections can be reused (even though after the branch folding, some of the discriminator in that section gets removed)?
> I had this is mainly for the tail duplication in block placement pass. Conceptually it for all the new clones in the pipeline.
> I have the statistics of the counter for different rounds of discriminators. We do have plenty of this.
> 
> That said, more of the performance are from the round before block placement. If I disable this, the performance change is within the noise range (for the benchmark I used).
Can probably make use this discriminator range for other purposes (target optimizations)


================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1478
+    std::string FSFile = getFSProfileFile();
+    if (!FSFile.empty())
+      addPass(createFSProfileLoaderPass(FSFile, PASS_2_DIS_BIT_BEG,
----------------
xur wrote:
> davidxl wrote:
> > I can see the importance of adding pass to add discriminator after MBP due to tail dup in MBP, but how important (performance wise) it is to load sample profile again for branch folding pass?
> This is from my experiments. The default is off anyway.  I will probably remove this.
If it is not important for performance, suggest making it  (the sample profile loading part) off by default to avoid unnecessary compile time increase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99123/new/

https://reviews.llvm.org/D99123



More information about the llvm-commits mailing list