[PATCH] D125287: [SLP] Improve root steering by building actual trees instead of calling the look-ahead heuristic

Tue May 10 16:38:10 PDT 2022

vporpo added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2028-2036
+      SmallVector<Value *, 2> Roots(
+          {Candidates[I].first, Candidates[I].second});
+      buildTree(Roots, /*UserIgnoreList=*/None,
+                RootLookAheadMaxSize.getValue());
+      if (isTreeTinyAndNotFullyVectorizable())
+        continue;
+      reorderTopToBottom();
----------------
ABataev wrote:
> vporpo wrote:
> > vdmitrie wrote:
> > > ABataev wrote:
> > > > ABataev wrote:
> > > > > vdmitrie wrote:
> > > > > > vporpo wrote:
> > > > > > > ABataev wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > vdmitrie wrote:
> > > > > > > > > > > vporpo wrote:
> > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > vporpo wrote:
> > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > vdmitrie wrote:
> > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > vdmitrie wrote:
> > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > vdmitrie wrote:
> > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > vporpo wrote:
> > > > > > > > > > > > > > > > > > > > > > vdmitrie wrote:
> > > > > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > > > > I'm afraid of increasing compile time. All this stuff includes scheduling, which may take lots of time for large basic blocks.
> > > > > > > > > > > > > > > > > > > > > > > Yep, I agree. this will be more expensive for compile time.
> > > > > > > > > > > > > > > > > > > > > > > What about combining both worlds? I mean first try to use lookahead heuristics to get the single best. And if we can't narrow down to just one pair only then switch into probing via building trees.
> > > > > > > > > > > > > > > > > > > > > > > I believe it will not happen too frequently. We can also increase lookahead depth to make it even less frequent when we need to build vectorizable tree.
> > > > > > > > > > > > > > > > > > > > > > What if we set a flag to disable scheduling for these types of fast tree estimations? 
> > > > > > > > > > > > > > > > > > > > > There is a problem with this fix that it tries to avoid/mask the problem, not fix it. The fact that LookAhead.getScoreAtLevelRec does not work here means that we're doing something wrong there or missing something. Would be good to try to improve LookAhead.getScoreAtLevelRec
> > > > > > > > > > > > > > > > > > > > This problem does actually not have a perfect solution. This is a heuristics and it will always have something missed.
> > > > > > > > > > > > > > > > > > > > You can improve it to fix one particular case but there will be eventually another instance of the same problem.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Yes, sure. But if the heuristic misses something, better to try tweak the heuristic rather than using actual cost/vectorization attempt and just ignore the heuristic, which exists exactly for this purpose.
> > > > > > > > > > > > > > > > > > No, " just ignore" is not what I said.  We definitely should use the heuristics. But when it happens that we came to its limits then we could use more fine grained tools.
> > > > > > > > > > > > > > > > > I rather doubt that building a graph can be called a "fine grained tool". This is different tool, not intended for the analysis. We can extract some functionality out of there (to a separate function/member function) and make the heuristic more smart, but not use the build graph directly.
> > > > > > > > > > > > > > > > > Same problem with the heuristic may exist in some other places, we need to handle them too.
> > > > > > > > > > > > > > > > Cost modeling is the tool I referred to as a "fine grained tool". We have to build a graph to run it. So it's sort of necessary evil. In this sense trying to turn off scheduler for the purpose of using CM as finer grained heuristics does not sound like a crazy idea.
> > > > > > > > > > > > > > > That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we have heuristic. If it is not good, need to tweak it.
> > > > > > > > > > > > > > One issue with the lookahead search is that it is trying both sides of commutative operations, so this doesn't scale if we need to increase the depth. So we need a different tool for testing deeper trees. I agree that there may be something that the lookahead heuristic is missing here, but I would argue that it is the wrong tool for the job. The buildTree() logic is a much more accurate for this. Reusing the existing buildTree logic with some compromises (e.g, limiting size and disabling scheduling) seems like a good compromise to me.
> > > > > > > > > > > > > I would oppose that. I would not use buildTree() for estimation. If there is a part, which can be used for better estimation, better to extract it to a separate function/class and the reuse it in the heuristic and actual graph building separately.
> > > > > > > > > > > > What is the reasoning for opposing it?
> > > > > > > > > > > > That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we have heuristic. If it is not good, need to tweak it.
> > > > > > > > > > > 
> > > > > > > > > > > It would be nice if you explained why you are against using CM for selecting a candidate.
> > > > > > > > > > > Cost model as its  name suggests is supposed to be used for modeling.
> > > > > > > > > > Bad design decision. We have 4 stages.
> > > > > > > > > > 1. Analysis.
> > > > > > > > > > 2. Tree building.
> > > > > > > > > > 3. Cost estimation.
> > > > > > > > > > 4. Codegen.
> > > > > > > > > > You want to make a circular dependence between Analysis and Tree building/Cost estimation.
> > > > > > > > > > 
> > > > > > > > > > But I'm not against reusing some of the code from buildTree()/cost estimation for the analysis phase. I'm just saying that this functionality must be extracted and then reused for the analysis and for the tree building/cost estimation (if possible, to reduce maintenance burden).
> > > > > > > > > Because it is not modelling, it is analysis
> > > > > > > > I mean, you want to use it not for modelling but for the analysis
> > > > > > > I think the high level design that you showed is not very accurate. We are actually doing multiple "tree builds" and "cost estimations" before generating code even in the current design. I don't see the "circular dependency" issue being introduced by this.
> > > > > > Distinction between the two is moot.
> > > > > > 
> > > > > Thta's not because we mix analysis/modelling/estimation, but just because cost estimation shows that the tree is not profitable. The question is not about number of attempts, it is about the design.
> > > > Weak argument.
> > > What about alternative solution which is kind of step back but buildTree+CM is not used for analysis?
> > > 
> > > if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable.
> > > 
> > I don't see any strong argument against using buildTree+CostModel as long as buildTree is fast enough. The argument that this is somehow changing the design makes little sense because the pass is already following this buildTree+CostModel design. The only exception is perhaps for the lookahead search which is actually an example of a design to avoid: it is using its own custom tree-building and cost modeling, and requires special maintenance.
> > 
> > Also the argument that we should extract some of the functionality and place it in a separate component is not very strong. Replicating similar functionality in multiple places is something that a good design should avoid. It just increases the maintenance overhead and will inevitably lead to divergence.
> > What about alternative solution which is kind of step back but buildTree+CM is not used for analysis?
> > 
> > if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable.
> > 
> 
> Yes, it may work as a quick solution.
> if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable.
That won't work because lookahead finds a single best, but it turns out to be the wrong one.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125287/new/

https://reviews.llvm.org/D125287