[PATCH] D125755: [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat

Thu May 19 02:23:10 PDT 2022

dmgreen added subscribers: djtodoro, rahular-rrlogic, gsocshubham.
dmgreen added a comment.

Thanks for the comments.

In D125755#3523457 <https://reviews.llvm.org/D125755#3523457>, @efriedma wrote:

> There's a continual struggle between pushing towards canonical IR, vs, pushing towards things we think are cheap on some specific target.  Normally, the way we've resolved that is by distinguishing "early" vs. "late" optimizations: we try to push towards a canonical form early on to make the code easier to analyze, then we start doing optimizations like vectorization etc. using target-specific heuristics.  AggressiveInstCombine doesn't really have anything to do with "early" vs. "late"; if we want something that runs just before vectorization, we should probably add a dedicated pass.

It's not really about vectorizations - although that is where the biggest gains will come from. It's really about getting all the other cost based decisions correct throughout the llvm pipeline. If we don't then there will always be performance left on the table.

The inliner runs early and has always considered costs. As @spatel / @craig.topper said above, most other passes are directed by the target through the datalayout, it is just a different form of cost modelling.

I had considered moving the code that calculates costs in this patch into a separate function in the TTI. So AggressiveInstCombine can just call shouldChangeToFPSat, and that can be overridden by the target. The actual details inside shouldChangeToFPSat needn't be from the cost model, but in the case of fptosi.sat it might make the most sense. It's not always obvious when they will be profitable, and are usually custom legalized.

> If vectorization is involved, we have to worry about cost difference between vector and scalar versions.  For example, for vectors, we might want to use floating-point min, but for scalars we prefer integer min.  Not sure if this is actually true for any target, but worth considering.  If we need to deal with situations like that, we can't cleanly query the cost model, so we should prefer some unified approach.  For example, attach some range metadata to fptosi.sat, or add some sort of "combiner" to VPlan.

There would be other ways to tackle this exact issue, but it doesn't solve the problem in general. I think for operations where there is a vector form but not a scalar form then there should be combiners in VPlan. MULH for example has vector instructions but not a scalar form. There was an example in D88152 <https://reviews.llvm.org/D88152> of how that could work from a long time ago, but VPlan is still missing a few pieces (and has changed a lot since then).

We would also need to get the cost of `smin(smax(fptosi))` correct, but I don't believe we can cost model multiple-element nodes like that well in general. Single instructions are usually easy enough to cost model. Two instructions like `zext(load)` are possible but it starts to get ugly. Three or more just doesn't work.

And none of that gets D113291 <https://reviews.llvm.org/D113291> unstuck. Which is more about doing the transform early enough to benefit from other optimizations in the pipeline.

> That said, what targets are we worried about here?  I guess soft-float targets?  For targets with native floats, it's hard for me to imagine "nnan fptosi.sat" is significantly more expensive than "fptosi+min+max". It looks like isel currently doesn't take advantage of nnan, but it probably should.

Yeah - soft-float Thumb1 targets are where I see significant losses from doing this unconditionally. You can always have targets where the integer smin/smax are cheaper than the fp min/max.

> So I'm not opposed, but I also don't have a good sense about potential downside of allowing target-specific IR transforms earlier in the optimization pipeline. Clearly, there's demand to do this because we get this kind of request fairly regularly.

Do we know what the downsides presented in the past were? I've heard about an increased need for testing on all targets before, but that has really always been true with datalayout controlled combines. I believe that was more about the core of instcombine though, where more fundamental canonicalizations are happening. This and D113291 <https://reviews.llvm.org/D113291> I feel are more about higher level irreversible transforms.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125755/new/

https://reviews.llvm.org/D125755