[PATCH] D37121: [DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)

Wed Sep 6 17:52:56 PDT 2017

hfinkel added inline comments.

================
Comment at: lib/Transforms/Scalar/DivRemHoist.cpp:85
+      // because division is an implicit component of remainder. The backend
+      // can do whatever is optimal for the div/rem pair.
+      DivInst->moveAfter(RemInst);
----------------
spatel wrote:
> hfinkel wrote:
> > spatel wrote:
> > > hfinkel wrote:
> > > > I don't understand what's going on here? Can you please explain in the comment. Why is this way cheap but hoisting the rem near the div is not? I'm guessing this has something to do with the fact that DAGCombiner::useDivRem isn't called for divisions if TLI.isIntDivCheap returns true, but is always called on remainders? If that's it, should we mirror that logic more directly here (i.e. somehow directly incorporate the result of calling TLI.isIntDivCheap)?
> > > This is independent of anything in the backend. That's why this transform is frustrating - it's half target-independent. :)
> > > 
> > > Let's see if some more words or repeating the formula would make it better:
> > > "Hoist the div into the rem block. No cost calculation is needed because division is an implicit component of remainder:
> > >  X % Y --> X - ((X / Y) * Y)
> > > If the target has a unified instruction for div/rem, then this will occur in a single instruction. If the target does not have a unified instruction for div/rem, then it must calculate remainder as (sub X, (mul (div X, Y), Y). Either way, hoisting division will be free."
> > That is much better, thanks. However, are you making that substitution? It looks like you're just moving the division in this case (i.e. only doing `DivInst->moveAfter(RemInst);`)?
> We can just move the division in this pass because we can assume that the backend will be able to simplify the common division when the div and rem are in the same block.
> 
> I suppose at this point we could decompose the dominating rem into div+mul+sub if we know the backend does not have a divrem. I don't think there would be any downside to that (other than we're repeating functionality that is known to exist in the backend)?
> We can just move the division in this pass because we can assume that the backend will be able to simplify the common division when the div and rem are in the same block.

That's okay, but you should say that in the comment. However...

> I suppose at this point we could decompose the dominating rem into div+mul+sub if we know the backend does not have a divrem. I don't think there would be any downside to that (other than we're repeating functionality that is known to exist in the backend)?

If this pass were running early in the pipeline, we'd definitely need to do this (because it would affect inlining/unrolling costs and the like). Because this runs relatively late, that's not a huge concern. However, I'd prefer that you do the decomposition here at the IR level anyway because a) you already have the code here to do it (so there's little added complexity) and b) in case there are any transformations later that are still using IR-level costs, and there very well might be in the backend (even target-specific ones), we might as well be kind to them and give them more-accurate costs.

https://reviews.llvm.org/D37121