[PATCH] D37121: [DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)

Thu Mar 1 16:34:19 PST 2018

jlebar added a comment.

After some sleuthing, I discovered that the addition of this new (morally good) pass introduced a moderate regression on NVPTX.

Previously, when given a div/rem pair operating over i64, we'd check if the operands fit in i32 in BypassSlowDivision (during codegenprepare).  If so, we'd replace the div/rem pair with i32 div/rem ops.  Then in the NVPTX selection dag, we'd compute the rem from this div using the same formulation as here.  But since the div was i32, the rem computation would also happen in i32.

After this change, the rem is replaced with i64 arithmetic early on in the pipeline.  BypassSlowDivision replaces the i64 div with an i32 div as before.  But because this happens during codegenprepare, nobody ever changes the rem computation to happen in 32 bits.

I'm not sure what is the right fix for this.  We could teach instcombine to strength-reduce i64 divides where the operands are known to fit into 32 bits into i32 divs?  It already does this for zext'ed operands, but not in general.  But I'm not sure what is the principled way to tell instcombine whether and how to do this strength reduction.  (Like, maybe we want to convert 64-bit divs into 32-bit divs, but do we want to convert 64-bit divs into 8-bit divs, if it fits?)

Repository:
  rL LLVM

https://reviews.llvm.org/D37121