[PATCH] D46760: [InstCombine] Enhance narrowUDivURem.

Sat Jun 9 04:14:13 PDT 2018

lebedev.ri added a subscriber: mkazantsev.
lebedev.ri added a comment.

In https://reviews.llvm.org/D46760#1123713, @spatel wrote:

> In https://reviews.llvm.org/D46760#1123181, @bixia wrote:
>
> > What Sanjoy suggested doesn't (completely) solve the problem we have , where UDIV and UREM are used to calculate the dimensional indices from a linear index, and the dimensionally indices are then used to access tensors of different shapes (for example, two tensors being accessed both require broadcasting but along different dimensions). 
> >  With the current state of https://reviews.llvm.org/D47113,  would it be acceptable if I revise my change here to only narrow i64 udiv/urem to i32 using the added patterns?
>
>
> We're missing the fundamental motivation. This isn't about div/rem based on the IR example posted on May 12. That code is canonicalized to use shifts. So we must account for the fact that the source code could have been written using shifts in the first place. Otherwise, we're chasing an incomplete solution.
>
> So let's look at that example after -instcombine:
>
>   define void @narrow_long_chain_with_udiv_urem(i64* %result) {
>     %num1 = call i32 @get_number(), !range !0
>     %num2 = call i32 @get_number(), !range !0
>     %1 = shl nuw nsw i32 %num1, 6
>     %addconv = add nuw nsw i32 %1, %num2
>     %2 = urem i32 %addconv, 384
>     %tmp2 = zext i32 %2 to i64
>     %lane_id = and i64 %tmp2, 31
>     %3 = lshr i64 %tmp2, 2             <---
>     %tmp4 = shl nuw nsw i64 %3, 5      <---
>     %tmp5 = or i64 %tmp4, 224
>     %tmp6 = or i64 %lane_id, %tmp5
>     store i64 %tmp6, i64* %result, align 4
>     ret void
>   }
>  
>
>
> I've pointed out what I think is the critical sequence: shift right followed by shift left. Why is that a better canonicalization than:
>
>   %and = and i64 %x, -4
>   %left = shl i64 %and, 3
>   
>
> https://rise4fun.com/Alive/Gmc
>
> If we had that transform, then everything would be shrunken to i32 already as we want/expect:
>  ...
>  I suspect there's some codegen concern about having a large 'and' constant mask, so if that's a problem, we can add a reverse canonicalization to the DAG to transform back to shift from 'and'. The 'and' seems better for analysis and enabling other transforms (as seen here), so I think we should have this canonicalization in IR.

As suggested in https://bugs.llvm.org/show_bug.cgi?id=37603, i have added such tests in https://reviews.llvm.org/rL334347.
But then i have noticed this: https://github.com/llvm-mirror/llvm/blob/b155a7cf565c5af36bc98ff92917054ade29a2f0/lib/Transforms/InstCombine/InstCombineShifts.cpp#L625-L649

  // Be careful about hiding shl instructions behind bit masks. They are used
  // to represent multiplies by a constant, and it is important that simple
  // arithmetic expressions are still recognizable by scalar evolution.
  // The inexact versions are deferred to DAGCombine, so we don't hide shl
  // behind a bit mask.

That was added by @spatel in https://reviews.llvm.org/rL293570 / https://reviews.llvm.org/rL293435.
So it seems the current reason why we don't handle different constants in `(x >> y) << y` (but do handle them in `(x << y) >> y`!) is SCEV.
Can anyone familiar with SCEV comment, @mkazantsev perhaps?

Repository:
  rL LLVM

https://reviews.llvm.org/D46760