[PATCH] D46760: [InstCombine] Enhance narrowUDivURem.

Wed Jun 6 08:50:54 PDT 2018

spatel added a comment.

In https://reviews.llvm.org/D46760#1123181, @bixia wrote:

> What Sanjoy suggested doesn't (completely) solve the problem we have , where UDIV and UREM are used to calculate the dimensional indices from a linear index, and the dimensionally indices are then used to access tensors of different shapes (for example, two tensors being accessed both require broadcasting but along different dimensions). 
>  With the current state of https://reviews.llvm.org/D47113,  would it be acceptable if I revise my change here to only narrow i64 udiv/urem to i32 using the added patterns?

We're missing the fundamental motivation. This isn't about div/rem based on the IR example posted on May 12. That code is canonicalized to use shifts. So we must account for the fact that the source code could have been written using shifts in the first place. Otherwise, we're chasing an incomplete solution.

So let's look at that example after -instcombine:

  define void @narrow_long_chain_with_udiv_urem(i64* %result) {
    %num1 = call i32 @get_number(), !range !0
    %num2 = call i32 @get_number(), !range !0
    %1 = shl nuw nsw i32 %num1, 6
    %addconv = add nuw nsw i32 %1, %num2
    %2 = urem i32 %addconv, 384
    %tmp2 = zext i32 %2 to i64
    %lane_id = and i64 %tmp2, 31
    %3 = lshr i64 %tmp2, 2             <---
    %tmp4 = shl nuw nsw i64 %3, 5      <---
    %tmp5 = or i64 %tmp4, 224
    %tmp6 = or i64 %lane_id, %tmp5
    store i64 %tmp6, i64* %result, align 4
    ret void
  }

I've pointed out what I think is the critical sequence: shift right followed by shift left. Why is that a better canonicalization than:

  %and = and i64 %x, -4
  %left = shl i64 %and, 3

https://rise4fun.com/Alive/Gmc

If we had that transform, then everything would be shrunken to i32 already as we want/expect:

  define void @narrow_long_chain_with_shifts(i64* %result) {
    %num1 = call i32 @get_number(), !range !0
    %num2 = call i32 @get_number(), !range !0
    %1 = shl nuw nsw i32 %num1, 6
    %addconv = add nuw nsw i32 %1, %num2
    %2 = urem i32 %addconv, 384
    %3 = and i32 %2, 31
    %4 = shl nuw nsw i32 %2, 3
    %5 = and i32 %4, 3840
    %6 = or i32 %5, 224
    %tmp61 = or i32 %3, %6
    %tmp6 = zext i32 %tmp61 to i64
    store i64 %tmp6, i64* %result, align 4
    ret void
  }

I suspect there's some codegen concern about having a large 'and' constant mask, so if that's a problem, we can add a reverse canonicalization to the DAG to transform back to shift from 'and'. The 'and' seems better for analysis and enabling other transforms (as seen here), so I think we should have this canonicalization in IR.

Repository:
  rL LLVM

https://reviews.llvm.org/D46760