[PATCH] D37121: [DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)

Mon Mar 5 07:39:29 PST 2018

spatel added a comment.

In https://reviews.llvm.org/D37121#1026812, @jlebar wrote:

> In https://reviews.llvm.org/D37121#1025333, @spatel wrote:
>
> > Can you post an IR example or file a bug that shows the failure? If BypassSlowDivision can get it, but InstCombine can not, then the difference comes down to using computeKnownBits?
>
>
> Sure, something like:
>
>   target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
>   target triple = "nvptx64-nvidia-cuda"
>  
>   define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
>     %b = and i64 %a, 65535
>     %div = udiv i64 %b, 42
>     %rem = urem i64 %b, 42
>     store i64 %div, i64* %ptr1
>     store i64 %rem, i64* %ptr2
>     ret void
>   }
>

Are you confident that the problem is limited to cases with a constant div/rem operand and masking of the variable that could be replaced by a trunc? If so, then we could add a narrow pattern match fix without using computeKnownBits:

Name: udiv_shrink
%b = and i32 %a, 65535
%r = udiv i32 %b, 42

  =>

%t = trunc i32 %a to i16
%u = udiv i16 %t, 42
%r = zext i16 %u to i32

https://rise4fun.com/Alive/EHK

I was worried that canEvaluateZExtd() would try to invert that transform, but either by oversight or intention, we don't widen udiv/urem there like most binops.

Repository:
  rL LLVM

https://reviews.llvm.org/D37121