[PATCH] D37121: [DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 5 07:39:29 PST 2018
spatel added a comment.
In https://reviews.llvm.org/D37121#1026812, @jlebar wrote:
> In https://reviews.llvm.org/D37121#1025333, @spatel wrote:
>
> > Can you post an IR example or file a bug that shows the failure? If BypassSlowDivision can get it, but InstCombine can not, then the difference comes down to using computeKnownBits?
>
>
> Sure, something like:
>
> target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
> target triple = "nvptx64-nvidia-cuda"
>
> define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
> %b = and i64 %a, 65535
> %div = udiv i64 %b, 42
> %rem = urem i64 %b, 42
> store i64 %div, i64* %ptr1
> store i64 %rem, i64* %ptr2
> ret void
> }
>
Are you confident that the problem is limited to cases with a constant div/rem operand and masking of the variable that could be replaced by a trunc? If so, then we could add a narrow pattern match fix without using computeKnownBits:
Name: udiv_shrink
%b = and i32 %a, 65535
%r = udiv i32 %b, 42
=>
%t = trunc i32 %a to i16
%u = udiv i16 %t, 42
%r = zext i16 %u to i32
https://rise4fun.com/Alive/EHK
I was worried that canEvaluateZExtd() would try to invert that transform, but either by oversight or intention, we don't widen udiv/urem there like most binops.
Repository:
rL LLVM
https://reviews.llvm.org/D37121
More information about the llvm-commits
mailing list