[llvm] r274098 - [ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition.

Wed Jun 29 17:03:20 PDT 2016

That’s an interesting case ☺.

Sorry, that should be sdiv/srem -> udiv/urem.

Also I was wrong with the loop-rotate I metioned. Loop-rotate is actually in the pipeline before indvars. In D18867, adding loop-rotate before indvars just happened to work for those 2 test cases, but it doesn’t solve  the fundamental problem. The fundamental problem is that SCEV’s isKnownPredicate lacks the ability to recognize some NSW additions as non-negative.  The initial change in D18867 used isKnownNonNegative of ValueTracking to assist isKnownPredicate, and fixed the FE test. But this is sort of hacky, and Sanjoy suggested we should teach isKnownPredicate to recognize NSW additions. Unfortunately, I haven’t found a good solution. There are some discussions/analysis in the comments of D18867.

In D21773, I changed all indices to “i + 1” because isKnownPredicate could recognize “i +nsw 1” as non-negative, but not “i +nsw 2” or “i +nsw 3”, then this test passes with the change in D18867. We don’t need to change the test if we can solve the “trunc/zext” problem, though.

From: Craig Topper [mailto:craig.topper at gmail.com]
Sent: Wednesday, June 29, 2016 3:42 PM
To: Huang, Li1 <li1.huang at intel.com>
Cc: Sanjoy Das <sanjoy at playingwithpointers.com>; Adam Nemet <anemet at apple.com>; llvm-commits <llvm-commits at lists.llvm.org>
Subject: Re: [llvm] r274098 - [ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition.

sdiv also prevents converting divide by power of 2 into a shift. That's what was happening in the loop I was looking at. For x86 we ended up with a multiple instruction abs idiom, a shift for the divide, and then undoing the abs.

On Wed, Jun 29, 2016 at 12:20 PM, Huang, Li1 <li1.huang at intel.com<mailto:li1.huang at intel.com>> wrote:
Actually, D18777 is doing the same thing as Craig's change, and is based on the same motivation: udiv/urem -> sdiv/srem. Udiv/urem is generally faster than sdiv/srem on most archs.

The reason it breaks the FE test is that it converts some sexts to zexts before indvars (during instcombine), but indvars cannot recognize that the zext is unnecessary and should be removed. Instead, indvars inserts truncs before zexts after widening the IV.

D18867 was proposed to fix this problem, this change teaches indvars that zext of a signed value which is known non-negative is unnecessary. However, this change itself is not enough, a loop-rotate pass is needed before indvars for this change to work.

The test uses O1 but O1 pipeline doesn't have a loop-rotate before indvars, neither does O2 or O3. So I proposed to change the test in D21773.

Thanks,
Li Huang

-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com<mailto:sanjoy at playingwithpointers.com>]
Sent: Wednesday, June 29, 2016 12:35 AM
To: Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>>
Cc: Adam Nemet <anemet at apple.com<mailto:anemet at apple.com>>; llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>>; Huang, Li1 <li1.huang at intel.com<mailto:li1.huang at intel.com>>
Subject: Re: [llvm] r274098 - [ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition.

I think I have to redact my "I suspect this isn't anything complicated" tone. :)

To do the right thing in cases like the one you pastebinned, we'll have to track `IsSigned` on individual `NarrowIVDefUse` instances.

Then for instance, in

   %i.050 = phi i32 [ %add15, %for.body ], [ 0, %for.body.preheader ]
   %idxprom45 = zext i32 %i.050 to i64
   %arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom45
   %0 = load i32, i32* %arrayidx, align 4
   %sub = add nsw i32 %i.050, -1
   %idxprom1 = sext i32 %sub to i64
   %arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %idxprom1
   %1 = load i32, i32* %arrayidx2, align 4

we'll first replace `%idxprom45` with the zero extended wide IV, and then when we get to `%sub`, we'll replace it with `add %wide.zexted.iv, -1` but the corresponding `NarrowIVDefUse` we push to the worklist will have to remember that this new wide add instruction is really a `sext` of `%sub` (and accordingly handle its users).

I'll be on vacation from 30th June to 10th of July, but I should be able to get to this once I'm back (unfortunately, I probably won't have time before I leave).  If you want to fix this and have someone else or me (once I'm back) review it that's fine too.

Thanks,
-- Sanjoy

--
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160630/6674153c/attachment.html>