[llvm] r274098 - [ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition.

Wed Jun 29 12:20:16 PDT 2016

Actually, D18777 is doing the same thing as Craig's change, and is based on the same motivation: udiv/urem -> sdiv/srem. Udiv/urem is generally faster than sdiv/srem on most archs.

The reason it breaks the FE test is that it converts some sexts to zexts before indvars (during instcombine), but indvars cannot recognize that the zext is unnecessary and should be removed. Instead, indvars inserts truncs before zexts after widening the IV.

D18867 was proposed to fix this problem, this change teaches indvars that zext of a signed value which is known non-negative is unnecessary. However, this change itself is not enough, a loop-rotate pass is needed before indvars for this change to work. 

The test uses O1 but O1 pipeline doesn't have a loop-rotate before indvars, neither does O2 or O3. So I proposed to change the test in D21773. 

Thanks,
Li Huang

-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com] 
Sent: Wednesday, June 29, 2016 12:35 AM
To: Craig Topper <craig.topper at gmail.com>
Cc: Adam Nemet <anemet at apple.com>; llvm-commits <llvm-commits at lists.llvm.org>; Huang, Li1 <li1.huang at intel.com>
Subject: Re: [llvm] r274098 - [ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition.

I think I have to redact my "I suspect this isn't anything complicated" tone. :)

To do the right thing in cases like the one you pastebinned, we'll have to track `IsSigned` on individual `NarrowIVDefUse` instances.

Then for instance, in

   %i.050 = phi i32 [ %add15, %for.body ], [ 0, %for.body.preheader ]
   %idxprom45 = zext i32 %i.050 to i64
   %arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom45
   %0 = load i32, i32* %arrayidx, align 4
   %sub = add nsw i32 %i.050, -1
   %idxprom1 = sext i32 %sub to i64
   %arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %idxprom1
   %1 = load i32, i32* %arrayidx2, align 4

we'll first replace `%idxprom45` with the zero extended wide IV, and then when we get to `%sub`, we'll replace it with `add %wide.zexted.iv, -1` but the corresponding `NarrowIVDefUse` we push to the worklist will have to remember that this new wide add instruction is really a `sext` of `%sub` (and accordingly handle its users).

I'll be on vacation from 30th June to 10th of July, but I should be able to get to this once I'm back (unfortunately, I probably won't have time before I leave).  If you want to fix this and have someone else or me (once I'm back) review it that's fine too.

Thanks,
-- Sanjoy