[LLVMdev] A question about everyone's favorite constructs: NSW and NUW

Tue Jan 7 12:16:03 PST 2014

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Chandler Carruth" <chandlerc at gmail.com>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Monday, January 6, 2014 7:44:30 PM
> Subject: Re: [LLVMdev] A question about everyone's favorite constructs: NSW and NUW
> 
> 
> 
> 
> On Sun, Jan 5, 2014 at 12:17 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> 
> ----- Original Message -----
> > From: "Chandler Carruth" < chandlerc at gmail.com >
> > To: "LLVM Developers Mailing List" < llvmdev at cs.uiuc.edu >
> > Sent: Saturday, January 4, 2014 7:29:07 PM
> > Subject: [LLVMdev] A question about everyone's favorite constructs:
> > NSW and NUW
> > 
> > 
> > 
> > So, I know there are a lot of mixed feelings about NSW and NUW, but
> > I'm running into a common pattern where they really help:
> > 
> > 
> > void f(char *array, int i) {
> > // do some stuff...
> > g(array[i++]);
> > 
> > 
> > // do some more stuff...
> > g(array[i++]);
> > 
> > 
> > // more of the same
> > }
> > 
> > 
> > So, this kind of code comes up pretty frequently. Unrolled loops
> > with
> > an int IV[1], encoding, decoding, compression, etc. What we'd like
> > to do is take the sequence of N increments and turn them into N
> > addressing modes with a constant offset. Then add a small constant
> > at the end and test the flags if we're in a loop, etc. This is
> > *much
> > faster* than clobbering registers to do LEA or inc or add for each
> > one (and yea, we do LEA and use lots of registers currently).
> > 
> > 
> > The problem is that this is being compiled in 64-bit mode. So the
> > index is 32-bits and gets sign extended prior to the GEP. So we end
> > up with a lot of repetitions of:
> > 
> > 
> > %a = add nsw i32 %i, N # for some immediate small integer N
> > %b = sext i32 %i to i64
> > %ptr = gep inbounds i8* %base, i64 %b
> > 
> > 
> > When we try to match this pattern in the backend to an x86
> > addressing
> > mode we fail because we can't look through the sext. This is
> > particularly frustrating because we *should* be able to look
> > through
> > the sext due to the NSW bit on the add, but we don't preserve NSW
> > constraints into the DAG.
> > 
> > 
> > How can we fix this? I see the following options:
> > 
> > 
> > 1) Canonicalize the other way in the IR. This could take a few
> > different forms:
> > 1a) Widen math (where NSW or NUW allows) which is only used by an
> > *ext feeding a GEP index.
> > 1b) Widen all math which is only used by an *ext that targets a
> > legal
> > integer size.
> 
> I'm in favor of having NSW/NUW information available at the SDAG
> level (and even at the MI level), but for other reasons.
> 
> 
> Cool. Could you suggest how best to implement this (especially in the
> SDAG)? I don't like my ideas thus far.

I'm not sure that I have anything but the straightforward solution: Add a flags variable (and a few methods) to BinarySDNode; then go through all of the SDAG code with a fine-toothed comb, propagating the flags where appropriate.

If we're only concerned about operations that are directly represented in the IR, we could store a Value * to the original instruction (as we do for MMO). While this has the benefit of allowing for some global analysis with in the SDAG, it might be too limiting.

> 
> 
> For this problem, I vote for solution 1b. My rationale is that this
> would solve a problem that I currently have in the PowerPC backend
> for 64-bit targets in a general way:
> 
> On 64-bit PPC targets, return values are (zero or sign) extended to
> 64-bits. Here's a simple example that demonstrates the issue:
> 
> define signext i32 @foo(i1 zeroext %a) #0 {
> entry:
> %cond = select i1 %a, i32 7, i32 12
> ret i32 %cond
> }
> 
> compiles into something like this:
> 
> ...
> li r4, 7
> li r3, 12
> isel r3, r4, r3, 1
> rldicl r3, r3, 0, 32 <-- this is a completely unnecessary extension
> blr
> 
> Obviously, this affects cases other than when the extension is part
> of the return value (although the return value case is a pretty
> obvious code wart).
> 
> One possible solution to this that I've thought about (although not
> yet experimented with) is 'promoting' all 32-bit operations to
> 64-bit ones up front. That, however, might lead to a bunch of
> unnecessary extensions of its own. I think that the 1b solution
> would nicely solve this problem without introducing unnecessary
> extensions.
> 
> 
> 
> It feels like there are way better solutions to this. Why can't you
> just fix this by looking for a sext used by ret in a function which
> is already signext? It seems like you could match this in the SDAG
> and fix it very directly?
> 

I can do this, and will (I can test the # of sign bits or known masked bits and eliminate the extension in a fairly general way). Unfortunately, this only solves the problem in a single-BB sense (and, thus, can be defeated by an only-slightly-more-complicated example). A more general solution will also need to involved an IR-level transformation as well.

> 
> 
> One issue that might come up is that 64-bit operations might be more
> expensive (either in latency or in throughput) than the
> corresponding 32-bit ones (I'm thinking specifically about
> multiplication and division here, but if we're spilling a lot, there
> could also be stack-size and bandwidth effects), and we might want
> some target input on that.
> 
> 
> 
> Oh, its worse than this. I'm no longer liking #1 at all.
> 
> 
> 
> 
> If we hoist the extension over math which has flags allowing us to do
> so safely, we actually lose significant information. The wider the
> type on which we have the NSW flag, the less information we have
> about the known value range of the inputs. While this may not have a
> significant impact in many benchmarks, it seems like a very
> worrisome change to make when picking the canonical form because we
> essentially say that we will *never* be able to preserve this
> information even if some day it matters a lot. =/

Good point.

 -Hal

> 
> 
> I'm increasingly interested in getting NSW to reach the SDAG so that
> we can actually do the extension when we know we have a fast 64-bit
> operation available in the form of an addressing mode.

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory