[LLVMdev] A question about everyone's favorite constructs: NSW and NUW

Sun Jan 5 09:17:52 PST 2014

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Saturday, January 4, 2014 7:29:07 PM
> Subject: [LLVMdev] A question about everyone's favorite constructs: NSW and	NUW
> 
> 
> 
> So, I know there are a lot of mixed feelings about NSW and NUW, but
> I'm running into a common pattern where they really help:
> 
> 
> void f(char *array, int i) {
> // do some stuff...
> g(array[i++]);
> 
> 
> // do some more stuff...
> g(array[i++]);
> 
> 
> // more of the same
> }
> 
> 
> So, this kind of code comes up pretty frequently. Unrolled loops with
> an int IV[1], encoding, decoding, compression, etc. What we'd like
> to do is take the sequence of N increments and turn them into N
> addressing modes with a constant offset. Then add a small constant
> at the end and test the flags if we're in a loop, etc. This is *much
> faster* than clobbering registers to do LEA or inc or add for each
> one (and yea, we do LEA and use lots of registers currently).
> 
> 
> The problem is that this is being compiled in 64-bit mode. So the
> index is 32-bits and gets sign extended prior to the GEP. So we end
> up with a lot of repetitions of:
> 
> 
> %a = add nsw i32 %i, N # for some immediate small integer N
> %b = sext i32 %i to i64
> %ptr = gep inbounds i8* %base, i64 %b
> 
> 
> When we try to match this pattern in the backend to an x86 addressing
> mode we fail because we can't look through the sext. This is
> particularly frustrating because we *should* be able to look through
> the sext due to the NSW bit on the add, but we don't preserve NSW
> constraints into the DAG.
> 
> 
> How can we fix this? I see the following options:
> 
> 
> 1) Canonicalize the other way in the IR. This could take a few
> different forms:
> 1a) Widen math (where NSW or NUW allows) which is only used by an
> *ext feeding a GEP index.
> 1b) Widen all math which is only used by an *ext that targets a legal
> integer size.

I'm in favor of having NSW/NUW information available at the SDAG level (and even at the MI level), but for other reasons. For this problem, I vote for solution 1b. My rationale is that this would solve a problem that I currently have in the PowerPC backend for 64-bit targets in a general way:

On 64-bit PPC targets, return values are (zero or sign) extended to 64-bits. Here's a simple example that demonstrates the issue:

define signext i32 @foo(i1 zeroext %a) #0 {
entry:
  %cond = select i1 %a, i32 7, i32 12
  ret i32 %cond
}

compiles into something like this:

        ...
	li r4, 7
	li r3, 12
	isel r3, r4, r3, 1
	rldicl r3, r3, 0, 32 <-- this is a completely unnecessary extension
        blr

Obviously, this affects cases other than when the extension is part of the return value (although the return value case is a pretty obvious code wart).

One possible solution to this that I've thought about (although not yet experimented with) is 'promoting' all 32-bit operations to 64-bit ones up front. That, however, might lead to a bunch of unnecessary extensions of its own. I think that the 1b solution would nicely solve this problem without introducing unnecessary extensions.

One issue that might come up is that 64-bit operations might be more expensive (either in latency or in throughput) than the corresponding 32-bit ones (I'm thinking specifically about multiplication and division here, but if we're spilling a lot, there could also be stack-size and bandwidth effects), and we might want some target input on that.

In short, I would not be so hard on #3 ;) -- this is a legitimate problem that might not have a nice target-cost-independent solution, but I suspect that the benefits of using 1b as the new canonical form probably outweigh the negative side effects.

 -Hal

> 
> 
> 2) Add the NSW (and NUW) bits to the DAG, and query them in the
> address mode matching to match across (sext (op nsw ...)) and (zext
> (op nuw ...))
> 
> 
> 3) Add a complete hack to CodeGenPrepare (or similar) which does #1
> as a non-canonical step in the IR.
> 
> 
> 
> 
> I dislike #3 intensely -- we really want to be able to combine on
> this stuff in a canonical way, so I want the pattern to either
> always be (gep ... (sext (op nsw ...))) or (gep ... (op nsw (sext
> ...))). Also, it feels like quite a hack around the lack of fidelity
> in the backend.
> 
> 
> I'm perfectly fine with either #1 or #2. Doing #1 would require a lot
> of benchmarking to make sure the different canonicalization doesn't
> regress lots of things, and probably a few fixes where it does
> regress things. These are likely to be easy to find by looking for
> patterns that match sext.
> 
> 
> Doing #2 requires a plumbing these flags through the DAG. I don't yet
> know how hard that would be, but it doesn't sound too bad. The
> current approach I would probably take is to add an SDNodeProperty
> for NSW and NUW, and nodes for operations with those flags. But that
> means generalizing everything that currently only looks at ADD.
> There is probably a better way to express this in the DAG, so if
> this is the direction folks thing LLVM should go, point me at any
> ideas you have about how to best implement it.
> 
> 
> Currently, I would naively go after #1 just because I know exactly
> how to do that. But it doesn't feel quite right and I wonder if we
> should really invest the time in making sure the backend is aware of
> the NSW and NUW flags.
> 
> 
> -Chandler
> 
> 
> [1]: Note that we could fix this inside of loop bodies when its an
> induction variable by widening the induction variable. But that
> doesn't fix in the general case.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory