[LLVMdev] A question about everyone's favorite constructs: NSW and NUW

Mon Jan 6 17:44:30 PST 2014

On Sun, Jan 5, 2014 at 12:17 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Chandler Carruth" <chandlerc at gmail.com>
> > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> > Sent: Saturday, January 4, 2014 7:29:07 PM
> > Subject: [LLVMdev] A question about everyone's favorite constructs: NSW
> and   NUW
> >
> >
> >
> > So, I know there are a lot of mixed feelings about NSW and NUW, but
> > I'm running into a common pattern where they really help:
> >
> >
> > void f(char *array, int i) {
> > // do some stuff...
> > g(array[i++]);
> >
> >
> > // do some more stuff...
> > g(array[i++]);
> >
> >
> > // more of the same
> > }
> >
> >
> > So, this kind of code comes up pretty frequently. Unrolled loops with
> > an int IV[1], encoding, decoding, compression, etc. What we'd like
> > to do is take the sequence of N increments and turn them into N
> > addressing modes with a constant offset. Then add a small constant
> > at the end and test the flags if we're in a loop, etc. This is *much
> > faster* than clobbering registers to do LEA or inc or add for each
> > one (and yea, we do LEA and use lots of registers currently).
> >
> >
> > The problem is that this is being compiled in 64-bit mode. So the
> > index is 32-bits and gets sign extended prior to the GEP. So we end
> > up with a lot of repetitions of:
> >
> >
> > %a = add nsw i32 %i, N # for some immediate small integer N
> > %b = sext i32 %i to i64
> > %ptr = gep inbounds i8* %base, i64 %b
> >
> >
> > When we try to match this pattern in the backend to an x86 addressing
> > mode we fail because we can't look through the sext. This is
> > particularly frustrating because we *should* be able to look through
> > the sext due to the NSW bit on the add, but we don't preserve NSW
> > constraints into the DAG.
> >
> >
> > How can we fix this? I see the following options:
> >
> >
> > 1) Canonicalize the other way in the IR. This could take a few
> > different forms:
> > 1a) Widen math (where NSW or NUW allows) which is only used by an
> > *ext feeding a GEP index.
> > 1b) Widen all math which is only used by an *ext that targets a legal
> > integer size.
>
> I'm in favor of having NSW/NUW information available at the SDAG level
> (and even at the MI level), but for other reasons.

Cool. Could you suggest how best to implement this (especially in the
SDAG)? I don't like my ideas thus far.

> For this problem, I vote for solution 1b. My rationale is that this would
> solve a problem that I currently have in the PowerPC backend for 64-bit
> targets in a general way:
>
> On 64-bit PPC targets, return values are (zero or sign) extended to
> 64-bits. Here's a simple example that demonstrates the issue:
>
> define signext i32 @foo(i1 zeroext %a) #0 {
> entry:
>   %cond = select i1 %a, i32 7, i32 12
>   ret i32 %cond
> }
>
> compiles into something like this:
>
>         ...
>         li r4, 7
>         li r3, 12
>         isel r3, r4, r3, 1
>         rldicl r3, r3, 0, 32 <-- this is a completely unnecessary extension
>         blr
>
> Obviously, this affects cases other than when the extension is part of the
> return value (although the return value case is a pretty obvious code wart).
>
> One possible solution to this that I've thought about (although not yet
> experimented with) is 'promoting' all 32-bit operations to 64-bit ones up
> front. That, however, might lead to a bunch of unnecessary extensions of
> its own. I think that the 1b solution would nicely solve this problem
> without introducing unnecessary extensions.
>

It feels like there are way better solutions to this. Why can't you just
fix this by looking for a sext used by ret in a function which is already
signext? It seems like you could match this in the SDAG and fix it very
directly?

>
> One issue that might come up is that 64-bit operations might be more
> expensive (either in latency or in throughput) than the corresponding
> 32-bit ones (I'm thinking specifically about multiplication and division
> here, but if we're spilling a lot, there could also be stack-size and
> bandwidth effects), and we might want some target input on that.
>

Oh, its worse than this. I'm no longer liking #1 at all.

If we hoist the extension over math which has flags allowing us to do so
safely, we actually lose significant information. The wider the type on
which we have the NSW flag, the less information we have about the known
value range of the inputs. While this may not have a significant impact in
many benchmarks, it seems like a very worrisome change to make when picking
the canonical form because we essentially say that we will *never* be able
to preserve this information even if some day it matters a lot. =/

I'm increasingly interested in getting NSW to reach the SDAG so that we can
actually do the extension when we know we have a fast 64-bit operation
available in the form of an addressing mode.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140106/cb153e0e/attachment.html>