[LLVMdev] A question about everyone's favorite constructs: NSW and NUW

Sat Jan 4 17:29:07 PST 2014

So, I know there are a lot of mixed feelings about NSW and NUW, but I'm
running into a common pattern where they really help:

void f(char *array, int i) {
  // do some stuff...
  g(array[i++]);

  // do some more stuff...
  g(array[i++]);

  // more of the same
}

So, this kind of code comes up pretty frequently. Unrolled loops with an
int IV[1], encoding, decoding, compression, etc. What we'd like to do is
take the sequence of N increments and turn them into N addressing modes
with a constant offset. Then add a small constant at the end and test the
flags if we're in a loop, etc. This is *much faster* than clobbering
registers to do LEA or inc or add for each one (and yea, we do LEA and use
lots of registers currently).

The problem is that this is being compiled in 64-bit mode. So the index is
32-bits and gets sign extended prior to the GEP. So we end up with a lot of
repetitions of:

  %a = add nsw i32 %i, N  # for some immediate small integer N
  %b = sext i32 %i to i64
  %ptr = gep inbounds i8* %base, i64 %b

When we try to match this pattern in the backend to an x86 addressing mode
we fail because we can't look through the sext. This is particularly
frustrating because we *should* be able to look through the sext due to the
NSW bit on the add, but we don't preserve NSW constraints into the DAG.

How can we fix this? I see the following options:

1) Canonicalize the other way in the IR. This could take a few different
forms:
1a) Widen math (where NSW or NUW allows) which is only used by an *ext
feeding a GEP index.
1b) Widen all math which is only used by an *ext that targets a legal
integer size.

2) Add the NSW (and NUW) bits to the DAG, and query them in the address
mode matching to match across (sext (op nsw ...)) and (zext (op nuw ...))

3) Add a complete hack to CodeGenPrepare (or similar) which does #1 as a
non-canonical step in the IR.

I dislike #3 intensely -- we really want to be able to combine on this
stuff in a canonical way, so I want the pattern to either always be (gep
... (sext (op nsw ...))) or (gep ... (op nsw (sext ...))). Also, it feels
like quite a hack around the lack of fidelity in the backend.

I'm perfectly fine with either #1 or #2. Doing #1 would require a lot of
benchmarking to make sure the different canonicalization doesn't regress
lots of things, and probably a few fixes where it does regress things.
These are likely to be easy to find by looking for patterns that match
sext.

Doing #2 requires a plumbing these flags through the DAG. I don't yet know
how hard that would be, but it doesn't sound too bad. The current approach
I would probably take is to add an SDNodeProperty for NSW and NUW, and
nodes for operations with those flags. But that means generalizing
everything that currently only looks at ADD. There is probably a better way
to express this in the DAG, so if this is the direction folks thing LLVM
should go, point me at any ideas you have about how to best implement it.

Currently, I would naively go after #1 just because I know exactly how to
do that. But it doesn't feel quite right and I wonder if we should really
invest the time in making sure the backend is aware of the NSW and NUW
flags.

-Chandler

[1]: Note that we could fix this inside of loop bodies when its an
induction variable by widening the induction variable. But that doesn't fix
in the general case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140104/81c75a9f/attachment.html>