<div dir="ltr">So, I know there are a lot of mixed feelings about NSW and NUW, but I'm running into a common pattern where they really help:<div><br></div><div>void f(char *array, int i) {</div><div> // do some stuff...</div>
<div> g(array[i++]);</div><div><br></div><div> // do some more stuff...</div><div> g(array[i++]);</div><div><br></div><div> // more of the same</div><div>}</div><div><br></div><div>So, this kind of code comes up pretty frequently. Unrolled loops with an int IV[1], encoding, decoding, compression, etc. What we'd like to do is take the sequence of N increments and turn them into N addressing modes with a constant offset. Then add a small constant at the end and test the flags if we're in a loop, etc. This is *much faster* than clobbering registers to do LEA or inc or add for each one (and yea, we do LEA and use lots of registers currently).</div>
<div><br></div><div>The problem is that this is being compiled in 64-bit mode. So the index is 32-bits and gets sign extended prior to the GEP. So we end up with a lot of repetitions of:</div><div><br></div><div> %a = add nsw i32 %i, N # for some immediate small integer N</div>
<div> %b = sext i32 %i to i64</div><div> %ptr = gep inbounds i8* %base, i64 %b</div><div><br></div><div>When we try to match this pattern in the backend to an x86 addressing mode we fail because we can't look through the sext. This is particularly frustrating because we *should* be able to look through the sext due to the NSW bit on the add, but we don't preserve NSW constraints into the DAG.</div>
<div><br></div><div>How can we fix this? I see the following options:</div><div><br></div><div>1) Canonicalize the other way in the IR. This could take a few different forms:</div><div>1a) Widen math (where NSW or NUW allows) which is only used by an *ext feeding a GEP index.</div>
<div>1b) Widen all math which is only used by an *ext that targets a legal integer size.</div><div><br></div><div>2) Add the NSW (and NUW) bits to the DAG, and query them in the address mode matching to match across (sext (op nsw ...)) and (zext (op nuw ...))</div>
<div><br></div><div>3) Add a complete hack to CodeGenPrepare (or similar) which does #1 as a non-canonical step in the IR.</div><div><br></div><div><br></div><div>I dislike #3 intensely -- we really want to be able to combine on this stuff in a canonical way, so I want the pattern to either always be (gep ... (sext (op nsw ...))) or (gep ... (op nsw (sext ...))). Also, it feels like quite a hack around the lack of fidelity in the backend.</div>
<div><br></div><div>I'm perfectly fine with either #1 or #2. Doing #1 would require a lot of benchmarking to make sure the different canonicalization doesn't regress lots of things, and probably a few fixes where it does regress things. These are likely to be easy to find by looking for patterns that match sext. </div>
<div><br></div><div>Doing #2 requires a plumbing these flags through the DAG. I don't yet know how hard that would be, but it doesn't sound too bad. The current approach I would probably take is to add an SDNodeProperty for NSW and NUW, and nodes for operations with those flags. But that means generalizing everything that currently only looks at ADD. There is probably a better way to express this in the DAG, so if this is the direction folks thing LLVM should go, point me at any ideas you have about how to best implement it.</div>
<div><br></div><div>Currently, I would naively go after #1 just because I know exactly how to do that. But it doesn't feel quite right and I wonder if we should really invest the time in making sure the backend is aware of the NSW and NUW flags.</div>
<div><br></div><div>-Chandler</div><div><br></div><div>[1]: Note that we could fix this inside of loop bodies when its an induction variable by widening the induction variable. But that doesn't fix in the general case.</div>
</div>