<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Jan 5, 2014 at 12:17 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">----- Original Message -----<br>

> From: "Chandler Carruth" <<a href="mailto:chandlerc@gmail.com">chandlerc@gmail.com</a>><br>

> To: "LLVM Developers Mailing List" <<a href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a>><br>

> Sent: Saturday, January 4, 2014 7:29:07 PM<br>

> Subject: [LLVMdev] A question about everyone's favorite constructs: NSW and   NUW<br>

><br>

><br>

><br>

> So, I know there are a lot of mixed feelings about NSW and NUW, but<br>

> I'm running into a common pattern where they really help:<br>

><br>

><br>

> void f(char *array, int i) {<br>

> // do some stuff...<br>

> g(array[i++]);<br>

><br>

><br>

> // do some more stuff...<br>

> g(array[i++]);<br>

><br>

><br>

> // more of the same<br>

> }<br>

><br>

><br>

> So, this kind of code comes up pretty frequently. Unrolled loops with<br>

> an int IV[1], encoding, decoding, compression, etc. What we'd like<br>

> to do is take the sequence of N increments and turn them into N<br>

> addressing modes with a constant offset. Then add a small constant<br>

> at the end and test the flags if we're in a loop, etc. This is *much<br>

> faster* than clobbering registers to do LEA or inc or add for each<br>

> one (and yea, we do LEA and use lots of registers currently).<br>

><br>

><br>

> The problem is that this is being compiled in 64-bit mode. So the<br>

> index is 32-bits and gets sign extended prior to the GEP. So we end<br>

> up with a lot of repetitions of:<br>

><br>

><br>

> %a = add nsw i32 %i, N # for some immediate small integer N<br>

> %b = sext i32 %i to i64<br>

> %ptr = gep inbounds i8* %base, i64 %b<br>

><br>

><br>

> When we try to match this pattern in the backend to an x86 addressing<br>

> mode we fail because we can't look through the sext. This is<br>

> particularly frustrating because we *should* be able to look through<br>

> the sext due to the NSW bit on the add, but we don't preserve NSW<br>

> constraints into the DAG.<br>

><br>

><br>

> How can we fix this? I see the following options:<br>

><br>

><br>

> 1) Canonicalize the other way in the IR. This could take a few<br>

> different forms:<br>

> 1a) Widen math (where NSW or NUW allows) which is only used by an<br>

> *ext feeding a GEP index.<br>

> 1b) Widen all math which is only used by an *ext that targets a legal<br>

> integer size.<br>

<br>

</div></div>I'm in favor of having NSW/NUW information available at the SDAG level (and even at the MI level), but for other reasons.</blockquote><div><br></div><div>Cool. Could you suggest how best to implement this (especially in the SDAG)? I don't like my ideas thus far.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> For this problem, I vote for solution 1b. My rationale is that this would solve a problem that I currently have in the PowerPC backend for 64-bit targets in a general way:<br>


<br>

On 64-bit PPC targets, return values are (zero or sign) extended to 64-bits. Here's a simple example that demonstrates the issue:<br>

<br>

define signext i32 @foo(i1 zeroext %a) #0 {<br>

entry:<br>

  %cond = select i1 %a, i32 7, i32 12<br>

  ret i32 %cond<br>

}<br>

<br>

compiles into something like this:<br>

<br>

        ...<br>

        li r4, 7<br>

        li r3, 12<br>

        isel r3, r4, r3, 1<br>

        rldicl r3, r3, 0, 32 <-- this is a completely unnecessary extension<br>

        blr<br>

<br>

Obviously, this affects cases other than when the extension is part of the return value (although the return value case is a pretty obvious code wart).<br>

<br>

One possible solution to this that I've thought about (although not yet experimented with) is 'promoting' all 32-bit operations to 64-bit ones up front. That, however, might lead to a bunch of unnecessary extensions of its own. I think that the 1b solution would nicely solve this problem without introducing unnecessary extensions.<br>

</blockquote><div><br></div><div>It feels like there are way better solutions to this. Why can't you just fix this by looking for a sext used by ret in a function which is already signext? It seems like you could match this in the SDAG and fix it very directly?</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

One issue that might come up is that 64-bit operations might be more expensive (either in latency or in throughput) than the corresponding 32-bit ones (I'm thinking specifically about multiplication and division here, but if we're spilling a lot, there could also be stack-size and bandwidth effects), and we might want some target input on that.<br>

</blockquote><div><br></div><div>Oh, its worse than this. I'm no longer liking #1 at all.</div><div><br></div><div><br></div><div>If we hoist the extension over math which has flags allowing us to do so safely, we actually lose significant information. The wider the type on which we have the NSW flag, the less information we have about the known value range of the inputs. While this may not have a significant impact in many benchmarks, it seems like a very worrisome change to make when picking the canonical form because we essentially say that we will *never* be able to preserve this information even if some day it matters a lot. =/</div>

<div><br></div><div>I'm increasingly interested in getting NSW to reach the SDAG so that we can actually do the extension when we know we have a fast 64-bit operation available in the form of an addressing mode.</div>

</div></div></div>