<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Jan 16, 2013, at 11:26 PM, Dimitri Tcaciuc <<a href="mailto:dtcaciuc@gmail.com">dtcaciuc@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">Hello everyone,<div><br></div><div style="">For the context of question, I have a small loop written in a custom front-end which can be fairly accurately expressed with the following C program:</div><div style="">
<br></div><div style=""><div> struct Array {</div><div> double * data;</div><div> long n;</div><div> };</div><div><br></div><div> #define X 0</div><div> #define Y 1</div><div> #define Z 2</div>
<div><br></div><div><div> void f(struct Array * restrict d, struct Array * restrict out, const long n)</div><div> {</div><div> for (long i = 0; i < n; ++ i) {</div><div> for (long j = i + 1; j < n; ++ j) {</div>
<div> out->data[X] = d->data[i * 3 + X] * d->data[j * 3 + X];</div><div> out->data[Y] = d->data[i * 3 + Y] * d->data[j * 3 + Y];</div><div> out->data[Z] = d->data[i * 3 + Z] * d->data[j * 3 + Z];</div>
</div><div><div> }</div><div> }</div><div> }</div></div><div><br></div><div><br></div><div style="">I'm looking through the IR transformations during passes added by LLVMTargetMachine::addPassesToEmitFile and seeing something I could use some help explaining. The point of interest is between 'unreachableblockelim' and 'codegenprepare' passes. Here is the paste of IR after each pass</div>
<div style=""><br></div><div style=""> <a href="http://pastebin.com/42xLT4ZN">http://pastebin.com/42xLT4ZN</a></div><div style=""><br></div><div style=""><br></div><div style="">I've annotated 3 spots in the code with stars. In (1), after unreachableblockelim, addr89 is precomputed outside the loop once and is used in store in (2). However, in (3), after codegenprepare, there is now a bunch of math being done every loop iteration to get the address for the same store. Additionally, looks like the same thing is happening for several addresses above as well.</div>
<div style=""><br></div><div style="">Does this look right? Why would those calculations be moved back into the loop?</div></div></div></blockquote><br></div><div>Basically, it thinks that the loads within the loop are using a free(ish) addressing mode that your target will be able to fold into the load. So it sinks the address computations into the loop on the assumption that the addressing mode ISel will kick in, for a net same performance with less register pressure.</div><div><br></div><div>--Owen</div><br></body></html>