<div dir="ltr"><span style="font-size:13px">Hi Matt, All, </span><div style="font-size:13px"><br></div><div style="font-size:13px">Thanks for looking into this and your suggestions.</div><div style="font-size:13px"><br></div><div style="font-size:13px">I compiled the program with -O3 optimization level (clang -O3) on X86_64 target.</div><div style="font-size:13px">After your suggestion, i ran the program with iteration of 10000 runs and found that average runtimes are (attaching data collected as well)</div><div style="font-size:13px"><br></div><div style="font-size:13px">Forward loop traverse : 2.006 milli seconds</div><div style="font-size:13px">Reverse loop traverse : 1.531 milli seconds</div><div style="font-size:13px"><br></div><div style="font-size:13px">Yes, i agree that this sample program may not be sufficient to say that loop reversal traversing will always be faster, however, difference in runtime is visible though. And that's where the profitability calculation comes into picture. </div><div style="font-size:13px"><br></div><div style="font-size:13px">I found mention of loop traverse in various papers (links mentioned in previous mail), and hence thought of implementing it.</div><div style="font-size:13px"><br></div><div style="font-size:13px">I am not sure though if its really helpful. Above papers mentioned that it might not be beneficial in itself, but opens up the opportunity for other optimizations.</div><div style="font-size:13px"><br></div><div style="font-size:13px">Suggestions on this are most welcomed. Waiting for others to pitch in too.</div><div style="font-size:13px"><br></div><div style="font-size:13px">Regards,</div><div style="font-size:13px">Vishal Sarda,</div><div style="font-size:13px">3rd Year Undergraduate,</div><div style="font-size:13px">Department of Computer Engineering,</div><div style="font-size:13px">College of Engineering, Pune</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 18, 2015 at 1:20 AM, Matt Godbolt <span dir="ltr"><<a href="mailto:matt@godbolt.org" target="_blank">matt@godbolt.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi,<br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 17, 2015 at 2:00 PM, vishal sarda <span dir="ltr"><<a href="mailto:vishalksarda@gmail.com" target="_blank">vishalksarda@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">

        
<p style="margin-bottom:0.35cm;line-height:115%">[snip]</p><span class="">

<p style="margin-bottom:0.35cm;line-height:115%">Loop counting up 

           0.15 ms</p>

<p style="margin-bottom:0.35cm;line-height:115%">Loop counting

down       0.08 ms</p></span></div></blockquote><div>I'm no llvm expert, but as an interest bystander: I suspect you compiled the source without any optimizations applied - I tried to replicate this behaviour and found the optimzer happily replaces both the inner loops you had with a constant, and thus I got the same time on both loops. (e.g. see <a href="http://goo.gl/aXFkVb" target="_blank">http://goo.gl/aXFkVb</a> ) </div><div><br></div><div>Benchmarks of this nature where the run time is so small are notoriously prone to measurement errors, so I'd be a little careful drawing conclusions from the sample you listed. Also; what architecture did you measure on, and what spec machine?</div><div><br></div><div>Not at all to dissuade you from investigating this optimization! I'm just a little sceptical of the benchmark you posted!</div><div><br></div><div>Best regards, </div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Matt</div><div><br></div></font></span></div>

</div></div>

</blockquote></div><br></div>