<div dir="ltr">Hi Vishal,<div><br></div><div>I am still a little confused - can you share your new code?  Be aware that you're relying on undefined behaviour: the "unisgned int" you are adding to is overflowing, and clang will use this fact.</div><div><br></div><div>I was able to replicate your initial results but I found that if I ran the counting code twice, the second time both types of loop took much the same time:</div><div><br></div><div><div>$ make; and ./loop </div><div>make: Nothing to be done for `all'.</div><div>First run:</div><div>0:834</div><div>0:405</div><div>Second run:</div><div>0:381</div><div>0:384</div></div><div><br></div><div>(on a 3.5GHz Haswell).</div><div><br></div><div>I hacked together a very simplistic version using a memory barrier to defeat the optimizer instead of adding (which means the loop actually gets compiled: <a href="http://url.godbolt.org/hqwkx">http://url.godbolt.org/hqwkx</a> ). Additionally, I used the CPU cycle counter on x86 (again, very simplistically, there's a huge art to this: I can share other resources if you're interested). Counting to 2.4 billion takes an appreciable amount of time now :)</div><div><br></div><div>The results from that:</div><div><br></div><div><div>/t/lop $ make; and ./loop2</div><div>First run:</div><div>Counting up:   2528555165 cycles</div><div>Counting down: 2524361064 cycles</div><div>Second run:</div><div>Counting up:   2495025808 cycles</div><div>Counting down: 2511668857 cycles</div></div><div><br></div><div>Source and makefile attached.</div><div><br></div><div>If you look at the code clang is producing in both cases (see the URL) it has already transformed the code into effectively the same: adding and/or subtracting until hitting zero. At least in this case, where the loop counter value itself isn't used.</div><div><br></div><div>Sorry I haven't added much to the discussion, but hopefully this email is useful in gauging the relevance of the feature.</div><div><br></div><div>Best regards, Matt</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 18, 2015 at 10:33 AM, vishal sarda <span dir="ltr"><<a href="mailto:vishalksarda@gmail.com" target="_blank">vishalksarda@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span style="font-size:13px">Hi Matt, All, </span><div style="font-size:13px"><br></div><div style="font-size:13px">Thanks for looking into this and your suggestions.</div><div style="font-size:13px"><br></div><div style="font-size:13px">I compiled the program with -O3 optimization level (clang -O3) on X86_64 target.</div><div style="font-size:13px">After your suggestion, i ran the program with iteration of 10000 runs and found that average runtimes are (attaching data collected as well)</div><div style="font-size:13px"><br></div><div style="font-size:13px">Forward loop traverse : 2.006 milli seconds</div><div style="font-size:13px">Reverse loop traverse : 1.531 milli seconds</div><div style="font-size:13px"><br></div><div style="font-size:13px">Yes, i agree that this sample program may not be sufficient to say that loop reversal traversing will always be faster, however, difference in runtime is visible though. And that's where the profitability calculation comes into picture. </div><div style="font-size:13px"><br></div><div style="font-size:13px">I found mention of loop traverse in various papers (links mentioned in previous mail), and hence thought of implementing it.</div><div style="font-size:13px"><br></div><div style="font-size:13px">I am not sure though if its really helpful. Above papers mentioned that it might not be beneficial in itself, but opens up the opportunity for other optimizations.</div><div style="font-size:13px"><br></div><div style="font-size:13px">Suggestions on this are most welcomed. Waiting for others to pitch in too.</div><span class=""><div style="font-size:13px"><br></div><div style="font-size:13px">Regards,</div><div style="font-size:13px">Vishal Sarda,</div><div style="font-size:13px">3rd Year Undergraduate,</div><div style="font-size:13px">Department of Computer Engineering,</div><div style="font-size:13px">College of Engineering, Pune</div></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 18, 2015 at 1:20 AM, Matt Godbolt <span dir="ltr"><<a href="mailto:matt@godbolt.org" target="_blank">matt@godbolt.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi,<br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 17, 2015 at 2:00 PM, vishal sarda <span dir="ltr"><<a href="mailto:vishalksarda@gmail.com" target="_blank">vishalksarda@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">

        
<p style="margin-bottom:0.35cm;line-height:115%">[snip]</p><span>

<p style="margin-bottom:0.35cm;line-height:115%">Loop counting up 

           0.15 ms</p>

<p style="margin-bottom:0.35cm;line-height:115%">Loop counting

down       0.08 ms</p></span></div></blockquote><div>I'm no llvm expert, but as an interest bystander: I suspect you compiled the source without any optimizations applied - I tried to replicate this behaviour and found the optimzer happily replaces both the inner loops you had with a constant, and thus I got the same time on both loops. (e.g. see <a href="http://goo.gl/aXFkVb" target="_blank">http://goo.gl/aXFkVb</a> ) </div><div><br></div><div>Benchmarks of this nature where the run time is so small are notoriously prone to measurement errors, so I'd be a little careful drawing conclusions from the sample you listed. Also; what architecture did you measure on, and what spec machine?</div><div><br></div><div>Not at all to dissuade you from investigating this optimization! I'm just a little sceptical of the benchmark you posted!</div><div><br></div><div>Best regards, </div><span><font color="#888888"><div><br></div><div>Matt</div><div><br></div></font></span></div>

</div></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Matt</div>

</div>