<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 17, 2016 at 11:12 AM, Michael Kuperstein <span dir="ltr"><<a href="mailto:mkuper@google.com" target="_blank">mkuper@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">mkuper added a comment.<br>
<br>
Right, I'll add an explicit test (the test in induction_plus is *sort of* that, but not quite), thanks David!<br>
<br>
Regarding the extra movdqa - I think the copy may be necessary.<br>
The problem is that the loop body needs both to modify the current IV (because of two-address instructions) and keep it so that it can generate the new IV.<br>
<br></blockquote><div><br></div><div>You are right :)</div><div><br></div><div>David </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
GCC does something similar:<br>
<br>
jmp .L3<br>
[...]<br>
.L5:<br>
movdqa %xmm4, %xmm1<br>
.L3:<br>
movdqa %xmm1, %xmm4<br>
pxor %xmm6, %xmm1<br>
movdqa %xmm5, %xmm2<br>
addl $1, %eax<br>
cmpl $250, %eax<br>
paddd %xmm7, %xmm4<br>
pcmpgtd %xmm1, %xmm2<br>
movdqa %xmm1, %xmm3<br>
punpckhdq %xmm2, %xmm1<br>
punpckldq %xmm2, %xmm3<br>
paddq %xmm3, %xmm0<br>
paddq %xmm1, %xmm0<br>
jne .L5<br>
<br>
The difference is that the loop is rotated, so there is no copy on the first iteration, but other than that, we still have the same two movdqas per iteration.<br>
<br>
In any case, the extra mov disappears once we have AVX and three-address instructions, so we no longer update the current IV destructively.<br>
<br>
<br>
<a href="http://reviews.llvm.org/D20315" rel="noreferrer" target="_blank">http://reviews.llvm.org/D20315</a><br>
<br>
<br>
<br>
</blockquote></div><br></div></div>