<div dir="ltr">Right, the intrinsic issue is mostly orthogonal (and hard - so far, nobody came up with a really good proposal.)<div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 4, 2016 at 8:46 AM, Alexey Bataev <span dir="ltr"><<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div bgcolor="#FFFFFF" text="#000000">

<p>Hi Suyog,</p>

<p>thanks for your comments.</p>

<p>1. I believe intrinsic is another problem and must be implemented in a different patch.</p>

<p>2. Checked it, works only for fast-math ops.<br>

</p>

<pre class="m_5336677415844803805moz-signature" cols="72">Best regards,

Alexey Bataev</pre><div><div class="h5">

<div class="m_5336677415844803805moz-cite-prefix">On 10/04/2016 05:41 PM, suyog sarda wrote:<br>

</div>

<blockquote type="cite">

<div dir="ltr">As far as i understand this patch tries to vectorize the horizontal sum in an unrolled loop in following manner : 

<div><br>

</div>

<div>vec0 = shuffle<p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7]></div>

<div>vec1 = shuffle vec0 <p[4], p[5], p[6], p[7],undef, undef, undef, undef></div>

<div>vec2 = add vec0, vec1                ---------> this will result in <p[0]+p[4], p[1]+p[5], p[2]+p[6], p[3]+p[7], undef, undef, undef, undef></div>

<div>vec3 = shuffle vec2 <p[2]+p[6], p[3]+p[7], undef, undef, undef, undef, undef, undef></div>

<div>vec4 = add vec2, vec3                ---------> this will result in <p[0]+p[4]+p[2]+p[6], p[1]+p[5]+p[3]+p[7], undef, undef, undef, undef, undef, undef> </div>

<div>vec5 = shuffle vec4<p[1]+p[5]+p[3]+p[7], undef, undef, undef, undef, undef, undef, undef></div>

<div>vec6 = add vec4, vec5                ---------> this will result in <p[0]+p[4]+p[2]+p[6] +p[1]+p[5]+p[3]+p[7], undef, undef, undef, undef, undef, undef, undef></div>

<div>sum = extractelement vec6, 0</div>

<div><br>

</div>

<div>This was discussed earlier too (<a href="https://marc.info/?l=llvm-dev&m=141106671810521&w=4" target="_blank">https://marc.info/?l=llvm-<wbr>dev&m=141106671810521&w=4</a>) </div>

<div>in the similar manner and it was suggested to generate intrinsic for horizontal sum so that it can be lowered to target specific code. </div>

<div><br>

</div>

<div>Also, does this patch takes care of floating point ops as described in <a href="https://marc.info/?l=llvm-commits&m=141892087031143&w=3" target="_blank">

https://marc.info/?l=llvm-<wbr>commits&m=141892087031143&w=3</a> </div>

<div>I haven't checked the patch. Just pitching in with some relevant data in the past.</div>

<div><br>

</div>

<div>Regards,</div>

<div>Suyog      <br>

</div>

</div>

<div class="gmail_extra"><br>

<div class="gmail_quote">On Tue, Oct 4, 2016 at 5:31 PM, Alexey Bataev <span dir="ltr">

<<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

ABataev updated this revision to Diff 73458.<br>

ABataev added a comment.<br>

<br>

Added a comment + updated<br>

<span><br>

<br>

<a href="https://reviews.llvm.org/D24796" rel="noreferrer" target="_blank">https://reviews.llvm.org/D2479<wbr>6</a><br>

<br>

Files:<br>

  lib/Transforms/Vectorize/SLPVe<wbr>ctorizer.cpp<br>

  test/Transforms/SLPVectorizer/<wbr>X86/reduction_unrolled.ll<br>

</span>  test/Transforms/SLPVectorizer/<wbr>X86/scheduling.ll<br>

<br>

</blockquote>

</div>

<br>

</div>

</blockquote>

<br>

</div></div></div>


</blockquote></div><br></div></div></div>