<div dir="ltr">Right, the intrinsic issue is mostly orthogonal (and hard - so far, nobody came up with a really good proposal.)<div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 4, 2016 at 8:46 AM, Alexey Bataev <span dir="ltr"><<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Hi Suyog,</p>
<p>thanks for your comments.</p>
<p>1. I believe intrinsic is another problem and must be implemented in a different patch.</p>
<p>2. Checked it, works only for fast-math ops.<br>
</p>
<pre class="m_5336677415844803805moz-signature" cols="72">Best regards,
Alexey Bataev</pre><div><div class="h5">
<div class="m_5336677415844803805moz-cite-prefix">On 10/04/2016 05:41 PM, suyog sarda wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">As far as i understand this patch tries to vectorize the horizontal sum in an unrolled loop in following manner :
<div><br>
</div>
<div>vec0 = shuffle<p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7]></div>
<div>vec1 = shuffle vec0 <p[4], p[5], p[6], p[7],undef, undef, undef, undef></div>
<div>vec2 = add vec0, vec1 ---------> this will result in <p[0]+p[4], p[1]+p[5], p[2]+p[6], p[3]+p[7], undef, undef, undef, undef></div>
<div>vec3 = shuffle vec2 <p[2]+p[6], p[3]+p[7], undef, undef, undef, undef, undef, undef></div>
<div>vec4 = add vec2, vec3 ---------> this will result in <p[0]+p[4]+p[2]+p[6], p[1]+p[5]+p[3]+p[7], undef, undef, undef, undef, undef, undef> </div>
<div>vec5 = shuffle vec4<p[1]+p[5]+p[3]+p[7], undef, undef, undef, undef, undef, undef, undef></div>
<div>vec6 = add vec4, vec5 ---------> this will result in <p[0]+p[4]+p[2]+p[6] +p[1]+p[5]+p[3]+p[7], undef, undef, undef, undef, undef, undef, undef></div>
<div>sum = extractelement vec6, 0</div>
<div><br>
</div>
<div>This was discussed earlier too (<a href="https://marc.info/?l=llvm-dev&m=141106671810521&w=4" target="_blank">https://marc.info/?l=llvm-<wbr>dev&m=141106671810521&w=4</a>) </div>
<div>in the similar manner and it was suggested to generate intrinsic for horizontal sum so that it can be lowered to target specific code. </div>
<div><br>
</div>
<div>Also, does this patch takes care of floating point ops as described in <a href="https://marc.info/?l=llvm-commits&m=141892087031143&w=3" target="_blank">
https://marc.info/?l=llvm-<wbr>commits&m=141892087031143&w=3</a> </div>
<div>I haven't checked the patch. Just pitching in with some relevant data in the past.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Suyog <br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Oct 4, 2016 at 5:31 PM, Alexey Bataev <span dir="ltr">
<<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
ABataev updated this revision to Diff 73458.<br>
ABataev added a comment.<br>
<br>
Added a comment + updated<br>
<span><br>
<br>
<a href="https://reviews.llvm.org/D24796" rel="noreferrer" target="_blank">https://reviews.llvm.org/D2479<wbr>6</a><br>
<br>
Files:<br>
lib/Transforms/Vectorize/SLPVe<wbr>ctorizer.cpp<br>
test/Transforms/SLPVectorizer/<wbr>X86/reduction_unrolled.ll<br>
</span> test/Transforms/SLPVectorizer/<wbr>X86/scheduling.ll<br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div></div></div>