<p dir="ltr">+LLVMdev (sorry for not broadcasting earlier)</p>

<div class="gmail_quote">On 5 May 2015 12:40, "suyog sarda" <<a href="mailto:sardask01@gmail.com">sardask01@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Nadav,<div><br></div><div>I stumbled upon one more question (sorry for not specifying earlier). </div><div>Below query is when -mavx2 is specified as target feature. <br><br>As i understand correctly, AVX1 is subset of AVX2. In SLP, we get scalar reduction cost in getReduction() function, which queries the TTI (TargetTransformInfo) via getArithmeticInstrCost().<br><br>Now for integer ADD, since AVX2 added support for integer arithmetic, the entry for ADD (SUB/MUL) are missing in AVX2CostTable (which is what you also specified earlier).</div><div>It fails to find the entry and goes for subsequent checks further. When it comes to AVX1 check, it specifically checks if AVX2 is not specified. </div><div><br></div><div>(ST->hasAVX() && !ST->hasAVX2())</div><div><br></div><div>since, we have specified -mavx2 this check also fails falls back to BaseTTI. </div><div><br></div><div>Shouldn't it just check for hasAVX(), since AVX1 is subset of AVX2 ?<br></div><div><div><br></div><div>(ST->hasAVX())</div></div><div><br></div><div>I have a situation where i have integer ADD as reduction op. When i specify AVX2, the scalar cost is much less than AVX1. And hence, it doesn't vectorize the code at all.</div><div> If AVX2 vector instructions are costly, shouldn't it fall back to AVX1 and generate AVX1 vector instructions? </div><div><br></div><div>Correct me if i am wrong somewhere. Awaiting for your comments :) </div><div><br></div><div>Thanks.</div><div><br></div><div>Regards,</div><div>Suyog  <br></div><div> </div><div><br></div><div><br></div><div> </div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 4, 2015 at 11:20 PM, Nadav Rotem <span dir="ltr"><<a href="mailto:nrotem@apple.com" target="_blank">nrotem@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span><blockquote type="cite"><div>On May 4, 2015, at 10:23 AM, suyog sarda <<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>> wrote:</div><br><div><p dir="ltr">Thanks Nadav for the info. It clears my query :)</p><p dir="ltr">Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1.</p><p dir="ltr">One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is generic). IMO, it will make things more clear. </p><p dir="ltr">Your thoughts on this??</p><div><br></div></div></blockquote><div><br></div></span><div>I prefer that we continue to rely on TargetLowering in order to avoid duplicating the cost information. </div></div><span><div><blockquote type="cite"><div><p dir="ltr">Regards,<br>

Suyog Sarda</p>

<div class="gmail_quote">On 4 May 2015 21:57, "Nadav Rotem" <<a href="mailto:nrotem@apple.com" target="_blank">nrotem@apple.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

> On May 4, 2015, at 2:36 AM, suyog sarda <<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>> wrote:<br>

><br>

> Hi all,<br>

><br>

> I have a query regarding Cost Table for AVX2 in TargetTransformInfo.<br>

><br>

> The table consist of entries for shift and div operations only. There are no entries for ADD, SUB and MUL for AVX2 cost table. Those entries are present in Cost Table for AVX.<br>

<br>

Most of the cost information is inferred from the TargetLowering tables (where operations are marked as Legal, Custom, etc.)  Only exceptional instructions need to be recorded in the TargetTransformInfo cost tables.<br>

<br>

><br>

> The reason for query is - when my sub target feature is AVX2, in SLP Vectorization,  while calculating scalar cost of ADD, it doesn't see the entry in cost table and falls back to default implementation returning cost 1. While for AVX, it finds the ADD in cost table and returns 4 as scalar cost.<br>

<br>

><br>

> I am suspecting this is something specific to architecture difference between AVX and AVX2. I am naive to architecture specifics in this case.<br>

<br>

I assume that this is integer ADD, because AVX1 only supported floating point arithmetic on 256bit vectors, while AVX2 added support for 256bit integer arithmetic. So, it makes sense that the cost that AVX1 gives this operation is much higher.<br>

<br>

<br>

><br>

> I would be glad if someone clarifies on this.<br>

><br>

> Thanks.<br>

><br>

> Regards,<br>

> Suyog Sarda<br>

<br>

</blockquote></div>

</div></blockquote></div><br></span></div></blockquote></div><br></div>

</blockquote></div>