[llvm-dev] X86 TRUNCATE cost for AVX & AVX2 mode

Mon Apr 11 08:35:05 PDT 2016

Hi,

One day I worked hard and refactored the cost calculation for all X86 targets.
http://reviews.llvm.org/D15604
But this revision was not accepted.

I fixed conversions, but assume that truncation suffers from the same problem.
I used "SplitFactor" in order to process wide types.

I'll be happy if you'll try to reanimate this work or part of it, because the huge numbers causes a non-optimal vectorization factor to be chosen.

-           Elena

From: Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
Sent: Monday, April 11, 2016 16:51
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Zuckerman, Michael <michael.zuckerman at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: X86 TRUNCATE cost for AVX & AVX2 mode

Hi,

I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost
calculation for TRUNCATE instruction in AVX mode.

In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for
TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there
it finds cost as 30 for this operation. 30 cost for this operation looks very high.

Wondering why such a high cost kept for this, any pointers to understand this will be helpful.
In few cases this restricts better vectorization opportunities.

Other observations:
Cost for TRUNCATE v16i32 to v16i8 in SSE2ConversionTbl as 7.
Cost for TRUNCATE v8i32 to v8i8 is 2 in AVX2 and 4 in AVX mode.

Thanks,
Ashutosh

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160411/032654b2/attachment.html>