[PATCH] improve vectorizers by removing cost of unnecessary truncs and exts.

Mon May 11 05:37:16 PDT 2015

Hi Elena,

My hope with this patch is that vectorizer recognizes that the backend will
not need to generate the truncate instruction.

For my case, where only i32 types are supported for scalar operations, the
IR could look like this:

%a = load i16,

. (load 7 more data points)

%b = load i16,

. (load 7 more data points)

%c = sext i8 %a, i32

.  (repeat for rest of the elements)

%d = sext i8 %b, i32

.  (repeat for rest of the elements)

%e = add %c, %d

. (again, repeat)

%f = trunc i32 %e to i8

. 

store i8 %f,

.

The vectorizer doesn't touch it because it's too expensive to perform the
truncations and extensions, but that's because it doesn't know that the
smaller types are supported by the vector registers. With the patch, the
vectorizer is able to produce this IR:

%a = load <8 x i16>

%b = load <8 x i16>

%c = add < 8 x i16>

store <8 x i16>, %c

Could you elaborate further if this still would not be the case for AVX-512?
Or if this is just not legal!

Cheers,

Sam

From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com] 
Sent: 11 May 2015 12:13
To: Sam Parker
Cc: llvm-commits at cs.uiuc.edu
Subject: RE: [PATCH] improve vectorizers by removing cost of unnecessary
truncs and exts.

+        if (Opcode == Instruction::Trunc) {

+          if (TTI->isTypeLegal(DstVecTy)) {

+            VecCost = 0;

+          }

On AVX-512 the "truncate" is usually one instruction, the VecCost should be
1.

On AVX the type may be legal, but "truncate" is more than one instruction.

-           Elena

From: llvm-commits-bounces at cs.uiuc.edu
[mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Sam Parker
Sent: Monday, May 11, 2015 13:57
To: llvm-commits at cs.uiuc.edu
Subject: [PATCH] improve vectorizers by removing cost of unnecessary truncs
and exts.

Hi,

I've attached a patch to both the loop vectorizer and slp-vectorizer which
checks to see whether truncs and extensions would actually be required if
the code was vectorized. This is so that the vectorizers understand that the
cost of these instructions is effectively zero if vectorization happens.
This is helpful when working on smaller data types, such as i8 and i16, that
do not have native support in general purpose registers, but are supported
in vector register files.

Regards,

Sam

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150511/1cbb722b/attachment.html>