[PATCH] improve vectorizers by removing cost of unnecessary truncs and exts.

Mon May 11 05:59:39 PDT 2015

I did not realized that "truncate" will disappear.

%a = load <8 x i16>
%b = load <8 x i16>
%c = add < 8 x i16>
store <8 x i16>, %c <== do you need truncating before store?, your original is "store i8"

-           Elena

From: Sam Parker [mailto:sam.parker at arm.com]
Sent: Monday, May 11, 2015 15:37
To: Demikhovsky, Elena
Cc: llvm-commits at cs.uiuc.edu
Subject: RE: [PATCH] improve vectorizers by removing cost of unnecessary truncs and exts.

Hi Elena,

My hope with this patch is that vectorizer recognizes that the backend will not need to generate the truncate instruction.
For my case, where only i32 types are supported for scalar operations, the IR could look like this:

%a = load i16,
... (load 7 more data points)
%b = load i16,
... (load 7 more data points)
%c = sext i8 %a, i32
...  (repeat for rest of the elements)
%d = sext i8 %b, i32
...  (repeat for rest of the elements)
%e = add %c, %d
... (again, repeat)
%f = trunc i32 %e to i8
...
store i8 %f,
...

The vectorizer doesn't touch it because it's too expensive to perform the truncations and extensions, but that's because it doesn't know that the smaller types are supported by the vector registers. With the patch, the vectorizer is able to produce this IR:

%a = load <8 x i16>
%b = load <8 x i16>
%c = add < 8 x i16>
store <8 x i16>, %c

Could you elaborate further if this still would not be the case for AVX-512? Or if this is just not legal!

Cheers,
Sam

From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com]
Sent: 11 May 2015 12:13
To: Sam Parker
Cc: llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: RE: [PATCH] improve vectorizers by removing cost of unnecessary truncs and exts.

+        if (Opcode == Instruction::Trunc) {
+          if (TTI->isTypeLegal(DstVecTy)) {
+            VecCost = 0;
+          }

On AVX-512 the "truncate" is usually one instruction, the VecCost should be 1.
On AVX the type may be legal, but "truncate" is more than one instruction.

-           Elena

From: llvm-commits-bounces at cs.uiuc.edu<mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Sam Parker
Sent: Monday, May 11, 2015 13:57
To: llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: [PATCH] improve vectorizers by removing cost of unnecessary truncs and exts.

Hi,

I've attached a patch to both the loop vectorizer and slp-vectorizer which checks to see whether truncs and extensions would actually be required if the code was vectorized. This is so that the vectorizers understand that the cost of these instructions is effectively zero if vectorization happens. This is helpful when working on smaller data types, such as i8 and i16, that do not have native support in general purpose registers, but are supported in vector register files.

Regards,
Sam

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150511/dec7e419/attachment.html>