[llvm-dev] Vector trunc code generation difference between llvm-3.9 and 4.0

Fri Feb 17 08:21:25 PST 2017

Hello,

We are investigating a difference in code generation for vector splat
instructions between llvm-3.9 and llvm-4.0, which could lead to a
performance regression for our target. Here is the C snippet

typedef signed v8i16_t __attribute__((ext_vector_type(8)))

v8i16_t foo (v8i16 a, int n)
{
   return result = a >> n;
}

With llvm-3.9, the generated sequence does a trunc followed by splat, but
with llvm-4.0 it is reversed to a splat to a bigger vector followed by a
v8i32->v8i16 trunc. Is this by design? The earlier code sequence is
definitely better for our target, but are there known scenarios where the
new sequence would lead to better code?

Here are the instruction sequences generated in the two cases:

With llvm 3.9:

define <8 x i16> @foo(<8 x i16>, i32) #0 {
  %3 = trunc i32 %1 to i16
  %4 = insertelement <8 x i16> undef, i16 %3, i32 0
  %5 = shufflevector <8 x i16> %4, <8 x i16> undef, <8 x i32>
zeroinitializer
  %6 = ashr <8 x i16> %0, %5
  ret <8 x i16> %6
}

With llvm 4.0:

define <8 x i16> @foo(<8 x i16>, i32) #0 {
  %3 = insertelement <8 x i32> undef, i32 %1, i32 0
  %4 = shufflevector <8 x i32> %3, <8 x i32> undef, <8 x i32>
zeroinitializer
  %5 = trunc <8 x i32> %4 to <8 x i16>
  %6 = ashr <8 x i16> %0, %5
  ret <8 x i16> %6
}

Best regards
Saurabh Verma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170217/2e9f69de/attachment.html>