[llvm-dev] Vector trunc code generation difference between llvm-3.9 and 4.0
Saurabh Verma via llvm-dev
llvm-dev at lists.llvm.org
Fri Feb 17 08:38:42 PST 2017
Correction in the C snippet:
typedef signed short v8i16_t __attribute__((ext_vector_type(8)));
v8i16_t foo (v8i16_t a, int n)
{
return a >> n;
}
Best regards
Saurabh
On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com>
wrote:
> Hello,
>
> We are investigating a difference in code generation for vector splat
> instructions between llvm-3.9 and llvm-4.0, which could lead to a
> performance regression for our target. Here is the C snippet
>
> typedef signed v8i16_t __attribute__((ext_vector_type(8)))
>
> v8i16_t foo (v8i16 a, int n)
> {
> return result = a >> n;
> }
>
> With llvm-3.9, the generated sequence does a trunc followed by splat, but
> with llvm-4.0 it is reversed to a splat to a bigger vector followed by a
> v8i32->v8i16 trunc. Is this by design? The earlier code sequence is
> definitely better for our target, but are there known scenarios where the
> new sequence would lead to better code?
>
> Here are the instruction sequences generated in the two cases:
>
> With llvm 3.9:
>
> define <8 x i16> @foo(<8 x i16>, i32) #0 {
> %3 = trunc i32 %1 to i16
> %4 = insertelement <8 x i16> undef, i16 %3, i32 0
> %5 = shufflevector <8 x i16> %4, <8 x i16> undef, <8 x i32>
> zeroinitializer
> %6 = ashr <8 x i16> %0, %5
> ret <8 x i16> %6
> }
>
>
> With llvm 4.0:
>
> define <8 x i16> @foo(<8 x i16>, i32) #0 {
> %3 = insertelement <8 x i32> undef, i32 %1, i32 0
> %4 = shufflevector <8 x i32> %3, <8 x i32> undef, <8 x i32>
> zeroinitializer
> %5 = trunc <8 x i32> %4 to <8 x i16>
> %6 = ashr <8 x i16> %0, %5
> ret <8 x i16> %6
> }
>
> Best regards
> Saurabh Verma
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170217/7474c772/attachment.html>
More information about the llvm-dev
mailing list