[cfe-dev] [llvm-dev] Vector trunc code generation difference between llvm-3.9 and 4.0
Akira Hatanaka via cfe-dev
cfe-dev at lists.llvm.org
Wed Mar 8 19:28:56 PST 2017
There were several patches (r278501 was the first) that fixed vector shift bugs. I don’t think the IR changes were intentional.
I’m not sure if it’s the right solution, but inserting an integral cast before the CK_VectorSplat cast in checkVectorShift makes IRGen emit the trunc before the splat.
> On Mar 8, 2017, at 7:21 AM, Sanjay Patel via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> The regression for the reported case should be avoided after:
> https://reviews.llvm.org/rL297232 <https://reviews.llvm.org/rL297232>
> https://reviews.llvm.org/rL297242 <https://reviews.llvm.org/rL297242>
> https://reviews.llvm.org/rL297280 <https://reviews.llvm.org/rL297280>
>
> It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited.
>
> On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
> Yes, there is an IR difference between clang 3.9.1 and clang trunk before any IR transforms are done:
> https://godbolt.org/g/FuBqIb <https://godbolt.org/g/FuBqIb>
>
> We can't solve this problem (moving a trunc ahead of other vector ops) in general in IR because we take a conservative approach to vector transforms in IR. That means the burden for solving the general problem falls on the front-end or the back-end. If you can bisect to find the clang commit where this changed, that would be very helpful.
>
> However, I think we can handle a very specific case (a too fat splat) in IR in instcombine, and it will resolve this exact example. This will take a couple of patches to restore your example. Here's a proposal for the first one:
> https://reviews.llvm.org/D30123 <https://reviews.llvm.org/D30123>
>
>
> On Sat, Feb 18, 2017 at 12:33 AM, Saurabh Verma <saurabh.verma at movidius.com <mailto:saurabh.verma at movidius.com>> wrote:
> Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed?
>
> Best regards
> Saurabh
>
>
> On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
> I think this is caused by a front-end change (cc'ing clang-dev) because the IR with "-Xclang -disable-llvm-optzns" shows the difference.
>
> But independently of that, there's a missing IR canonicalization - instcombine doesn't currently do anything with either version.
>
> And the version where we trunc later survives through the backend and produces worse code even for x86 with AVX2:
> before:
> vmovd %edi, %xmm1
> vpmovzxwq %xmm1, %xmm1
> vpsraw %xmm1, %xmm0, %xmm0
> retq
>
> after:
> vmovd %edi, %xmm1
> vpbroadcastd %xmm1, %ymm1
> vmovdqa LCPI1_0(%rip), %ymm2
> vpshufb %ymm2, %ymm1, %ymm1
> vpermq $232, %ymm1, %ymm1
> vpmovzxwd %xmm1, %ymm1
> vpmovsxwd %xmm0, %ymm0
> vpsravd %ymm1, %ymm0, %ymm0
> vpshufb %ymm2, %ymm0, %ymm0
> vpermq $232, %ymm0, %ymm0
> vzeroupper
>
>
> So this example may have won the bug lottery by exposing all of front-, middle-, back-end bugs. :)
>
>
>
> On Fri, Feb 17, 2017 at 9:38 AM, Saurabh Verma via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Correction in the C snippet:
>
> typedef signed short v8i16_t __attribute__((ext_vector_type(8)));
>
> v8i16_t foo (v8i16_t a, int n)
> {
> return a >> n;
> }
>
> Best regards
> Saurabh
>
>
>
> On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com <mailto:saurabh.verma at movidius.com>> wrote:
> Hello,
>
> We are investigating a difference in code generation for vector splat instructions between llvm-3.9 and llvm-4.0, which could lead to a performance regression for our target. Here is the C snippet
>
> typedef signed v8i16_t __attribute__((ext_vector_type(8)))
>
> v8i16_t foo (v8i16 a, int n)
> {
> return result = a >> n;
> }
>
> With llvm-3.9, the generated sequence does a trunc followed by splat, but with llvm-4.0 it is reversed to a splat to a bigger vector followed by a v8i32->v8i16 trunc. Is this by design? The earlier code sequence is definitely better for our target, but are there known scenarios where the new sequence would lead to better code?
>
> Here are the instruction sequences generated in the two cases:
>
> With llvm 3.9:
>
> define <8 x i16> @foo(<8 x i16>, i32) #0 {
> %3 = trunc i32 %1 to i16
> %4 = insertelement <8 x i16> undef, i16 %3, i32 0
> %5 = shufflevector <8 x i16> %4, <8 x i16> undef, <8 x i32> zeroinitializer
> %6 = ashr <8 x i16> %0, %5
> ret <8 x i16> %6
> }
>
>
> With llvm 4.0:
>
> define <8 x i16> @foo(<8 x i16>, i32) #0 {
> %3 = insertelement <8 x i32> undef, i32 %1, i32 0
> %4 = shufflevector <8 x i32> %3, <8 x i32> undef, <8 x i32> zeroinitializer
> %5 = trunc <8 x i32> %4 to <8 x i16>
> %6 = ashr <8 x i16> %0, %5
> ret <8 x i16> %6
> }
>
> Best regards
> Saurabh Verma
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170308/0e64ea2d/attachment.html>
More information about the cfe-dev
mailing list