<div dir="ltr">Hi Chandler,<br><br>I've been looking at the regressions Quentin mentioned, and filed a PR<br>for the most egregious one: <a href="http://llvm.org/bugs/show_bug.cgi?id=22377">http://llvm.org/bugs/show_bug.cgi?id=22377</a><br><br>As for the others, I'm working on reducing them, but for now, here are<br>some raw observations, in case any of it rings a bell:<br><br><br>Another problem I'm seeing is that in some cases we can't fold memory anymore:<br>    vpermilps     $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]<br>    vblendps      $0x1, %xmm2, %xmm0, %xmm0<br>becomes:<br>    vmovaps       -0xXX(%rdx), %xmm2<br>    vshufps       $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 = xmm2[3,0],xmm0[0,0]<br>    vshufps       $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 = xmm3[0,2],xmm0[1,2]<br><br><br>Also, I see differences when some loads are shuffled, that I'm a bit<br>conflicted about:<br>    vmovaps       -0xXX(%rbp), %xmm3<br>    ...<br>    vinsertps     $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 = xmm4[3],xmm3[1,2,3]<br>becomes:<br>    vpermilps     $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]<br>    ...<br>    vinsertps     $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3]<br><br>Note that the second version does the shuffle in-place, in xmm2.<br><br><br>Some are blends (har har) of those two:<br>    vpermilps     $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2]<br>    vpermilps     $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2]<br>    vblendps      $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3]<br>becomes:<br>    vmovaps       -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]<br>    vpermilps     $-0x6d, %xmm0, %xmm1 ## xmm1 = xmm0[3,0,1,2]<br>    vshufps       $0x3, %xmm_mem_1, %xmm0, %xmm0 ## xmm0 = xmm0[3,0],xmm_mem_1[0,0]<br>    vshufps       $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm_mem_1[1,2]<br><br><br>I also see a lot of somewhat neutral (focusing on Haswell for now)<br>domain changes such as (xmm5 and 0 are initially integers, and are<br>dead after the store):<br>    vpshufd       $-0x5c, %xmm0, %xmm0    ## xmm0 = xmm0[0,1,2,2]<br>    vpalignr      $0xc, %xmm0, %xmm5, %xmm0 ## xmm0 = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]<br>    vmovdqu       %xmm0, 0x20(%rax)<br>turning into:<br>    vshufps       $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[2,0],xmm5[0,0]<br>    vshufps       $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm5[1,2]<br>    vmovups       %xmm0, 0x20(%rax)<br><br><br>-Ahmed<br><br><br>On Fri, Jan 23, 2015 at 12:15 AM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com">chandlerc@gmail.com</a>> wrote:<br>> Greetings LLVM hackers and x86 vector shufflers!<br>><br>> I would like to flip on another chunk of the new vector shuffling,<br>> specifically the logic to mark ~all shuffles as "legal".<br>><br>> This can be tested today with the flag<br>> "-x86-experimental-vector-shuffle-legality". I would essentially like to<br>> make this the default (by removing the "false" path). Doing this will allow<br>> me to completely delete the old vector shuffle lowering.<br>><br>> I've got the patches prepped and ready to go, but it will likely have a<br>> significant impact on performance. Notably, a bunch of the remaining domain<br>> crossing bugs I'm seeing are due to this. The key thing to realize is that<br>> vector shuffle combining is *much* more powerful when we think all of these<br>> are legal, and so we combine away bad shuffles that would trigger domain<br>> crosses.<br>><br>> All of my benchmarks have come back performance neutral overall with a few<br>> benchmarks improving. However, there may be some regressions that folks want<br>> to track down first. I'd really like to get those reported and prioritize<br>> among the vector shuffle work so we can nuke several *thousand* lines of<br>> code from X86ISelLowering.cpp. =D<br>><br>> Thanks!<br>> -Chandler<br>><br>><br>> PS: If you're feeling adventurous, the next big mode flip flag I want to see<br>> changed is -x86-experimental-vector-widening-legalization, but this is a<br>> much more deep change to the entire vector legalization strategy, so I want<br>> to do it second and separately.<br>><br>> _______________________________________________<br>> LLVM Developers mailing list<br>> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>><br></div>