[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Andrea Di Biagio andrea.dibiagio at gmail.com
Mon Sep 15 09:03:08 PDT 2014


On Mon, Sep 15, 2014 at 1:57 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Andrea, Quentin:
>
> Ok, everything for blendps, insertps, movddup, movsldup, movshdup, unpcklps,
> and unpckhps is committed and should generally be working. I've not tested
> it *super* thoroughly (will do this ASAP) so if you run into something
> fishy, don't burn lots of time on it.

Ok.

>
> I've also fixed a number of issues I found in the nightly test suite and
> things like gcc-loops. I think there are still a couple of regressions I
> spotted in the nightly test suite, but haven't gotten to them yet.
>
> I've got very rhudimentary support for pblendw finished and committed. There
> is a much more fundamental change that is really needed for pblendw support
> though -- currently, the blend lowering strategy assumes this instruction
> doesn't exist and thus picks a deeply wrong strategy in some cases... Not
> sure how much this is even relevant though.
>
>
> Anyways, it's almost certainly useful to look into any non-test-suite
> benchmarks you have, or to run the benchmarks on non-intel hardware. Let me
> know how it goes! So far, with the fixes I've landed recently, I'm seeing
> more improvements than regressions on the nightly test suite. =]

Cool!
I'll have a look at it. I will let you know how it goes.
Thanks for working on this :-).

-Andrea

>
> -Chandler
>
> On Wed, Sep 10, 2014 at 3:36 AM, Andrea Di Biagio
> <andrea.dibiagio at gmail.com> wrote:
>>
>> On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com>
>> wrote:
>> > Awesome, thanks for all the information!
>> >
>> > See below:
>> >
>> > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio
>> > <andrea.dibiagio at gmail.com>
>> > wrote:
>> >>
>> >> You have already mentioned how the new shuffle lowering is missing
>> >> some features; for example, you explicitly said that we currently lack
>> >> of SSE4.1 blend support. Unfortunately, this seems to be one of the
>> >> main reasons for the slowdown we are seeing.
>> >>
>> >> Here is a list of what we found so far that we think is causing most
>> >> of the slowdown:
>> >> 1) shufps is always emitted in cases where we could emit a single
>> >> blendps; in these cases, blendps is preferable because it has better
>> >> reciprocal throughput (this is true on all modern Intel and AMD cpus).
>> >
>> >
>> > Yep. I think this is actually super easy. I'll add support for blendps
>> > shortly.
>>
>> Thanks Chandler!
>>
>> >
>> >> 3) When a shuffle performs an insert at index 0 we always generate an
>> >> insertps, while a movss would do a better job.
>> >> ;;;
>> >> define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {
>> >>   %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,
>> >> i32 1, i32 2, i32 3>
>> >>   ret <4 x float> %1
>> >> }
>> >> ;;;
>> >>
>> >> llc (-mcpu=corei7-avx):
>> >>   vmovss %xmm1, %xmm0, %xmm0
>> >>
>> >> llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):
>> >>   vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]
>> >
>> >
>> > So, this is hard. I think we should do this in MC after register
>> > allocation
>> > because movss is the worst instruction ever: it switches from blending
>> > with
>> > the destination to zeroing the destination when the source switches from
>> > a
>> > register to a memory operand. =[ I would like to not emit movss in the
>> > DAG
>> > *ever*, and teach the MC combine pass to run after register allocation
>> > (and
>> > thus spills) have been emitted. This way we can match both patterns:
>> > when
>> > insertps is zeroing the other lanes and the operand is from memory, and
>> > when
>> > insertps is blending into the other lanes and the operand is in a
>> > register.
>> >
>> > Does that make sense? If so, would you be up for looking at this side of
>> > things? It seems nicely separable.
>>
>> I think it is a good idea and it makes sense to me.
>> I will start investigating on this and see what can be done.
>>
>> Cheers,
>> Andrea
>
>



More information about the llvm-dev mailing list