[llvm] r211679 - [X86] Add target combine rule to select ADDSUB instructions from a build_vector

Wed Jun 25 08:09:57 PDT 2014

On Wed, Jun 25, 2014 at 3:54 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Andrea Di Biagio" <andrea.dibiagio at gmail.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "Andrea Di Biagio" <Andrea_DiBiagio at sn.scee.net>, "llvm-commits at cs.uiuc.edu for LLVM" <llvm-commits at cs.uiuc.edu>
>> Sent: Wednesday, June 25, 2014 6:57:11 AM
>> Subject: Re: [llvm] r211679 - [X86] Add target combine rule to select ADDSUB instructions from a build_vector
>>
>> Hi Hal,
>>
>> I can confirm that after r211427 we correctly match a 'subadd' into a
>> x86 ADDSUB instruction.
>> To verify this I redirected to ' llc -mcpu=core-i7-avx' the output of
>> test 'addsub.ll' added at revision r211339.
>
> Good; thanks for looking at this...
>
>>
>> So,
>>
>> opt < addsub.ll -basicaa -slp-vectorizer -S | llc -march=x86-64
>> -mcpu=corei7-avx | less
>>
>> you can see how we correctly emit an 'addsub' instruction in the case
>> of function @fsubfadd.
>> We don't emit an 'addsub' for the other functions but that's fine for
>> the following reasons:
>>  1- 'addsub' on X86 is only for packed float vectors (therefore - we
>> cannot match an 'addsub' in case of functions @addsub and @subadd);
>
> Yes, this is a different issue.
>
>>  2- the semantic of 'addsub' for x86 requires that even-numbered
>> elements are subtracted (not added). That means, we cannot match an
>> 'addsub' in function @faddfsub.
>
> You're right, however, I think that we can still do this in combination with another instruction: subadd(x, y) == addsub(x, (-1, 1, -1, 1, ...)*y) [and the multiply can be lowered as an xor on the sign bits, IIRC, which is more efficient than actually using a floating-point multiply]. In intrinsics, this would be something like _mm256_xor_pd(y, _mm256_set_ps(-0.0f, -0.0f, ...)). Is this reasonable?

True. It makes sense.
I will do some experiments and in case I will post a patch for review.

-Andrea

>
> Thanks again,
> Hal
>
>>
>> Instead, we correctly match an 'addsub' in the case of function
>> @fsubfadd.
>>
>> On x86, excluding function @fsubfadd, other functions cannot be
>> translated using 'addsubps/addsubpd' (and their AVX variants).
>>
>> I hope this helps :-)
>>
>> Andrea
>>
>> On Wed, Jun 25, 2014 at 12:01 PM, Andrea Di Biagio
>> <andrea.dibiagio at gmail.com> wrote:
>> > Hi Hal,
>> >
>> >
>> >
>> > On Wed, Jun 25, 2014 at 11:35 AM, Hal Finkel <hfinkel at anl.gov>
>> > wrote:
>> >> Does this mean that we now match the form of these produced by the
>> >> SLP vectorizer? (see r211339)
>> >>
>> >>  -Hal
>> >>
>> >
>> > Interesting, apparently I missed that commit...
>> > I'll have a look at it now.
>> >
>> > -Andrea
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory