[llvm] r211679 - [X86] Add target combine rule to select ADDSUB instructions from a build_vector

Thu Jun 26 04:00:20 PDT 2014

Hi Hal,

It turns out that it was very easy to improve the folding of 'addsub'
using a 'xorps'.
The fix is now committed at revision 211771.

I also verified that we correctly select an ADDSUBPS in the case of
function @faddfsub in test 'Transforms/SLPVectorizer/X86/addsub.ll'.

Cheers,
Andrea

On Wed, Jun 25, 2014 at 4:09 PM, Andrea Di Biagio
<andrea.dibiagio at gmail.com> wrote:
> On Wed, Jun 25, 2014 at 3:54 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> ----- Original Message -----
>>> From: "Andrea Di Biagio" <andrea.dibiagio at gmail.com>
>>> To: "Hal Finkel" <hfinkel at anl.gov>
>>> Cc: "Andrea Di Biagio" <Andrea_DiBiagio at sn.scee.net>, "llvm-commits at cs.uiuc.edu for LLVM" <llvm-commits at cs.uiuc.edu>
>>> Sent: Wednesday, June 25, 2014 6:57:11 AM
>>> Subject: Re: [llvm] r211679 - [X86] Add target combine rule to select ADDSUB instructions from a build_vector
>>>
>>> Hi Hal,
>>>
>>> I can confirm that after r211427 we correctly match a 'subadd' into a
>>> x86 ADDSUB instruction.
>>> To verify this I redirected to ' llc -mcpu=core-i7-avx' the output of
>>> test 'addsub.ll' added at revision r211339.
>>
>> Good; thanks for looking at this...
>>
>>>
>>> So,
>>>
>>> opt < addsub.ll -basicaa -slp-vectorizer -S | llc -march=x86-64
>>> -mcpu=corei7-avx | less
>>>
>>> you can see how we correctly emit an 'addsub' instruction in the case
>>> of function @fsubfadd.
>>> We don't emit an 'addsub' for the other functions but that's fine for
>>> the following reasons:
>>>  1- 'addsub' on X86 is only for packed float vectors (therefore - we
>>> cannot match an 'addsub' in case of functions @addsub and @subadd);
>>
>> Yes, this is a different issue.
>>
>>>  2- the semantic of 'addsub' for x86 requires that even-numbered
>>> elements are subtracted (not added). That means, we cannot match an
>>> 'addsub' in function @faddfsub.
>>
>> You're right, however, I think that we can still do this in combination with another instruction: subadd(x, y) == addsub(x, (-1, 1, -1, 1, ...)*y) [and the multiply can be lowered as an xor on the sign bits, IIRC, which is more efficient than actually using a floating-point multiply]. In intrinsics, this would be something like _mm256_xor_pd(y, _mm256_set_ps(-0.0f, -0.0f, ...)). Is this reasonable?
>
> True. It makes sense.
> I will do some experiments and in case I will post a patch for review.
>
> -Andrea
>
>>
>> Thanks again,
>> Hal
>>
>>>
>>> Instead, we correctly match an 'addsub' in the case of function
>>> @fsubfadd.
>>>
>>> On x86, excluding function @fsubfadd, other functions cannot be
>>> translated using 'addsubps/addsubpd' (and their AVX variants).
>>>
>>> I hope this helps :-)
>>>
>>> Andrea
>>>
>>> On Wed, Jun 25, 2014 at 12:01 PM, Andrea Di Biagio
>>> <andrea.dibiagio at gmail.com> wrote:
>>> > Hi Hal,
>>> >
>>> >
>>> >
>>> > On Wed, Jun 25, 2014 at 11:35 AM, Hal Finkel <hfinkel at anl.gov>
>>> > wrote:
>>> >> Does this mean that we now match the form of these produced by the
>>> >> SLP vectorizer? (see r211339)
>>> >>
>>> >>  -Hal
>>> >>
>>> >
>>> > Interesting, apparently I missed that commit...
>>> > I'll have a look at it now.
>>> >
>>> > -Andrea
>>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory