[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Thu Sep 22 12:48:09 PDT 2011

> The output of the avx-hadd program is
>
> 3 11 7 15

Thanks Preston.  That means that the 256 bit AVX floating point horizontal
add works the same as the integer 256 bits ones, i.e. inconsistently with
the 128 bit ones.

Ciao, Duncan.

>
> Preston
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
> Sent: Thursday, September 22, 2011 3:14 PM
> To: Bruno Cardoso Lopes
> Cc: LLVMdev
> Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
>
> Hi Bruno,
>
>> Some comments:
>>
>> +  // Try to synthesize horizontal adds from adds of shuffles.
>> +  if (((Subtarget->hasSSE3()&&   (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
>> +       (Subtarget->hasAVX()&&   (VT == MVT::v8f32 || VT == MVT::v4f64)))&&
>> +      isHorizontalBinOp(LHS, RHS, true))
>>
>> 1) You probably want to do something like:
>>
>> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
>> Subtarget->hasAVX()" and check it for the first condition, because
>> when AVX is on, the SSE levels are all turned off (as to consider AVX
>> a reimplementation of all SSE levels).
>>
>> For the second condition: Does this logic works for 256-bit vectors?
>> I'm asking that because although the 128-bit HADDPS and the 256-bit
>> HADDPD have the same number of elements, their horizontal operation
>> behavior is different (look at AVX manual for details)! If it doesn't,
>> just remove the 256-bit handling for now.
>
> it's not clear whether it is correct for 256 bit operations.  The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently.  If someone has a machine with AVX to test on, I've attached avx-hadd.s.  It should be possible to do:
>     gcc -o avx-hadd avx-hadd.s
>     ./avx-hadd
> and the result should make it clear.
>
> In the meantime I'm removed the 256 bit logic.
>
>> 2) Rename horizontal.ll to sse3-haddsub.ll
>
> Done!
>
>> 3) Can you duplicate the testcase file to something like
>> avx-haddsub.ll, and check for the AVX 128-bit versions too?
>
> I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great).
>
>> 4) Your tablegen modifications are totally fine, for the intrinsics just do:
>>
>> let Predicates = [HasSSE3] in {
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>>             (HADDPSrr VR128:$src1, VR128:$src2)>; def :
>> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>>             (HADDPSrm VR128:$src1, addr:$src2)>; ...
>>
>> and
>>
>> let Predicates = [HasAVX] in {
>> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>>             (VHADDPSrr VR128:$src1, VR128:$src2)>; def :
>> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>>             (VHADDPSrm VR128:$src1, addr:$src2)>; ...
>
> I came up with a vim macro that added them for me (see attached patch).
> Probably there is a way to compress this using tablegen magic, but I don't know how.
>
> OK to apply?
>
> Ciao, Duncan.