[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Thu Sep 22 12:45:28 PDT 2011

The output of the avx-hadd program is

3 11 7 15

Preston

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
Sent: Thursday, September 22, 2011 3:14 PM
To: Bruno Cardoso Lopes
Cc: LLVMdev
Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Hi Bruno,

> Some comments:
>
> +  // Try to synthesize horizontal adds from adds of shuffles.
> +  if (((Subtarget->hasSSE3()&&  (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
> +       (Subtarget->hasAVX()&&  (VT == MVT::v8f32 || VT == MVT::v4f64)))&&
> +      isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX 
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit 
> HADDPD have the same number of elements, their horizontal operation 
> behavior is different (look at AVX manual for details)! If it doesn't, 
> just remove the 256-bit handling for now.

it's not clear whether it is correct for 256 bit operations.  The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently.  If someone has a machine with AVX to test on, I've attached avx-hadd.s.  It should be possible to do:
   gcc -o avx-hadd avx-hadd.s
   ./avx-hadd
and the result should make it clear.

In the meantime I'm removed the 256 bit logic.

> 2) Rename horizontal.ll to sse3-haddsub.ll

Done!

> 3) Can you duplicate the testcase file to something like 
> avx-haddsub.ll, and check for the AVX 128-bit versions too?

I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great).

> 4) Your tablegen modifications are totally fine, for the intrinsics just do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (HADDPSrr VR128:$src1, VR128:$src2)>; def : 
> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>            (HADDPSrm VR128:$src1, addr:$src2)>; ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (VHADDPSrr VR128:$src1, VR128:$src2)>; def : 
> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>            (VHADDPSrm VR128:$src1, addr:$src2)>; ...

I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't know how.

OK to apply?

Ciao, Duncan.