[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Thu Sep 22 12:14:25 PDT 2011

Hi Bruno,

> Some comments:
>
> +  // Try to synthesize horizontal adds from adds of shuffles.
> +  if (((Subtarget->hasSSE3()&&  (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
> +       (Subtarget->hasAVX()&&  (VT == MVT::v8f32 || VT == MVT::v4f64)))&&
> +      isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit
> HADDPD have the same number of elements, their horizontal operation
> behavior is different (look at AVX manual for details)! If it doesn't,
> just remove the 256-bit handling for now.

it's not clear whether it is correct for 256 bit operations.  The AVX docs
only describe the integer horizontal operations, not the floating point ones;
for the integer ones the 256 bit ones work differently.  If someone has a
machine with AVX to test on, I've attached avx-hadd.s.  It should be possible
to do:
   gcc -o avx-hadd avx-hadd.s
   ./avx-hadd
and the result should make it clear.

In the meantime I'm removed the 256 bit logic.

> 2) Rename horizontal.ll to sse3-haddsub.ll

Done!

> 3) Can you duplicate the testcase file to something like
> avx-haddsub.ll, and check for the AVX 128-bit versions too?

I added the avx checks to the same file (in which case calling it
sse3-haddsub.ll is not so great).

> 4) Your tablegen modifications are totally fine, for the intrinsics just do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (HADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>            (HADDPSrm VR128:$src1, addr:$src2)>;
> ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
>            (VHADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
>            (VHADDPSrm VR128:$src1, addr:$src2)>;
> ...

I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't
know how.

OK to apply?

Ciao, Duncan.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: avx-hadd.s
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hadd.diff
Type: text/x-patch
Size: 23340 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.bin>