[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Gurd, Preston
preston.gurd at intel.com
Thu Sep 22 12:45:28 PDT 2011
The output of the avx-hadd program is
3 11 7 15
Preston
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
Sent: Thursday, September 22, 2011 3:14 PM
To: Bruno Cardoso Lopes
Cc: LLVMdev
Subject: Re: [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Hi Bruno,
> Some comments:
>
> + // Try to synthesize horizontal adds from adds of shuffles.
> + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
> + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&&
> + isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit
> HADDPD have the same number of elements, their horizontal operation
> behavior is different (look at AVX manual for details)! If it doesn't,
> just remove the 256-bit handling for now.
it's not clear whether it is correct for 256 bit operations. The AVX docs only describe the integer horizontal operations, not the floating point ones; for the integer ones the 256 bit ones work differently. If someone has a machine with AVX to test on, I've attached avx-hadd.s. It should be possible to do:
gcc -o avx-hadd avx-hadd.s
./avx-hadd
and the result should make it clear.
In the meantime I'm removed the 256 bit logic.
> 2) Rename horizontal.ll to sse3-haddsub.ll
Done!
> 3) Can you duplicate the testcase file to something like
> avx-haddsub.ll, and check for the AVX 128-bit versions too?
I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great).
> 4) Your tablegen modifications are totally fine, for the intrinsics just do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
> (HADDPSrr VR128:$src1, VR128:$src2)>; def :
> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
> (HADDPSrm VR128:$src1, addr:$src2)>; ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
> (VHADDPSrr VR128:$src1, VR128:$src2)>; def :
> Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
> (VHADDPSrm VR128:$src1, addr:$src2)>; ...
I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't know how.
OK to apply?
Ciao, Duncan.
More information about the llvm-dev
mailing list