[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Duncan Sands
baldrick at free.fr
Thu Sep 22 12:14:25 PDT 2011
Hi Bruno,
> Some comments:
>
> + // Try to synthesize horizontal adds from adds of shuffles.
> + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
> + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&&
> + isHorizontalBinOp(LHS, RHS, true))
>
> 1) You probably want to do something like:
>
> "bool HasHorizontalArith = Subtarget->hasSSE3() ||
> Subtarget->hasAVX()" and check it for the first condition, because
> when AVX is on, the SSE levels are all turned off (as to consider AVX
> a reimplementation of all SSE levels).
>
> For the second condition: Does this logic works for 256-bit vectors?
> I'm asking that because although the 128-bit HADDPS and the 256-bit
> HADDPD have the same number of elements, their horizontal operation
> behavior is different (look at AVX manual for details)! If it doesn't,
> just remove the 256-bit handling for now.
it's not clear whether it is correct for 256 bit operations. The AVX docs
only describe the integer horizontal operations, not the floating point ones;
for the integer ones the 256 bit ones work differently. If someone has a
machine with AVX to test on, I've attached avx-hadd.s. It should be possible
to do:
gcc -o avx-hadd avx-hadd.s
./avx-hadd
and the result should make it clear.
In the meantime I'm removed the 256 bit logic.
> 2) Rename horizontal.ll to sse3-haddsub.ll
Done!
> 3) Can you duplicate the testcase file to something like
> avx-haddsub.ll, and check for the AVX 128-bit versions too?
I added the avx checks to the same file (in which case calling it
sse3-haddsub.ll is not so great).
> 4) Your tablegen modifications are totally fine, for the intrinsics just do:
>
> let Predicates = [HasSSE3] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
> (HADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
> (HADDPSrm VR128:$src1, addr:$src2)>;
> ...
>
> and
>
> let Predicates = [HasAVX] in {
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
> (VHADDPSrr VR128:$src1, VR128:$src2)>;
> def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
> (VHADDPSrm VR128:$src1, addr:$src2)>;
> ...
I came up with a vim macro that added them for me (see attached patch).
Probably there is a way to compress this using tablegen magic, but I don't
know how.
OK to apply?
Ciao, Duncan.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: avx-hadd.s
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hadd.diff
Type: text/x-patch
Size: 23340 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110922/a8cbe343/attachment.bin>
More information about the llvm-dev
mailing list