[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

Bruno Cardoso Lopes bruno.cardoso at gmail.com
Wed Sep 21 14:00:36 PDT 2011


Hi Duncan,

On Wed, Sep 21, 2011 at 1:24 PM, Duncan Sands <baldrick at free.fr> wrote:
> This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from
> floating
> point additions and subtractions of appropriate vector shuffles.  To do this
> I
> introduced new x86 FHADD and FHSUB opcodes.  These need to be wired up
> somehow
> in the .td file to the appropriate instructions.  Since I have no idea how
> tablegen works I just hacked it in horribly.  It works, but breaks support
> for
> the hadd etc intrinsics (if you take a look at how I did it you will see
> why!).
> I'm sending the patch for comments, and in the hope that someone will
> explain
> how I should be doing the tablegen bits.

This is awesome :D

Some comments:

+  // Try to synthesize horizontal adds from adds of shuffles.
+  if (((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
+       (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) &&
+      isHorizontalBinOp(LHS, RHS, true))

1) You probably want to do something like:

"bool HasHorizontalArith = Subtarget->hasSSE3() ||
Subtarget->hasAVX()" and check it for the first condition, because
when AVX is on, the SSE levels are all turned off (as to consider AVX
a reimplementation of all SSE levels).

For the second condition: Does this logic works for 256-bit vectors?
I'm asking that because although the 128-bit HADDPS and the 256-bit
HADDPD have the same number of elements, their horizontal operation
behavior is different (look at AVX manual for details)! If it doesn't,
just remove the 256-bit handling for now.

2) Rename horizontal.ll to sse3-haddsub.ll
3) Can you duplicate the testcase file to something like
avx-haddsub.ll, and check for the AVX 128-bit versions too?
4) Your tablegen modifications are totally fine, for the intrinsics just do:

let Predicates = [HasSSE3] in {
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
          (HADDPSrr VR128:$src1, VR128:$src2)>;
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
          (HADDPSrm VR128:$src1, addr:$src2)>;
...

and

let Predicates = [HasAVX] in {
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
          (VHADDPSrr VR128:$src1, VR128:$src2)>;
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
          (VHADDPSrm VR128:$src1, addr:$src2)>;
...

Thanks Duncan,

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc




More information about the llvm-dev mailing list