[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Bruno Cardoso Lopes
bruno.cardoso at gmail.com
Wed Sep 21 14:00:36 PDT 2011
Hi Duncan,
On Wed, Sep 21, 2011 at 1:24 PM, Duncan Sands <baldrick at free.fr> wrote:
> This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from
> floating
> point additions and subtractions of appropriate vector shuffles. To do this
> I
> introduced new x86 FHADD and FHSUB opcodes. These need to be wired up
> somehow
> in the .td file to the appropriate instructions. Since I have no idea how
> tablegen works I just hacked it in horribly. It works, but breaks support
> for
> the hadd etc intrinsics (if you take a look at how I did it you will see
> why!).
> I'm sending the patch for comments, and in the hope that someone will
> explain
> how I should be doing the tablegen bits.
This is awesome :D
Some comments:
+ // Try to synthesize horizontal adds from adds of shuffles.
+ if (((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
+ (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) &&
+ isHorizontalBinOp(LHS, RHS, true))
1) You probably want to do something like:
"bool HasHorizontalArith = Subtarget->hasSSE3() ||
Subtarget->hasAVX()" and check it for the first condition, because
when AVX is on, the SSE levels are all turned off (as to consider AVX
a reimplementation of all SSE levels).
For the second condition: Does this logic works for 256-bit vectors?
I'm asking that because although the 128-bit HADDPS and the 256-bit
HADDPD have the same number of elements, their horizontal operation
behavior is different (look at AVX manual for details)! If it doesn't,
just remove the 256-bit handling for now.
2) Rename horizontal.ll to sse3-haddsub.ll
3) Can you duplicate the testcase file to something like
avx-haddsub.ll, and check for the AVX 128-bit versions too?
4) Your tablegen modifications are totally fine, for the intrinsics just do:
let Predicates = [HasSSE3] in {
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
(HADDPSrr VR128:$src1, VR128:$src2)>;
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
(HADDPSrm VR128:$src1, addr:$src2)>;
...
and
let Predicates = [HasAVX] in {
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2),
(VHADDPSrr VR128:$src1, VR128:$src2)>;
def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)),
(VHADDPSrm VR128:$src1, addr:$src2)>;
...
Thanks Duncan,
--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
More information about the llvm-dev
mailing list