[PATCH][x86] Add more patterns for SSE/AVX scalar single/double-precision fp arithmetic instructions.

Andrea Di Biagio andrea.dibiagio at gmail.com
Wed Dec 11 12:44:45 PST 2013


Hi Nadav,

Thanks for the feedback!

You are right. It looks like my patterns only works if the initial DAG
contains a vector_shuffle which is then lowered into a MOVSS.
If I pass the following IR to llc:

define <4 x float> @foo(<4 x float> %a, <4 x float> %b) {
  %1 = fadd <4 x float> %a, %b
  %2 = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x
float> %1, <4 x float> %a
  ret <4 x float> %2
}

The Initial selection DAG will contain the following sequence:

    N0: i1 = Constant<0>
    N1: i1 = Constant<-1>
    N2: v4i1 = BUILD_VECTOR N1, N0, N0, N0
    N3: v4f32 = fadd op1, op2
    N4: v4f32 = vselect N2, N3, OtherNode

Unfortunately the target specific combine for 'vselect' is unable to
simplify the 'vselect' into a MOVSS.
In theory it would be easy to add a target combine that folds the
'vselect' according to the rule:

   vselect MASK, SRC1, SRC2    -->    X86ISD::MOVSS  SRC2, SRC1

where MASK is < -1, 0, 0, 0>

With that combine, the new patterns would work even when a 'vselect' is used.
What is your opinion about this?
In case, if you think that this is the correct way to fix it, would it
be acceptable to commit the current patch first and then continue
working on the missing combine?

At the moment what happens is that the vselect is perfectly legal and
no custom lowering is required; eventually ISEL selects a BLENDVPSrr0
from the 'vselect'.

Thanks,
Andrea Di Biagio

On Wed, Dec 11, 2013 at 4:54 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Andrea,
>
> The patterns look okay but I think that they may be fragile. Small changes in the IR passes or legalization may generate sequences that are not matched.  I mention this because it looks like this shuffle should be canonicalized into a select instruction (in InstCombine), because what it does is to blend two vectors:
>
> %2 = shufflevector <4 x float> %1, <4 x float> %a, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
>
> Can you please make sure that after canonicalizing the shuffle->blend your patterns still work?
>
> Thanks,
> Nadav
>
>
>
> On Dec 11, 2013, at 5:11 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>
>> <patch.diff>
>




More information about the llvm-commits mailing list