[PATCH] D20598: [X86] Detect SAD patterns and emit psadbw instructions on X86 redux

Fri May 27 09:39:21 PDT 2016

On Fri, May 27, 2016 at 9:20 AM, Michael Kuperstein <mkuper at google.com> wrote:
> mkuper added a comment.
>
> I agree the spill code is complete nonsense, but I don't think it should block this patch, for two reasons:
>
> 1. We seem to have a real problem with inserts into oversize vectors (e.g. <16 x i32> on SSE2), but I'm not entirely sure it's a high priority. I'm not sure how much of those the vectorizer actually generates - although, for reductions, that may be more common, I haven't checked.
>

I don't think it is a general problem. No such stack saves/restores
are generated if we turn off detectSADPattern for sad16_i8. I think
the problem is because we use INSERT_SUBVECTOR. I tried the following
code then the stack saves/restores were gone.

  if (VT.getSizeInBits() > ResVT.getSizeInBits()) {

    // Update part of elements of the reduction vector. This is done by first
    // extracting a sub-vector from it, updating this sub-vector, and inserting
    // it back.
    SDValue SubPhi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResVT, Phi,
                                 DAG.getIntPtrConstant(0, DL));
    SDValue Res = DAG.getNode(ISD::ADD, DL, ResVT, Sad, SubPhi);
    SmallVector<SDValue, 16> Ops(VT.getSizeInBits() /
ResVT.getSizeInBits(), DAG.getConstant(0, DL, ResVT));
    Ops[0] = Res;
    return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Ops);
  }

> 2. More importantly, even with the spills, it still looks like a large net improvement. E.g. the SSE2 code for the sad16_i8 loop before this patch is:
> ...
> Are you ok with me committing as is, and filing a PR on the insert_subvector issue?
>

I think it is fine. Thanks.