[PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation

Wed Apr 15 23:21:46 PDT 2015

Hi Ahmed,

Thanks for looking into it.

> -----Original Message-----
> From: Ahmed Bougacha [mailto:ahmed.bougacha at gmail.com]
> Sent: Thursday, April 16, 2015 12:27 AM
> To: Shahid, Asghar-ahmad; hfinkel at anl.gov; spatel at rotateright.com;
> aschwaighofer at apple.com; james.molloy at arm.com;
> ahmed.bougacha at gmail.com
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and
> corresponding ISD::SAD node for "Sum Of Absolute Differences" operation
> 
> A few general notes:
> 
> - A LangRef description of the intrinsic would be useful for reviewing (and of
> course necessary for committing).
Sure, I will do that.
> - Also, the TTI parts are independent of the intrinsic/building/legalization
> parts; perhaps submit those with the patch that introduces their usage?  (the
> vectorizer I guess)
I will submit a follow up patch which enhances the SLP to identify the SAD operation pattern
and contains the usage of this patch.

> - I'm a bit confused about the lowering:  if you only implement the v16i8 legal
> case, I don't expect to see anything other than a .td pattern and the
> SelectionDAGBuilder code.
The custom lowering has been done for V8i8 which is not legal for x86 AVX psad instruction.
For V16i8, the SAD node was rather expanded to meet the semantics  of x86 SAD operation for this type.

> - In general, I think this will eventually need much stronger legalization:  once
> something is in IR, people *will* use it, and IMO, if there isn't a good reason,
> it shouldn't fail to select or crash the compiler.
> 
> Thanks for working on this, I'm excited about the vectorizer parts!
Our Loop vectorizer patch, which is under review, is already using these codegen changes.
http://reviews.llvm.org/differential/diff/23038/

However, I will be sending another patch which utilizes these changes soon.

> -Ahmed
> 
> 
> REPOSITORY
>   rL LLVM
> 
> ================
> Comment at: include/llvm/IR/Intrinsics.td:589 @@ +588,3 @@
> +// vectors.
> +def int_sad : Intrinsic<[llvm_anyint_ty],
> +                        [ llvm_anyvector_ty, llvm_anyvector_ty ],
> +[IntrNoMem]>;
> ----------------
> The opposite question was asked on the thread about i32, but can we
> constraint the return type more than this? (open question, I don't think so)
> 
> ================
> Comment at: include/llvm/IR/Intrinsics.td:590 @@ +589,3 @@
> +def int_sad : Intrinsic<[llvm_anyint_ty],
> +                        [ llvm_anyvector_ty, llvm_anyvector_ty ],
> +[IntrNoMem]>;
> +
> ----------------
> No spaces after '[' and before ']'.
> 
> ================
> Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1291-1295
> @@ -1289,3 +1290,7 @@
>      break;
> -
> +  case ISD::SAD:
> +    Action = TLI.getOperationAction(Node->getOpcode(),
> +                                    Node->getOperand(0).getValueType());
> +    SimpleFinishLegalizing = false;
> +    break;
>    default:
> ----------------
> This basically bypasses the legalizer for SAD, for no good reason.  It should
> just be handled by the default case, and the legalization below.
> 
> ================
> Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2741-2760
> @@ -2735,1 +2740,22 @@
> 
> +SDValue SelectionDAGLegalize::ExpandSAD(EVT DestVT, SDValue LHS,
> SDValue RHS,
> +                                        SDLoc DL) {
> +  SDValue result;
> +  switch (LHS.getValueType().getSimpleVT().SimpleTy) {
> +  default:
> +    llvm_unreachable("Unhandled Expand type in SAD");
> +  case MVT::i16:
> +    SDValue sad = DAG.getNode(ISD::SAD, DL, MVT::v2i64, LHS, RHS);
> +    EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
> +    SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL,
> MVT::i64, sad,
> +                                     DAG.getConstant(0, VecIdxTy));
> +    SDValue TopHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL,
> MVT::i64, sad,
> +                                  DAG.getConstant(1, VecIdxTy));
> +
> +    BottomHalf = DAG.getNode(ISD::TRUNCATE, DL, DestVT, BottomHalf);
> +    TopHalf = DAG.getNode(ISD::TRUNCATE, DL, DestVT, TopHalf);
> +    result = DAG.getNode(ISD::ADD, DL, DestVT, TopHalf, BottomHalf);
> +  }
> +  return result;
> +}
> +
> ----------------
> I'm confused:  what does this achieve?
> 
> ================
> Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:5096-
> 5100
> @@ -5088,2 +5095,7 @@
>      return nullptr;
> +  case Intrinsic::sad: {
> +    setValue(&I, ExpandSad(sdl, TLI.getValueType(I.getType()),
> getValue(I.getArgOperand(0)),
> +                           getValue(I.getArgOperand(1)), DAG));
> +    return nullptr;
> +  }
>    case Intrinsic::cttz: {
> ----------------
> I would inline ExpandSad here, like bswap right above.
> 
> ================
> Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2087-2097
> @@ -2086,2 +2086,13 @@
>    default: break;
> +  case ISD::SAD: {
> +    SDNode *New;
> +    SDValue Ops[] = {Node->getOperand(0), Node->getOperand(1)};
> +
> +    if (Subtarget->hasAVX() || Subtarget->hasSSE1() || Subtarget-
> >hasSSE2())
> +      New =
> +          CurDAG->getMachineNode(X86::PSADBWrr, dl, Node-
> >getValueType(0), Ops);
> +    else
> +      New = Node;
> +    return New;
> +  }
>    case ISD::INTRINSIC_W_CHAIN: {
> ----------------
> Why not a .td pattern?  Should be a one-liner (+ the SDNode in
> TargetSelectionDAG.td).
> 
> ================
> Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2089
> @@ +2088,3 @@
> +    SDNode *New;
> +    SDValue Ops[] = {Node->getOperand(0), Node->getOperand(1)};
> +
> ----------------
> I think getMachineNode has an overload for two operands you can use.
> 
> ================
> Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2091
> @@ +2090,3 @@
> +
> +    if (Subtarget->hasAVX() || Subtarget->hasSSE1() || Subtarget-
> >hasSSE2())
> +      New =
> ----------------
> Two things:
> - SDM says SSE1 supports this on v8i8, which isn't really legal.
> - why not a single check? For instance, is hasAVX() useful here?
> 
> ================
> Comment at: lib/Target/X86/X86ISelLowering.cpp:915
> @@ +914,3 @@
> +    setOperationAction(ISD::SAD, MVT::v8i8, Custom);
> +    setOperationAction(ISD::SAD, MVT::v16i8, Expand);
> +
> ----------------
> Why not Legal?
> 
> ================
> Comment at: lib/Target/X86/X86ISelLowering.cpp:16928-16929
> @@ +16927,4 @@
> +    EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
> +    SDValue V0 = DAG.getUNDEF(MVT::v8i8);
> +    SDValue V1 = DAG.getUNDEF(MVT::v8i8);
> +    SDValue Op0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V0,
> + N0);
> ----------------
> Why not a single SDValue?
> 
> ================
> Comment at: lib/Target/X86/X86ISelLowering.cpp:16930-16936
> @@ +16929,9 @@
> +    SDValue V1 = DAG.getUNDEF(MVT::v8i8);
> +    SDValue Op0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V0,
> N0);
> +    SDValue Op1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V1,
> N1);
> +    SDValue Ops[] = {Op0, Op1};
> +    SDValue SAD = DAG.getNode(ISD::SAD, dl, MVT::v2i64, Ops);
> +    SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl,
> MVT::i64, SAD,
> +                                     DAG.getConstant(0, VecIdxTy));
> +    result = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, BottomHalf);  }
> + else
> ----------------
> The original ISD::SAD building returns an integer (matching the intrinsic).
> Why v2i64 + extract + trunc here, instead of just doing
> DAG.getNode(ISD::SAD, dl, Op.getValueType(), ...) ?
> 
> ================
> Comment at: lib/Target/X86/X86ISelLowering.cpp:16932-16933
> @@ +16931,4 @@
> +    SDValue Op1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V1,
> N1);
> +    SDValue Ops[] = {Op0, Op1};
> +    SDValue SAD = DAG.getNode(ISD::SAD, dl, MVT::v2i64, Ops);
> +    SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl,
> + MVT::i64, SAD,
> ----------------
> There's a getNode overload you can use for two operands:  getNode(..., Op0,
> Op1), like you do right below.
> 
> ================
> Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:1143-1144
> @@ +1142,4 @@
> +
> +  assert(DataTy->isVectorTy() && "Must be a vector");
> +  assert(DataTy->getScalarType()->isIntegerTy() && "Elem must be an
> + integer");
> +
> ----------------
> Why not "return false"?  It doesn't make much sense to call it on any other
> type, but I think that's the point of the function.
> 
> ================
> Comment at: test/CodeGen/X86/sad_intrinsic.ll:1
> @@ +1,2 @@
> +; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx < %s |
> +FileCheck %s -check-prefix=AVX
> +
> ----------------
> Why AVX?  This is all SSE2, no?
> 
> ================
> Comment at: test/CodeGen/X86/sad_intrinsic.ll:5
> @@ +4,3 @@
> +
> +; AVX: psadbw     %xmm1, %xmm0
> +
> ----------------
> Explicitly checking the movd would be useful, I think
> 
> http://reviews.llvm.org/D9029
> 
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
>