[PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation
Ahmed Bougacha
ahmed.bougacha at gmail.com
Wed Apr 15 11:56:37 PDT 2015
A few general notes:
- A LangRef description of the intrinsic would be useful for reviewing (and of course necessary for committing).
- Also, the TTI parts are independent of the intrinsic/building/legalization parts; perhaps submit those with the patch that introduces their usage? (the vectorizer I guess)
- I'm a bit confused about the lowering: if you only implement the v16i8 legal case, I don't expect to see anything other than a .td pattern and the SelectionDAGBuilder code.
- In general, I think this will eventually need much stronger legalization: once something is in IR, people *will* use it, and IMO, if there isn't a good reason, it shouldn't fail to select or crash the compiler.
Thanks for working on this, I'm excited about the vectorizer parts!
-Ahmed
REPOSITORY
rL LLVM
================
Comment at: include/llvm/IR/Intrinsics.td:589
@@ +588,3 @@
+// vectors.
+def int_sad : Intrinsic<[llvm_anyint_ty],
+ [ llvm_anyvector_ty, llvm_anyvector_ty ], [IntrNoMem]>;
----------------
The opposite question was asked on the thread about i32, but can we constraint the return type more than this? (open question, I don't think so)
================
Comment at: include/llvm/IR/Intrinsics.td:590
@@ +589,3 @@
+def int_sad : Intrinsic<[llvm_anyint_ty],
+ [ llvm_anyvector_ty, llvm_anyvector_ty ], [IntrNoMem]>;
+
----------------
No spaces after '[' and before ']'.
================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1291-1295
@@ -1289,3 +1290,7 @@
break;
-
+ case ISD::SAD:
+ Action = TLI.getOperationAction(Node->getOpcode(),
+ Node->getOperand(0).getValueType());
+ SimpleFinishLegalizing = false;
+ break;
default:
----------------
This basically bypasses the legalizer for SAD, for no good reason. It should just be handled by the default case, and the legalization below.
================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2741-2760
@@ -2735,1 +2740,22 @@
+SDValue SelectionDAGLegalize::ExpandSAD(EVT DestVT, SDValue LHS, SDValue RHS,
+ SDLoc DL) {
+ SDValue result;
+ switch (LHS.getValueType().getSimpleVT().SimpleTy) {
+ default:
+ llvm_unreachable("Unhandled Expand type in SAD");
+ case MVT::i16:
+ SDValue sad = DAG.getNode(ISD::SAD, DL, MVT::v2i64, LHS, RHS);
+ EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
+ SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, sad,
+ DAG.getConstant(0, VecIdxTy));
+ SDValue TopHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, sad,
+ DAG.getConstant(1, VecIdxTy));
+
+ BottomHalf = DAG.getNode(ISD::TRUNCATE, DL, DestVT, BottomHalf);
+ TopHalf = DAG.getNode(ISD::TRUNCATE, DL, DestVT, TopHalf);
+ result = DAG.getNode(ISD::ADD, DL, DestVT, TopHalf, BottomHalf);
+ }
+ return result;
+}
+
----------------
I'm confused: what does this achieve?
================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:5096-5100
@@ -5088,2 +5095,7 @@
return nullptr;
+ case Intrinsic::sad: {
+ setValue(&I, ExpandSad(sdl, TLI.getValueType(I.getType()), getValue(I.getArgOperand(0)),
+ getValue(I.getArgOperand(1)), DAG));
+ return nullptr;
+ }
case Intrinsic::cttz: {
----------------
I would inline ExpandSad here, like bswap right above.
================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2087-2097
@@ -2086,2 +2086,13 @@
default: break;
+ case ISD::SAD: {
+ SDNode *New;
+ SDValue Ops[] = {Node->getOperand(0), Node->getOperand(1)};
+
+ if (Subtarget->hasAVX() || Subtarget->hasSSE1() || Subtarget->hasSSE2())
+ New =
+ CurDAG->getMachineNode(X86::PSADBWrr, dl, Node->getValueType(0), Ops);
+ else
+ New = Node;
+ return New;
+ }
case ISD::INTRINSIC_W_CHAIN: {
----------------
Why not a .td pattern? Should be a one-liner (+ the SDNode in TargetSelectionDAG.td).
================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2089
@@ +2088,3 @@
+ SDNode *New;
+ SDValue Ops[] = {Node->getOperand(0), Node->getOperand(1)};
+
----------------
I think getMachineNode has an overload for two operands you can use.
================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2091
@@ +2090,3 @@
+
+ if (Subtarget->hasAVX() || Subtarget->hasSSE1() || Subtarget->hasSSE2())
+ New =
----------------
Two things:
- SDM says SSE1 supports this on v8i8, which isn't really legal.
- why not a single check? For instance, is hasAVX() useful here?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:915
@@ +914,3 @@
+ setOperationAction(ISD::SAD, MVT::v8i8, Custom);
+ setOperationAction(ISD::SAD, MVT::v16i8, Expand);
+
----------------
Why not Legal?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:16928-16929
@@ +16927,4 @@
+ EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
+ SDValue V0 = DAG.getUNDEF(MVT::v8i8);
+ SDValue V1 = DAG.getUNDEF(MVT::v8i8);
+ SDValue Op0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V0, N0);
----------------
Why not a single SDValue?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:16930-16936
@@ +16929,9 @@
+ SDValue V1 = DAG.getUNDEF(MVT::v8i8);
+ SDValue Op0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V0, N0);
+ SDValue Op1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V1, N1);
+ SDValue Ops[] = {Op0, Op1};
+ SDValue SAD = DAG.getNode(ISD::SAD, dl, MVT::v2i64, Ops);
+ SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, SAD,
+ DAG.getConstant(0, VecIdxTy));
+ result = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, BottomHalf);
+ } else
----------------
The original ISD::SAD building returns an integer (matching the intrinsic). Why v2i64 + extract + trunc here, instead of just doing DAG.getNode(ISD::SAD, dl, Op.getValueType(), ...) ?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:16932-16933
@@ +16931,4 @@
+ SDValue Op1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V1, N1);
+ SDValue Ops[] = {Op0, Op1};
+ SDValue SAD = DAG.getNode(ISD::SAD, dl, MVT::v2i64, Ops);
+ SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, SAD,
----------------
There's a getNode overload you can use for two operands: getNode(..., Op0, Op1), like you do right below.
================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:1143-1144
@@ +1142,4 @@
+
+ assert(DataTy->isVectorTy() && "Must be a vector");
+ assert(DataTy->getScalarType()->isIntegerTy() && "Elem must be an integer");
+
----------------
Why not "return false"? It doesn't make much sense to call it on any other type, but I think that's the point of the function.
================
Comment at: test/CodeGen/X86/sad_intrinsic.ll:1
@@ +1,2 @@
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx < %s | FileCheck %s -check-prefix=AVX
+
----------------
Why AVX? This is all SSE2, no?
================
Comment at: test/CodeGen/X86/sad_intrinsic.ll:5
@@ +4,3 @@
+
+; AVX: psadbw %xmm1, %xmm0
+
----------------
Explicitly checking the movd would be useful, I think
http://reviews.llvm.org/D9029
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list