[PATCH] D104630: [AArch64][CostModel] Add cost model for experimental.vector.splice
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 22 01:43:22 PDT 2021
sdesmalen added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1900
{ TTI::SK_Reverse, MVT::nxv2i1, 1 },
+ // Handle the cases for vector.splice with scalable vectors
+ { TTI::SK_Splice, MVT::nxv16i8, 1 },
----------------
Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well.
Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>.
* For a positive offsets we can use SVE's EXT instruction. E.g. to splice at offset #1, the result of the splice will be <x1, x2, x3, y0>.
* For a negative offset, we can't use EXT but we can instead use SPLICE which requires (generating) a predicate. For a negative offset of 1 we need a predicate of: <0, 0, 0, 1>. This means the operation can be done using whilelt+not+splice, so for negative offsets it would be more expensive.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1901-1913
+ { TTI::SK_Splice, MVT::nxv16i8, 1 },
+ { TTI::SK_Splice, MVT::nxv8i16, 1 },
+ { TTI::SK_Splice, MVT::nxv4i32, 1 },
+ { TTI::SK_Splice, MVT::nxv2i64, 1 },
+ { TTI::SK_Splice, MVT::nxv2f16, 1 },
+ { TTI::SK_Splice, MVT::nxv4f16, 1 },
+ { TTI::SK_Splice, MVT::nxv8f16, 1 },
----------------
At the moment, the costs for these is actually quite high because they're expanded to two stores and one reload. That said, I'd prefer not to reflect that in the cost-model because this is not the desired code-gen and we should favour getting more scalable vectorization to get more testing coverage.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1914-1917
+ { TTI::SK_Splice, MVT::nxv16i1, 1 },
+ { TTI::SK_Splice, MVT::nxv8i1, 1 },
+ { TTI::SK_Splice, MVT::nxv4i1, 1 },
+ { TTI::SK_Splice, MVT::nxv2i1, 1 },
----------------
The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare.
================
Comment at: llvm/test/Analysis/CostModel/AArch64/splice.ll:34
+
+ %splice.nv16i8 = call < 16 x i8> @llvm.experimental.vector.splice.nv16i8(< 16 x i8> zeroinitializer, < 16 x i8> zeroinitializer, i32 -1)
+ %splice.nv32i8 = call < 32 x i8> @llvm.experimental.vector.splice.nv32i8(< 32 x i8> zeroinitializer, < 32 x i8> zeroinitializer, i32 -1)
----------------
nv?
================
Comment at: llvm/test/Analysis/CostModel/AArch64/splice.ll:59
+ %splice.nv8i1 = call < 8 x i1> @llvm.experimental.vector.splice.nv8i1(< 8 x i1> zeroinitializer, < 8 x i1> zeroinitializer, i32 -1)
+ %splice.nv4i1 = call < 4 x i1> @llvm.experimental.vector.splice.nv4i1(< 4 x i1> zeroinitializer, < 4 x i1> zeroinitializer, i32 -1)
+ %splice.nv2i1 = call < 2 x i1> @llvm.experimental.vector.splice.nv2i1(< 2 x i1> zeroinitializer, < 2 x i1> zeroinitializer, i32 -1)
----------------
odd spaces.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104630/new/
https://reviews.llvm.org/D104630
More information about the llvm-commits
mailing list