[PATCH] D146747: [RISCV] Model select and insertsubvector shuffle kinds

ShihPo Hung via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Apr 1 11:06:02 PDT 2023


arcbbb added inline comments.


================
Comment at: llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp:299
+    // vslideup.vi  v8, v9, 2
+    return LT.first * getLMULCost(LT.second);
+  case TTI::SK_Select: {
----------------
luke wrote:
> luke wrote:
> > arcbbb wrote:
> > > Thanks for implementing this!
> > > I have a question: LT is from Tp, is it supposed to use SubTp instead of Tp here?
> > Good point, I'm not sure. Aarch64 seems to SubTp for costing their subvector inserts. Should we be using both legalisation costs?
> Using `SubTp` for the legalisation cost here gives us this diff for this test case
> 
> ```
>  define <8 x i64> @insert_subvector_offset_1_v8i64(<8 x i64> %v, <8 x i64> %w) {
>  ; CHECK-LABEL: 'insert_subvector_offset_1_v8i64'
> -; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %res = shufflevector <8 x i64> %v, <8 x i64> %w, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
> +; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %res = shufflevector <8 x i64> %v, <8 x i64> %w, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
>  ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret <8 x i64> %res
>  ;
>    %res = shufflevector <8 x i64> %v, <8 x i64> %w, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
>    ret <8 x i64> %res
>  }
> ```
> 
> This is what's actually generated:
> 
> ```
> insert_subvector_offset_1_v8i64:        # @insert_subvector_offset_1_v8i64
> 	.cfi_startproc
> # %bb.0:
> 	vsetivli	zero, 5, e64, m4, tu, ma
> 	vslideup.vi	v8, v12, 1
> 	ret
> ```
> 
> It's using LMUL=4 here so I would presume we still want to cost it as 4 * one vslideup.
> 	vsetivli	zero, 5, e64, m4, tu, ma
> 	vslideup.vi	v8, v12, 1
I thought the operation was done after 2*VLEN was written even  though LMUL is 4, but after having second thoughts, I think it really depends on HW implementation. So it is fine to me now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146747/new/

https://reviews.llvm.org/D146747



More information about the llvm-commits mailing list