[PATCH] D142359: [TTI][AArch64] Cost model vector INS instructions

Mon Jan 23 07:04:24 PST 2023

SjoerdMeijer added a comment.

> I see a 2.5% perf uplift for x264 with this on the V1.

I haven't analysed the reasons for this, but it's a nice bonus while making INS a bit cheaper which seems more accurate.

================
Comment at: llvm/test/Analysis/CostModel/AArch64/insert-extract.ll:167
 ; NEO-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i64, ptr %i, align 8
-; NEO-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0
+; NEO-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0
 ; NEO-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %v2
----------------
In D141602, we made the indexed LD1 a bit more expensive with `ST->getVectorInsertExtractBaseCost() + 1`, which resulted in this cost here going up from 3 to 4. But because we lower the cost of `getVectorInsertExtractBaseCost()` to 2 in this patch, the cost of indexed LD1 is back to 3.

Don't think I am too unhappy with all of this: INS is a bit cheaper which I think it is or should be, and LD1 is a bit more expensive. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142359/new/

https://reviews.llvm.org/D142359