[PATCH] D103952: [CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic

Thu Jun 10 02:20:15 PDT 2021

RosieSumpter added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:280
+    // CTPOP costs should match the codegen from
+    // llvm/test/CodeGen/AArch64/arm64-vpopcnt.ll
+    static const CostTblEntry CtpopCostTbl[] = {
----------------
dmgreen wrote:
> It looks like this file doesn't contain all of the cases. As far as I understand, this is how it works:
>  - v8i8 and v8i16 are legal, so 1 instruction. Fantastic!
>  - v4i16 and v8i16 are converted to "v16i8 ctpop + addp". So cost 2
>  - v2i32 and v4i32 are converted to "v16i8 ctpop + addp + addp". So cost 3
>  - v1i64 and v2i64 are converted to "v16i8 ctpop + addp + addp + addp". So cost 4
> Those are all good. For scalar, as opposed to vector, there is no good instruction though. The generation also looks pretty terrible at the moment.
> So a i8 becomes "and 0xff; expensive-mov; v8i8 cnt; addlv; expensive-mov". The others looks equally expensive too. I would guess a cost of 5 would make sense?
> 
> Everything else would be legalized to one of those types. So you can probably use "auto LT = TLI->getTypeLegalizationCost(DL, RetTy);" and Use LT.first for the type in the table lookup, and return Entry->Cost * LT.second, to include the cost of legalization (how many different vectors there will be).
Thanks for the help with this, I've had another go at it. Only thing I'm not sure about now is that e.g. v2i16 is promoted to v2i32 so with LT.first * Entry->Cost it gets a cost of 3, but checking codegen it seems there are 5 instructions. Something like (2 * Entry->Cost - 1) would give the right numbers, but is this the correct thing to do?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103952/new/

https://reviews.llvm.org/D103952