[llvm] [AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts (PR #130756)
Graham Hunter via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 13 06:21:33 PDT 2025
================
@@ -2968,6 +3010,16 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
{ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 4},
{ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 2},
+ // SVE: to nxv2f64
+ {ISD::SINT_TO_FP, MVT::nxv2f64, MVT::nxv2i8, 7},
+ {ISD::SINT_TO_FP, MVT::nxv2f64, MVT::nxv2i16, 5},
+ {ISD::SINT_TO_FP, MVT::nxv2f64, MVT::nxv2i32, 3},
----------------
huntergr-arm wrote:
I based the entries on how the corresponding NEON entries are used. `<vscale x 2 x i32>` is not a legal type, and here it's treated as a packed vector – so rather than interleaving lanes the data is in the first half of the vector. The cost represents the required unpack operations on top of the fcvts themselves.
This isn't great, but it does mostly work with the multiply-by-legalization-factor approach discussed above with Sander.
This work is to address a particular regression in SPEC when max vector bandwidth is enabled, and the cost of a vplan with VF `vscale x 8` is considered to be cheaper than a fixed VF of `8` due to the cost of the converts.
In the NEON case, a `v8i16` is converted to a `v8f64`; TTI reaches this function, hits the call to `BasicTTIImplBase::getCastInstrCost` at the bottom, retries with `v4i16` to `v4f64`, calls the base again and finally finds a match when called with `v2i16` (an illegal type) to `v2f64`. That cost (4) then gets multiplied by the 2 rounds of splitting to give 16, and there's an extra penalty of 3 on top giving a score of 19.
For SVE, it was costed as 1 * 4 (for 2 rounds of splitting) + 3, giving 7. But NEON was able to use 4 tbl instructions, where SVE currently uses 6 unpack instructions. So now the line with a cost of 5 for`nxv2i16` to `nxv2f64` gives us a total cost of 23, and we now pick the fixed length VF instead, preventing the regression.
The same applies to the `nxv2i32` to `nxv2f64` case – we're expecting this to come from a conversion of `nxv4i32` to `nxv4f64`, so the cost of the unpacks is bundled in the same way it is for NEON.
I don't particularly like it, but I don't want to overhaul all the existing NEON code here – some of the numbers in the table date back to when the backend was first merged upstream, and I'm not sure how they were derived.
One possible alternative for this current patch would be to have an SVE-specific helper which calculates legalization separately (and comes up with the cost of the unpacks separately from the cost of the fcvt, which would allow us to change that cost in future if we decide to use tbl instructions for SVE as well), then only asks for the cost of a fully legalized fcvt to multiply by the number of registers required. I initially decided against that because I didn't want to reimplement a bunch of logic for legalizing the types, but it would be more accurate.
Would that be preferable?
https://github.com/llvm/llvm-project/pull/130756
More information about the llvm-commits
mailing list