[PATCH] D101924: [X86] Improve costmodel for scalar byte swaps
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 5 10:46:09 PDT 2021
craig.topper added a comment.
> Currently we don't model i16 bswap as very high cost (10),
> which doesn't seem right, with all other being at 1.
Was that supposed to say "we model i16 as very high cost(10)" with the don't?
> i8 reg-reg
i16 reg-reg?
It looks like i64 BSWAP on Intel Core is 2 uops. Possibly one uop to swap the bytes in the upper and lower halves separately followed by rotate by 32 to exchange the halves.
================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:2927
+ if (const Instruction *II = ICA.getInst()) {
+ if (II->hasOneUse() && isa<StoreInst>(II->user_back()))
+ return TTI::TCC_Free;
----------------
At least on Intel Core CPUs, MOVBE isn't optimized. It's a load or store and a bswap operation. Maybe it's optimized on Atom/Silvermont/Goldmont? It was added to that line of CPU first possibly because those CPUs have been used in networking equipment.
================
Comment at: llvm/test/Analysis/CostModel/X86/bswap-store.ll:76
define void @var_bswap_store_i64(i64 %a, i64* %dst) {
-; X64-LABEL: 'var_bswap_store_i64'
-; X64-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bswap = call i64 @llvm.bswap.i64(i64 %a)
----------------
These check lines vanished and were not replaced
================
Comment at: llvm/test/Analysis/CostModel/X86/bswap-store.ll:113
define void @var_bswap_store_i128(i128 %a, i128* %dst) {
-; X64-LABEL: 'var_bswap_store_i128'
-; X64-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bswap = call i128 @llvm.bswap.i128(i128 %a)
----------------
Same here
================
Comment at: llvm/test/Analysis/CostModel/X86/load-bswap.ll:121
define i128 @var_load_bswap_i128(i128* %src) {
-; X64-LABEL: 'var_load_bswap_i128'
-; X64-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %a = load i128, i128* %src, align 1
----------------
And here
================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/arith-add-usat.ll:29
define void @add_v8i64() {
-; SSE-LABEL: @add_v8i64(
-; SSE-NEXT: [[A0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
----------------
And here
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101924/new/
https://reviews.llvm.org/D101924
More information about the llvm-commits
mailing list