[PATCH] D101924: [X86] Improve costmodel for scalar byte swaps
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 5 14:15:18 PDT 2021
lebedev.ri added inline comments.
================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:2927
+ if (const Instruction *II = ICA.getInst()) {
+ if (II->hasOneUse() && isa<StoreInst>(II->user_back()))
+ return TTI::TCC_Free;
----------------
craig.topper wrote:
> At least on Intel Core CPUs, MOVBE isn't optimized. It's a load or store and a bswap operation. Maybe it's optimized on Atom/Silvermont/Goldmont? It was added to that line of CPU first possibly because those CPUs have been used in networking equipment.
Looking at actual AMD Zen3 measurements, `movbe r<-m` is `1` uop, while `movbe m<-r` is `2`,
which is actually a regression from Zen2/Zen1, as per https://www.agner.org/optimize/instruction_tables.pdf.
As per that table, both are really slow on haswell/broadwell/skylake*,
but fast on Silvermont/Goldmont*/KNL.
So i think we could mark `movbe r<-m` on AMD's at least.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101924/new/
https://reviews.llvm.org/D101924
More information about the llvm-commits
mailing list