[PATCH] D101924: [X86] Improve costmodel for scalar byte swaps

Wed May 5 10:46:09 PDT 2021

craig.topper added a comment.

> Currently we don't model i16 bswap as very high cost (10),
> which doesn't seem right, with all other being at 1.

Was that supposed to say "we model i16 as very high cost(10)" with the don't?

> i8 reg-reg

i16 reg-reg?

It looks like i64 BSWAP on Intel Core is 2 uops. Possibly one uop to swap the bytes in the upper and lower halves separately followed by rotate by 32 to exchange the halves.

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:2927
+      if (const Instruction *II = ICA.getInst()) {
+        if (II->hasOneUse() && isa<StoreInst>(II->user_back()))
+          return TTI::TCC_Free;
----------------
At least on Intel Core CPUs, MOVBE isn't optimized. It's a load or store and a bswap operation. Maybe it's optimized on Atom/Silvermont/Goldmont? It was added to that line of CPU first possibly because those CPUs have been used in networking equipment.

================
Comment at: llvm/test/Analysis/CostModel/X86/bswap-store.ll:76
 define void @var_bswap_store_i64(i64 %a, i64* %dst) {
-; X64-LABEL: 'var_bswap_store_i64'
-; X64-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %bswap = call i64 @llvm.bswap.i64(i64 %a)
----------------
These check lines vanished and were not replaced

================
Comment at: llvm/test/Analysis/CostModel/X86/bswap-store.ll:113
 define void @var_bswap_store_i128(i128 %a, i128* %dst) {
-; X64-LABEL: 'var_bswap_store_i128'
-; X64-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %bswap = call i128 @llvm.bswap.i128(i128 %a)
----------------
Same here

================
Comment at: llvm/test/Analysis/CostModel/X86/load-bswap.ll:121
 define i128 @var_load_bswap_i128(i128* %src) {
-; X64-LABEL: 'var_load_bswap_i128'
-; X64-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %a = load i128, i128* %src, align 1
----------------
And here

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/arith-add-usat.ll:29
 define void @add_v8i64() {
-; SSE-LABEL: @add_v8i64(
-; SSE-NEXT:    [[A0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
----------------
And here

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101924/new/

https://reviews.llvm.org/D101924