[all-commits] [llvm/llvm-project] e2d74a: [X86] EmitCmp - always use cmpw with foldable load...
Simon Pilgrim via All-commits
all-commits at lists.llvm.org
Wed May 15 09:47:12 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: e2d74a25eb562b117974add098ba2b9dd4cfc7f5
https://github.com/llvm/llvm-project/commit/e2d74a25eb562b117974add098ba2b9dd4cfc7f5
Author: Simon Pilgrim <llvm-dev at redking.me.uk>
Date: 2024-05-15 (Wed, 15 May 2024)
Changed paths:
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/test/CodeGen/X86/cmp16.ll
M llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll
M llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll
M llvm/test/CodeGen/X86/memcmp-optsize-x32.ll
M llvm/test/CodeGen/X86/memcmp-optsize.ll
M llvm/test/CodeGen/X86/memcmp-pgso-x32.ll
M llvm/test/CodeGen/X86/memcmp-pgso.ll
M llvm/test/CodeGen/X86/memcmp-x32.ll
M llvm/test/CodeGen/X86/memcmp.ll
Log Message:
-----------
[X86] EmitCmp - always use cmpw with foldable loads (#92251)
By default, EmitCmp avoids cmpw with i16 immediates due to 66/67h length-changing prefixes causing stalls, instead extending the value to i32 and using a cmpl with a i32 immediate, unless it has the TuningFastImm16 flag or we're building for optsize/minsize.
However, if we're loading the value for comparison, the performance costs of the decode stalls are likely to be exceeded by the impact of the load latency of the folded load, the shorter encoding and not needing an extra register to store the ext-load.
This matches the behaviour of gcc and msvc.
Fixes #90355
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list