[all-commits] [llvm/llvm-project] e2d74a: [X86] EmitCmp - always use cmpw with foldable load...

Wed May 15 09:47:12 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e2d74a25eb562b117974add098ba2b9dd4cfc7f5
      https://github.com/llvm/llvm-project/commit/e2d74a25eb562b117974add098ba2b9dd4cfc7f5
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2024-05-15 (Wed, 15 May 2024)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/cmp16.ll
    M llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll
    M llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll
    M llvm/test/CodeGen/X86/memcmp-optsize-x32.ll
    M llvm/test/CodeGen/X86/memcmp-optsize.ll
    M llvm/test/CodeGen/X86/memcmp-pgso-x32.ll
    M llvm/test/CodeGen/X86/memcmp-pgso.ll
    M llvm/test/CodeGen/X86/memcmp-x32.ll
    M llvm/test/CodeGen/X86/memcmp.ll

  Log Message:
  -----------
  [X86] EmitCmp - always use cmpw with foldable loads (#92251)

By default, EmitCmp avoids cmpw with i16 immediates due to 66/67h length-changing prefixes causing stalls, instead extending the value to i32 and using a cmpl with a i32 immediate, unless it has the TuningFastImm16 flag or we're building for optsize/minsize.

However, if we're loading the value for comparison, the performance costs of the decode stalls are likely to be exceeded by the impact of the load latency of the folded load, the shorter encoding and not needing an extra register to store the ext-load.

This matches the behaviour of gcc and msvc.

Fixes #90355

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications