[all-commits] [llvm/llvm-project] d838e5: [X86] Add FastImm16 tuning flag to Intel Atom + AM...

Tue May 7 02:00:23 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: d838e5b3e86e7b3b4b2f75ee9c2854e23782888e
      https://github.com/llvm/llvm-project/commit/d838e5b3e86e7b3b4b2f75ee9c2854e23782888e
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2024-05-07 (Tue, 07 May 2024)

  Changed paths:
    M llvm/lib/Target/X86/X86.td
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/cmp16.ll

  Log Message:
  -----------
  [X86] Add FastImm16 tuning flag to Intel Atom + AMD Bobcat/Ryzen Families (#90635)

This patch limits the icmp_i16(x,c) -> icmp_i32(ext(x),ext(c)) fold to CPUs that aren't known to have fast handling for length-changing prefixes for imm16 operands.

We are always assuming that 66/67h length-changing prefixes cause severe stalls and we should always extend imm16 operands and use a i32 icmp instead, the only exception being Intel Bonnell CPUs.

Agner makes this clear (see microarchitecture.pdf) that there are no stalls for any of the Intel Atom family (at least as far as Tremont - not sure about Gracemont or later). This is also true for AMD Bobcat/Jaguar and Ryzen families.

Recent performance Intel CPUs are trickier - Core2/Nehalem and earlier could have a 6-11cy stall, while SandyBridge onwards this is reduced to 3cy or less. I'm not sure if we should accept this as fast or not, we only use this flag for the icmp_i16 case, so that might be acceptable? If so, we should add this to x86-64-v3/v4 tuning as well.

Part of #90355 + #62952

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications