[all-commits] [llvm/llvm-project] d7043e: [X86] Add support for "light" AVX

Ilya Tocar via All-commits all-commits at lists.llvm.org
Tue Jan 24 14:03:25 PST 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: d7043e8c41bb74a31c9790616c1536596814567b
      https://github.com/llvm/llvm-project/commit/d7043e8c41bb74a31c9790616c1536596814567b
  Author: Ilya Tokar <tokarip at google.com>
  Date:   2023-01-24 (Tue, 24 Jan 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86.td
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/lib/Target/X86/X86Subtarget.h
    M llvm/lib/Target/X86/X86TargetTransformInfo.h
    A llvm/test/CodeGen/X86/memcpy-light-avx.ll
    M llvm/test/CodeGen/X86/vector-width-store-merge.ll

  Log Message:
  -----------
  [X86] Add support for "light" AVX

AVX/AVX512 instructions may cause frequency drop on e.g. Skylake.
The magnitude of frequency/performance drop depends on instruction
(multiplication vs load/store) and vector width. Currently users,
that want to avoid this drop can specify -mprefer-vector-width=128.
However this also prevents generations of 256-bit wide instructions,
that have no associated frequency drop (mainly load/stores).

Add a tuning flag that allows generations of 256-bit AVX load/stores,
even when -mprefer-vector-width=128 is set, to speed-up memcpy&co.
Verified that running memcpy loop on all cores has no frequency impact
and zero CORE_POWER:LVL[12]_TURBO_LICENSE perf counters.

Makes coping memory faster e.g.:
BM_memcpy_aligned/256 80.7GB/s ± 3% 96.3GB/s ± 9% +19.33% (p=0.000 n=9+9)

Differential Revision: https://reviews.llvm.org/D134982




More information about the All-commits mailing list