[all-commits] [llvm/llvm-project] d7043e: [X86] Add support for "light" AVX
Ilya Tocar via All-commits
all-commits at lists.llvm.org
Tue Jan 24 14:03:25 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: d7043e8c41bb74a31c9790616c1536596814567b
https://github.com/llvm/llvm-project/commit/d7043e8c41bb74a31c9790616c1536596814567b
Author: Ilya Tokar <tokarip at google.com>
Date: 2023-01-24 (Tue, 24 Jan 2023)
Changed paths:
M llvm/lib/Target/X86/X86.td
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/lib/Target/X86/X86Subtarget.h
M llvm/lib/Target/X86/X86TargetTransformInfo.h
A llvm/test/CodeGen/X86/memcpy-light-avx.ll
M llvm/test/CodeGen/X86/vector-width-store-merge.ll
Log Message:
-----------
[X86] Add support for "light" AVX
AVX/AVX512 instructions may cause frequency drop on e.g. Skylake.
The magnitude of frequency/performance drop depends on instruction
(multiplication vs load/store) and vector width. Currently users,
that want to avoid this drop can specify -mprefer-vector-width=128.
However this also prevents generations of 256-bit wide instructions,
that have no associated frequency drop (mainly load/stores).
Add a tuning flag that allows generations of 256-bit AVX load/stores,
even when -mprefer-vector-width=128 is set, to speed-up memcpy&co.
Verified that running memcpy loop on all cores has no frequency impact
and zero CORE_POWER:LVL[12]_TURBO_LICENSE perf counters.
Makes coping memory faster e.g.:
BM_memcpy_aligned/256 80.7GB/s ± 3% 96.3GB/s ± 9% +19.33% (p=0.000 n=9+9)
Differential Revision: https://reviews.llvm.org/D134982
More information about the All-commits
mailing list