[all-commits] [llvm/llvm-project] 579812: [X86] LowerRotate: prefer unpack-based algorithm
Ivan via All-commits
all-commits at lists.llvm.org
Mon May 15 03:25:58 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 579812c081a23f70804b7214d59776f59d056914
https://github.com/llvm/llvm-project/commit/579812c081a23f70804b7214d59776f59d056914
Author: Ivan Chikish <nekotekina at gmail.com>
Date: 2023-05-15 (Mon, 15 May 2023)
Changed paths:
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/test/CodeGen/X86/min-legal-vector-width.ll
M llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
M llvm/test/CodeGen/X86/vector-fshl-rot-256.ll
M llvm/test/CodeGen/X86/vector-fshl-rot-512.ll
M llvm/test/CodeGen/X86/vector-fshr-rot-128.ll
M llvm/test/CodeGen/X86/vector-fshr-rot-256.ll
M llvm/test/CodeGen/X86/vector-fshr-rot-512.ll
M llvm/test/CodeGen/X86/vector-rotate-128.ll
M llvm/test/CodeGen/X86/vector-rotate-256.ll
M llvm/test/CodeGen/X86/vector-rotate-512.ll
Log Message:
-----------
[X86] LowerRotate: prefer unpack-based algorithm
Splitting and improving from the https://reviews.llvm.org/D146357
When running tests for LowerShift, I discovered some poor codegen in rotate and funnel shift tests. This patch attempts to address some of them.
Using unpack for splitting and using double-bitwidth shifts may improve performance according to https://uica.uops.info tests.
No cross-lane shuffles
No dirtying double-width registers
Massive improvement for AVX2 rotates in some cases (var_funnnel_v8i16, var_funnnel_v16i16) — because unpack is currently only used for vXi8 vectors.
Differential Revision: https://reviews.llvm.org/D149071
More information about the All-commits
mailing list