[all-commits] [llvm/llvm-project] 31eaf8: [SLP]Improve minbitwidth analysis.
Alexey Bataev via All-commits
all-commits at lists.llvm.org
Tue Mar 19 08:23:32 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 31eaf86a1e8f1870e6ee4c42088a5213bde294b8
https://github.com/llvm/llvm-project/commit/31eaf86a1e8f1870e6ee4c42088a5213bde294b8
Author: Alexey Bataev <5361294+alexey-bataev at users.noreply.github.com>
Date: 2024-03-19 (Tue, 19 Mar 2024)
Changed paths:
M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
M llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/gather-buildvector-with-minbitwidth-user.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/gather-with-minbith-user.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/getelementptr2.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/reduce-add-i64.ll
M llvm/test/Transforms/SLPVectorizer/RISCV/reductions.ll
M llvm/test/Transforms/SLPVectorizer/X86/PR35777.ll
M llvm/test/Transforms/SLPVectorizer/X86/int-bitcast-minbitwidth.ll
M llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-icmp-to-trunc.ll
M llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-multiuse-with-insertelement.ll
M llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-transformed-operand.ll
M llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-user-not-min.ll
M llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll
M llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll
M llvm/test/Transforms/SLPVectorizer/X86/resched.ll
M llvm/test/Transforms/SLPVectorizer/X86/reused-reductions-with-minbitwidth.ll
M llvm/test/Transforms/SLPVectorizer/X86/same-scalar-in-same-phi-extract.ll
M llvm/test/Transforms/SLPVectorizer/X86/store-insertelement-minbitwidth.ll
M llvm/test/Transforms/SLPVectorizer/alt-cmp-vectorize.ll
Log Message:
-----------
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.
Metric: size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0%
SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.
Original Pull Request: https://github.com/llvm/llvm-project/pull/84334
The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/84536
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list