[all-commits] [llvm/llvm-project] cc8392: [CVP] Canonicalize signed minmax into unsigned (#8...
Alexey Bataev via All-commits
all-commits at lists.llvm.org
Thu Feb 22 11:37:10 PST 2024
Branch: refs/heads/users/alexey-bataev/spr/slpimprove-minbitwidth-analysis
Home: https://github.com/llvm/llvm-project
Commit: cc839275164a7768451531af868fa70eb9e71cbd
https://github.com/llvm/llvm-project/commit/cc839275164a7768451531af868fa70eb9e71cbd
Author: Yingwei Zheng <dtcxzyw2333 at gmail.com>
Date: 2024-02-23 (Fri, 23 Feb 2024)
Changed paths:
M llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
M llvm/test/Transforms/CorrelatedValuePropagation/min-max.ll
Log Message:
-----------
[CVP] Canonicalize signed minmax into unsigned (#82478)
This patch turns signed minmax to unsigned to match the behavior for
signed icmps.
Alive2: https://alive2.llvm.org/ce/z/UAAM42
Commit: 33a6ce18373ffd1457ebd54e930b6f02fe4c39c1
https://github.com/llvm/llvm-project/commit/33a6ce18373ffd1457ebd54e930b6f02fe4c39c1
Author: Yaxun (Sam) Liu <yaxun.liu at amd.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M clang/lib/CodeGen/CGCUDANV.cpp
M clang/lib/CodeGen/CodeGenModule.cpp
M clang/lib/Driver/OffloadBundler.cpp
M clang/lib/Driver/ToolChains/HIPUtility.cpp
M clang/test/CMakeLists.txt
M clang/test/CodeGenCUDA/device-stub.cu
M clang/test/CodeGenCUDA/host-used-device-var.cu
A clang/test/Driver/Inputs/hip.h
M clang/test/Driver/clang-offload-bundler.c
A clang/test/Driver/hip-partial-link.hip
M clang/test/Driver/hip-toolchain-rdc.hip
Log Message:
-----------
[HIP] Allow partial linking for `-fgpu-rdc` (#81700)
`-fgpu-rdc` mode allows device functions call device functions in
different TU. However, currently all device objects have to be linked
together since only one fat binary is supported. This is time consuming
for AMDGPU backend since it only supports LTO.
There are use cases that objects can be divided into groups in which
device functions are self-contained but host functions are not. It is
desirable to link/optimize/codegen the device code and generate a fatbin
for each group, whereas partially link the host code with `ld -r` or
generate a static library by using the `--emit-static-lib` option of
clang. This avoids linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.
Previously, clang emits an external symbol `__hip_fatbin` for all
objects for `-fgpu-rdc`. With this patch, clang emits an unique external
symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a
group of objects are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary. Each group has its
own fat binary. One executable or shared library can have multiple fat
binaries. Device linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and
merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is
introduced to facilitate debugging and tooling.
Fixes: https://github.com/llvm/llvm-project/issues/77018
Commit: 1069823ce7d154aa8ef87ae5a0fd34b527eca2a0
https://github.com/llvm/llvm-project/commit/1069823ce7d154aa8ef87ae5a0fd34b527eca2a0
Author: Alexander Shaposhnikov <6532716+alexander-shaposhnikov at users.noreply.github.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M llvm/lib/Passes/PassBuilderPipelines.cpp
M llvm/test/Other/new-pm-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
M llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
Log Message:
-----------
Enable JumpTableToSwitch pass by default (#82546)
Enable JumpTableToSwitch pass by default.
Test plan: ninja check-all
Commit: 4f7ab789bf43b49914815bdf4e4c3703f92e781d
https://github.com/llvm/llvm-project/commit/4f7ab789bf43b49914815bdf4e4c3703f92e781d
Author: Boian Petkantchin <boian.petkantchin at amd.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M mlir/lib/Dialect/Mesh/Transforms/Spmdization.cpp
M mlir/test/Dialect/Mesh/spmdization.mlir
Log Message:
-----------
[mlir][mesh] add support in spmdization for incomplete sharding annotations (#82442)
Don't require that `mesh.shard` operations come in pairs. If there is
only a single `mesh.shard` operation we assume that the producer result
and consumer operand have the same sharding.
Commit: 744c0057e7dc0d1d046a4867cece2f31fee9bb23
https://github.com/llvm/llvm-project/commit/744c0057e7dc0d1d046a4867cece2f31fee9bb23
Author: Nashe Mncube <nashe.mncube at arm.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
A llvm/test/CodeGen/AArch64/16bit-float-promotion-with-nofp.ll
M llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll
Log Message:
-----------
[AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr (#81724)
When performing lowering of the fptrunc opcode returning fp16 with the
+nofp flag enabled we could trigger a compiler crash. This is because we
had no custom lowering implemented. This patch
the case in which we need to promote an fp16 return type
for fptrunc when the +nofp attr is enabled.
Commit: 6ddb25ed9ca2cb0f4ad8f402d7411ac3328f598d
https://github.com/llvm/llvm-project/commit/6ddb25ed9ca2cb0f4ad8f402d7411ac3328f598d
Author: Florian Mayer <fmayer at google.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M compiler-rt/lib/scudo/standalone/combined.h
Log Message:
-----------
[scudo] increase frames per stack to 16 for stack depot (#82427)
8 was very low and it is likely that in real workloads we have more than
an average of 8 frames per stack given on Android we have 3 at the
bottom: __start_main, __libc_init, main, and three at the top: malloc,
scudo_malloc and Allocator::allocate. That leaves 2 frames for
application code, which is clearly unreasonable.
Commit: 242f98c7ab7c100d76cac29b555db20205619b38
https://github.com/llvm/llvm-project/commit/242f98c7ab7c100d76cac29b555db20205619b38
Author: Benjamin Kramer <benny.kra at googlemail.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M clang/test/CodeGen/aarch64-sme-inline-streaming-attrs.c
Log Message:
-----------
[Clang][SME] Skip writing output files to the source directory
Commit: 3168af56bcb827360c26957ef579b7871dad8e17
https://github.com/llvm/llvm-project/commit/3168af56bcb827360c26957ef579b7871dad8e17
Author: Benjamin Kramer <benny.kra at googlemail.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M llvm/test/Transforms/LoopVectorize/X86/pr72969.ll
Log Message:
-----------
LoopVectorize: Mark crash test as requiring assertions
Commit: 32994cc0d63513f77223c64148faeeb50aebb702
https://github.com/llvm/llvm-project/commit/32994cc0d63513f77223c64148faeeb50aebb702
Author: Alexey Bataev <5361294+alexey-bataev at users.noreply.github.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
M llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/reorder-fmuladd-crash.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/vec3-reorder-reshuffle.ll
M llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
M llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll
M llvm/test/Transforms/SLPVectorizer/X86/reorder-clustered-node.ll
M llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll
M llvm/test/Transforms/SLPVectorizer/X86/reorder-vf-to-resize.ll
M llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder.ll
M llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll
M llvm/test/Transforms/SLPVectorizer/X86/vec3-reorder-reshuffle.ll
Log Message:
-----------
[SLP]Improve findReusedOrderedScalars and graph rotation.
Patch syncs the code in findReusedOrderedScalars with cost
estimation/codegen. It tries to use similar logic to better determine
best order.
Before, it just tried to find previously vectorized node without
checking if it is possible to use the vectorized value in the shuffle.
Now it relies on the more generalized version. If it determines, that
a single vector must be reordered (using same mechanism, as codegen and
cost estimation), it generates better order.
The comparison between new/ref ordering:
Metric: SLP.NumVectorInstructions
Program SLP.NumVectorInstructions
results results0 diff
test-suite :: MultiSource/Benchmarks/nbench/nbench.test 139.00 140.00 0.7%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 344.00 346.00 0.6%
test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1293.00 1292.00 -0.1%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5176.00 5170.00 -0.1%
test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 5173.00 5167.00 -0.1%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11692.00 11660.00 -0.3%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 1621.00 1615.00 -0.4%
test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 795.00 792.00 -0.4%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26499.00 26338.00 -0.6%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 7343.00 7281.00 -0.8%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1104.00 1094.00 -0.9%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2216.00 2180.00 -1.6%
test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 787.00 637.00 -19.1%
Less 0% is better.
Most of the benchmarks see more vectorized code. The first ones just
have shuffles removed.
The ordering analysis still may require some improvements (e.g. for
alternate nodes), but this one should be produce better results.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/77529
Commit: 7f00130aa6151a9cba3cad2f85516ccb5171d7b0
https://github.com/llvm/llvm-project/commit/7f00130aa6151a9cba3cad2f85516ccb5171d7b0
Author: Alexey Bataev <a.bataev at outlook.com>
Date: 2024-02-22 (Thu, 22 Feb 2024)
Changed paths:
M clang/lib/CodeGen/CGCUDANV.cpp
M clang/lib/CodeGen/CodeGenModule.cpp
M clang/lib/Driver/OffloadBundler.cpp
M clang/lib/Driver/ToolChains/HIPUtility.cpp
M clang/test/CMakeLists.txt
M clang/test/CodeGen/aarch64-sme-inline-streaming-attrs.c
M clang/test/CodeGenCUDA/device-stub.cu
M clang/test/CodeGenCUDA/host-used-device-var.cu
A clang/test/Driver/Inputs/hip.h
M clang/test/Driver/clang-offload-bundler.c
A clang/test/Driver/hip-partial-link.hip
M clang/test/Driver/hip-toolchain-rdc.hip
M compiler-rt/lib/scudo/standalone/combined.h
M llvm/lib/Passes/PassBuilderPipelines.cpp
M llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
M llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
A llvm/test/CodeGen/AArch64/16bit-float-promotion-with-nofp.ll
M llvm/test/CodeGen/AArch64/strictfp_f16_abi_promote.ll
M llvm/test/Other/new-pm-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
M llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
M llvm/test/Transforms/CorrelatedValuePropagation/min-max.ll
M llvm/test/Transforms/LoopVectorize/X86/pr72969.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/reorder-fmuladd-crash.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll
M llvm/test/Transforms/SLPVectorizer/AArch64/vec3-reorder-reshuffle.ll
M llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll
M llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll
M llvm/test/Transforms/SLPVectorizer/X86/reorder-clustered-node.ll
M llvm/test/Transforms/SLPVectorizer/X86/reorder-reused-masked-gather.ll
M llvm/test/Transforms/SLPVectorizer/X86/reorder-vf-to-resize.ll
M llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reorder.ll
M llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll
M llvm/test/Transforms/SLPVectorizer/X86/vec3-reorder-reshuffle.ll
M mlir/lib/Dialect/Mesh/Transforms/Spmdization.cpp
M mlir/test/Dialect/Mesh/spmdization.mlir
Log Message:
-----------
Rebase
Created using spr 1.3.5
Compare: https://github.com/llvm/llvm-project/compare/89754078a3fd...7f00130aa615
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list