[PATCH] D18586: Allow DeadStoreElimination to track combinations of partial later wrties
Daniel Berlin via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 31 09:55:12 PDT 2016
This looks reasonable, assuming creating a bunch of intervalmaps is not
that expensive.
On Tue, Mar 29, 2016 at 5:41 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> hfinkel created this revision.
> hfinkel added reviewers: dberlin, chandlerc, echristo, igor-laevsky,
> mcrosier, eeckstein.
> hfinkel added a subscriber: llvm-commits.
> Herald added a subscriber: mcrosier.
>
> DeadStoreElimination can currently remove a small store rendered
> unnecessary by a later larger one, but cannot remove a larger store
> rendered unnecessary by a series of later smaller ones. This patch aims to
> rectify that.
>
> It works by keeping an IntervalMap for each store later overwritten only
> partially, and filling in that interval map as more such stores are
> discovered. No additional walking or aliasing queries are added. If the
> IntervalMap forms an interval covering the the entire earlier store, then
> it is dead and can be removed.
>
> I discovered this problem when investigating a performance issue with code
> like this on PowerPC:
>
> #include <complex>
> using namespace std;
>
> complex<float> bar(complex<float> C);
> complex<float> foo(complex<float> C) {
> return bar(C)*C;
> }
>
> which produces this:
>
> define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias
> nocapture sret %agg.result, i64 %c.coerce) {
> entry:
> %ref.tmp = alloca i64, align 8
> %tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
> %c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
> %c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
> %0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
> %c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
> %1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
> call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret
> %tmpcast, i64 %c.coerce)
> %2 = bitcast %"struct.std::complex"* %agg.result to i64*
> %3 = load i64, i64* %ref.tmp, align 8
> store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE
> ****
> %_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex",
> %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
> %4 = lshr i64 %3, 32
> %5 = trunc i64 %4 to i32
> %6 = bitcast i32 %5 to float
> %_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex",
> %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
> %7 = trunc i64 %3 to i32
> %8 = bitcast i32 %7 to float
> %mul_ad.i.i = fmul fast float %6, %1
> %mul_bc.i.i = fmul fast float %8, %0
> %mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
> %mul_ac.i.i = fmul fast float %6, %0
> %mul_bd.i.i = fmul fast float %8, %1
> %mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
> store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
> store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
> ret void
> }
>
> the problem here is not just that the i64 store is unnecessary, but also
> that it blocks further backend optimizations of the other uses of that i64
> value in the backend.
>
> For the value of the interval map, I'm currently using std::tuple<>
> (because it is empty).
>
>
> http://reviews.llvm.org/D18586
>
> Files:
> lib/Transforms/Scalar/DeadStoreElimination.cpp
> test/Transforms/DeadStoreElimination/combined-partial-overwrites.ll
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160331/eb65f677/attachment.html>
More information about the llvm-commits
mailing list