[PATCH] D18586: Allow DeadStoreElimination to track combinations of partial later wrties

Daniel Berlin via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 31 09:55:12 PDT 2016


This looks reasonable, assuming creating a bunch of intervalmaps is not
that expensive.

On Tue, Mar 29, 2016 at 5:41 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> hfinkel created this revision.
> hfinkel added reviewers: dberlin, chandlerc, echristo, igor-laevsky,
> mcrosier, eeckstein.
> hfinkel added a subscriber: llvm-commits.
> Herald added a subscriber: mcrosier.
>
> DeadStoreElimination can currently remove a small store rendered
> unnecessary by a later larger one, but cannot remove a larger store
> rendered unnecessary by a series of later smaller ones. This patch aims to
> rectify that.
>
> It works by keeping an IntervalMap for each store later overwritten only
> partially, and filling in that interval map as more such stores are
> discovered. No additional walking or aliasing queries are added. If the
> IntervalMap forms an interval covering the the entire earlier store, then
> it is dead and can be removed.
>
> I discovered this problem when investigating a performance issue with code
> like this on PowerPC:
>
>   #include <complex>
>   using namespace std;
>
>   complex<float> bar(complex<float> C);
>   complex<float> foo(complex<float> C) {
>     return bar(C)*C;
>   }
>
> which produces this:
>
>   define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias
> nocapture sret %agg.result, i64 %c.coerce) {
>   entry:
>     %ref.tmp = alloca i64, align 8
>     %tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
>     %c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
>     %c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
>     %0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
>     %c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
>     %1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
>     call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret
> %tmpcast, i64 %c.coerce)
>     %2 = bitcast %"struct.std::complex"* %agg.result to i64*
>     %3 = load i64, i64* %ref.tmp, align 8
>     store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE
> ****
>     %_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex",
> %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
>     %4 = lshr i64 %3, 32
>     %5 = trunc i64 %4 to i32
>     %6 = bitcast i32 %5 to float
>     %_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex",
> %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
>     %7 = trunc i64 %3 to i32
>     %8 = bitcast i32 %7 to float
>     %mul_ad.i.i = fmul fast float %6, %1
>     %mul_bc.i.i = fmul fast float %8, %0
>     %mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
>     %mul_ac.i.i = fmul fast float %6, %0
>     %mul_bd.i.i = fmul fast float %8, %1
>     %mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
>     store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
>     store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
>     ret void
>   }
>
> the problem here is not just that the i64 store is unnecessary, but also
> that it blocks further backend optimizations of the other uses of that i64
> value in the backend.
>
> For the value of the interval map, I'm currently using std::tuple<>
> (because it is empty).
>
>
> http://reviews.llvm.org/D18586
>
> Files:
>   lib/Transforms/Scalar/DeadStoreElimination.cpp
>   test/Transforms/DeadStoreElimination/combined-partial-overwrites.ll
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160331/eb65f677/attachment.html>


More information about the llvm-commits mailing list