[PATCH] D15537: limit the number of instructions per block examined by dead store elimination

Tue Aug 23 12:07:11 PDT 2016

On 08/23/2016 11:26 AM, Xinliang David Li wrote:
> On Tue, Aug 23, 2016 at 9:38 AM, Philip Reames
> <listmail at philipreames.com> wrote:
>> reames added a comment.
>>
>> I just want to make sure I understand the big picture view of this change.  I'm going to try to summarize, please correct me if I'm wrong.
>>
>> Today, we will walk back through the list of defs/clobbers provided by MDA (within a single block) without limit in DSE.  Internally, MDA will only find defs/clobbers which are within a limited distance of each other.  As a result, a series of adjacent clobbers will be scanned, but the same series of adjacent clobbers with a single long break of non-memory related instructions will not be.  Right?
>>
> See below the small example that demonstrate the issue : opt  -S -dse
> -memdep-block-scan-limit=4 < dse.ll
I can't tell if this is meant as "yes, your summary was correct, here's 
an example", or "no, you misunderstood, let me clarify". Which was it?
>
> Suppose we look at g1 = 0 store and walk backwards to look for
> dependent access. Since the scan limit is set to 4 which is a little
> higher than the distance it needs walk to find a dependent memory
> access, a valid memLoc is returned. DSE then finds out that dependent
> memory is not removable, so it will continue walk up the dep-chain
> starting from where it stopped. In other words, it essentially
> bypassed the scan limit.
>
>
>
>
> // C code:
>
> extern int g1, g2[];
>
> extern int *p[];
>
> void test() {
>     *(p[2]) = 0;
>     g2[2] = 0;
>     *(p[1]) = 0;
>     g2[1] = 0;
>     *(p[0]) = 0;
>     g2[0] = 0;
>     g1 = 0;
> }
>
> // LLVM IR:
>
> source_filename = "dse.c"
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> @p = external global [0 x i32*], align 8
> @g2 = external global [0 x i32], align 4
> @g1 = external global i32, align 4
>
> ; Function Attrs: nounwind uwtable
> define void @test() #0 {
>    %1 = load i32*, i32** getelementptr inbounds ([0 x i32*], [0 x
> i32*]* @p, i64 0, i64 2), align 8, !tbaa !1
>    store i32 0, i32* %1, align 4, !tbaa !5
>    store i32 0, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @g2,
> i64 0, i64 2), align 4, !tbaa !5
>    %2 = load i32*, i32** getelementptr inbounds ([0 x i32*], [0 x
> i32*]* @p, i64 0, i64 1), align 8, !tbaa !1
>    store i32 0, i32* %2, align 4, !tbaa !5
>    store i32 0, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @g2,
> i64 0, i64 1), align 4, !tbaa !5
>    %3 = load i32*, i32** getelementptr inbounds ([0 x i32*], [0 x
> i32*]* @p, i64 0, i64 0), align 8, !tbaa !1
>    store i32 0, i32* %3, align 4, !tbaa !5
>    store i32 0, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @g2,
> i64 0, i64 0), align 4, !tbaa !5
>    store i32 0, i32* @g1, align 4, !tbaa !5
>    ret void
> }
>
> attributes #0 = { nounwind uwtable
> "correctly-rounded-divide-sqrt-fp-math"="false"
> "disable-tail-calls"="false" "less-precise-fpmad"="false"
> "no-frame-pointer-elim"="false" "no-infs-fp-math"="false"
> "no-jump-tables"="false" "no-nans-fp-math"="false"
> "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8"
> "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87"
> "unsafe-fp-math"="false" "use-soft-float"="false" }
>
> !llvm.ident = !{!0}
>
> !0 = !{!"clang version 4.0.0 (trunk 279504)"}
> !1 = !{!2, !2, i64 0}
> !2 = !{!"any pointer", !3, i64 0}
> !3 = !{!"omnipotent char", !4, i64 0}
> !4 = !{!"Simple C/C++ TBAA"}
> !5 = !{!6, !6, i64 0}
> !6 = !{!"int", !3, i64 0}
>
>
>> With the patch, we will walk backwards a fixed distance (in number of instructions) considering any def/clobber we see in that window.
> With this patch, we will remember the previous distance that we have
> walked and make sure the total distance is limited.
>
>> Is that a correct summary?
>>
>> (p.s. Using the same default value from the original implementation with the new one seems highly suspect since the old implementation would have been much more aggressive in practice..)
> This is of course tunable, but IMO, scan backward 100
> clobbering/aliased instructions in one basic block for every store
> instruction that is examined is already a pretty generous.
>
> thanks,
>
> David
>
>
>>
>> https://reviews.llvm.org/D15537
>>
>>
>>