[PATCH] D15537: limit the number of instructions per block examined by dead store elimination

Xinliang David Li via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 23 12:30:26 PDT 2016


On Tue, Aug 23, 2016 at 12:07 PM, Philip Reames
<listmail at philipreames.com> wrote:
> On 08/23/2016 11:26 AM, Xinliang David Li wrote:
>>
>> On Tue, Aug 23, 2016 at 9:38 AM, Philip Reames
>> <listmail at philipreames.com> wrote:
>>>
>>> reames added a comment.
>>>
>>> I just want to make sure I understand the big picture view of this
>>> change.  I'm going to try to summarize, please correct me if I'm wrong.
>>>
>>> Today, we will walk back through the list of defs/clobbers provided by
>>> MDA (within a single block) without limit in DSE.  Internally, MDA will only
>>> find defs/clobbers which are within a limited distance of each other.  As a
>>> result, a series of adjacent clobbers will be scanned, but the same series
>>> of adjacent clobbers with a single long break of non-memory related
>>> instructions will not be.  Right?
>>>
>> See below the small example that demonstrate the issue : opt  -S -dse
>> -memdep-block-scan-limit=4 < dse.ll
>
> I can't tell if this is meant as "yes, your summary was correct, here's an
> example", or "no, you misunderstood, let me clarify". Which was it?
>

Sorry about it :).   I think what you summarized is basically correct
except that I could not parse your last sentence : "but the same
series of adjacent clobbers with a single long break of non-memory
related instructions will not be"

David



>>
>> Suppose we look at g1 = 0 store and walk backwards to look for
>> dependent access. Since the scan limit is set to 4 which is a little
>> higher than the distance it needs walk to find a dependent memory
>> access, a valid memLoc is returned. DSE then finds out that dependent
>> memory is not removable, so it will continue walk up the dep-chain
>> starting from where it stopped. In other words, it essentially
>> bypassed the scan limit.
>>
>>
>>
>>
>> // C code:
>>
>> extern int g1, g2[];
>>
>> extern int *p[];
>>
>> void test() {
>>     *(p[2]) = 0;
>>     g2[2] = 0;
>>     *(p[1]) = 0;
>>     g2[1] = 0;
>>     *(p[0]) = 0;
>>     g2[0] = 0;
>>     g1 = 0;
>> }
>>
>> // LLVM IR:
>>
>> source_filename = "dse.c"
>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> target triple = "x86_64-unknown-linux-gnu"
>>
>> @p = external global [0 x i32*], align 8
>> @g2 = external global [0 x i32], align 4
>> @g1 = external global i32, align 4
>>
>> ; Function Attrs: nounwind uwtable
>> define void @test() #0 {
>>    %1 = load i32*, i32** getelementptr inbounds ([0 x i32*], [0 x
>> i32*]* @p, i64 0, i64 2), align 8, !tbaa !1
>>    store i32 0, i32* %1, align 4, !tbaa !5
>>    store i32 0, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @g2,
>> i64 0, i64 2), align 4, !tbaa !5
>>    %2 = load i32*, i32** getelementptr inbounds ([0 x i32*], [0 x
>> i32*]* @p, i64 0, i64 1), align 8, !tbaa !1
>>    store i32 0, i32* %2, align 4, !tbaa !5
>>    store i32 0, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @g2,
>> i64 0, i64 1), align 4, !tbaa !5
>>    %3 = load i32*, i32** getelementptr inbounds ([0 x i32*], [0 x
>> i32*]* @p, i64 0, i64 0), align 8, !tbaa !1
>>    store i32 0, i32* %3, align 4, !tbaa !5
>>    store i32 0, i32* getelementptr inbounds ([0 x i32], [0 x i32]* @g2,
>> i64 0, i64 0), align 4, !tbaa !5
>>    store i32 0, i32* @g1, align 4, !tbaa !5
>>    ret void
>> }
>>
>> attributes #0 = { nounwind uwtable
>> "correctly-rounded-divide-sqrt-fp-math"="false"
>> "disable-tail-calls"="false" "less-precise-fpmad"="false"
>> "no-frame-pointer-elim"="false" "no-infs-fp-math"="false"
>> "no-jump-tables"="false" "no-nans-fp-math"="false"
>> "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8"
>> "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87"
>> "unsafe-fp-math"="false" "use-soft-float"="false" }
>>
>> !llvm.ident = !{!0}
>>
>> !0 = !{!"clang version 4.0.0 (trunk 279504)"}
>> !1 = !{!2, !2, i64 0}
>> !2 = !{!"any pointer", !3, i64 0}
>> !3 = !{!"omnipotent char", !4, i64 0}
>> !4 = !{!"Simple C/C++ TBAA"}
>> !5 = !{!6, !6, i64 0}
>> !6 = !{!"int", !3, i64 0}
>>
>>
>>> With the patch, we will walk backwards a fixed distance (in number of
>>> instructions) considering any def/clobber we see in that window.
>>
>> With this patch, we will remember the previous distance that we have
>> walked and make sure the total distance is limited.
>>
>>> Is that a correct summary?
>>>
>>> (p.s. Using the same default value from the original implementation with
>>> the new one seems highly suspect since the old implementation would have
>>> been much more aggressive in practice..)
>>
>> This is of course tunable, but IMO, scan backward 100
>> clobbering/aliased instructions in one basic block for every store
>> instruction that is examined is already a pretty generous.
>>
>> thanks,
>>
>> David
>>
>>
>>>
>>> https://reviews.llvm.org/D15537
>>>
>>>
>>>
>


More information about the llvm-commits mailing list