[LLVMdev] GlobalsModRef (and thus LTO) is completely broken
Xinliang David Li
xinliangli at gmail.com
Tue Jul 21 15:50:59 PDT 2015
reg_values is a file static variable in cselib.c -- so you might be able to
reproduce the issue with a smaller reproducible.
David
On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:
>
> > On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> >
> > Based on function names and structures, this is some version of GCC :)
> Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I
> only did that on IRC).
>
> > Any way you can post the entire .ll file?
> It’s an LTO build, so it’d be troublesome.. I tried to print the module in
> lldb, but after a several minutes it hasn’t even finished printing globals
> (which I assume is the very beginning).
>
> >
> > Because it's globalsmodref, it's hard to debug without the other
> > functions, since it goes over all the functions to determine address
> > takenness, etc :)
> Yep, I understand that - I’m still in debugger though, so if you’re
> interested in some particular data, I can try collecting it. I can try to
> dump the module too, but it might be not-practical in the end:)
>
> Michael
> >
> >
> > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin
> > <mzolotukhin at apple.com> wrote:
> >> Hi Chandler,
> >>
> >> We observed some regressions in our regular testing (despite I saw
> nothing
> >> suspicious in my runs). I did accurate investigation and was able to
> >> reproduce and track down the regression.
> >> I found the exact request to GlobalsModRef that results in the
> performance
> >> loss (I added a limit on number of requests into the implementation and
> >> bisected the number to find the interesting request).
> >>
> >> Here are the details:
> >>
> >> We’re checking following two locations:
> >>
> >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump()
> >> %arrayidx.i = getelementptr inbounds [1 x %struct.elt_list*], [1 x
> >> %struct.elt_list*]* %te.i, i64 0, i64 %indvars.iv.i
> >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump()
> >> @reg_values = internal unnamed_addr global %struct.varray_head_tag*
> null,
> >> align 8
> >>
> >> and the function in question is “cselib_init”:
> >> (lldb) p
> >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName()
> >> (llvm::StringRef) $3 = (Data = "cselib_init", Length = 11)
> >>
> >> Corresponding underlying values:
> >> (lldb) p UV2->dump()
> >> @reg_values = internal unnamed_addr global %struct.varray_head_tag*
> null,
> >> align 8
> >> (lldb) p UV1->dump()
> >> %32 = load %struct.varray_head_tag*, %struct.varray_head_tag**
> @reg_values,
> >> align 8, !tbaa !2
> >>
> >> Backtrace:
> >> (lldb) bt
> >> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib`(anonymous
> >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10,
> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at
> >> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason =
> step
> >> over
> >> * frame #0: 0x00000001038b752a libLTO.dylib`(anonymous
> >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10,
> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at
> >> GlobalsModRef.cpp:519
> >> frame #1: 0x00000001038b82f7 libLTO.dylib`non-virtual thunk to
> >> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30,
> >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at
> >> GlobalsModRef.cpp:562
> >> frame #2: 0x00000001038d6aa8
> >> libLTO.dylib`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30,
> >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at
> AliasAnalysis.cpp:288
> >> frame #3: 0x0000000103a0a814
> >>
> libLTO.dylib`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0,
> >> MemLoc=0x00007fff5fbf6268, isLoad=true,
> ScanIt=llvm::BasicBlock::iterator at
> >> 0x00007fff5fbf4390, BB=0x000000010a19ffa0,
> QueryInst=0x000000010a1a20c8) +
> >> 1908 at MemoryDependenceAnalysis.cpp:570
> >> frame #4: 0x0000000103a0ffa5
> >>
> libLTO.dylib`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0,
> >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true,
> >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) +
> 2165
> >> at MemoryDependenceAnalysis.cpp:965
> >> frame #5: 0x0000000103a0e3a9
> >>
> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0,
> >> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8,
> >> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0,
> >> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208,
> SkipFirstBlock=false)
> >> + 5897 at MemoryDependenceAnalysis.cpp:1200
> >> frame #6: 0x0000000103a0cb3b
> >>
> libLTO.dylib`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0,
> >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at
> >> MemoryDependenceAnalysis.cpp:911
> >> frame #7: 0x000000010340c5b5 libLTO.dylib`(anonymous
> >> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680,
> >> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706
> >> frame #8: 0x0000000103408eef libLTO.dylib`(anonymous
> >> namespace)::GVN::processLoad(this=0x000000010e6ce680,
> L=0x000000010a1a20c8)
> >> + 1551 at GVN.cpp:1905
> >> frame #9: 0x00000001034080fd libLTO.dylib`(anonymous
> >> namespace)::GVN::processInstruction(this=0x000000010e6ce680,
> >> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220
> >> frame #10: 0x0000000103407d1b libLTO.dylib`(anonymous
> >> namespace)::GVN::processBlock(this=0x000000010e6ce680,
> >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394
> >> frame #11: 0x0000000103401755 libLTO.dylib`(anonymous
> >> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680,
> >> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677
> >> frame #12: 0x0000000103400fef libLTO.dylib`(anonymous
> >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680,
> >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352
> >> frame #13: 0x00000001027cd05b
> >> libLTO.dylib`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810,
> >> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520
> >> frame #14: 0x00000001027cd375
> >> libLTO.dylib`llvm::FPPassManager::runOnModule(this=0x000000010eba6810,
> >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540
> >> frame #15: 0x00000001027cdda1 libLTO.dylib`(anonymous
> >> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0,
> >> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596
> >> frame #16: 0x00000001027cd636
> >> libLTO.dylib`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740,
> >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698
> >> frame #17: 0x00000001027ce521
> >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8,
> >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729
> >>
> >>
> >> The function body is in the attached file.
> >>
> >>
> >>
> >> GlobalsModRef reports NoAlias for this pair, here:
> >> if (GV1 || GV2) {
> >> // If the global's address is taken, pretend we don't know it's a
> >> pointer to
> >> // the global.
> >> if (GV1 && !NonAddressTakenGlobals.count(GV1))
> >> GV1 = nullptr;
> >> if (GV2 && !NonAddressTakenGlobals.count(GV2))
> >> GV2 = nullptr;
> >>
> >> // If the two pointers are derived from two different non-addr-taken
> >> // globals, or if one is and the other isn't, we know these can't
> alias.
> >> if ((GV1 || GV2) && GV1 != GV2)
> >> return NoAlias;
> >>
> >> // Otherwise if they are both derived from the same addr-taken
> global,
> >> we
> >> // can't know the two accesses don't overlap.
> >> }
> >>
> >>
> >> Thanks,
> >> Michael
> >>
> >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
> >>
> >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich
> >> <evgeny.astigeevich at arm.com> wrote:
> >>>
> >>> It’s Dhrystone.
> >>
> >> Dhrystone has historically not been a good indicator of real-world
> >> performance fluctuations, especially at this small of a shift.
> >>
> >> I'd like to see if we see any fluctuation on larger and more realistic
> >> application benchmarks. One advantage of the flag being set is that we
> >> should get runs from folks who have automatic builds and runs
> periodically
> >> from trunk. Those should help give an accurate picture.
> >>
> >>>
> >>>
> >>>
> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com]
> >>> Sent: 17 July 2015 16:10
> >>>
> >>>
> >>> To: Evgeny Astigeevich; Chandler Carruth
> >>> Cc: LLVM Developers Mailing List
> >>>
> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely
> broken
> >>>
> >>>
> >>>
> >>> Can you say what Benchmark or give a test case so we understand the
> nature
> >>> of the regression? As Gerolf said, that will be important to
> understand what
> >>> is best to do.
> >>>
> >>>
> >>>
> >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich
> >>> <Evgeny.Astigeevich at arm.com> wrote:
> >>>
> >>> Yes, the regression is stable. I double checked this. A full benchmark
> run
> >>> consists of at least 10 sub-runs to validate the score.
> >>>
> >>> I also checked if there were regressions of this benchmark across
> >>> different ARM hardware versions. I found all regressions of this
> benchmark
> >>> were in range 1.6%-2%.
> >>>
> >>>
> >>>
> >>> Kind regards,
> >>>
> >>> Evgeny Astigeevich
> >>>
> >>>
> >>>
> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com]
> >>> Sent: 17 July 2015 07:52
> >>> To: Evgeny Astigeevich; Chandler Carruth
> >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin
> >>>
> >>>
> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely
> broken
> >>>
> >>>
> >>>
> >>> Hey, thanks for benchmarking.
> >>>
> >>>
> >>>
> >>> How stable is the 2% regression?
> >>>
> >>>
> >>>
> >>> Michael ran some benchmarks with GlobalsModRef completely disabled and
> the
> >>> only differences were in the noise. This was a complete spec2k6 run
> along
> >>> with some others. Based on the number of benchmarks run there, I'm
> going to
> >>> go ahead and submit these patches, but if you can clarify the impact
> here,
> >>> we can look at potentially some other tradeoff. I'm not particularly
> set on
> >>> one set of defaults, etc, I just don't want to keep patches held up
> based on
> >>> that. We can flip the default back and forth as new data arrives.
> >>>
> >>>
> >>>
> >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich
> >>> <evgeny.astigeevich at arm.com> wrote:
> >>>
> >>> Hi Chandler,
> >>>
> >>>
> >>>
> >>> I ran couple benchmarks with LTO turned on and your patches on ARM
> >>> hardware.
> >>>
> >>> There were no performance degradation of one benchmark and 2% slowdown
> of
> >>> another benchmark.
> >>>
> >>>
> >>>
> >>> Kind regards,
> >>>
> >>> Evgeny Astigeevich
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On
> >>> Behalf Of Evgeny Astigeevich
> >>> Sent: 15 July 2015 15:12
> >>>
> >>>
> >>> To: 'Chandler Carruth'; Gerolf Hoflehner
> >>> Cc: LLVM Developers Mailing List
> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely
> broken
> >>>
> >>>
> >>>
> >>> Hi Chandler,
> >>>
> >>>
> >>>
> >>> I would like to run some benchmarks on ARM hardware and to look at
> impact
> >>> of your patches on LTO.
> >>>
> >>>
> >>>
> >>> Kind regards,
> >>>
> >>> Evgeny Astigeevich
> >>>
> >>>
> >>>
> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On
> >>> Behalf Of Chandler Carruth
> >>>
> >>>
> >>> Sent: 15 July 2015 10:45
> >>> To: Chandler Carruth; Gerolf Hoflehner
> >>> Cc: LLVM Developers Mailing List
> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely
> broken
> >>>
> >>>
> >>>
> >>> I've fixed the obvious bugs I spotted in r242281. These should be pure
> >>> correctness improvements.
> >>>
> >>>
> >>>
> >>> I've sent the two patches I'm imagining to address the core issue here:
> >>>
> >>> http://reviews.llvm.org/D11213
> >>>
> >>> http://reviews.llvm.org/D11214
> >>>
> >>>
> >>>
> >>> Currently, I have the unsafe alias results disabled by default, but
> with a
> >>> flag that can re-enable them if needed. I don't feel really strongly
> about
> >>> which way the default is set -- but that may be because I don't have
> lots of
> >>> users relying on LTO. I'll let others indicate which way they would be
> most
> >>> comfortable.
> >>>
> >>>
> >>>
> >>> Some IRC conversations indicated that early benchmark results with GMR
> >>> completely disabled weren't showing really significant swings, so
> maybe this
> >>> relatively small reduction in power of GMR won't be too problematic for
> >>> folks. Either way, I'm open to the different approaches. It's D11214
> that I
> >>> care a lot about. =]
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Thanks for all the thoughts here!
> >>>
> >>> -Chandler
> >>>
> >>>
> >>>
> >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com
> >
> >>> wrote:
> >>>
> >>> Replying here, but several of the questions raised boil down to
> "couldn't
> >>> you make the usage of GetUnderlyingObject conservatively correct?".
> I'll try
> >>> and address that.
> >>>
> >>>
> >>>
> >>> I think this *is* the right approach, but I think it is very hard to do
> >>> without effectively disabling this part of GlobalsModRef. That is, the
> easy
> >>> ways are likely to fire very frequently IMO.
> >>>
> >>>
> >>>
> >>> The core idea is to detect a "no information" state coming out of
> >>> GetUnderlyingObject (likely be providing a custom version just for
> >>> GlobalsModRef and tailored to its use cases). This is particularly
> effective
> >>> at avoiding the problems with the recursion limit. But let's look at
> what
> >>> cases we *wouldn't* return that. Here are the cases I see when I
> thought
> >>> about this last night with Hal, roughly in descending likelihood I
> would
> >>> guess:
> >>>
> >>>
> >>>
> >>> 1) We detect some global or an alloca. In that case, even BasicAA
> would be
> >>> sufficient to provide no-alias. GMR shouldn't be relevant.
> >>>
> >>>
> >>>
> >>> 2) We detect a phi, select, or inttoptr, and stop there.
> >>>
> >>>
> >>>
> >>> 3) We detect a load and stop there.
> >>>
> >>>
> >>>
> >>> 4) We detect a return from a function.
> >>>
> >>>
> >>>
> >>> 5) We detect an argument to the function.
> >>>
> >>>
> >>>
> >>> I strongly suspect the vast majority of queries hit #1. That's why
> BasicAA
> >>> is *so* effective. Both #4 and #5 I think are actually reasonable
> places for
> >>> GMR to potentially say "no-alias" and provide useful definitive
> information.
> >>> But I also suspect these are the least common.
> >>>
> >>>
> >>>
> >>> So let's look at #2 and #3 because I think they're interesting. For
> these,
> >>> I think it is extremely hard to return "no-alias". It seems extremely
> easy
> >>> for a reasonable and innocuous change to the IR to introduce a phi or a
> >>> select into one side of the GetUnderlyingObject but not the other. If
> that
> >>> ever happens, we can't return "no-alias" for #2, or we need to add
> really
> >>> expensive updates. It also seems reasonable (if not as likely) to want
> >>> adding a store and load to the IR to not trigger a miscompile. If it is
> >>> possible for a valid optimization pass to do reg2mem on an SSA value,
> then
> >>> that could happen to only one side of the paired GetUnderlyingObject
> and
> >>> break GMR with #3. If that seems like an unreasonable thing to do,
> consider
> >>> loop re-rolling or other transformations which may need to take things
> in
> >>> SSA form at move them out of SSA form. Even if we then try immediately
> to
> >>> put it back *into* SSA form, before we do that we create a point where
> GMR
> >>> cannot correctly return no-alias.
> >>>
> >>>
> >>>
> >>> So ultimately, I don't think we want to rely on GMR returning
> "no-alias"
> >>> for either #2 or #3 because of the challenge of actually updating it
> in all
> >>> of the circumstances that could break them. That means that *only* #4
> and #5
> >>> are going to return "no-alias" usefully. And even then, function
> inlining
> >>> and function outlining both break #4 and #5, so you have to preclude
> those
> >>> transforms while GMR is active. And I have serious doubts about these
> >>> providing enough value to be worth the cost.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> I think the better way to approach this is the other way around. Rather
> >>> than doing a backwards analysis to see if one location reaches and
> global
> >>> and the other location doesn't reach a global, I think it would be
> much more
> >>> effective to re-cast this as a forward analysis that determines all the
> >>> memory locations in a function that come from outside the function,
> and use
> >>> that to drive the no-alias responses.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <
> ghoflehner at apple.com>
> >>> wrote:
> >>>
> >>> I wouldn’t be willing to give up performance for hypothetical issues.
> >>> Please protect all your changes with options. For some of your
> concerns it
> >>> is probably hard to provide a test case that shows an/the actual issue.
> >>>
> >>>
> >>>
> >>> I certainly agree that it will be very hard to provide a test case and
> >>> extremely rare to see this in the wild for most of these issues. As
> long as
> >>> I can remove the problematic update API we currently have (which as
> its an
> >>> API change can't really be put behind flags), I'm happy to have flags
> >>> control whether or not GMR uses the unsound / stale information to try
> to
> >>> answer alias queries. Do you have any opinion about what the default
> value
> >>> of the flags should be?
> >>>
> >>>
> >>>
> >>> I'll go ahead and prepare the patches, as it seems like we're all
> ending
> >>> up in the same position, and just wondering about the precise
> tradeoffs we
> >>> want to settle on.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>>
> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>>
> >>> ______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0520ab1e/attachment.html>
More information about the llvm-dev
mailing list