[llvm-dev] LoopAccessAnalysis cache problem with new PM

Björn Pettersson A via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 2 15:29:23 PDT 2021


Hi!

Spent lots of time today trying to debug a problem where I end up with sporadic crashes, asserts, etc.
It happens occasionally if I run the same opt test case over and over again, and I see the failures when LoopDistribution is running.

In the IR there are 2 functions, "f" and "g". Function "f" has several loops, and function "g" has one loop.

My command line is basically "opt -O3 -enable-loop-distribute -debug-pass-manager=verbose -debug-only=loop-distribute,loop-accesses", but it seems to fail more often (or more seldom) if adding some other parameters as well.

I've added some debug printouts in LoopDistributeForLoop::processLoop to print the Loop* L variable, and in LoopBase c'tor/d'tor, as that seemed to be relevant.


What I've found out is that when it goes bad the output is like this:

......................
<cut>
Running pass: LoopSimplifyPass on f
Running analysis: LoopAnalysis on f
*ADDED-DEBUG* Loop is created: 0x891cc60

<cut>
Running pass: LoopDistributePass on f

LDist: In "f" checking Loop at depth 3 containing: %for.body10.us.us<header><latch><exiting>
*ADDED-DEBUG* L used when requesting LAA: 0x891cd08
Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: %for.body10.us.us<header><latch><exiting>

LDist: In "f" checking Loop at depth 3 containing: %for.body89<header><latch><exiting>
*ADDED-DEBUG* L used when requesting LAA: 0x891cc60
Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: %for.body89<header><latch><exiting>

<cut>
Running pass: SimpleLoopUnswitchPass on Loop at depth 2 containing: %for.cond1.preheader<header><exiting>,%for.body10.us.us,%cleanup.thread.split.us.us,%for.body89,%crit_edge,%for.inc100<latch><exiting>,%for.body89.preheader,%for.body10.us.us.preheader
*ADDED-DEBUG* Loop is deleted: 0x891cc60

<cut>
Running pass: LoopSimplifyPass on g
Running analysis: LoopAnalysis on g
*ADDED-DEBUG* Loop is created: 0x891cc60

<cut>
Running pass: LoopDistributePass on g

LDist: In "g" checking Loop at depth 1 containing: %for.cond<header><latch><exiting>
*ADDED-DEBUG* L used when requesting LAA: 0x891cc60

<cut>
......................

And then the pass manager will return the LoopAccessInfo cached for the Loop in function "f".
(Notice that the Loop object happened to reuse the same address for the Loop object used earlier.)

I don't know if the using object pointers as key to the cached analysis like this is safe in general.
But I guess the problem here is that the cache should have been cleared earlier (when the original Loop was deleted), right?

Trying to track when LoopBase objects are created and deleted kind of points at that the first Loop object at 0x891cc60 is deleted when doing loop unswitch on a loop at lower depth in function "f". But I'm not sure exactly how the LoopAnalysisManagerFunctionProxy holding the cached LoopAccessAnalysis based on that pointer is supposed to be informed about that. But I suspect that maybe SimpleLoopUnswitchPass is to blame here? (or something in the pass manager framework)


I guess I'll need to write a PR for this, but kind of messy since getting the crashes is kind of sporadic. So I might need to find a better way to detect that there is a stale analysis (but a few hours ago I did not know much about how these analyses are cached in the new PM at all, so I haven't really known what to look for).

If anyone got some ideas how to debug this further, or what to look for, I'm interested to get some guidance how to continue debugging this (or I'll just have to keep banging my head on the wall trying to understand analysis caches and proxies in new PM tomorrow).

Regards,
Björn





More information about the llvm-dev mailing list