[llvm-dev] LoopAccessAnalysis cache problem with new PM

Björn Pettersson A via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 3 13:23:12 PDT 2021


An attempt to solve the problem I was discussing here:  https://reviews.llvm.org/D109257

Kind of nasty bug (at least to debug). Maybe this should be merged to LLVM 13 if the patch is ok?

/Björn

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Björn Pettersson A via llvm-dev
Sent: den 3 september 2021 01:07
To: Arthur Eubanks <aeubanks at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] LoopAccessAnalysis cache problem with new PM

A quick look at SimpleLoopUnswitch tells me that it is doing some markLoopAsDeleted calls for the base loop.
But when unswitching it may clone/destroy other sub loops (?) as well (there are some calls to LI.destroy(…) but I can’t really see that there is any LPMUpdater calls related to those).

I’ll see if I can reduce and find a better reproducer tomorrow (given the knowledge that I probably should be able to run LoopDistribute -> SimpleLoopUnswitch  and then somehow detect that there should is a stale analysis it might be a lot easier compared to my earlier attempts).

/Björn

From: Arthur Eubanks <aeubanks at google.com<mailto:aeubanks at google.com>>
Sent: den 3 september 2021 00:49
To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com<mailto:bjorn.a.pettersson at ericsson.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] LoopAccessAnalysis cache problem with new PM

A bug and a repro would be good, even if it's not consistent. I can take a look, having worked with new PM caches/proxies a bit.
llvm-reduce'ing the IR with a test that runs the pipeline multiple times would be nice.

Perhaps wherever the loop is being deleted, it's not calling LPMUpdater::markLoopAsDeleted()? LPMUpdater::markLoopAsDeleted() deletes analysis cache entries for the deleted loop.

On Thu, Sep 2, 2021 at 3:29 PM Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi!

Spent lots of time today trying to debug a problem where I end up with sporadic crashes, asserts, etc.
It happens occasionally if I run the same opt test case over and over again, and I see the failures when LoopDistribution is running.

In the IR there are 2 functions, "f" and "g". Function "f" has several loops, and function "g" has one loop.

My command line is basically "opt -O3 -enable-loop-distribute -debug-pass-manager=verbose -debug-only=loop-distribute,loop-accesses", but it seems to fail more often (or more seldom) if adding some other parameters as well.

I've added some debug printouts in LoopDistributeForLoop::processLoop to print the Loop* L variable, and in LoopBase c'tor/d'tor, as that seemed to be relevant.


What I've found out is that when it goes bad the output is like this:

......................
<cut>
Running pass: LoopSimplifyPass on f
Running analysis: LoopAnalysis on f
*ADDED-DEBUG* Loop is created: 0x891cc60

<cut>
Running pass: LoopDistributePass on f

LDist: In "f" checking Loop at depth 3 containing: %for.body10.us.us<https://protect2.fireeye.com/v1/url?k=ca8e8b78-9515b23b-ca8ecbe3-861fcb972bfc-5cb88c39c1f21069&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F><header><latch><exiting>
*ADDED-DEBUG* L used when requesting LAA: 0x891cd08
Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: %for.body10.us.us<https://protect2.fireeye.com/v1/url?k=f1705b7b-aeeb6238-f1701be0-861fcb972bfc-72aa0c7d47dedbae&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F><header><latch><exiting>

LDist: In "f" checking Loop at depth 3 containing: %for.body89<header><latch><exiting>
*ADDED-DEBUG* L used when requesting LAA: 0x891cc60
Running analysis: LoopAccessAnalysis on Loop at depth 3 containing: %for.body89<header><latch><exiting>

<cut>
Running pass: SimpleLoopUnswitchPass on Loop at depth 2 containing: %for.cond1.preheader<header><exiting>,%for.body10.us.us<https://protect2.fireeye.com/v1/url?k=7b502b6b-24cb1228-7b506bf0-861fcb972bfc-2e4bd48f8f2ba4d5&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F>,%cleanup.thread.split.us.us<https://protect2.fireeye.com/v1/url?k=a35872cd-fcc34b8e-a3583256-861fcb972bfc-cce5dce10235ee85&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Fcleanup.thread.split.us.us%2F>,%for.body89,%crit_edge,%for.inc100<latch><exiting>,%for.body89.preheader,%for.body10.us.us<https://protect2.fireeye.com/v1/url?k=f603136b-a9982a28-f60353f0-861fcb972bfc-8e3c555b18dabc09&q=1&e=4b838500-8bfc-4bc3-acae-134737c08377&u=http%3A%2F%2Ffor.body10.us.us%2F>.preheader
*ADDED-DEBUG* Loop is deleted: 0x891cc60

<cut>
Running pass: LoopSimplifyPass on g
Running analysis: LoopAnalysis on g
*ADDED-DEBUG* Loop is created: 0x891cc60

<cut>
Running pass: LoopDistributePass on g

LDist: In "g" checking Loop at depth 1 containing: %for.cond<header><latch><exiting>
*ADDED-DEBUG* L used when requesting LAA: 0x891cc60

<cut>
......................

And then the pass manager will return the LoopAccessInfo cached for the Loop in function "f".
(Notice that the Loop object happened to reuse the same address for the Loop object used earlier.)

I don't know if the using object pointers as key to the cached analysis like this is safe in general.
But I guess the problem here is that the cache should have been cleared earlier (when the original Loop was deleted), right?

Trying to track when LoopBase objects are created and deleted kind of points at that the first Loop object at 0x891cc60 is deleted when doing loop unswitch on a loop at lower depth in function "f". But I'm not sure exactly how the LoopAnalysisManagerFunctionProxy holding the cached LoopAccessAnalysis based on that pointer is supposed to be informed about that. But I suspect that maybe SimpleLoopUnswitchPass is to blame here? (or something in the pass manager framework)


I guess I'll need to write a PR for this, but kind of messy since getting the crashes is kind of sporadic. So I might need to find a better way to detect that there is a stale analysis (but a few hours ago I did not know much about how these analyses are cached in the new PM at all, so I haven't really known what to look for).

If anyone got some ideas how to debug this further, or what to look for, I'm interested to get some guidance how to continue debugging this (or I'll just have to keep banging my head on the wall trying to understand analysis caches and proxies in new PM tomorrow).

Regards,
Björn



_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210903/ca461ab5/attachment.html>


More information about the llvm-dev mailing list