[PATCH] [CaptureTracking] Avoid long compilation time on large basic blocks

Tue May 5 14:17:44 PDT 2015

----- Original Message -----
> From: "Bruno Cardoso Lopes" <bruno.cardoso at gmail.com>
> To: reviews+D7010+public+bde9f49f8dcc4a63 at reviews.llvm.org
> Cc: nicholas at mxc.ca, "Hal Finkel" <hfinkel at anl.gov>, "Bob Wilson" <bob.wilson at apple.com>, "Rafael Ávila de Espíndola"
> <rafael.espindola at gmail.com>, "Philip Reames" <listmail at philipreames.com>, "Commit Messages and Patches for LLVM"
> <llvm-commits at cs.uiuc.edu>
> Sent: Tuesday, May 5, 2015 1:57:10 PM
> Subject: Re: [PATCH] [CaptureTracking] Avoid long compilation time on large basic blocks
> 
> Hi,
> 
> Sorry for the delay folks. Resuming this thread with more information
> and patches...
> 
> > I don't disagree that it might be difficult to re-use the cache
> > inbetween calls to PointerMayBeCaptured (especially as the IR is
> > being mutated -- we'd need to think about how this was laid out
> > and what needed to actively invalidate it), but I thought the
> > expensive part was really the repeated BB scans *within*
> > PointerMayBeCaptured. That could be solved by having a local
> > cache, right?
> 
> Currently, PointerMayBeCaptured scans the BB the number of times
> equal
> to the number of uses of 'BeforeHere', which is currently capped at
> 20
> and bails out with Tracker->tooManyUses(). The main issue here is the
> number of calls *to* PointerMayBeCaptured times the basic block scan.
> In the testcase with 82k instructions, PointerMayBeCaptured is called
> 130k times, leading to 'shouldExplore' taking 527k runs, this
> currently takes ~12min.
> 
> I tried the approach where I locally (within PointerMayBeCaptured )
> number the instructions in the basic block using a DenseMap to cache
> instruction positions/numbers. I experimented with two approaches:
> 
> (1) Build the local cache once in the beginning and consult the cache
> to gather position of instructions => Takes ~4min.
> (2) Build the cache incrementally every time we need to scan an
> unexplored part of the BB => Takes ~2min.
> 
> I've attached the implementation for (2) in case there's any
> interest.
> 
> Considering the reduction from 12min to ~2min, this is a good gain,
> but still looks to me like a long time to have a user waiting for the
> compiler to finish, specially under -O1/-O2. I still believe the
> limited scan approach is a better fix.

Well, it's faster anyway, but also less precise.

> 
> Let me know what you think.

I think that this is really nice.

+bool isPotentiallyReachableInner(SmallVectorImpl<BasicBlock *> &Worklist,

I'd prefer to rename this function if it will now be externally accessible: 'Inner' is not particularly descriptive.

+      // Start the search with the instruction found in the last lookup round.

How do you know that subsequent calls to find specify instructions that always come later in the BB than those from previous calls?

+    /// It also does not consider the scenario in 'isPotentiallyReachable'
+    /// where 'A' comes after 'B' but may dominate through 'BB' successors.

What does this mean exactly? Are you saying that the caller isPotentiallyReachable handles this case itself?

+        if (isa<InvokeInst>(BeforeHere) || isa<PHINode>(I) || I == BeforeHere)
+          return false;

Why do we return false when I == BeforeHere?

Thanks again,
Hal

> 
> Cheers,
> 
> 
> 
> --
> Bruno Cardoso Lopes
> http://www.brunocardoso.cc
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory