[llvm-commits] [PATCH] Make dependency DAG construction to use AA

Tue May 8 19:12:48 PDT 2012

On Tue, 08 May 2012 15:04:56 -0700
Andrew Trick <atrick at apple.com> wrote:

> Hi Sergei, I finally put together a real review for this... plenty of
> comments follow, but overall it looks pretty good.
> 
> On Apr 30, 2012, at 11:34 AM, Sergei Larin <slarin at codeaurora.org>
> wrote:
> >   Presented is a non-heroic optimization intended to introduce alias
> > analysis usage into dependency graph construction used for MI
> > scheduling (pre- post- RA, bundling etc.).
> > Currently any memory objects are conservatively considered
> > dependent on each other (aliasing) and as such are serialized via
> > chain dependencies. Attached patch is an attempt to introduce alias
> > analysis into the process to eliminate chain dependencies known to
> > be false. 
> 
> +static inline bool isGlobalMemoryObject(AliasAnalysis *AA,
> MachineInstr *MI) {
> +  if (MI->isCall() || MI->hasUnmodeledSideEffects() ||
> +      (MI->hasVolatileMemoryRef() &&
> +       (!MI->mayLoad() || !MI->isInvariantLoad(AA))))
> 
> I don't understand why !MI->mayLoad() makes a volatile reference safe
> to move, when MI->mayStore() could still be true. But you're correct
> to leave the existing logic in tact. Maybe someone else cares to
> comment.
> 
> +  if (!getUnderlyingObjectForInstr(MIa, MFI, MayAlias) ||
> +      !getUnderlyingObjectForInstr(MIb, MFI, MayAlias))
> +    return true;
> 
> I'm not sure why you're doing this. What are the actual preconditions
> that we need before calling AliasAnalysis?
> 
> +  if (MIa->mayLoad() && MIb->mayLoad())
> 
> As Hal mentioned, wouldn't it be safer to check !mayStore.
> 
> +  MachineMemOperand *MMOa = *MIa->memoperands_begin();
> 
> Please either check MI->hasOneMemOperand() or better yet, loop
> through all memory operands. I know getUnderlyingObjectForInstr
> currently covers this but its bad to have such implicit assumptions.
> 
> +  int64_t MinOffset = std::min(MMOa->getOffset(), MMOb->getOffset());
> +  int64_t Overlapa = MMOa->getSize() + MMOa->getOffset() - MinOffset;
> +  int64_t Overlapb = MMOb->getSize() + MMOb->getOffset() - MinOffset;
> 
> It took me a while to figure out what you're doing and why. We can
> hide this behind an API eventually (I can do it post checkin), but
> for now can you add comments with any cases you could think of?
> Please also comment that it's taken from DAGCombiner. Apparently
> DAGCombiner thinks it's a good idea to normalize offsets relative to
> each other by subtracting MinOffset. I don't understand why this is a
> good idea and can't prove to myself that it is safe. For example,
> what if we have an IR-level offset from the same base pointer smaller
> than the MachineOperand offset? We would seem to miss the overlap.
> Maybe Dan can comment on this.
> 
> Hal also mentioned negative offsets, but it looks to me like that
> would work assuming positive offsets work!

I was actually concerned about one positive offset and one negative.
Based on your second e-mail, I'll assume that this is fine too.

For example:
A: p + 2 (size = 4)
B: p - 8 (size = 2)
Here, MinOffset will be -6, and so Overlapa will be 4 + 2 - 8 = -2.
That does not seem right, I thought that the size in AA::Location would
need to be positive.

 -Hal

> 
> Before calling IR-level AliasAnalysis, we have an opportunity to
> detect that memory access at different MachineMemOperand offsets with
> non-overlapping offset+size do not alias. DAGCombiner::isAlias does
> this. If you don't want to implement that, please add a FIXME.
> 
> +  // Remember visited nodes.
> +  for (SmallPtrSet<const SUnit*, 16>::const_iterator I =
> Visited.begin(),
> +       E = Visited.end(); I != E; ++I)
> +    if (SUb == *I)
> +      return *Depth;
> +  Visited.insert(SUb);
> 
> Please remove the for loop. Just check the return value of
> Visisted.insert().
> 
> + /// This function assumes that "downward" from SUb there exist
> 
> Stale comment, as Hal said.
> 
> +          addChainDependency(AA, MFI, SU, I->second, RejectMemNodes);
> 
> There are a few cases where we used to create a chain dependency with
> STORE_LOAD_LATENCY, where now you create a zero latency edge.
> 
> A couple style/comments issues that I could easily fix myself
> post-checkin...
> 
> Why does the unit test need to be disabled?
> 
> +    cl::desc("Enable use of AA during MI GAD construction"));
>                                           ^^^
> 
> +  AliasAnalysis::AliasResult AAResult = AA->alias(
> +  AliasAnalysis::Location(MMOa->getValue(), Overlapa,
> +                          MMOa->getTBAAInfo()),
> +  AliasAnalysis::Location(MMOb->getValue(), Overlapb,
> +                          MMOb->getTBAAInfo()));
> 
> Needs a bit of indentation.
> 
> > -	Theoretically, the whole task of chain edge detection
> > could become quadratic to the number of memory objects in a basic
> > block with this patch, but in practice it never does. It is further
> > kept in check by max depth search budget that prevents even
> > theoretical possibility of any corner cases.
> 
> As we've already discussed, I'm concerned about the worst case cost
> of this approach. But to be fair, building the memory dependence DAG
> was already quadratic, so this isn't terrible. Since you haven't
> found any issues in your test suite, I think it's fine to go in under
> a flag. When we decide to make it default, we'll have to consider
> whether we can really handle the worst situations (aggressively
> unrolled loops) without blowing up. An alternative approach would be
> to precompute alias sets and have the scheduler maintain a chain for
> each set, but that's conceptually more complicated.
> 
> >  During preliminary review it has been suggested that worklist could
> > perform better, but I was trying to do all processing in place and
> > with minimal reiteration, so I will leave this point up to
> > collective reviewer opinion.
> 
> +  // Iterate over chain dependencies only.
> +  for (SUnit::const_succ_iterator I = SUb->Succs.begin(), E =
> SUb->Succs.end();
> +       I != E; ++I)
> +    if (I->isCtrl())
> +      iterateChainSucc (AA, MFI, SUa, I->getSUnit(), ExitSU, Depth,
> Visited);
> 
> We're currently trying to remove recursive DAG walking code to ensure
> that LLVM does not trigger stack overflow. You've added the Depth
> check, which should be sufficient. But for the record, a pre-order
> DAG walk like this is trivial to implement with an iterative worklist.
> 
> >  The feature is currently under a general flag, off by default. It
> > is up to individual back-end maintainers whether it is desirable
> > feature for them. One side effect of this patch is a massive
> > release of ILP in some cases, which on occasion could clogs current
> > schedulers, and can increase reg pressure on some targets. My quick
> > test of test-suite on x86 showed fractional _decrease_ of average
> > compile time (0.5%). There is also a spectrum of performance
> > differences. Different targets might see different results.
> > 
> >  Test cases for instruction scheduling are notoriously tricky, so I
> > took a somewhat direct shortcut. I am not sure how it would be
> > accepted by the community, but as I see it, this is the most
> > reliable way to verify that check actually takes place. The test is
> > written for the Hexagon back end, and since Hexagon is sort of in
> > flux right now, I disable it for the time being. Obviously regular
> > check-all passes (with and without optimization enabled) on X86 and
> > Hexagon.
> 
> It would be good to have at least one unit test that is enabled for
> most developers running make check on their builds. How about a
> simple example compiled for X86? You just need a few memory access
> that cannot be reordered, and CHECK that they aren't reordered. It's
> not a great test, but better than nothing.
> 
> -Andy
> 
> >  As always, comments are very welcome, and thank you for your time
> > and effort.
> > 
> > Sergei Larin
> > 
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> > 
> > <patch_aa_in_dag.patch>
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory