[llvm-commits] [PATCH] Make dependency DAG construction to use AA

Thu May 10 13:01:02 PDT 2012

Andy, Hal, 

  Here is an updated patch. I hope I got all the comments. It does include
dep latency set now as well.
If I missed something - please let me know.
I also enabled the test for march=arm for now (they actually call that
code), but placed the test under CodeGen/Hexagon. 
My intent is to change that to march=hexagon once Hexagon switches to MI
scheduling.

  Thanks.

Sergei

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Sergei Larin
> Sent: Wednesday, May 09, 2012 12:38 PM
> To: 'Andrew Trick'
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Make dependency DAG construction to
> use AA
> 
> This is a great example of community at work. With the rest of the
> thread, I see that the most concerns are expressed about AA invocation,
> and not about actual dependency filtering (besides the worst case
> scenario complexity concern).
> 
> Below is my quick reply to Andy's original review.
> 
> Sergei
> 
> 
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> 
> 
> > -----Original Message-----
> > From: Andrew Trick [mailto:atrick at apple.com]
> > Sent: Tuesday, May 08, 2012 5:05 PM
> > To: Sergei Larin
> > Cc: 'Hal Finkel'; 'Evan Cheng'; llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH] Make dependency DAG construction
> > to use AA
> >
> > Hi Sergei, I finally put together a real review for this... plenty of
> > comments follow, but overall it looks pretty good.
> >
> > On Apr 30, 2012, at 11:34 AM, Sergei Larin <slarin at codeaurora.org>
> > wrote:
> > >   Presented is a non-heroic optimization intended to introduce
> alias
> > > analysis usage into dependency graph construction used for MI
> > > scheduling
> > > (pre- post- RA, bundling etc.).
> > > Currently any memory objects are conservatively considered
> dependent
> > > on each other (aliasing) and as such are serialized via chain
> > dependencies.
> > >  Attached patch is an attempt to introduce alias analysis into the
> > > process to eliminate chain dependencies known to be false.
> >
> > +static inline bool isGlobalMemoryObject(AliasAnalysis *AA,
> > MachineInstr
> > +*MI) {
> > +  if (MI->isCall() || MI->hasUnmodeledSideEffects() ||
> > +      (MI->hasVolatileMemoryRef() &&
> > +       (!MI->mayLoad() || !MI->isInvariantLoad(AA))))
> >
> > I don't understand why !MI->mayLoad() makes a volatile reference safe
> > to move, when MI->mayStore() could still be true. But you're correct
> > to leave the existing logic in tact. Maybe someone else cares to
> comment.
> 
> [Larin, Sergei] Indeed, this is the original logic I have not touched,
> and it is currently used to guard insertion of ch edges. I separated it
> to a function to avoid code duplication. Now for the logic itself - the
> way I interpret it, this is an early check whether a memory object
> should not be reasoned about. At the same time, this is _not_ the only
> check, and (I am
> guessing) it allows some volatile object to go to the next level in
> hope some could be proven independent. I wonder if the original intent
> had anything to do with this passage
> (http://llvm.org/docs/LangRef.html#volatile):
> 
> "... The optimizers may change the order of volatile operations
> relative to non-volatile operations. This is not Java's "volatile" and
> has no cross-thread synchronization behavior."
> 
>   The next check for volatility is at the
> getUnderlyingObjectForInstr():
> 
>   if (!MI->hasOneMemOperand() ||
>       !(*MI->memoperands_begin())->getValue() ||
>       (*MI->memoperands_begin())->isVolatile())
>     return 0;
> 
>   This will actually cause conservative treatment to _all_ volatile
> objects, but not before some of them were allowed to be reasoned about
> elsewhere.
> 
> As action item - I intent to keep this logic intact for now unless
> someone has strong opinion about it.
> 
> 
> >
> > +  if (!getUnderlyingObjectForInstr(MIa, MFI, MayAlias) ||
> > +      !getUnderlyingObjectForInstr(MIb, MFI, MayAlias))
> > +    return true;
> >
> > I'm not sure why you're doing this. What are the actual preconditions
> > that we need before calling AliasAnalysis?
> 
> 
> [Larin, Sergei] I was trying to be as unadventurous as possible here,
> and yet again err on the side of conservatism. Previously here was a
> complex set of checks for volatility, presence of just one memory
> objects (I convinced myself to start with the simplest case of
> hasOneMemOperand() to facilitate review and comprehension of the patch,
> and extend if needed later. Also my back end never produces multiple
> mem operands, so I do not yet feel comfortable to proceed without
> testing. I'll put a FIXME here). There were also more checks for
> analyzability of the memory operand.
> Anyway, once I compared all these checks to the functionality of
> getUnderlyingObjectForInstr(), it was 99% match, so I reduced code
> duplication by reusing existing logic.
> 
> If you feel like this is unjustified here, I can reinstantiate original
> checks.
> 
> 
> >
> > +  if (MIa->mayLoad() && MIb->mayLoad())
> >
> > As Hal mentioned, wouldn't it be safer to check !mayStore.
> 
> [Larin, Sergei] Totally agree. Will change in the next patch.
> 
> >
> > +  MachineMemOperand *MMOa = *MIa->memoperands_begin();
> >
> > Please either check MI->hasOneMemOperand() or better yet, loop
> through
> > all memory operands. I know getUnderlyingObjectForInstr currently
> > covers this but its bad to have such implicit assumptions.
> 
> 
> [Larin, Sergei] As I eluded above, I already have a check for
> hasOneMemOperand() via getUnderlyingObjectForInstr(), and will extend
> this in the (near?) future. I just need to get a test case for it
> somehow. I will also add comments to explicitly state this assumption.
> 
> >
> > +  int64_t MinOffset = std::min(MMOa->getOffset(), MMOb-
> >getOffset());
> > + int64_t Overlapa = MMOa->getSize() + MMOa->getOffset() - MinOffset;
> > + int64_t Overlapb = MMOb->getSize() + MMOb->getOffset() - MinOffset;
> >
> > It took me a while to figure out what you're doing and why. We can
> > hide this behind an API eventually (I can do it post checkin), but
> for
> > now can you add comments with any cases you could think of? Please
> > also comment that it's taken from DAGCombiner. Apparently DAGCombiner
> > thinks it's a good idea to normalize offsets relative to each other
> by
> > subtracting MinOffset. I don't understand why this is a good idea and
> > can't prove to myself that it is safe. For example, what if we have
> an
> > IR-level offset from the same base pointer smaller than the
> > MachineOperand offset? We would seem to miss the overlap. Maybe Dan
> > can comment on this.
> 
> [Larin, Sergei] This received the most comments so far, but
> interestingly enough, it looks like it is still rather acceptable
> solution, so besides commenting extensively on that, I am going to keep
> the core logic the same.
> FIXME is due for API changes and for preliminary (local) checks.
> 
> >
> > Hal also mentioned negative offsets, but it looks to me like that
> > would work assuming positive offsets work!
> >
> 
> [Larin, Sergei] Again, according to Dan it should be totally fine...
> 
> > Before calling IR-level AliasAnalysis, we have an opportunity to
> > detect that memory access at different MachineMemOperand offsets with
> > non- overlapping offset+size do not alias. DAGCombiner::isAlias does
> this.
> > If you don't want to implement that, please add a FIXME.
> 
> [Larin, Sergei] Agree. Will add FIXME and will look at this logic more
> in the near future. I have strong suspicion it will only save some
> compile time, but unlikely to catch any "missing" cases, but let me try
> it first.
> 
> 
> >
> > +  // Remember visited nodes.
> > +  for (SmallPtrSet<const SUnit*, 16>::const_iterator I =
> > Visited.begin(),
> > +       E = Visited.end(); I != E; ++I)
> > +    if (SUb == *I)
> > +      return *Depth;
> > +  Visited.insert(SUb);
> >
> > Please remove the for loop. Just check the return value of
> > Visisted.insert().
> >
> > + /// This function assumes that "downward" from SUb there exist
> >
> > Stale comment, as Hal said.
> 
> [Larin, Sergei] Done.
> 
> >
> > +          addChainDependency(AA, MFI, SU, I->second,
> RejectMemNodes);
> >
> > There are a few cases where we used to create a chain dependency with
> > STORE_LOAD_LATENCY, where now you create a zero latency edge.
> >
> > A couple style/comments issues that I could easily fix myself post-
> > checkin...
> >
> > Why does the unit test need to be disabled?
> 
> [Larin, Sergei] When I submitted the patch the most of Hexagon code was
> reverted, but I think now it is safe to reenable. I'll turn it on.
> 
> >
> > +    cl::desc("Enable use of AA during MI GAD construction"));
> >                                           ^^^
> >
> > +  AliasAnalysis::AliasResult AAResult = AA->alias(
> > + AliasAnalysis::Location(MMOa->getValue(), Overlapa,
> > +                          MMOa->getTBAAInfo()),
> > + AliasAnalysis::Location(MMOb->getValue(), Overlapb,
> > +                          MMOb->getTBAAInfo()));
> >
> > Needs a bit of indentation.
> [Larin, Sergei] Sure.
> >
> > > -	Theoretically, the whole task of chain edge detection could
> > become
> > > quadratic to the number of memory objects in a basic block with
> this
> > > patch, but in practice it never does. It is further kept in check
> by
> > > max depth search budget that prevents even theoretical possibility
> > > of any corner cases.
> >
> > As we've already discussed, I'm concerned about the worst case cost
> of
> > this approach. But to be fair, building the memory dependence DAG was
> > already quadratic, so this isn't terrible. Since you haven't found
> any
> > issues in your test suite, I think it's fine to go in under a flag.
> > When we decide to make it default, we'll have to consider whether we
> > can really handle the worst situations (aggressively unrolled loops)
> > without blowing up. An alternative approach would be to precompute
> > alias sets and have the scheduler maintain a chain for each set, but
> > that's conceptually more complicated.
> 
> [Larin, Sergei] Totally agree, and it should be done. What is the best
> way to ensure that it will not be forgotten?
> 
> >
> > >  During preliminary review it has been suggested that worklist
> could
> > > perform better, but I was trying to do all processing in place and
> > > with minimal reiteration, so I will leave this point up to
> > > collective reviewer opinion.
> >
> > +  // Iterate over chain dependencies only.
> > +  for (SUnit::const_succ_iterator I = SUb->Succs.begin(), E = SUb-
> > >Succs.end();
> > +       I != E; ++I)
> > +    if (I->isCtrl())
> > +      iterateChainSucc (AA, MFI, SUa, I->getSUnit(), ExitSU, Depth,
> > + Visited);
> >
> > We're currently trying to remove recursive DAG walking code to ensure
> > that LLVM does not trigger stack overflow. You've added the Depth
> > check, which should be sufficient. But for the record, a pre-order
> DAG
> > walk like this is trivial to implement with an iterative worklist.
> 
> 
> [Larin, Sergei] I thought a lot about it, and it still escapes me on
> how to actually implement it. I am sure it is entirely my failing, but
> the way I see this problem might be different. The issue is that the
> topology walk is _different_ every time you do it, depending on what
> you are currently comparing with, so precalculating a set becomes a
> superset construction with a limited degree of reuse... Recursive
> descend seems to be "in place" and optimal without any revisits... and
> worklist itself does not (again in my limited understanding) solves
> worst case scenario - you can limit number of nodes to check on this
> list, but what makes latency to explode is the _path_ through them, not
> there number, so a depth check will still be needed...
> Once again, I am sure I miss something important here, and would love
> to discuss it more.
> 
> >
> > >  The feature is currently under a general flag, off by default. It
> > > is up to individual back-end maintainers whether it is desirable
> > > feature
> > for them.
> > > One side effect of this patch is a massive release of ILP in some
> > > cases, which on occasion could clogs current schedulers, and can
> > > increase reg pressure on some targets. My quick test of test-suite
> > > on
> > > x86 showed fractional _decrease_ of average compile time (0.5%).
> > There
> > > is also a spectrum of performance differences. Different targets
> > might
> > > see different results.
> > >
> > >  Test cases for instruction scheduling are notoriously tricky, so I
> > > took a somewhat direct shortcut. I am not sure how it would be
> > > accepted by the community, but as I see it, this is the most
> > > reliable way to verify that check actually takes place. The test is
> > > written
> > for
> > > the Hexagon back end, and since Hexagon is sort of in flux right
> > > now, I disable it for the time being. Obviously regular check-all
> > > passes (with and without optimization
> > > enabled) on X86 and Hexagon.
> >
> > It would be good to have at least one unit test that is enabled for
> > most developers running make check on their builds. How about a
> simple
> > example compiled for X86? You just need a few memory access that
> > cannot be reordered, and CHECK that they aren't reordered. It's not a
> > great test, but better than nothing.
> 
> [Larin, Sergei] I will think about it. Originally I did not feel
> authorized to suggest tests outside my back end domain. But I think it
> would be easier for me to suggest an ARM based test - but let me try
> first.
> 
> >
> > -Andy
> 
> [Larin, Sergei] Thank you all for the review. It will take me a little
> time to revise and test the new patch, so please stay tuned.
> 
> >
> > >  As always, comments are very welcome, and thank you for your time
> > and
> > > effort.
> > >
> > > Sergei Larin
> > >
> > > --
> > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> > >
> > > <patch_aa_in_dag.patch>
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_aa_in_dag_v2.patch
Type: application/octet-stream
Size: 16747 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120510/70e7cb83/attachment.obj>