[llvm-commits] [PATCH] Make dependency DAG construction to use AA

Tue May 8 14:15:58 PDT 2012

Hal,

  Thanks for the early heads-up... Quick reply below...

Sergei

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Tuesday, May 08, 2012 3:16 PM
> To: Sergei Larin
> Cc: 'Andrew Trick'; 'Evan Cheng'; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Make dependency DAG construction to
> use AA
> 
> Sergei,
> 
> I've not yet tried the patch, but a few comments from looking though
> the code:
> 
> > +/// This returns true if the two MIs may alias each other.
> > +/// If these are not even memory operations, we still may need ///
> > +chain deps between them. The question really is - could /// these
> two
> > +MIs be reordered during scheduling from memory
> > dependency +/// point of view.
> > +static bool MIsNeedChainEdge(AliasAnalysis *AA, const
> > MachineFrameInfo *MFI,
> > +                             MachineInstr *MIa,
> > +                             MachineInstr *MIb) {
> > +
> > +  if (!MIa || !MIb || (MIa == MIb))
> > +    return false;
> 
> Based on the comment above, I'd think that MIa == MIb would return true
> (they "alias" each other). I can obviously understand why you might
> want to invert the degenerate case, but that should be specifically
> noted.

[Larin, Sergei] The logic of this function should go beyond just aliasing.
The question really is "do we need a chain edge between these MIs", and we
obviously do not want one for MIa == MIb... I will definitely update
comments to describe this intent.

> 
> > +  // If we are dealing with two "normal" loads, we do not need an
> > edge
> > +  // between them - they could be reordered.
> > +  if (MIa->mayLoad() && MIb->mayLoad())
> > +    return false;
> 
> As far as I know, an instruction can both load and store (PPC actually
> has several of these). Should this be?
> 
>   if (!MIa->mayStore() && !MIb->mayStore())
>     return false;

[Larin, Sergei] Good point. This is definitely cleaner that way.

> 
> > +  int64_t MinOffset = std::min(MMOa->getOffset(), MMOb-
> >getOffset());
> > + int64_t Overlapa = MMOa->getSize() + MMOa->getOffset() - MinOffset;
> > + int64_t Overlapb = MMOb->getSize() + MMOb->getOffset() - MinOffset;
> > +
> > +  AliasAnalysis::AliasResult AAResult = AA->alias(
> > + AliasAnalysis::Location(MMOa->getValue(), Overlapa,
> > +                          MMOa->getTBAAInfo()),
> > + AliasAnalysis::Location(MMOb->getValue(), Overlapb,
> > +                          MMOb->getTBAAInfo()));
> > +
> > +  return (AAResult != AliasAnalysis::NoAlias);
> 
> Can you please comment on this. Does it work if one of the offsets is
> negative? 

[Larin, Sergei] Excellent question. I guess I have not seen them negative,
but I guess I do not know if they never are. Interesting side effect of this
is that this is a "borrowed" approach from the DAG combiner
(DAGCombiner::isAlias). This would be an issue there as well, but this is a
lousy argument. I would definitely welcome a test case where this assumption
is not true, but even if someone could comment with certainty on this - it
would be great... I can easily adjust the logic.

>Are you assuming that the value pointers into any given
> memory region will be unique?

[Larin, Sergei] I guess I am not exactly sure what do you mean here... could
you please elaborate a bit?

> 
> FWIW, this is similar to a check done in the PPC 970 hazard recognizer
> in lib/Target/PowerPC/PPCHazardRecognizers.cpp:107. There, however, a
> false negative is not fatal ;)
> 
> > +/// This function assumes that "downward" from SUb there exist ///
> > +tail/leaf of already constructed DAG. It iterates downward and ///
> > +checks whether SUa can be aliasing any node dominated /// by SUb.
> > +static void adjustChainDeps(AliasAnalysis *AA, const
> > MachineFrameInfo *MFI,
> > +            SUnit *SUa, SUnit *ExitSU, std::set<SUnit *> &CheckList)
> > {
> 
> This function does not take an SUb, please update the comment.
> 
> This looks promising. Thanks again for working on this!

 [Larin, Sergei] My pleasure. Thank you for looking at this. I'll accumulate
more feedback and will post an updated patch, but if in the mean time you
can take a deeper look at the algorithmic side, I would highly appreciate
it. Thanks.

Sergei

> 
>  -Hal
> 
> On Mon, 7 May 2012 13:31:33 -0500
> "Sergei Larin" <slarin at codeaurora.org> wrote:
> 
> > Traditional ping :)
> > Thanks.
> >
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> >
> >
> > -----Original Message-----
> > From: llvm-commits-bounces at cs.uiuc.edu
> > [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Sergei Larin
> > Sent: Monday, April 30, 2012 1:35 PM
> > To: 'Andrew Trick'; 'Hal Finkel'; 'Evan Cheng'
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: [llvm-commits] [PATCH] Make dependency DAG construction to
> > use AA
> >
> > Hello everyone,
> >
> >    Presented is a non-heroic optimization intended to introduce alias
> > analysis usage into dependency graph construction used for MI
> > scheduling (pre- post- RA, bundling etc.).
> > Currently any memory objects are conservatively considered dependent
> > on each other (aliasing) and as such are serialized via chain
> > dependencies. Attached patch is an attempt to introduce alias
> analysis
> > into the process to eliminate chain dependencies known to be false.
> >
> >   Some design assumptions:
> > -	AA is treated as a "black box" to provide clean and
> > minimalistic interface.
> > -	Only likely cases are checked. Certain objects like calls or
> > volatiles are treated conservatively in the original algorithm, and
> > could not be reasoned about easily. Proposed algorithm extension
> > avoids checking against such objects. This saves compile time without
> > any (detectable) loss of performance.
> > -	Theoretically, the whole task of chain edge detection could
> > become quadratic to the number of memory objects in a basic block
> with
> > this patch, but in practice it never does. It is further kept in
> check
> > by max depth search budget that prevents even theoretical possibility
> > of any corner cases. During preliminary review it has been suggested
> > that worklist could perform better, but I was trying to do all
> > processing in place and with minimal reiteration, so I will leave
> this
> > point up to collective reviewer opinion.
> >
> >   The feature is currently under a general flag, off by default. It
> is
> > up to individual back-end maintainers whether it is desirable feature
> > for them. One side effect of this patch is a massive release of ILP
> in
> > some cases, which on occasion could clogs current schedulers, and can
> > increase reg pressure on some targets. My quick test of test-suite on
> > x86 showed fractional _decrease_ of average compile time (0.5%).
> There
> > is also a spectrum of performance differences. Different targets
> might
> > see different results.
> >
> >   Test cases for instruction scheduling are notoriously tricky, so I
> > took a somewhat direct shortcut. I am not sure how it would be
> > accepted by the community, but as I see it, this is the most reliable
> > way to verify that check actually takes place. The test is written
> for
> > the Hexagon back end, and since Hexagon is sort of in flux right now,
> > I disable it for the time being. Obviously regular check-all passes
> > (with and without optimization enabled) on X86 and Hexagon.
> >
> >   As always, comments are very welcome, and thank you for your time
> > and effort.
> >
> > Sergei Larin
> >
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> >
> 
> 
> 
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory