[PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().

Hal Finkel hfinkel at anl.gov
Tue Feb 10 09:06:08 PST 2015


[-llvmdev for patch review]

----- Original Message -----
> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom Stellard"
> <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>, "Patrik Hägglund H"
> <patrik.h.hagglund at ericsson.com>, "Sanjin Sijaric" <ssijaric at codeaurora.org>, "Andrew Trick (atrick at apple.com)"
> <atrick at apple.com>, llvm-commits at cs.uiuc.edu
> Sent: Tuesday, February 10, 2015 8:54:28 AM
> Subject: RE: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
> 
> Hi,
> 
> I have committed the patch now (svn id 228686).
> 
> Regarding the commenting you requested, I attach a patch. Feel free
> to make changes.
> 
> I found it difficult to explain what the code does in isolated
> places, and thus kept my
> commenting quite short.

+      // Call adjustChainDeps() regardless of MayAlias, since NonAliasMemUses
+      // lists also get cleared.
       adjustChainDeps(AA, MFI, SU, &ExitSU, RejectMemNodes,
                       TrueMemOrderLatency);

I'd like the comment to explain why this matters. Maybe we can say something like, "All entries associated with loads of base a object that were in AliasMemUses or NonAliasMemUses are cleared upon encountering the first store, with entries not aliasing with that store placed into RejectMemNodes. When we have multiple stores to the same base object, we need the subsequent stores to check RejectMemNodes for potentially aliasing loads to which we need edges." What do you think?

> This makes me feel like the code needs a bit
> of refactorization
> to make it more simple and understandable.

I think this is certainly true.

> 
> Looking at the possibility of refactorization / redesign, I wonder
> what are the main strong
> points of this implementation right now, in your opinion? The mapping
> to underlying objects
> looks nice to me, but I am not sure I understand why to go through
> the trouble of clearing
> lists and using a set of RejectMemNodes with adjustChainDeps(). It is
> very complex code, and
> I wonder what is gained in the end here. Does anyone have any
> thoughts about it?  Is it to handle
> huge regions with tons of memory accesses well?

It is supposed to do this.

It matters to me that we can make good use of AA to break unnecessary memory-induced dependency edges.

> 
> Regarding the other two patches:
> 
> The bugfix I just commited makes sure that adjustChainDeps() is
> called on an SU regardless of
> MayAlias. It occoured to me that RejectMemNodes contains a mix of SUs
> from both Alias and
> NonAlias sets, and that the separation of the two domains is lost. I
> became worried
> that some performance might therefore be lost because of my bugfix.

Ah, alright. Yes, this makes sense.

> 
> Since I was not sure how big this impact might be, I made a fix for
> it quickly just to illustrate
> what the problem is (one of the previously attached patches).
> Basically, when an SU's underlying
> objects are analyzed, the results are remembered by putting the SU
> into one or both of the (added)
> sets AliasMemUseDefMIs and NonAliasMemUseDefMIs. Later,
> MIsNeedChainEdge() can return
> false, if they were originally in different domains:
> 
>   // A NonAliasing node cannot alias an AliasingNode. This is the
>   case
>   // where MIa had only non-aliasing underlying objects and MIb had
>   // only aliasing underlying objects, or vice versa.
>   if ((AliasMemUseDefMIs.count(MIa) &&
>   !NonAliasMemUseDefMIs.count(MIa) &&
>        !AliasMemUseDefMIs.count(MIb) &&
>        NonAliasMemUseDefMIs.count(MIb)) ||
>       (!AliasMemUseDefMIs.count(MIa) &&
>       NonAliasMemUseDefMIs.count(MIa) &&
>        AliasMemUseDefMIs.count(MIb) &&
>        !NonAliasMemUseDefMIs.count(MIb)))
>     return false;
> 
> This should then be an improvement to MIsNeedChainEdge(), although I
> am not sure if there is
> a more elegant solution, which I invite you to suggest.

Something like this should work. We'd not use a std::set, but a SmallPtrSet, and they could not be global, we'd need to thread them though the function call stack. You'd need to clear the sets once we encounter a scheduling barrier (which is where we clear RejectMemNodes anyway -- which will also limit growth).

Another solution might be to have AliasRejectMemNodes and NonAliasRejectMemNodes instead of just one set (I'm not sure the associated complexity is worthwhile, however).

Thanks for working on this!

 -Hal

> 
> I also supplied a patch with a lot of ugly code that had the original
> purpose of finding a test case
> on an in-tree-target for the bug I had discovered. You could ignore
> that patch now, I think.
> ( An assert would trigger if there was - with the bugfix in place -
> an edge inserted in the NonAlias case,
> which would not have been done before the bugfix.)
> 
> I attach the patch for MIsNeedChainEdge() again,  and would
> appreciate some feedback.
> 
> /Jonas Paulsson
> 
> 
> 
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: den 7 februari 2015 02:07
> To: Jonas Paulsson
> Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei
> Larin; Patrik Hägglund H; Sanjin Sijaric; Andrew Trick
> (atrick at apple.com); llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH] Bugfix for missed dependency from store to load
> in buildSchedGraph().
> 
> ----- Original Message -----
> > From: "Jonas Paulsson" <jonas.paulsson at ericsson.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>, "Sanjin Sijaric"
> > <ssijaric at codeaurora.org>, "Andrew Trick (atrick at apple.com)"
> > <atrick at apple.com>, llvm-commits at cs.uiuc.edu
> > Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com>,
> > llvmdev at cs.uiuc.edu, "Tom Stellard"
> > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>,
> > "Patrik Hägglund H"
> > <patrik.h.hagglund at ericsson.com>
> > Sent: Friday, January 30, 2015 4:23:50 AM
> > Subject: [PATCH] Bugfix for missed dependency from store to load in
> > buildSchedGraph().
> > 
> > Hi,
> > 
> > I have revisited the issue in buildSchedGraph() I talked about
> > previously, and attached a few patches. The first tries to fix the
> > issue, and the other two try to illustrate associated issues,
> > emerged
> > from applying it.
> > 
> > Is it OK to commit the first patch?
> > 
> >     [PATCH] Bugfix for missed dependency from store to load in
> >     buildSchedGraph().
> > 
> >     Bugfix for missed dependency from store to load in
> >     buildSchedGraph().
> > 
> >     Background: When handling underlying objects for a store, the
> >     vector
> >     of previous mem uses, mapped to the same Value, is afterwards
> >     cleared
> >     (regardless of ThisMayAlias). This means that during handling
> >     of
> >     the
> >     next store using the same Value, adjustChainDeps() must be
> >     called,
> >     otherwise a dependency might be missed.
> 
> Yes, LGTM. However, I'd like an extensive comment added to that part
> of the code explaining what is going on. That function has far too
> little in the way of commentary (RejectMemNodes itself even has no
> description). It is crucial that we change this. As far as I'm
> concerned, a comment that is as long and explanatory as this patch
> description is perfectly appropriate. [I say this admitting to
> having touched this function myself].
> 
> Also, I'd really prefer you use reviews.llvm.org for these patches in
> the future, and upload them with full context
> (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface),
> it makes it much easier to see what is going on.
> 
> Regarding the other two patches, can you please explain what is going
> on? It is not obvious to me from the patches.
> 
> Thanks again,
> Hal
> 
> > 
> >     For example, three spill/reload (NonAliasing) memory accesses
> >     using
> >     the same Value 'a', with different offsets:
> > 
> >         SU(2): store  @a
> >         SU(1): store  @a, Offset:1
> >         SU(0): load   @a
> > 
> >     In this case we have:
> > 
> >     * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends
> >     up in
> >       RejectMemNodes and is removed from the mem-uses list
> >       (AliasMemUses
> >       or NonAliasMemUses), as this list is cleared.
> > 
> >     * SU(2) needs a dep against SU(0). Therefore, SU(2) must check
> >       RejectMemNodes by calling adjustChainDeps().
> > 
> >     Previously, for store SUs, adjustChainDeps() was only called if
> >     MayAlias was true, missing the S(2) to S(0) dependency in the
> >     case
> >     above. The fix is to always call adjustChainDeps(), regardless
> >     of
> >     MayAlias, since this applies both for AliasMemUses and
> >     NonAliasMemUses.
> > 
> >     No test case found for any in-tree target.
> > 
> > Jonas Paulsson
> > 
> > -----Original Message-----
> > From: Jonas Paulsson
> > Sent: den 8 januari 2015 16:01
> > To: 'Hal Finkel'; Sanjin Sijaric
> > Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei
> > Larin; Andrew Trick; llvm-commits at cs.uiuc.edu
> > Subject: [PATCH] Call adjustChainDeps() always when handling a
> > store.
> > 
> > Hi,
> > 
> > here is a patch on a problem I described in my previous mail
> > (number
> > 1) for your review. In short, I think adjustChainDeps() must be
> > called
> > regardless of MayAlias while handling a store.
> > 
> > This problem was found on an out-of-tree target, and no test case
> > can
> > be provided for an official target.
> > 
> > Let me know if I can commit this patch, or if you have any
> > comments.
> > 
> > /Jonas
> > 
> > PS. Sanjin, regarding TII->areMemAccessesTriviallyDisjoint(), my
> > target does the same things as you mentioned: register / offset
> > based
> > analysis, both pre/post RA. It is adding very little improvement
> > over
> > AA, it seems. I don't have any failing test cases right now, so it
> > may
> > be that this patch actually fixed my problem.
> > However, right now it is confusing to have the whole algorithm
> > written
> > around memory operands and at the same time allow register /
> > offsets
> > analyzis in the midst of things. It would be good to rewrite that
> > part
> > like you explained. What are your current thoughts about this?
> > 
> > -----Original Message-----
> > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > Sent: den 23 december 2014 18:11
> > To: Sanjin Sijaric
> > Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei
> > Larin; Jonas Paulsson; Andrew Trick
> > Subject: Re: ScheduleDAGInstrs.cpp
> > 
> > ----- Original Message -----
> > > From: "Sanjin Sijaric" <ssijaric at codeaurora.org>
> > > To: "Jonas Paulsson" <jonas.paulsson at ericsson.com>, "Andrew
> > > Trick"
> > > <atrick at apple.com>
> > > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Mattias Eriksson V"
> > > <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom
> > > Stellard"
> > > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>
> > > Sent: Friday, December 19, 2014 1:51:44 PM
> > > Subject: RE: ScheduleDAGInstrs.cpp
> > > 
> > > Hi Jonas,
> > > 
> > > How is your target implementing areMemAccessesTriviallyDisjoint?
> > >  The
> > > callback is there so that we don't get into the situation where
> > > the
> > > call to isIdentifiedObject (which is called from
> > > isUnsafeMemoryObject from MIsNeedChainEdge) results in the edge
> > > being added between two memory locations that the target can
> > > easily
> > > prove are different (e.g based on base register + index + offset,
> > > etc).
> > > 
> > > Are you seeing the problem during pre-RA scheduling or post-RA
> > > scheduling?
> > > 
> > > I think we may be able to get rid of
> > > areMemAccessesTriviallyDisjoint, and let the AA code in
> > > MIsNeedChainEdgee handle it.  This works for me if I don't call
> > > isIdentifiedObject from isUnsafeMemoryObject if we are using AA.
> > > Currently, the aliasing code won't get a chance to run in some
> > > cases
> > > as isUnsafeMemoryObject returns true, resulting in
> > > the edge being added.   Maybe someone can shed some light as to
> > > why
> > > the call to isIdentifedMemoryObject in isUnsafeMemoryObject is
> > > needed if we are using AA.
> > 
> > I think that it might not be needed. Here's my thinking:
> > 
> > There is a call to getUnderlyingObjectsForInstr, and that needs to
> > be
> > there because of how the calling code groups the resulting SUs by
> > underlying object. Specifically, ScheduleDAGInstrs::buildSchedGraph
> > first gets the underlying objects for a value, and then only calls
> > addChainDependency on SUs that share one of these underlying
> > objects.
> > The reason why we insist on having an identified underlying object
> > is
> > so that we know we're not being non-uniformly limited by the depth
> > constraint. I mean the following. Assume I have two globals (@a and
> > @b), and I have a third value formed using a combination of them:
> > %c =
> > select i1 %something, @a, @b -- so I have
> > 
> >  SU(0): underlying objects: @a
> >  SU(1): underlying objects: @b
> >  SU(2): underlying objects: @a and @b
> > 
> > now imagine that our getUnderlyingObjectsForInstr had a trivial
> > depth
> > limit, and we did not check that the underlying objects returned
> > were
> > identified objects, then we might have:
> > 
> > 
> >  SU(0): underlying objects: @a
> >  SU(1): underlying objects: @b
> >  SU(2): underlying objects: %c
> > 
> > and we might never give AA the chance to conclude that %c and @a or
> > @b
> > might alias (because addChainDependency might never be called on
> > SU(0) <-> SU(2) because @a != %c).
> > 
> > To prevent this problem, we clear the underlying objects list in
> > getUnderlyingObjectsForInstr if any of the underlying objects are
> > not
> > identified (so that we can fall back to the conservative behavior).
> > 
> > Now in isUnsafeMemoryObject we have the same check, so that when
> > addChainDependency is called on an SU with a cleared underlying
> > objects list (which we did specifically to get conservative
> > behavior),
> > we always get that conservative behavior. However, this check seems
> > unnecessary (except, perhaps, as noted in the next paragraph). When
> > not using AA, by the time we get to MIsNeedChainEdge, we're already
> > destined to return true from MIsNeedChainEdge (except for load/load
> > edges). When using AA, we might not return true (depending on what
> > AA
> > returns), but the AA check is specific to the two instructions
> > being
> > queried, and AA understands how to handle selects/phis/etc. in a
> > reasonable way.
> > 
> > Nevertheless, I'm somewhat worried about a subtlety here: I believe
> > that we cannot fail to addChainDependency on the AliasChain because
> > AA
> > is smarter about underlying objects because the AliasChain actually
> > can represent many other dependencies. So maybe when we call
> > addChainDependency with SU on the AliasChain, we need to always
> > pass
> > nullptr for AA.
> > 
> >  -Hal
> > 
> > > 
> > > Thanks,
> > > Sanjin
> > > 
> > > -----Original Message-----
> > > From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com]
> > > Sent: Friday, December 19, 2014 6:51 AM
> > > To: Andrew Trick
> > > Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin
> > > Sijaric; Tom Stellard; Sergei Larin
> > > Subject: ScheduleDAGInstrs.cpp
> > > 
> > > Hi,
> > > 
> > > I write again regarding buildSchedGraph(), as I am still not
> > > happy
> > > about things there.
> > > 
> > > I have found at least two examples which do not work out:
> > > 
> > > 1)
> > > 
> > > SU(2) Store "Value A"
> > > 
> > > SU(1) Store "Value A"
> > > 
> > > SU(0) Load  "Value A"
> > > 
> > > If MIsNeedChainEdge() returns false for SU(0) and SU(1), SU(0) is
> > > inserted into RejectedMemNodes and removed from its MemUses SU
> > > list,
> > > as this list is cleared. Therefore SU(2) must be handled with
> > > adjustChainDeps(), because it needs an edge from SU(0).
> > > For some reason adjustChainDeps() was only called for
> > > may-aliasing
> > > stores. I think this is wrong, as a store will clear the MemUses
> > > SU
> > > list also in the non-aliasing case.
> > > 
> > > If MIsNeedChainEdge() returns true for SU(0) and SU(1), what
> > > happens
> > > if MIsNeedChainEdge() returns false between SU(2) and SU(1)? The
> > > dependency against SU(0) will not be checked, because it is not
> > > in
> > > RejectMemNodes, nor in the MemUses SU list.
> > > 
> > > 2)
> > > 
> > > SU(2) Load  "Value A"
> > > 
> > > SU(1) Store "Value A"
> > > 
> > > SU(0) Store "Value A"
> > > 
> > > If not using alias analysis, then the MemDefs list is cleared
> > > after
> > > *maybe* having inserted an edge from SU(0) to SU(1) with
> > > addChainDependency(). If there was not an edge inserted, then
> > > SU(2)
> > > must get a chance to check its deps against SU(0) by use of
> > > adjustChainDeps(). Again, this is only done if MayAlias is true.
> > > 
> > > I suspect therefore that areMemAccessesTriviallyDisjoint() is
> > > currently used in a very dangerous way, since it might get lucky
> > > in
> > > just one of two (equivalent) queries, and a dependency might be
> > > missed later on as seen above.
> > > 
> > > Or am I missing something here?
> > > 
> > > Senjin, I am working on an out-of-tree target. The problem I had
> > > was
> > > the first scenario in #1, and I fixed it by moving the
> > > adjustChainDeps() call to be called also in the non-aliasing
> > > case.
> > > 
> > > Another question I have is regarding adjustChainDeps(). If
> > > MIsNeedChainEdge() returns true, is it still then always
> > > necessary
> > > to continue recursively with successors? If this is a latency
> > > issue,
> > > perhaps it would be ok to do "continue" after adding the pred if
> > > the
> > > latency is non-zero? Right now this seems to be always zero, or
> > > one
> > > in the store-load case per the comment on line 856.
> > > 
> > > On line 947: "A store to a specific PseudoSourceValue" should
> > > probably say "... a specific Value / PseudoSourceValue", right?
> > > The
> > > same goes for line 1029.
> > > 
> > > /Jonas
> > > 
> > > 
> > > 
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-commits mailing list