[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
Hal Finkel
hfinkel at anl.gov
Fri Feb 6 17:06:54 PST 2015
----- Original Message -----
> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Sanjin Sijaric" <ssijaric at codeaurora.org>, "Andrew Trick (atrick at apple.com)"
> <atrick at apple.com>, llvm-commits at cs.uiuc.edu
> Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom Stellard"
> <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>, "Patrik Hägglund H"
> <patrik.h.hagglund at ericsson.com>
> Sent: Friday, January 30, 2015 4:23:50 AM
> Subject: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
>
> Hi,
>
> I have revisited the issue in buildSchedGraph() I talked about
> previously, and attached a few patches. The first tries to fix the
> issue, and the other two try to illustrate associated issues,
> emerged from applying it.
>
> Is it OK to commit the first patch?
>
> [PATCH] Bugfix for missed dependency from store to load in
> buildSchedGraph().
>
> Bugfix for missed dependency from store to load in
> buildSchedGraph().
>
> Background: When handling underlying objects for a store, the
> vector
> of previous mem uses, mapped to the same Value, is afterwards
> cleared
> (regardless of ThisMayAlias). This means that during handling of
> the
> next store using the same Value, adjustChainDeps() must be
> called,
> otherwise a dependency might be missed.
Yes, LGTM. However, I'd like an extensive comment added to that part of the code explaining what is going on. That function has far too little in the way of commentary (RejectMemNodes itself even has no description). It is crucial that we change this. As far as I'm concerned, a comment that is as long and explanatory as this patch description is perfectly appropriate. [I say this admitting to having touched this function myself].
Also, I'd really prefer you use reviews.llvm.org for these patches in the future, and upload them with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface), it makes it much easier to see what is going on.
Regarding the other two patches, can you please explain what is going on? It is not obvious to me from the patches.
Thanks again,
Hal
>
> For example, three spill/reload (NonAliasing) memory accesses
> using
> the same Value 'a', with different offsets:
>
> SU(2): store @a
> SU(1): store @a, Offset:1
> SU(0): load @a
>
> In this case we have:
>
> * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends
> up in
> RejectMemNodes and is removed from the mem-uses list
> (AliasMemUses
> or NonAliasMemUses), as this list is cleared.
>
> * SU(2) needs a dep against SU(0). Therefore, SU(2) must check
> RejectMemNodes by calling adjustChainDeps().
>
> Previously, for store SUs, adjustChainDeps() was only called if
> MayAlias was true, missing the S(2) to S(0) dependency in the
> case
> above. The fix is to always call adjustChainDeps(), regardless of
> MayAlias, since this applies both for AliasMemUses and
> NonAliasMemUses.
>
> No test case found for any in-tree target.
>
> Jonas Paulsson
>
> -----Original Message-----
> From: Jonas Paulsson
> Sent: den 8 januari 2015 16:01
> To: 'Hal Finkel'; Sanjin Sijaric
> Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei
> Larin; Andrew Trick; llvm-commits at cs.uiuc.edu
> Subject: [PATCH] Call adjustChainDeps() always when handling a store.
>
> Hi,
>
> here is a patch on a problem I described in my previous mail (number
> 1) for your review. In short, I think adjustChainDeps() must be
> called regardless of MayAlias while handling a store.
>
> This problem was found on an out-of-tree target, and no test case can
> be provided for an official target.
>
> Let me know if I can commit this patch, or if you have any comments.
>
> /Jonas
>
> PS. Sanjin, regarding TII->areMemAccessesTriviallyDisjoint(), my
> target does the same things as you mentioned: register / offset
> based analysis, both pre/post RA. It is adding very little
> improvement over AA, it seems. I don't have any failing test cases
> right now, so it may be that this patch actually fixed my problem.
> However, right now it is confusing to have the whole algorithm
> written around memory operands and at the same time allow register /
> offsets analyzis in the midst of things. It would be good to rewrite
> that part like you explained. What are your current thoughts about
> this?
>
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: den 23 december 2014 18:11
> To: Sanjin Sijaric
> Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei
> Larin; Jonas Paulsson; Andrew Trick
> Subject: Re: ScheduleDAGInstrs.cpp
>
> ----- Original Message -----
> > From: "Sanjin Sijaric" <ssijaric at codeaurora.org>
> > To: "Jonas Paulsson" <jonas.paulsson at ericsson.com>, "Andrew Trick"
> > <atrick at apple.com>
> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Mattias Eriksson V"
> > <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom
> > Stellard"
> > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>
> > Sent: Friday, December 19, 2014 1:51:44 PM
> > Subject: RE: ScheduleDAGInstrs.cpp
> >
> > Hi Jonas,
> >
> > How is your target implementing areMemAccessesTriviallyDisjoint?
> > The
> > callback is there so that we don't get into the situation where the
> > call to isIdentifiedObject (which is called from
> > isUnsafeMemoryObject
> > from MIsNeedChainEdge) results in the edge being added between two
> > memory locations that the target can easily prove are different
> > (e.g
> > based on base register + index + offset, etc).
> >
> > Are you seeing the problem during pre-RA scheduling or post-RA
> > scheduling?
> >
> > I think we may be able to get rid of
> > areMemAccessesTriviallyDisjoint,
> > and let the AA code in MIsNeedChainEdgee handle it. This works for
> > me
> > if I don't call isIdentifiedObject from isUnsafeMemoryObject if we
> > are
> > using AA. Currently, the aliasing code won't get a chance to run
> > in
> > some cases as isUnsafeMemoryObject returns true, resulting in
> > the edge being added. Maybe someone can shed some light as to why
> > the call to isIdentifedMemoryObject in isUnsafeMemoryObject is
> > needed
> > if we are using AA.
>
> I think that it might not be needed. Here's my thinking:
>
> There is a call to getUnderlyingObjectsForInstr, and that needs to be
> there because of how the calling code groups the resulting SUs by
> underlying object. Specifically, ScheduleDAGInstrs::buildSchedGraph
> first gets the underlying objects for a value, and then only calls
> addChainDependency on SUs that share one of these underlying
> objects. The reason why we insist on having an identified underlying
> object is so that we know we're not being non-uniformly limited by
> the depth constraint. I mean the following. Assume I have two
> globals (@a and @b), and I have a third value formed using a
> combination of them: %c = select i1 %something, @a, @b -- so I have
>
> SU(0): underlying objects: @a
> SU(1): underlying objects: @b
> SU(2): underlying objects: @a and @b
>
> now imagine that our getUnderlyingObjectsForInstr had a trivial depth
> limit, and we did not check that the underlying objects returned
> were identified objects, then we might have:
>
>
> SU(0): underlying objects: @a
> SU(1): underlying objects: @b
> SU(2): underlying objects: %c
>
> and we might never give AA the chance to conclude that %c and @a or
> @b might alias (because addChainDependency might never be called on
> SU(0) <-> SU(2) because @a != %c).
>
> To prevent this problem, we clear the underlying objects list in
> getUnderlyingObjectsForInstr if any of the underlying objects are
> not identified (so that we can fall back to the conservative
> behavior).
>
> Now in isUnsafeMemoryObject we have the same check, so that when
> addChainDependency is called on an SU with a cleared underlying
> objects list (which we did specifically to get conservative
> behavior), we always get that conservative behavior. However, this
> check seems unnecessary (except, perhaps, as noted in the next
> paragraph). When not using AA, by the time we get to
> MIsNeedChainEdge, we're already destined to return true from
> MIsNeedChainEdge (except for load/load edges). When using AA, we
> might not return true (depending on what AA returns), but the AA
> check is specific to the two instructions being queried, and AA
> understands how to handle selects/phis/etc. in a reasonable way.
>
> Nevertheless, I'm somewhat worried about a subtlety here: I believe
> that we cannot fail to addChainDependency on the AliasChain because
> AA is smarter about underlying objects because the AliasChain
> actually can represent many other dependencies. So maybe when we
> call addChainDependency with SU on the AliasChain, we need to always
> pass nullptr for AA.
>
> -Hal
>
> >
> > Thanks,
> > Sanjin
> >
> > -----Original Message-----
> > From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com]
> > Sent: Friday, December 19, 2014 6:51 AM
> > To: Andrew Trick
> > Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin
> > Sijaric; Tom Stellard; Sergei Larin
> > Subject: ScheduleDAGInstrs.cpp
> >
> > Hi,
> >
> > I write again regarding buildSchedGraph(), as I am still not happy
> > about things there.
> >
> > I have found at least two examples which do not work out:
> >
> > 1)
> >
> > SU(2) Store "Value A"
> >
> > SU(1) Store "Value A"
> >
> > SU(0) Load "Value A"
> >
> > If MIsNeedChainEdge() returns false for SU(0) and SU(1), SU(0) is
> > inserted into RejectedMemNodes and removed from its MemUses SU
> > list,
> > as this list is cleared. Therefore SU(2) must be handled with
> > adjustChainDeps(), because it needs an edge from SU(0).
> > For some reason adjustChainDeps() was only called for may-aliasing
> > stores. I think this is wrong, as a store will clear the MemUses SU
> > list also in the non-aliasing case.
> >
> > If MIsNeedChainEdge() returns true for SU(0) and SU(1), what
> > happens
> > if MIsNeedChainEdge() returns false between SU(2) and SU(1)? The
> > dependency against SU(0) will not be checked, because it is not in
> > RejectMemNodes, nor in the MemUses SU list.
> >
> > 2)
> >
> > SU(2) Load "Value A"
> >
> > SU(1) Store "Value A"
> >
> > SU(0) Store "Value A"
> >
> > If not using alias analysis, then the MemDefs list is cleared after
> > *maybe* having inserted an edge from SU(0) to SU(1) with
> > addChainDependency(). If there was not an edge inserted, then SU(2)
> > must get a chance to check its deps against SU(0) by use of
> > adjustChainDeps(). Again, this is only done if MayAlias is true.
> >
> > I suspect therefore that areMemAccessesTriviallyDisjoint() is
> > currently used in a very dangerous way, since it might get lucky in
> > just one of two (equivalent) queries, and a dependency might be
> > missed
> > later on as seen above.
> >
> > Or am I missing something here?
> >
> > Senjin, I am working on an out-of-tree target. The problem I had
> > was
> > the first scenario in #1, and I fixed it by moving the
> > adjustChainDeps() call to be called also in the non-aliasing case.
> >
> > Another question I have is regarding adjustChainDeps(). If
> > MIsNeedChainEdge() returns true, is it still then always necessary
> > to
> > continue recursively with successors? If this is a latency issue,
> > perhaps it would be ok to do "continue" after adding the pred if
> > the
> > latency is non-zero? Right now this seems to be always zero, or one
> > in
> > the store-load case per the comment on line 856.
> >
> > On line 947: "A store to a specific PseudoSourceValue" should
> > probably
> > say "... a specific Value / PseudoSourceValue", right? The same
> > goes
> > for line 1029.
> >
> > /Jonas
> >
> >
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list