[PATCH] D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking

Fri Nov 9 09:25:42 PST 2018

t-tye added a comment.

In https://reviews.llvm.org/D54228#1292786, @nhaehnle wrote:

> In https://reviews.llvm.org/D54228#1290997, @t-tye wrote:
>
> > > This is sufficient, because whenever only one event of a count type is
> >
> > pending, its last time point is naturally the upper bound of all time
> >  points of this count type, and when multiple event types are pending,
> >  the count type has gone out of order and an s_waitcnt to 0 is required
> >  to clear any pending event type (and will then clear all pending event
> >  types for that count type).
> >
> > Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.
>
>
> Hmm, how would that work? What lowest count are you referring to? For example, if lgkm has both in-flight SMEM read, and in-flight LDS, we could either have all SMEM read finish first or all LDS finish first.
>
> Something that we //could// do is a more finely-grained tracking of in-order events. For example, if we have both in-flight SMEM and in-flight LDS, and we need to wait for the second-to-last LDS, then in fact we could do an lgkmcnt(1) wait -- because if the counter reaches 1 or less, the second-to-last LDS must have returned. After the lgkmcnt(1), we still need to conservatively assume that any event type that was previously in-flight may still be in-flight, so this patch here is compatible with such a more finely-grained tracking.
>
> I think the finer-grained tracking could be achieved by introducing separate timelines for each event type: currently we only have timelines by counter. Anyway, it'd be a separate change, mainly for the benefit of mixing LDS and SMEM I think.

Right that was what I was meaning. Even though some waitcnt counters (such as lgkm) are an amalgam of several other internal counters, some of those internal counters are still in order. So tracking the internal counters avoids having to do the overly conservative use of 0.

It can also mean that some waitcnts may be eliminated since it may be that it is known there are no outstanding operations associated with an internal counter due to a previous waitcnt plus the fact that no operations for that internal counter have occurred subsequently.

Repository:
  rL LLVM

https://reviews.llvm.org/D54228