[PATCH] D54228: AMDGPU/InsertWaitcnts: Simplify pending events tracking

Fri Nov 9 06:30:06 PST 2018

nhaehnle added a comment.

In https://reviews.llvm.org/D54228#1290997, @t-tye wrote:

> > This is sufficient, because whenever only one event of a count type is
>
> pending, its last time point is naturally the upper bound of all time
>  points of this count type, and when multiple event types are pending,
>  the count type has gone out of order and an s_waitcnt to 0 is required
>  to clear any pending event type (and will then clear all pending event
>  types for that count type).
>
> Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.

Hmm, how would that work? What lowest count are you referring to? For example, if lgkm has both in-flight SMEM read, and in-flight LDS, we could either have all SMEM read finish first or all LDS finish first.

Something that we //could// do is a more finely-grained tracking of in-order events. For example, if we have both in-flight SMEM and in-flight LDS, and we need to wait for the second-to-last LDS, then in fact we could do an lgkmcnt(1) wait -- because if the counter reaches 1 or less, the second-to-last LDS must have returned. After the lgkmcnt(1), we still need to conservatively assume that any event type that was previously in-flight may still be in-flight, so this patch here is compatible with such a more finely-grained tracking.

I think the finer-grained tracking could be achieved by introducing separate timelines for each event type: currently we only have timelines by counter. Anyway, it'd be a separate change, mainly for the benefit of mixing LDS and SMEM I think.

Repository:
  rL LLVM

https://reviews.llvm.org/D54228