[PATCH] D154482: [AMDGPU] Flush vmcnt in preheader for loops with loads

Wed Jul 5 01:41:22 PDT 2023

kerbowa planned changes to this revision.
kerbowa added a comment.

This patch is meant to discuss and explore the idea of swapping the default to assume that in the average case, it is profitable to hoist waitcnt to the preheader of loops. It's mutually exclusive with D154480 <https://reviews.llvm.org/D154480>. Needs a round of performance testing to confirm it actually is profitable in the aggregate.

An improvement would probably be needed where there is verification that the waitcnt being hoisted is actually improving the placement of waitcnt in the loop.

E.g. in cases like below, we don't want to do any hoisting.

  v0 = load(...)
  loop {
    v1 = load(...)
    ...
    use(v1)
    use(v0)
  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154482/new/

https://reviews.llvm.org/D154482