[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

Thu May 9 10:02:44 PDT 2024

jrbyrnes wrote:

> We should spent more energy making the scheduler sensible by default, instead of creating all of this complexity.

I would also prefer a more sensible default scheduler, but the driving usecase for this is global scheduling. The scheduler is doing inefficient things since it is unaware of loop carried dependencies. A generalized solution, then, is not feasible due the timeline for that feature. We could try adding some sort of ad-hoc heuristic to the scheduler for cases like this, but I don't see how that would improve complexity relative to this, and it will likely not produce the results the users expect.

https://github.com/llvm/llvm-project/pull/85304