[llvm] llvm-mca: Disentangle `MemoryGroup` from `LSUnitBase` (PR #114159)

Wed Oct 30 12:58:48 PDT 2024

andrej wrote:

Thank you for replying to this so quickly.

> In my eyes, using a DAG for scheduling memory groups is not necessarily "just" an implementation detail.
> The general idea is that, no matter which consistency model you want to implement, you can always represent memory dependencies using edges of a DAG, where nodes are (groups of) instructions.

I agree that DAGs make sense for tracking data dependencies. Our implementation does in fact also use a DAG, but ...

> I am mainly curious about why using the existing framework would not work in your particular case. What is your solution using for tracking execution of dependent memory operations?

 ... the problem is that subclasses of `LSUnitBase` are forced to use the `MemoryGroup` class to represent the DAG. That robs them of the opportunity of attaching more detailed information or custom behavior for the nodes of the DAG. In our case, we wanted to attach the addresses at which instructions access memory (obtained from a trace) to the `MemoryGroup`, but this isn't possible without patching.

Note that I deliberately made `MemoryGroup` a _protected_ class defined in `LSUnit`. That way, one can still subclass `LSUnit` (instead of `LSUnitBase`) and make use of `MemoryGroup`. For example, if a subclass only wanted to overwrite the `dispatch` method to implement a different memory consistency model, they could still re-use the other code of `LSUnit`.

Subclassing `LSUnitBase` (instead of `LSUnit`) would then be used for more broadly different approaches to modeling the load/store unit. The overarching principle being that `LSUnitBase` would only specify the interface that other hardware units would need to interact with the load/store unit, e.g. the scheduler.

[**Alternative Approach**](https://github.com/llvm/llvm-project/compare/main...andrej:llvm-project:abstractize)
I can see that there could be arguments for keeping the DAG as part of the `LSUnitBase` interface. My suggestion in that case would be to expose `MemoryGroup`s at a more abstract level, so that subclasses still can customize the DAG implementation. I've implemented this as the "alternative approach" linked above, and am happy to instead make a PR for that if that approach if it is preferred.

This alternative implementation represents the DAG in `LSUnitBase` using a new class `AbstractMemoryGroup`, which specifies an interface for inspecting the DAG using virtual methods (such as `getNumPredecessors()` or `isWaiting()` for each group). `MemoryGroup` becomes a subclass of `AbstractMemoryGroup` -- a specific implementation of a DAG as used by the default `LSUnit` implementation, which contains the methods for how that load/store unit implementation might modify the DAG (such as adding nodes in `dispatch`). This still gives other `LSUnitBase` subclasses options to implement the DAG in an alternative way, but makes it clear that it is expected of all subclasses to provide some DAG in some way.

On another note, I realized my performance comparison above was flawed because that test does not contain any load/store instruction and hence does not exercise the load/store units. However I just ran `time ./bin/llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3000 ../llvm/test/tools/llvm-mca/X86/BtVer2/load-store-alias.s` and can confirm that the virtual dispatch still did not cause a significant slowdown (in neither this PR nor the "alternative approach" linked). All run in 0.60s.

https://github.com/llvm/llvm-project/pull/114159