[llvm] [MachineCopyPropagation, Scheduler] Detect and fix suboptimal instruction order to enable optimizations (PR #98087)

Thu Jul 18 12:10:22 PDT 2024

spaits wrote:

> There are two concerns I have with using the post-ra scheduler to enable copy propagation:
> 
>     1. Scheduling for latency, resource consumption, and reducing spills/reloads may not be effective if scheduling for MCP takes precedence.
> 
>     2. Scheduling for MCP may not be effective if scheduling for latency, resource consumption, and reducing spills/reloads takes precedence.
>

Yes. In the case of the C code I show in the issue description, basically the instruction scheduler would "fight" this mechanism that would do the re-ordering for the anti dependency breaker.
- The scheduler would want to prioritize the load.
- The copy pre-copy-propagation re-ordering mechanism I implement would want to keep the load at it's place because it wants to enable that optimization later.  

> On the other hand, I agree with others when they raise the worry about recreating the dependency graph in MCP. 

In my current implementation I have basically created a dumper scheduler graph, that only has the things need to create "must precede" relations between instructions. It doesn't care what kinds of dependencies cause the relation. I know that it is not good practice I also agree that regardless of the direction, the scheduler dag would be the best to re-use. I think the scheduler DAG has everything and more I need to do this optimization. (Basically it needs two things: recognize data dependencies, and be able to update itself when I move around instructions and the order of instructions change.)

> It sounds like the dependency breaker might be able to solve the problem of reordering in the scheduler to enable MCP, if we decide to take the route that reordering should be done in the scheduler.

@s-barannikov you have mentioned anti dep breaker first. Thinking about this deeper, I figured that even if we decide to not do this in MCP maybe it won't be that trivial to find a new place for this in the scheduler:

We are not breaking any anti dependency here. The anti dependencies stay, they must stay, in some cases they are not preventable. Anti dependencies must happen when we have a value, that must be in a specific register (for example the calling convention says so) or must come from a specific register (the calling convention specifies the return value storing register).

For example in the case of the aarch64 example even in the optimal code we must have anti dependency:
```asm
bl      _Z5chainii
mov     w1, w0
ldr     w0, [sp, 28]
bl      _Z5chainii
```

For the optimal work of the MCP we must move around instructions, that have are present in an anti dependency relation with a copy propagation "destination", but do not have any dependency with any of the instructions between itself and a "source" of a copy propagated. (It is pretty weird to write dependency with the `with` preposition but for some reason this describes the situation the best :) .)

So basically we just move around the anti dependency so it won't spoil a possible copy propagation.

> It sounds like we would need consensus around this idea though.

I totally agree.

@michaelmaitland thank you very much checking out this PR/discussion.

https://github.com/llvm/llvm-project/pull/98087