[llvm] [MachinePipeliner] Add an abstract layer to manipulate Data Dependenc… (PR #109918)

Tue Oct 22 04:09:08 PDT 2024

dpenry wrote:

> 
> https://godbolt.org/z/3v5rnPG3o
> 
> I think that the loop-carried dependence SU(5) -> SU(2) is not handled correctly in the current implementation. If we're just talking about this case, it doesn't matter because instructions that read/write `$nzcv` (which I think is equivalent to `$cpsr` in ARM) are ultimately scheduled to stage 0. However, the problem can occur if similar dependencies are created by any other different physical registers. As fixed in [dcb7764](https://github.com/llvm/llvm-project/commit/dcb77643e3440e948010ed8ecb4c2f8fe4fadb93), dependencies created by the status register are handled correctly, but those created by other physical registers seem not to be. I'm not sure if such dependencies exist, but I think FFR register of Arm SVE could create such one. This is what I meant in the previous comment "should be fixed to treat other loop-carried dependencies correctly in the future".

That godbolt makes more sense!   I do note that in this particular instance, an SU(5)->SU(2) dependence carried across the loop is not seen because SU(7) kills the liveness of SU(5)'s CPSR write in the output direction and SU(1) kills the liveness of whatever entered the basic block in the input direction.   Is that a problem in general?  I'd have to think more on it.

Note that CPSR isn't called out specially in the code; all physical registers are treated alike.  Are all instructions annotated properly with implicit physical register def/use/kill information?  That's up to the instruction set definitions to get right.

SVE's FFR is an interesting beast because it's a cumulative fault register, so there isn't really a sensible notion of an output dependence, kind of like FPSR flags.  Getting the order of writes to it wrong generally doesn't matter until it's read.  However, RDFFR/RDFFRS/SETFFR/WRFFR probably should be marked as unpipelineable instructions so that we don't try to pipeline some odd code that actually does care about the order.

I do think we've gotten a bit far afield here.  We don't (yet) have an example of a failure case due to loop-carried physical registers or memory dependencies with the old code, and we're leaving in most of the bits of logic that purport to handle it.  The only question is whether the removal of the reversal code around createAdjacencyStructure is going to create a new failure case in the presence of loop-carried physical register or memory dependencies.  That removal results in fewer computed circuits.  Circuits are used for MII calculation and for checkValidNodeOrder.  MII calculation is not a correctness issue.  As for validity, being in a circuit makes it legal to be after both successors and predecessors, so fewer circuits means fewer legal orders, so I don't think we can introduce new failure cases, just reduced opportunities for pipelining.  So, I think we're good for now.  I would like to see what others have to say.

https://github.com/llvm/llvm-project/pull/109918