[llvm] [RegAllocFast] fold foldable inline asm (PR #74344)

Mon Dec 4 10:27:28 PST 2023

nickdesaulniers wrote:

> What's a good example that motivates this change?

As part of solving #20571, I will need to adjust both instruction selection frameworks to chose "r" instead of "m" when the inline asm constraints "rm" (i.e. "r" or "m") are observed.

But doing so may lead to register exhaustion later during register allocation.

So we need a way for the register allocation frameworks to "undo" the decision made earlier by instruction selection.

The way to undo this is for the instruction selection frameworks to mark the operands as "foldable" (i.e. register was chosen, but it is permitted to transform this to memory later if necessary) so that the register allocators may convert these back to memory locations.

This has already been done for greedy, but not yet fastregalloc.

Because the instruction selectors don't (and shouldn't) know which register allocator will run later, and the implicit requirement that all register allocators support codegen from all isel frameworks, I need to implement this logic for fastregalloc in addition to greedy.

(please let me know what of the above I should add to the commit message if you found it helpful).

> (doing a 2nd pass over the basic block is not cheap and this allocator aims to be primarily fast)

ok, but if the additional pass is optimized for the "no work" scenario, does that really add meaningful cost?  If I move this work to some other pass, we're still potentially adding an iteration over all instructions.

> Would it be possible to add a special case InlineAsm instruction and basically replace the "`// Allocate virtreg uses and insert reloads as necessary.`" loop with a 2-pass algorithm that first assign non-foldable operands and in a 2nd pass over the operands the foldable ones?

Is that something you'd imagine occurring within fastregalloc?

This PR essentially _is_ a 2 pass algorithm; it does one pass for foldable operands first, then defers to existing machinery for the non-foldable operands.  It's not clear from your suggestion of doing that in the opposite order what the improvement would be (if it's even feasible); can you provide more thoughts on what you have in mind?

https://github.com/llvm/llvm-project/pull/74344