[llvm] [Clang][inlineasm] Add special support for "rm" output constraints (PR #92040)

Tue Jul 23 13:00:55 PDT 2024

================
@@ -4962,6 +4962,11 @@ class TargetLowering : public TargetLoweringBase {
     /// Memory, Other, Unknown.
     TargetLowering::ConstraintType ConstraintType = TargetLowering::C_Unknown;
 
+    /// The register may be folded. This is used if the constraint is "rm",
+    /// where we prefer using a register, but can fall back to a memory slot
+    /// under register pressure.
+    bool MayFoldRegister = false;
+
----------------
bwendling wrote:

Sure, done.

We already encode the 'MayBeFolded' in the `InlineAsm::Flag` value. If we go with a `C_RegisterOrMemory` type we can reclaim that bit. However, it amounts to the same, and there's not a lot more we can do with it beyond "select one constraint and fall back to the other if the first one fails." I've toyed with the idea of having a completely new way of representing multi-constraints, but it's trickier than it first appears. For instance, we could encode all constraints in the `INLINEASM` machine instruction, selecting the "best" one during register allocation and discarding the rest. This runs into some issues though. First off, cleaning up the unused instructions might be tricky because it relies upon memory analysis, etc. The back end does DCE, but I'm worried that it wouldn't remove *all* instructions.

So I've been going down the rabbit hole that's been suggested in this PR: forcing the fast register allocator to spill all "foldable" registers. As you can imagine, this is easier said than done. There are some situations where the code necessary to support a memory constraint simply doesn't exist and needs to be replicated. Unfortunately, there doesn't appear to be a single way to do that. (If there is and I just haven't found it, please let me know.) So I've resorted to searching for said instructions, and if I don't find them I cobble together the best versions I can and crossing my fingers. This works for several cases, but not for all, as you can imagine.

I'm planning on severely limiting the scope of this feature to "simple" inputs/outputs---no tied constraints, etc.---on X86 and calling it a day. It's a half-step forward, and hacky like nothing else, but the chewing gum, duct tape, and Flex TAPE should hold for the instances we actually care about...

https://github.com/llvm/llvm-project/pull/92040