[llvm] [RISCV] 'Zalrsc' may permit non-base instructions (PR #165042)

Mon Oct 27 17:31:02 PDT 2025

================
@@ -1906,6 +1906,25 @@ def FeatureForcedAtomics : SubtargetFeature<
 def HasAtomicLdSt
     : Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;
 
+// The RISC-V Unprivileged Architecture - ISA Volume 1 (Version: 20250508)
+// [https://docs.riscv.org/reference/isa/_attachments/riscv-unprivileged.pdf]
+// in section 13.3. Eventual Success of Store-Conditional Instructions, defines
+// _constrained_ LR/SC loops:
+//   The dynamic code executed between the LR and SC instructions can only
+//   contain instructions from the base ''I'' instruction set, excluding loads,
+//   stores, backward jumps, taken backward branches, JALR, FENCE, and SYSTEM
+//   instructions. Compressed forms of the aforementioned ''I'' instructions in
+//   the Zca and Zcb extensions are also permitted.
+// LR/SC loops that do not adhere to the above are _unconstrained_ LR/SC loops,
+// and success is implementation specific. For implementations which know that
+// non-base instructions (such as the ''B'' extension) will not violate any
+// forward progress guarantees, using these instructions to reduce the LR/SC
+// sequence length is desirable.
+def FeaturePermissiveZalrsc
+    : SubtargetFeature<
+          "permissive-zalrsc", "HasPermissiveZalrsc", "true",
+          "Implementation permits non-base instructions between LR/SC pairs">;
----------------
slachowsky wrote:

Certainly a reasonable ask.

This feature is from the point of view of a minimal RISC-V core with LR/SC, and a global monitor that is external to the core.  In such a configuration the global monitor is aware only of the load/store transactions to the memory system, and completely unaware of what instructions or control flow occurred on the CPU(s) (or non-CPU devices) to generate those transactions.  Any instruction mix is permissible in this style of system (ignoring higher order concerns of guaranteed forward progress / eventual success), as long as the same memory transactions present to the monitor.

It is necessary to have some `FeaturePermissiveZalrsc` control to enable 'unconstrained' LR/SC loops, and the proposal here is there are _no constraints_ on what is permissible.  The idea is to admit shorter sequences via checks on extant secondary extension feature availability:

```
if (STI->hasPermissiveZalrsc() && STI->hasVendorExtABC())
  // build short vendor ABC instruction sequence
else if (STI->hasPermissiveZalrsc() && STI->hasStdExtXYZ())
  // build short standard XYZ instruction sequence
else
  // build original constrained sequence with only 'I' instructions
```

This avoids the explosion in features to cover the cross-products of permitted Zalrsc x {XYZ, ABC, etc}. If a core has no constraints on what is permitted, and it also has an instruction extension that gives a shorter sequence go ahead and use it.

Realistically though, there is a tiny vocabulary of `atomicrmw <ops>`, and the existing pseudo expansions for these are very tightly coded, so there is very limited opportunity for improvement here.  Other than Zbb MIN/MAX instructions in this patch, the only other instruction extension that I can think of that has utility is some sort of bit field insertion / bit select instructions that could shorten the `xor` + `and` + `xor` sequence used in the masked atomics.

https://github.com/llvm/llvm-project/pull/165042