[PATCH] D47882: [RISCV] Codegen for i8, i16, and i32 atomicrmw with RV32A

Fri Jun 8 05:29:34 PDT 2018

asb added a comment.

In https://reviews.llvm.org/D47882#1125698, @jyknight wrote:

> The problem with min and max is that you must not have a branch within the ll/sc loop. But, they can be done branchless with not too many instructions, so ISTM we should just implement them that way.
>
> I think the following will work -- it computes `b ^ ((a^b) & -(a < b))`. The `& -(a < b)` either returns the LHS (if comparison is true) or returns 0 (if comparison is false), so you either get b^a^b, or b.

I did consider the branchless approach. However forward branches _are_ part of the architectural guarantee of LR/SC forward progress:

> In the standard A extension, certain constrained LR/SC sequences are guaranteed to succeed
>  eventually. The static code for the LR/SC sequence plus the code to retry the sequence in case
>  of failure must comprise at most 16 integer instructions placed sequentially in memory. For the
>  sequence to be guaranteed to eventually succeed, the dynamic code executed between the LR and
>  SC instructions can only contain other instructions from the base “I” subset, excluding loads, stores,
>  backward jumps or taken backward branches, FENCE, FENCE.I, and SYSTEM instructions. The
>  code to retry a failing LR/SC sequence can contain backward jumps and/or branches to repeat the
>  LR/SC sequence, but otherwise has the same constraints. The SC must be to the same address
>  and of the same data size as the latest LR executed. LR/SC sequences that do not meet these
>  constraints might complete on some attempts on some implementations, but there is no guarantee
>  of eventual success.

================
Comment at: lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:179
+
+  // andi offset, addr, 3
+  // andi alignedaddr, addr, -4
----------------
jyknight wrote:
> I'd note that all of the code which is OUTSIDE the ll/sc loop doesn't really need to be implemented within this pseudo. I'm not sure if it's worthwhile extracting that or not, but it could theoretically have benefit, due to being able to potentially reuse e.g. the address math.
> 
> Have you considered making such a split?
Yes, I've thought about this quite a bit. The ideal is to expand everything other than the LL+SC+branch at the IR level. I had been thinking about a new inlineasm-like 'black box' construct that would be expanded by target-dependent code only at the very last moment, when lowering to MCInst. But when I started sketching it out in more detail with the aim of incorporating it as an option in an RFC on atomics lowering I started thinking that there's not much benefit in this versus expanding pseudo-instructions in preEmitPass2. Targets can guarantee that runs after everything else that might cause problems (such as the machine outliner). I'll play with this some more.

================
Comment at: lib/Target/RISCV/RISCVTargetMachine.cpp:104
+
+void RISCVPassConfig::addPreSched2() { addPass(createRISCVExpandPseudoPass()); }
----------------
jyknight wrote:
> Is it okay to do this pre-scheduling? Are we guaranteed that the scheduler won't do something invalid here? (I'm not sure the set of transforms it's capable of at the moment.)
> 
preEmitPass2 seems the ideal point. Although the machine outliner could be taught to leave instructions between LLC/SC alone, it's probably better to ensure that this lowering happens at the latest possible point.

https://reviews.llvm.org/D47882