[PATCH] D47882: [RISCV] Codegen for i8, i16, and i32 atomicrmw with RV32A

Thu Jun 7 14:40:32 PDT 2018

jyknight added a comment.

The problem with min and max is that you must not have a branch within the ll/sc loop. But, they can be done branchless with not too many instructions, so ISTM we should just implement them that way.

I think the following will work -- it computes `b ^ ((a^b) & -(a < b))`. The `& -(a < b)` either returns the LHS (if comparison is true) or returns 0 (if comparison is false), so you either get b^a^b, or b.

min(a0, a1) -> a0:

  slt     a2, a0, a1
  neg     a2, a2
  xor     a0, a0, a1
  and     a0, a0, a2
  xor     a0, a1, a0

And I think just use sltu instead of slt for umin/umax, and reverse the args of the slt for max/umax.

================
Comment at: lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:179
+
+  // andi offset, addr, 3
+  // andi alignedaddr, addr, -4
----------------
I'd note that all of the code which is OUTSIDE the ll/sc loop doesn't really need to be implemented within this pseudo. I'm not sure if it's worthwhile extracting that or not, but it could theoretically have benefit, due to being able to potentially reuse e.g. the address math.

Have you considered making such a split?

================
Comment at: lib/Target/RISCV/RISCVTargetMachine.cpp:104
+
+void RISCVPassConfig::addPreSched2() { addPass(createRISCVExpandPseudoPass()); }
----------------
Is it okay to do this pre-scheduling? Are we guaranteed that the scheduler won't do something invalid here? (I'm not sure the set of transforms it's capable of at the moment.)

https://reviews.llvm.org/D47882