[PATCH] D49195: [WebAssembly] Support for a ternary atomic RMW instruction

Thu Jul 12 01:22:20 PDT 2018

aheejin added a comment.

I think this CL can be reviewed as is. I will add an optimization for the success flag thing in another CL bc it is gonna be too long. So the problem I was talking about is, let's say we have these two test cases:

  define i64 @cmpxchg_i8_i64_loaded_value(i8* %p, i64 %exp, i64 %new) {
    %exp_t = trunc i64 %exp to i8
    %new_t = trunc i64 %new to i8
    %pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst
    %old = extractvalue { i8, i1 } %pair, 0
    %e = zext i8 %old to i64
    ret i64 %e
  }

  define i1 @cmpxchg_i8_i64_success(i8* %p, i64 %exp, i64 %new) {
    %exp_t = trunc i64 %exp to i8
    %new_t = trunc i64 %new to i8
    %pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst
    %succ = extractvalue { i8, i1 } %pair, 1
    ret i1 %succ
  }

So, in the LLVM IR (not wasm), unlike atomicrmw <http://llvm.org/docs/LangRef.html#atomicrmw-instruction> instruction, cmpxchg <http://llvm.org/docs/LangRef.html#cmpxchg-instruction> instruction returns a pair of { loaded value, success flag }. So it returns an additional 'success flag' which indicates whether the loaded value and the expected value matches. With this CL, the first function's compilation result is going to be

  cmpxchg_i8_i64_loaded_value:
    .param    i32, i64, i64
    .result   i64
    i64.atomic.rmw8_u.cmpxchg  $push0=, 0($0), $1, $2
    return    $pop0

But for the second function (which is little contrived, because, usually the success flag is not gonna be returned from a function but likely to be used in a loop condition), this fails to make use of the `i64.atomic.rmw8_u.cmpxchg` instruction. It's gonna be something like

  cmpxchg_i8_i64_success:
    .param    i32, i64, i64
    .result   i32
    i32.wrap/i64  $push6=, $1
    tee_local  $push5=, $3=, $pop6
    i32.wrap/i64  $push0=, $2
    i32.atomic.rmw8_u.cmpxchg  $push1=, 0($0), $pop5, $pop0
    i32.const  $push2=, 255
    i32.and   $push3=, $3, $pop2
    i32.eq    $push4=, $pop1, $pop3
    return    $pop4

which is suboptimal. (This only happens when truncation-extension exists.)

I think we need another set of patterns to optimize this. And in case we want to use both the loaded value and the success flag, which I guess is the most common case, we need another set of patterns for that as well. I'll add that in another CL separately.

Repository:
  rL LLVM

https://reviews.llvm.org/D49195