[PATCH] D49195: [WebAssembly] Support for a ternary atomic RMW instruction
Heejin Ahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 12 01:22:20 PDT 2018
aheejin added a comment.
I think this CL can be reviewed as is. I will add an optimization for the success flag thing in another CL bc it is gonna be too long. So the problem I was talking about is, let's say we have these two test cases:
define i64 @cmpxchg_i8_i64_loaded_value(i8* %p, i64 %exp, i64 %new) {
%exp_t = trunc i64 %exp to i8
%new_t = trunc i64 %new to i8
%pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst
%old = extractvalue { i8, i1 } %pair, 0
%e = zext i8 %old to i64
ret i64 %e
}
define i1 @cmpxchg_i8_i64_success(i8* %p, i64 %exp, i64 %new) {
%exp_t = trunc i64 %exp to i8
%new_t = trunc i64 %new to i8
%pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst
%succ = extractvalue { i8, i1 } %pair, 1
ret i1 %succ
}
So, in the LLVM IR (not wasm), unlike atomicrmw <http://llvm.org/docs/LangRef.html#atomicrmw-instruction> instruction, cmpxchg <http://llvm.org/docs/LangRef.html#cmpxchg-instruction> instruction returns a pair of { loaded value, success flag }. So it returns an additional 'success flag' which indicates whether the loaded value and the expected value matches. With this CL, the first function's compilation result is going to be
cmpxchg_i8_i64_loaded_value:
.param i32, i64, i64
.result i64
i64.atomic.rmw8_u.cmpxchg $push0=, 0($0), $1, $2
return $pop0
But for the second function (which is little contrived, because, usually the success flag is not gonna be returned from a function but likely to be used in a loop condition), this fails to make use of the `i64.atomic.rmw8_u.cmpxchg` instruction. It's gonna be something like
cmpxchg_i8_i64_success:
.param i32, i64, i64
.result i32
i32.wrap/i64 $push6=, $1
tee_local $push5=, $3=, $pop6
i32.wrap/i64 $push0=, $2
i32.atomic.rmw8_u.cmpxchg $push1=, 0($0), $pop5, $pop0
i32.const $push2=, 255
i32.and $push3=, $3, $pop2
i32.eq $push4=, $pop1, $pop3
return $pop4
which is suboptimal. (This only happens when truncation-extension exists.)
I think we need another set of patterns to optimize this. And in case we want to use both the loaded value and the success flag, which I guess is the most common case, we need another set of patterns for that as well. I'll add that in another CL separately.
Repository:
rL LLVM
https://reviews.llvm.org/D49195
More information about the llvm-commits
mailing list