[cfe-commits] [PATCH] atomic operation builtins, part 1

Fri Oct 7 17:03:07 PDT 2011

On Fri, Oct 7, 2011 at 3:53 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:
> On Thu, Oct 6, 2011 at 6:36 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> Atomic ops; similar to the patch I submitted earlier, but this version
>> adds the restriction that the first operand to the __atomic_*
>> operations must be an _Atomic(T)*.  I'm sending this to double-check
>> that I haven't missed any serious issues.
>>
>> After this is committed, I'll put together a patch for
>> __atomic_is_lock_free, including the necessary machinery to come up
>> with the right answers.
>>
>
> Comments/questions inline:
>
>> Index: include/clang/Basic/Builtins.def
>> ===================================================================
>> --- include/clang/Basic/Builtins.def  (revision 141335)
>> +++ include/clang/Basic/Builtins.def  (working copy)
>> @@ -585,8 +585,19 @@
>>  BUILTIN(__sync_swap_8, "LLiLLiD*LLi.", "n")
>>  BUILTIN(__sync_swap_16, "LLLiLLLiD*LLLi.", "n")
>>
>> +BUILTIN(__atomic_load, "v.", "t")
>> +BUILTIN(__atomic_store, "v.", "t")
>> +BUILTIN(__atomic_exchange, "v.", "t")
>> +BUILTIN(__atomic_compare_exchange_strong, "v.", "t")
>> +BUILTIN(__atomic_compare_exchange_weak, "v.", "t")
>> +BUILTIN(__atomic_fetch_add, "v.", "t")
>> +BUILTIN(__atomic_fetch_sub, "v.", "t")
>> +BUILTIN(__atomic_fetch_and, "v.", "t")
>> +BUILTIN(__atomic_fetch_or, "v.", "t")
>> +BUILTIN(__atomic_fetch_xor, "v.", "t")
>> +BUILTIN(__atomic_thread_fence, "vi", "t")
>> +BUILTIN(__atomic_signal_fence, "vi", "t")
>
> I believe gcc is using __sync_mem_* instead of __atomic_* for their
> builtins, although they're using __atomic_* for the library that
> handles maybe-not-lock-free calls:
> http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary.  I've cc'ed Andrew
> MacLeod who's actually deciding this, in the hope that both compilers
> can use the same intrinsics.

Okay, I can wait to hear back.

> There's also some chance we'll eventually want to provide builtins to
> access the singlethread synchronization scope (or other
> synchronization scopes we haven't thought of a need for yet).
> (http://llvm.org/docs/LangRef.html#ordering for Andrew.) Do you want
> to include that parameter in these builtins, or wait, and maybe need
> to add another set?

The C++ standard doesn't actually expose anything corresponding to
those instructions, for reasons I don't entirely understand.

>> -
>>  // Non-overloaded atomic builtins.
>>  BUILTIN(__sync_synchronize, "v.", "n")
>>  // GCC does not support these, they are a Clang extension.
>> Index: include/clang/Basic/DiagnosticSemaKinds.td
>> ===================================================================
>> --- include/clang/Basic/DiagnosticSemaKinds.td        (revision 141335)
>> +++ include/clang/Basic/DiagnosticSemaKinds.td        (working copy)
>> @@ -4000,6 +4000,15 @@
>>  def err_atomic_builtin_pointer_size : Error<
>>    "first argument to atomic builtin must be a pointer to 1,2,4,8 or 16 byte "
>>    "type (%0 invalid)">;
>> +def err_atomic_builtin_needs_atomic : Error<
>> +  "first argument to atomic builtin must be a pointer to atomic "
>> +  " type (%0 invalid)">;
>
> Maybe s/atomic/_Atomic/. Otherwise C++ programmers who see this are
> likely to try passing std::atomic<T> instances.

K.

>> +def err_atomic_builtin_needs_atomic_int_or_ptr : Error<
>> +  "first argument to atomic builtin must be a pointer to atomic "
>> +  " integer or pointer (%0 invalid)">;
>> +def err_atomic_builtin_logical_needs_atomic_int : Error<
>> +  "first argument to logical atomic builtin must be a pointer to atomic "
>> +  " integer (%0 invalid)">;
>>
>>  def err_deleted_function_use : Error<"attempt to use a deleted function">;
>>
>> Index: include/clang/AST/Expr.h
>> ===================================================================
>> --- include/clang/AST/Expr.h  (revision 141335)
>> +++ include/clang/AST/Expr.h  (working copy)
>> @@ -4157,6 +4157,128 @@
>>    // Iterators
>>    child_range children() { return child_range(&SrcExpr, &SrcExpr+1); }
>>  };
>> +
>> +/// AtomicExpr - Variadic atomic builtins: __atomic_exchange, __atomic_fetch_*,
>> +/// __atomic_load, __atomic_store, and __atomic_compare_exchange_*, for the
>> +/// similarly-named C++0x instructions.  All of these instructions take one
>> +/// primary pointer and at least one memory order.
>> +class AtomicExpr : public Expr {
>> +public:
>> +  enum AtomicOp { Load, Store, CmpXchgStrong, CmpXchgWeak, Xchg,
>> +                  Add, Sub, And, Or, Xor };
>> +private:
>> +  enum { PTR, ORDER, VAL1, ORDER_FAIL, VAL2, END_EXPR };
>> +  Stmt* SubExprs[END_EXPR];
>> +  unsigned NumSubExprs;
>> +  SourceLocation BuiltinLoc, RParenLoc;
>> +  AtomicOp Op;
>> +
>> +public:
>> +  // Constructor for Load
>> +  AtomicExpr(SourceLocation BLoc, Expr *ptr, Expr *order, QualType t,
>> +             AtomicOp op, SourceLocation RP,
>> +             bool TypeDependent, bool ValueDependent)
>> +    : Expr(AtomicExprClass, t, VK_RValue, OK_Ordinary,
>> +           TypeDependent, ValueDependent,
>> +           ptr->isInstantiationDependent(),
>> +           ptr->containsUnexpandedParameterPack()),
>> +      BuiltinLoc(BLoc), RParenLoc(RP), Op(op) {
>> +      SubExprs[PTR] = ptr;
>> +      SubExprs[ORDER] = order;
>> +      NumSubExprs = 2;
>> +    }
>
> Hmm, normally I'd ask for a CreateLoad() static method that forwarded
> to an appropriate private constructor, since it may be hard to
> remember which constructor goes with which operation, but it looks
> like there's exactly one call site for each of these constructors, so
> it's probably not worth it.  Maybe add an assert on 'op' instead?

An assert seems reasonable.

>> +
>> +  // Constructor for Store, Xchg, Add, Sub, And, Or, Xor
>> +  AtomicExpr(SourceLocation BLoc, Expr *ptr, Expr *val, Expr *order,
>> +             QualType t, AtomicOp op, SourceLocation RP,
>> +             bool TypeDependent, bool ValueDependent)
>> +      : Expr(AtomicExprClass, t, VK_RValue, OK_Ordinary,
>> +             TypeDependent, ValueDependent,
>> +             (ptr->isInstantiationDependent() ||
>> +              val->isInstantiationDependent()),
>> +             (ptr->containsUnexpandedParameterPack() ||
>> +              val->containsUnexpandedParameterPack())),
>> +        BuiltinLoc(BLoc), RParenLoc(RP), Op(op) {
>> +        SubExprs[PTR] = ptr;
>> +        SubExprs[ORDER] = order;
>> +        SubExprs[VAL1] = val;
>> +        NumSubExprs = 3;
>> +      }
>> +
>> +  // Constructor for CmpXchgStrong, CmpXchgWeak
>> +  AtomicExpr(SourceLocation BLoc, Expr *ptr, Expr *val1, Expr *val2,
>> +             Expr *order, Expr *order_fail, QualType t, AtomicOp op,
>> +             SourceLocation RP, bool TypeDependent, bool ValueDependent)
>> +    : Expr(AtomicExprClass, t, VK_RValue, OK_Ordinary,
>> +           TypeDependent, ValueDependent,
>> +           (ptr->isInstantiationDependent() ||
>> +            val1->isInstantiationDependent() ||
>> +            val2->isInstantiationDependent()),
>> +           (ptr->containsUnexpandedParameterPack() ||
>> +            val1->containsUnexpandedParameterPack() ||
>> +            val2->containsUnexpandedParameterPack())),
>> +      BuiltinLoc(BLoc), RParenLoc(RP), Op(op) {
>> +      SubExprs[PTR] = ptr;
>> +      SubExprs[VAL1] = val1;
>> +      SubExprs[ORDER] = order;
>> +      SubExprs[VAL2] = val2;
>> +      SubExprs[ORDER_FAIL] = order_fail;
>> +      NumSubExprs = 5;
>> +    }
>> +
>> +  /// \brief Build an empty __builtin_choose_expr.
>
> Is this a mis-copy? If not, I don't understand the comment.

mis-copy.

>> +  explicit AtomicExpr(EmptyShell Empty) : Expr(AtomicExprClass, Empty) { }
>> +
>> +  Expr *getPtr() const { return cast<Expr>(SubExprs[PTR]); }
>> +  void setPtr(Expr *E) { SubExprs[PTR] = E; }
>> +  Expr *getOrder() const { return cast<Expr>(SubExprs[ORDER]); }
>> +  void setOrder(Expr *E) { SubExprs[ORDER] = E; }
>> +  Expr *getVal1() const { return cast<Expr>(SubExprs[VAL1]); }
>> +  void setVal1(Expr *E) { SubExprs[VAL1] = E; }
>> +  Expr *getOrderFail() const { return cast<Expr>(SubExprs[ORDER_FAIL]); }
>> +  void setOrderFail(Expr *E) { SubExprs[ORDER_FAIL] = E; }
>> +  Expr *getVal2() const { return cast<Expr>(SubExprs[VAL2]); }
>> +  void setVal2(Expr *E) { SubExprs[VAL2] = E; }
>> +
>> +  AtomicOp getOp() const { return Op; }
>> +  void setOp(AtomicOp op) { Op = op; }
>> +  unsigned getNumSubExprs() { return NumSubExprs; }
>> +  void setNumSubExprs(unsigned num) { NumSubExprs = num; }
>> +
>> +  int getOrderVal(ASTContext &Ctx) const {
>> +    return getOrder()->EvaluateAsInt(Ctx).getZExtValue();
>> +  }
>> +  int getOrderFailVal(ASTContext &Ctx) const {
>
> Want to assert that this is a cmpxchg call?

Sure.

>> +    return getOrderFail()->EvaluateAsInt(Ctx).getZExtValue();
>> +  }
>> +  bool isVolatile() const {
>> +    return getPtr()->getType()->getPointeeType().isVolatileQualified();
>> +  }
>> +
>> +  bool isCmpXChg() const {
>> +    return getOp() == AtomicExpr::CmpXchgStrong ||
>> +           getOp() == AtomicExpr::CmpXchgWeak;
>> +  }
>> +
>> +  SourceLocation getBuiltinLoc() const { return BuiltinLoc; }
>> +  void setBuiltinLoc(SourceLocation L) { BuiltinLoc = L; }
>> +
>> +  SourceLocation getRParenLoc() const { return RParenLoc; }
>> +  void setRParenLoc(SourceLocation L) { RParenLoc = L; }
>> +
>> +  SourceRange getSourceRange() const {
>> +    return SourceRange(BuiltinLoc, RParenLoc);
>> +  }
>> +  static bool classof(const Stmt *T) {
>> +    return T->getStmtClass() == AtomicExprClass;
>> +  }
>> +  static bool classof(const AtomicExpr *) { return true; }
>> +
>> +  // Iterators
>> +  child_range children() {
>> +    return child_range(SubExprs, SubExprs+NumSubExprs);
>> +  }
>> +};
>>  }  // end namespace clang
>>
>>  #endif
>> Index: lib/Sema/SemaChecking.cpp
>> ===================================================================
>> --- lib/Sema/SemaChecking.cpp (revision 141335)
>> +++ lib/Sema/SemaChecking.cpp (working copy)
>> @@ -414,6 +437,141 @@
>>    return false;
>>  }
>>
>> +ExprResult
>> +Sema::SemaAtomicOpsOverloaded(ExprResult TheCallResult, AtomicExpr::AtomicOp Op) {
>> +  CallExpr *TheCall = (CallExpr *)TheCallResult.get();
>
> Shouldn't you use cast<CallExpr>?

Copy-paste code; I don't recall if ExprResult.get() actually returns
an Expr, but I'll change it if I can.

>> +  DeclRefExpr *DRE =cast<DeclRefExpr>(TheCall->getCallee()->IgnoreParenCasts());
>> +  Expr *Ptr, *Order, *Val1, *Val2, *OrderFail;
>> +
>> +  unsigned NumVals = 1;
>> +  unsigned NumOrders = 1;
>
> These probably deserve a comment about the layout of the builtin call. Like:
>
> // __atomic_call(_Atomic(T)* [, U]*NumVals [, int]*NumOrders);
> // where the 'U's are determined by the call and T, and the ints
> // represent memory_order parameters.

K.

>> +  if (Op == AtomicExpr::Load) {
>> +    NumVals = 0;
>> +  } else if (Op == AtomicExpr::CmpXchgWeak || Op == AtomicExpr::CmpXchgStrong) {
>> +    NumVals = 2;
>> +    NumOrders = 2;
>> +  }
>> +
>> +  if (TheCall->getNumArgs() < NumVals+NumOrders+1) {
>> +    Diag(TheCall->getLocEnd(), diag::err_typecheck_call_too_few_args)
>> +      << 0 << NumVals+NumOrders+1 << TheCall->getNumArgs()
>> +      << TheCall->getCallee()->getSourceRange();
>> +    return ExprError();
>> +  } else if (TheCall->getNumArgs() > NumVals+NumOrders+1) {
>> +    Diag(TheCall->getArg(2)->getLocStart(),
>
> Why "getArg(2)"? Should this be getArg(NumVals+NumOrders+1)?

I guess?  I'll take another look.

>> +         diag::err_typecheck_call_too_many_args)
>> +      << 0 << NumVals+NumOrders+1 << TheCall->getNumArgs()
>> +      << TheCall->getCallee()->getSourceRange();
>> +    return ExprError();
>> +  }
>> +
>> +  // Inspect the first argument of the atomic builtin.  This should always be
>> +  // a pointer type, whose element is an integral scalar or pointer type.
>
> The pointer's element should be an atomic type whose element may be
> constrained by the operation, right?

Outdated comment, yes; will fix.

>> +  Ptr = TheCall->getArg(0);
>> +  Ptr = DefaultFunctionArrayLvalueConversion(Ptr).get();
>> +  const PointerType *pointerType = Ptr->getType()->getAs<PointerType>();
>> +  if (!pointerType) {
>> +    Diag(DRE->getLocStart(), diag::err_atomic_builtin_must_be_pointer)
>> +      << Ptr->getType() << Ptr->getSourceRange();
>> +    return ExprError();
>> +  }
>> +
>> +  QualType AtomTy = pointerType->getPointeeType();
>> +  if (!AtomTy->isAtomicType()) {
>> +    Diag(DRE->getLocStart(), diag::err_atomic_builtin_needs_atomic)
>> +      << Ptr->getType() << Ptr->getSourceRange();
>> +    return ExprError();
>> +  }
>> +  QualType ValType = cast<AtomicType>(AtomTy)->getValueType();
>> +
>> +  if ((Op == AtomicExpr::Add || Op == AtomicExpr::Sub) &&
>> +       !ValType->isIntegerType() && !ValType->isAnyPointerType() &&
>> +      !ValType->isBlockPointerType()) {
>
> Your indentation's weird here.

K.

>> +  // The first argument --- the pointer --- has a fixed type; we
>> +  // deduce the types of the rest of the arguments accordingly.  Walk
>> +  // the remaining arguments, converting them to the deduced value type.
>> +  for (unsigned i = 0; i != NumVals+NumOrders; ++i) {
>> +    ExprResult Arg = TheCall->getArg(i+1);
>> +    QualType Ty;
>> +    if (i < NumVals) {
>> +      if (i == 0 && (Op == AtomicExpr::CmpXchgWeak ||
>
> It's a little confusing for 'i' to not represent the i'th parameter.

K, I can change that.

>> +                     Op == AtomicExpr::CmpXchgStrong))
>> +         Ty = Context.getPointerType(ValType.getUnqualifiedType());
>> +      else if (!ValType->isIntegerType() &&
>> +               (Op == AtomicExpr::Add || Op == AtomicExpr::Sub))
>> +        Ty = Context.getPointerDiffType();
>> +      else
>> +        Ty = ValType;
>> +    } else {
>
> Could you add a comment like "// The remaining parameters are
> memory_orders, which are represented by ints."?
>
> Something should check the constraints on memory orders, like that
> cmpxchg's failure order has to be weaker than its success order, or
> that you can't have an acquire store.  Is that this code, the <atomic>
> header, or codegen?  You might also comment here that it's codegen
> that maps from consume to acquire.

We can't check it in general: the inputs can be variables.  We can't
even check it in the common case, because <atomic> is forced to wrap
around these functions.  I could add code to try and catch some cases,
but I'm not sure it's worth bothering.

>> +      Ty = Context.IntTy;
>> +    }
>> +    InitializedEntity Entity =
>> +        InitializedEntity::InitializeParameter(Context, Ty, false);
>> +    Arg = PerformCopyInitialization(Entity, SourceLocation(), Arg);
>> +    if (Arg.isInvalid())
>> +      return true;
>> +    TheCall->setArg(i+1, Arg.get());
>> +  }
>> +
>> +  if (Op == AtomicExpr::Load) {
>> +    Order = TheCall->getArg(1);
>> +    return Owned(new (Context) AtomicExpr(TheCall->getCallee()->getLocStart(),
>> +                                          Ptr, Order, ResultType, Op,
>> +                                          TheCall->getRParenLoc(), false,
>> +                                          false));
>
> If AtomicExprs are never type- or value-dependent, why have those two
> parameters?

Good point; I'll change that.

>> +
>> +  llvm::AtomicRMWInst::BinOp Op = llvm::AtomicRMWInst::Add;
>
> This is an odd default.

I think I need to set it to something to suppress uninitialized warnings.

>> +  switch (E->getOp()) {
>> +    case AtomicExpr::CmpXchgWeak:
>> +    case AtomicExpr::CmpXchgStrong:
>> +    case AtomicExpr::Store:
>> +    case AtomicExpr::Load:  assert(0 && "Already handled!");
>> +    case AtomicExpr::Add:   Op = llvm::AtomicRMWInst::Add;  break;
>> +    case AtomicExpr::Sub:   Op = llvm::AtomicRMWInst::Sub;  break;
>> +    case AtomicExpr::And:   Op = llvm::AtomicRMWInst::And;  break;
>> +    case AtomicExpr::Or:    Op = llvm::AtomicRMWInst::Or;   break;
>> +    case AtomicExpr::Xor:   Op = llvm::AtomicRMWInst::Xor;  break;
>> +    case AtomicExpr::Xchg:  Op = llvm::AtomicRMWInst::Xchg; break;
>> +  }
>> +  llvm::LoadInst *LoadVal1 = CGF.Builder.CreateLoad(Val1);
>> +  LoadVal1->setAlignment(Align);
>> +  llvm::AtomicRMWInst *RMWI =
>> +      CGF.Builder.CreateAtomicRMW(Op, Ptr, LoadVal1, Order);
>> +  RMWI->setVolatile(E->isVolatile());
>> +  llvm::StoreInst *StoreDest = CGF.Builder.CreateStore(RMWI, Dest);
>> +  StoreDest->setAlignment(Align);
>> +}
>> +
>> +static llvm::Value *
>> +EmitValToTemp(CodeGenFunction &CGF, Expr *E) {
>
> Please comment how this is different from EmitScalarExpr.

K.

>> +  llvm::Value *DeclPtr = CGF.CreateMemTemp(E->getType(), ".atomictmp");
>> +  CGF.EmitAnyExprToMem(E, DeclPtr, E->getType().getQualifiers(),
>> +                       /*Init*/ true);
>> +  return DeclPtr;
>> +}
>> +
>> +static RValue ConvertTempToRValue(CodeGenFunction &CGF, QualType Ty,
>> +                                  llvm::Value *Dest) {
>> +  if (Ty->isAnyComplexType())
>> +    return RValue::getComplex(CGF.LoadComplexFromAddr(Dest, false));
>> +  if (CGF.hasAggregateLLVMType(Ty))
>> +    return RValue::getAggregate(Dest);
>> +  return RValue::get(CGF.EmitLoadOfScalar(CGF.MakeAddrLValue(Dest, Ty)));
>> +}
>> +
>> +RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, llvm::Value *Dest) {
>> +  QualType PointeeTy = E->getPtr()->getType()->getPointeeType();
>> +  CharUnits sizeChars = getContext().getTypeSizeInChars(PointeeTy);
>> +  uint64_t Size = sizeChars.getQuantity();
>> +  CharUnits alignChars = getContext().getTypeSizeInChars(PointeeTy);
>
> Why call this twice?

The second one is actually supposed to be the alignment... I'll fix that.

>> +  unsigned Align = alignChars.getQuantity();
>> +  // FIXME: Bound on Size should not be hardcoded.
>> +  bool UseLibcall = (!llvm::isPowerOf2_64(Size) || Size > 8);
>> +
>> +  llvm::Value *Ptr, *Order, *OrderFail = 0, *Val1 = 0, *Val2 = 0;
>> +  Ptr = EmitScalarExpr(E->getPtr());
>> +  Order = EmitScalarExpr(E->getOrder());
>> +  if (E->isCmpXChg()) {
>> +    Val1 = EmitScalarExpr(E->getVal1());
>> +    Val2 = EmitValToTemp(*this, E->getVal2());
>> +    OrderFail = EmitScalarExpr(E->getOrderFail());
>> +    (void)OrderFail; // OrderFail is unused at the moment
>> +  } else if (E->getOp() != AtomicExpr::Load) {
>> +    Val1 = EmitValToTemp(*this, E->getVal1());
>> +  }
>> +
>> +  llvm::Type *PointeeLLVMTy =
>> +      cast<llvm::PointerType>(Ptr->getType())->getElementType();
>> +
>> +  llvm::Type *InstrReturnTy = PointeeLLVMTy;
>
> s/InstrReturnTy/CallReturnTy/ or something similar, since this is not
> the return type of the cmpxchg instructions.

Sure.

>> +  if (E->isCmpXChg())
>> +    InstrReturnTy = Builder.getInt1Ty();
>> +  else if (E->getOp() == AtomicExpr::Store)
>> +    InstrReturnTy = 0;
>> +
>> +  if (InstrReturnTy && !Dest)
>> +    Dest = CreateMemTemp(E->getType(), ".atomicdst");
>> +
>> +  if (UseLibcall) {
>> +    // FIXME: Finalize what the libcalls are actually supposed to look like.
>
> Possibly link to http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary here.

K.

>> +  if (isa<llvm::ConstantInt>(Order)) {
>> +    int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
>> +    switch (ord) {
>> +    case 0:  // memory_order_relaxed
>> +      EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
>> +                   llvm::Monotonic);
>> +      break;
>> +    case 1:  // memory_order_consume
>> +    case 2:  // memory_order_acquire
>> +      EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
>> +                   llvm::Acquire);
>> +      break;
>> +    case 3:  // memory_order_release
>> +      EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
>> +                   llvm::Release);
>> +      break;
>> +    case 4:  // memory_order_acq_rel
>> +      EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
>> +                   llvm::AcquireRelease);
>> +      break;
>> +    case 5:  // memory_order_seq_cst
>> +      EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
>> +                   llvm::SequentiallyConsistent);
>> +      break;
>> +    default: // invalid order
>
> This should probably have triggered an error earlier in the frontend.
> If that's impossible, comment why?

We don't actually check the value of the order argument in Sema
because it can be a variable... so an assert here would be fragile.  I
can add a comment.

>> +  // Create all the relevant BB's
>> +  llvm::BasicBlock *MonotonicBB, *AcquireBB, *ReleaseBB, *AcqRelBB, *SeqCstBB;
>> +  MonotonicBB = createBasicBlock("monotonic", CurFn);
>> +  if (E->getOp() != AtomicExpr::Store)
>> +    AcquireBB = createBasicBlock("acquire", CurFn);
>> +  if (E->getOp() != AtomicExpr::Load)
>> +    ReleaseBB = createBasicBlock("release", CurFn);
>> +  if (E->getOp() != AtomicExpr::Load && E->getOp() != AtomicExpr::Store)
>> +    AcqRelBB = createBasicBlock("acqrel", CurFn);
>> +  SeqCstBB = createBasicBlock("seqcst", CurFn);
>> +  llvm::BasicBlock *ContBB = createBasicBlock("atomic.continue", CurFn);
>> +
>> +  // Create the switch and PHI for the split
>
> I don't see a PHI.

Leftover comment; I'll zap it.

>> +  Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
>> +  llvm::SwitchInst *SI = Builder.CreateSwitch(Order, MonotonicBB);
>
> Is it better to make 0 be the default, or is this just the simplest
> thing that works, and it's never likely to make a difference because
> nearly all uses will inline to a constant anyway?  (Comment so the
> next person to come along doesn't have to ask that question.)

It shouldn't make a difference normally because it will constant-fold.
 I'll add a comment.

-Eli