[llvm-commits] [PATCH] Atomic NAND options

Fri Jul 15 16:49:08 PDT 2011

Hey guys,

Attached are the patches for post-GCC v4.4 Atomic NAND support for the
ARM, PPC, and X86 targets. It appears that the MBlaze and Mips targets
also implement the Atomic NAND intrinsic, but their implementations are
already current and correct. That is, unless I made a mistake interpreting
the code.
It is new to me.

Is there an expert who could check out the ARM patch especially? I do
not have much experience with ARM, so it's possible that this change
could be implemented in a better way. My solution was to use the MVN
instruction to NOT the result of the AND. Perhaps this could be done in
a more efficient/correct manner.

Assuming these patches are acceptable, we'll still need to:

1) Update Clang to issue a WARNING message for the Atomic NAND builtin's
behavior change;
2) Update the tests for the Atomic NAND feature (or possibly write tests for
this feature);

-Cameron

On Thu, Jul 14, 2011 at 11:41 AM, Jim Grosbach <grosbach at apple.com> wrote:

>
> On Jul 14, 2011, at 9:29 AM, Chris Lattner wrote:
>
> >
> > On Jul 14, 2011, at 8:03 AM, Cameron McInally wrote:
> >
> >> Hey Guys,
> >>
> >> I have been working on Atomic NAND. Here is a patch that allows the
> >> compiler writer to select a [NOT AND] implementation, while retaining
> >> [NEGATE and AND] as the default implementation for those that do not
> >> want the current GCC implementation (i.e. GCC v4.4 and later).
> >
> > Hi Cameron,
> >
> > Is there a reason to support the broken pre-gcc-4.4 implementation at
> all?
> >
>
> If we change (either optionally or unconditionally), we should also make
> sure to change all targets that implement the intrinsics, not just X86. It
> would be really nasty for the behaviour to be target specific.
>
> -Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20110715/30d6ea13/attachment.html>
-------------- next part --------------
upstream/llvm> svn diff lib/Target/ARM/
Index: lib/Target/ARM/ARMISelLowering.h
===================================================================

--- lib/Target/ARM/ARMISelLowering.h	(revision 135279)
+++ lib/Target/ARM/ARMISelLowering.h	(working copy)
@@ -484,7 +484,8 @@
     MachineBasicBlock *EmitAtomicBinary(MachineInstr *MI,
                                         MachineBasicBlock *BB,
                                         unsigned Size,
-                                        unsigned BinOpcode) const;
+                                        unsigned BinOpcode,
+					bool invRes = false) const;
     MachineBasicBlock * EmitAtomicBinaryMinMax(MachineInstr *MI,
                                                MachineBasicBlock *BB,
                                                unsigned Size,
Index: lib/Target/ARM/ARMISelLowering.cpp
===================================================================
--- lib/Target/ARM/ARMISelLowering.cpp	(revision 135279)
+++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
@@ -4996,7 +4996,8 @@
 
 MachineBasicBlock *
 ARMTargetLowering::EmitAtomicBinary(MachineInstr *MI, MachineBasicBlock *BB,
-                                    unsigned Size, unsigned BinOpcode) const {
+                                    unsigned Size, unsigned BinOpcode,
+				    bool invRes) const {
   // This also handles ATOMIC_SWAP, indicated by BinOpcode==0.
   const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
 
@@ -5049,6 +5050,7 @@
     isThumb2 ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass;
   unsigned scratch = MRI.createVirtualRegister(TRC);
   unsigned scratch2 = (!BinOpcode) ? incr : MRI.createVirtualRegister(TRC);
+  unsigned scratch3 = MRI.createVirtualRegister(TRC);
 
   //  thisMBB:
   //   ...
@@ -5065,16 +5067,17 @@
   BB = loopMBB;
   AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), dest).addReg(ptr));
   if (BinOpcode) {
-    // operand order needs to go the other way for NAND
-    if (BinOpcode == ARM::BICrr || BinOpcode == ARM::t2BICrr)
-      AddDefaultPred(BuildMI(BB, dl, TII->get(BinOpcode), scratch2).
-                     addReg(incr).addReg(dest)).addReg(0);
-    else
-      AddDefaultPred(BuildMI(BB, dl, TII->get(BinOpcode), scratch2).
-                     addReg(dest).addReg(incr)).addReg(0);
+    AddDefaultPred(BuildMI(BB, dl, TII->get(BinOpcode), scratch2).
+                   addReg(dest).addReg(incr)).addReg(0);
+
+    if (invRes) { 
+      AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2MVNr : ARM::tMVN), 
+		     scratch3).addReg(scratch2));
+    } else
+      scratch3 = scratch2;
   }
 
-  AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), scratch).addReg(scratch2)
+  AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), scratch).addReg(scratch3)
                  .addReg(ptr));
   AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri))
                  .addReg(scratch).addImm(0));
@@ -5321,11 +5324,11 @@
      return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2EORrr : ARM::EORrr);
 
   case ARM::ATOMIC_LOAD_NAND_I8:
-     return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2BICrr : ARM::BICrr);
+     return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr, true);
   case ARM::ATOMIC_LOAD_NAND_I16:
-     return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2BICrr : ARM::BICrr);
+     return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr, true);
   case ARM::ATOMIC_LOAD_NAND_I32:
-     return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2BICrr : ARM::BICrr);
+     return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr, true);
 
   case ARM::ATOMIC_LOAD_SUB_I8:
      return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2SUBrr : ARM::SUBrr);
-------------- next part --------------
upstream/llvm> svn diff lib/Target/PowerPC/
Index: lib/Target/PowerPC/PPCISelLowering.cpp
===================================================================
--- lib/Target/PowerPC/PPCISelLowering.cpp	(revision 135279)
+++ lib/Target/PowerPC/PPCISelLowering.cpp	(working copy)
@@ -4919,13 +4919,13 @@
     BB = EmitAtomicBinary(MI, BB, true, PPC::XOR8);
 
   else if (MI->getOpcode() == PPC::ATOMIC_LOAD_NAND_I8)
-    BB = EmitPartwordAtomicBinary(MI, BB, true, PPC::ANDC);
+    BB = EmitPartwordAtomicBinary(MI, BB, true, PPC::NAND);
   else if (MI->getOpcode() == PPC::ATOMIC_LOAD_NAND_I16)
-    BB = EmitPartwordAtomicBinary(MI, BB, false, PPC::ANDC);
+    BB = EmitPartwordAtomicBinary(MI, BB, false, PPC::NAND);
   else if (MI->getOpcode() == PPC::ATOMIC_LOAD_NAND_I32)
-    BB = EmitAtomicBinary(MI, BB, false, PPC::ANDC);
+    BB = EmitAtomicBinary(MI, BB, false, PPC::NAND);
   else if (MI->getOpcode() == PPC::ATOMIC_LOAD_NAND_I64)
-    BB = EmitAtomicBinary(MI, BB, true, PPC::ANDC8);
+    BB = EmitAtomicBinary(MI, BB, true, PPC::NAND8);
 
   else if (MI->getOpcode() == PPC::ATOMIC_LOAD_SUB_I8)
     BB = EmitPartwordAtomicBinary(MI, BB, true, PPC::SUBF);
-------------- next part --------------
upstream/llvm> svn diff lib/Target/X86/
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp	(revision 135294)
+++ lib/Target/X86/X86ISelLowering.cpp	(working copy)
@@ -9772,7 +9772,7 @@
                                                        unsigned notOpc,
                                                        unsigned EAXreg,
                                                        TargetRegisterClass *RC,
-                                                       bool invSrc) const {
+                                                       bool invRes) const {
   // For the atomic bitwise operator, we generate
   //   thisMBB:
   //   newMBB:
@@ -9827,13 +9827,6 @@
   for (int i=0; i <= lastAddrIndx; ++i)
     (*MIB).addOperand(*argOpers[i]);
 
-  unsigned tt = F->getRegInfo().createVirtualRegister(RC);
-  if (invSrc) {
-    MIB = BuildMI(newMBB, dl, TII->get(notOpc), tt).addReg(t1);
-  }
-  else
-    tt = t1;
-
   unsigned t2 = F->getRegInfo().createVirtualRegister(RC);
   assert((argOpers[valArgIndx]->isReg() ||
           argOpers[valArgIndx]->isImm()) &&
@@ -9842,16 +9835,23 @@
     MIB = BuildMI(newMBB, dl, TII->get(regOpc), t2);
   else
     MIB = BuildMI(newMBB, dl, TII->get(immOpc), t2);
-  MIB.addReg(tt);
+  MIB.addReg(t1);
   (*MIB).addOperand(*argOpers[valArgIndx]);
 
+  unsigned tr = F->getRegInfo().createVirtualRegister(RC);
+  if (invRes) {
+    MIB = BuildMI(newMBB, dl, TII->get(notOpc), tr).addReg(t2);
+  }
+  else
+    tr = t2;
+
   MIB = BuildMI(newMBB, dl, TII->get(TargetOpcode::COPY), EAXreg);
   MIB.addReg(t1);
 
   MIB = BuildMI(newMBB, dl, TII->get(CXchgOpc));
   for (int i=0; i <= lastAddrIndx; ++i)
     (*MIB).addOperand(*argOpers[i]);
-  MIB.addReg(t2);
+  MIB.addReg(tr);
   assert(bInstr->hasOneMemOperand() && "Unexpected number of memoperand");
   (*MIB).setMemRefs(bInstr->memoperands_begin(),
                     bInstr->memoperands_end());
@@ -9874,7 +9874,7 @@
                                                        unsigned regOpcH,
                                                        unsigned immOpcL,
                                                        unsigned immOpcH,
-                                                       bool invSrc) const {
+                                                       bool invRes) const {
   // For the atomic bitwise operator, we generate
   //   thisMBB (instructions are in pairs, except cmpxchg8b)
   //     ld t1,t2 = [bitinstr.addr]
@@ -9966,15 +9966,8 @@
 
   // The subsequent operations should be using the destination registers of
   //the PHI instructions.
-  if (invSrc) {
-    t1 = F->getRegInfo().createVirtualRegister(RC);
-    t2 = F->getRegInfo().createVirtualRegister(RC);
-    MIB = BuildMI(newMBB, dl, TII->get(NotOpc), t1).addReg(dest1Oper.getReg());
-    MIB = BuildMI(newMBB, dl, TII->get(NotOpc), t2).addReg(dest2Oper.getReg());
-  } else {
-    t1 = dest1Oper.getReg();
-    t2 = dest2Oper.getReg();
-  }
+  t1 = dest1Oper.getReg();
+  t2 = dest2Oper.getReg();
 
   int valArgIndx = lastAddrIndx + 1;
   assert((argOpers[valArgIndx]->isReg() ||
@@ -10001,15 +9994,26 @@
     MIB.addReg(t2);
   (*MIB).addOperand(*argOpers[valArgIndx + 1]);
 
+  unsigned trl = F->getRegInfo().createVirtualRegister(RC);
+  unsigned trh = F->getRegInfo().createVirtualRegister(RC);
+  if (invRes) {
+    MIB = BuildMI(newMBB, dl, TII->get(NotOpc), trl).addReg(t5);
+    MIB = BuildMI(newMBB, dl, TII->get(NotOpc), trh).addReg(t6);
+  }
+  else {
+    trl = t5;
+    trh = t6;
+  }
+
   MIB = BuildMI(newMBB, dl, TII->get(TargetOpcode::COPY), X86::EAX);
   MIB.addReg(t1);
   MIB = BuildMI(newMBB, dl, TII->get(TargetOpcode::COPY), X86::EDX);
   MIB.addReg(t2);
 
   MIB = BuildMI(newMBB, dl, TII->get(TargetOpcode::COPY), X86::EBX);
-  MIB.addReg(t5);
+  MIB.addReg(trl);
   MIB = BuildMI(newMBB, dl, TII->get(TargetOpcode::COPY), X86::ECX);
-  MIB.addReg(t6);
+  MIB.addReg(trh);
 
   MIB = BuildMI(newMBB, dl, TII->get(X86::LCMPXCHG8B));
   for (int i=0; i <= lastAddrIndx; ++i)
@@ -11003,18 +11007,15 @@
   case X86::ATOMAND6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::AND32rr, X86::AND32rr,
-                                               X86::AND32ri, X86::AND32ri,
-                                               false);
+                                               X86::AND32ri, X86::AND32ri);
   case X86::ATOMOR6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::OR32rr, X86::OR32rr,
-                                               X86::OR32ri, X86::OR32ri,
-                                               false);
+                                               X86::OR32ri, X86::OR32ri);
   case X86::ATOMXOR6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::XOR32rr, X86::XOR32rr,
-                                               X86::XOR32ri, X86::XOR32ri,
-                                               false);
+                                               X86::XOR32ri, X86::XOR32ri);
   case X86::ATOMNAND6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::AND32rr, X86::AND32rr,
@@ -11023,18 +11024,15 @@
   case X86::ATOMADD6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::ADD32rr, X86::ADC32rr,
-                                               X86::ADD32ri, X86::ADC32ri,
-                                               false);
+                                               X86::ADD32ri, X86::ADC32ri);
   case X86::ATOMSUB6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::SUB32rr, X86::SBB32rr,
-                                               X86::SUB32ri, X86::SBB32ri,
-                                               false);
+                                               X86::SUB32ri, X86::SBB32ri);
   case X86::ATOMSWAP6432:
     return EmitAtomicBit6432WithCustomInserter(MI, BB,
                                                X86::MOV32rr, X86::MOV32rr,
-                                               X86::MOV32ri, X86::MOV32ri,
-                                               false);
+                                               X86::MOV32ri, X86::MOV32ri);
   case X86::VASTART_SAVE_XMM_REGS:
     return EmitVAStartSaveXMMRegsWithCustomInserter(MI, BB);
 
Index: lib/Target/X86/X86ISelLowering.h
===================================================================
--- lib/Target/X86/X86ISelLowering.h	(revision 135279)
+++ lib/Target/X86/X86ISelLowering.h	(working copy)
@@ -897,7 +897,7 @@
                                                     unsigned notOpc,
                                                     unsigned EAXreg,
                                                     TargetRegisterClass *RC,
-                                                    bool invSrc = false) const;
+						    bool invRes = false) const;
 
     MachineBasicBlock *EmitAtomicBit6432WithCustomInserter(
                                                     MachineInstr *BInstr,
@@ -906,7 +906,7 @@
                                                     unsigned regOpcH,
                                                     unsigned immOpcL,
                                                     unsigned immOpcH,
-                                                    bool invSrc = false) const;
+						    bool invRes = false) const;
 
     /// Utility function to emit atomic min and max.  It takes the min/max
     /// instruction to expand, the associated basic block, and the associated