[llvm] r229462 - [BDCE] Add a bit-tracking DCE pass

Tue Feb 17 14:40:17 PST 2015

Hi Hal,

This change breaks a couple of sanitizer unit tests, probably because of a
bug in codegen:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/637
You can probably reproduce it by running "make check-sanitizer" on
Linux/x86-64 machine. Let me know if you need a better/shorter reproducer.

On Mon, Feb 16, 2015 at 5:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> Author: hfinkel
> Date: Mon Feb 16 19:36:59 2015
> New Revision: 229462
>
> URL: http://llvm.org/viewvc/llvm-project?rev=229462&view=rev
> Log:
> [BDCE] Add a bit-tracking DCE pass
>
> BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
> "aggressive DCE" pass), with the added capability to track dead bits of
> integer
> valued instructions and remove those instructions when all of the bits are
> dead.
>
> Currently, it does not actually do this all-bits-dead removal, but rather
> replaces the instruction's uses with a constant zero, and lets instcombine
> (and
> the later run of ADCE) do the rest. Because we essentially get a run of
> ADCE
> "for free" while tracking the dead bits, we also do what ADCE does and
> removes
> actually-dead instructions as well (this includes instructions newly
> trivially
> dead because all bits were dead, but not all such instructions can be
> removed).
>
> The motivation for this is a case like:
>
> int __attribute__((const)) foo(int i);
> int bar(int x) {
>   x |= (4 & foo(5));
>   x |= (8 & foo(3));
>   x |= (16 & foo(2));
>   x |= (32 & foo(1));
>   x |= (64 & foo(0));
>   x |= (128& foo(4));
>   return x >> 4;
> }
>
> As it turns out, if you order the bit-field insertions so that all of the
> dead
> ones come last, then instcombine will remove them. However, if you pick
> some
> other order (such as the one above), the fact that some of the calls to
> foo()
> are useless is not locally obvious, and we don't remove them (without this
> pass).
>
> I did a quick compile-time overhead check using sqlite from the test suite
> (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
> twice as expensive as ADCE).
>
> I've not looked at why yet, but we eliminate instructions due to having
> all-dead bits in:
> External/SPEC/CFP2006/447.dealII/447.dealII
> External/SPEC/CINT2006/400.perlbench/400.perlbench
> External/SPEC/CINT2006/403.gcc/403.gcc
> MultiSource/Applications/ClamAV/clamscan
> MultiSource/Benchmarks/7zip/7zip-benchmark
>
> Added:
>     llvm/trunk/lib/Transforms/Scalar/BDCE.cpp
>     llvm/trunk/test/Transforms/BDCE/
>     llvm/trunk/test/Transforms/BDCE/basic.ll
>     llvm/trunk/test/Transforms/BDCE/dce-pure.ll
> Modified:
>     llvm/trunk/include/llvm-c/Transforms/Scalar.h
>     llvm/trunk/include/llvm/InitializePasses.h
>     llvm/trunk/include/llvm/LinkAllPasses.h
>     llvm/trunk/include/llvm/Transforms/Scalar.h
>     llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp
>     llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt
>     llvm/trunk/lib/Transforms/Scalar/Scalar.cpp
>
> Modified: llvm/trunk/include/llvm-c/Transforms/Scalar.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/Transforms/Scalar.h?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/include/llvm-c/Transforms/Scalar.h (original)
> +++ llvm/trunk/include/llvm-c/Transforms/Scalar.h Mon Feb 16 19:36:59 2015
> @@ -35,6 +35,9 @@ extern "C" {
>  /** See llvm::createAggressiveDCEPass function. */
>  void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM);
>
> +/** See llvm::createBitTrackingDCEPass function. */
> +void LLVMAddBitTrackingDCEPass(LLVMPassManagerRef PM);
> +
>  /** See llvm::createAlignmentFromAssumptionsPass function. */
>  void LLVMAddAlignmentFromAssumptionsPass(LLVMPassManagerRef PM);
>
>
> Modified: llvm/trunk/include/llvm/InitializePasses.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/InitializePasses.h?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/include/llvm/InitializePasses.h (original)
> +++ llvm/trunk/include/llvm/InitializePasses.h Mon Feb 16 19:36:59 2015
> @@ -65,6 +65,7 @@ void initializeTarget(PassRegistry&);
>  void initializeAAEvalPass(PassRegistry&);
>  void initializeAddDiscriminatorsPass(PassRegistry&);
>  void initializeADCEPass(PassRegistry&);
> +void initializeBDCEPass(PassRegistry&);
>  void initializeAliasAnalysisAnalysisGroup(PassRegistry&);
>  void initializeAliasAnalysisCounterPass(PassRegistry&);
>  void initializeAliasDebuggerPass(PassRegistry&);
>
> Modified: llvm/trunk/include/llvm/LinkAllPasses.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/LinkAllPasses.h?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/include/llvm/LinkAllPasses.h (original)
> +++ llvm/trunk/include/llvm/LinkAllPasses.h Mon Feb 16 19:36:59 2015
> @@ -49,6 +49,7 @@ namespace {
>
>        (void) llvm::createAAEvalPass();
>        (void) llvm::createAggressiveDCEPass();
> +      (void) llvm::createBitTrackingDCEPass();
>        (void) llvm::createAliasAnalysisCounterPass();
>        (void) llvm::createAliasDebugger();
>        (void) llvm::createArgumentPromotionPass();
>
> Modified: llvm/trunk/include/llvm/Transforms/Scalar.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Scalar.h?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/include/llvm/Transforms/Scalar.h (original)
> +++ llvm/trunk/include/llvm/Transforms/Scalar.h Mon Feb 16 19:36:59 2015
> @@ -82,6 +82,13 @@ FunctionPass *createAggressiveDCEPass();
>
>
>  //===----------------------------------------------------------------------===//
>  //
> +// BitTrackingDCE - This pass uses a bit-tracking DCE algorithm in order
> to
> +// remove computations of dead bits.
> +//
> +FunctionPass *createBitTrackingDCEPass();
> +
>
> +//===----------------------------------------------------------------------===//
> +//
>  // SROA - Replace aggregates or pieces of aggregates with scalar SSA
> values.
>  //
>  FunctionPass *createSROAPass(bool RequiresDomTree = true);
>
> Modified: llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp (original)
> +++ llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp Mon Feb 16
> 19:36:59 2015
> @@ -252,6 +252,11 @@ void PassManagerBuilder::populateModuleP
>    MPM.add(createMemCpyOptPass());             // Remove memcpy / form
> memset
>    MPM.add(createSCCPPass());                  // Constant prop with SCCP
>
> +  // Delete dead bit computations (instcombine runs after to fold away
> the dead
> +  // computations, and then ADCE will run later to exploit any new DCE
> +  // opportunities that creates).
> +  MPM.add(createBitTrackingDCEPass());        // Delete dead bit
> computations
> +
>    // Run instcombine after redundancy elimination to exploit opportunities
>    // opened up by them.
>    MPM.add(createInstructionCombiningPass());
>
> Added: llvm/trunk/lib/Transforms/Scalar/BDCE.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/BDCE.cpp?rev=229462&view=auto
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/BDCE.cpp (added)
> +++ llvm/trunk/lib/Transforms/Scalar/BDCE.cpp Mon Feb 16 19:36:59 2015
> @@ -0,0 +1,408 @@
> +//===---- BDCE.cpp - Bit-tracking dead code elimination
> -------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
>
> +//===----------------------------------------------------------------------===//
> +//
> +// This file implements the Bit-Tracking Dead Code Elimination pass. Some
> +// instructions (shifts, some ands, ors, etc.) kill some of their input
> bits.
> +// We track these dead bits and remove instructions that compute only
> these
> +// dead bits.
> +//
>
> +//===----------------------------------------------------------------------===//
> +
> +#include "llvm/Transforms/Scalar.h"
> +#include "llvm/ADT/DenseMap.h"
> +#include "llvm/ADT/DepthFirstIterator.h"
> +#include "llvm/ADT/SmallPtrSet.h"
> +#include "llvm/ADT/SmallVector.h"
> +#include "llvm/ADT/Statistic.h"
> +#include "llvm/Analysis/AssumptionCache.h"
> +#include "llvm/Analysis/ValueTracking.h"
> +#include "llvm/IR/BasicBlock.h"
> +#include "llvm/IR/CFG.h"
> +#include "llvm/IR/DataLayout.h"
> +#include "llvm/IR/Dominators.h"
> +#include "llvm/IR/InstIterator.h"
> +#include "llvm/IR/Instructions.h"
> +#include "llvm/IR/IntrinsicInst.h"
> +#include "llvm/IR/Module.h"
> +#include "llvm/IR/Operator.h"
> +#include "llvm/Pass.h"
> +#include "llvm/Support/Debug.h"
> +#include "llvm/Support/raw_ostream.h"
> +
> +using namespace llvm;
> +
> +#define DEBUG_TYPE "bdce"
> +
> +STATISTIC(NumRemoved, "Number of instructions removed (unused)");
> +STATISTIC(NumSimplified, "Number of instructions trivialized (dead
> bits)");
> +
> +namespace {
> +struct BDCE : public FunctionPass {
> +  static char ID; // Pass identification, replacement for typeid
> +  BDCE() : FunctionPass(ID) {
> +    initializeBDCEPass(*PassRegistry::getPassRegistry());
> +  }
> +
> +  bool runOnFunction(Function& F) override;
> +
> +  void getAnalysisUsage(AnalysisUsage& AU) const override {
> +    AU.setPreservesCFG();
> +    AU.addRequired<AssumptionCacheTracker>();
> +    AU.addRequired<DominatorTreeWrapperPass>();
> +  }
> +
> +  void determineLiveOperandBits(const Instruction *UserI,
> +                                const Instruction *I, unsigned OperandNo,
> +                                const APInt &AOut, APInt &AB,
> +                                APInt &KnownZero, APInt &KnownOne,
> +                                APInt &KnownZero2, APInt &KnownOne2);
> +
> +  AssumptionCache *AC;
> +  const DataLayout *DL;
> +  DominatorTree *DT;
> +};
> +}
> +
> +char BDCE::ID = 0;
> +INITIALIZE_PASS_BEGIN(BDCE, "bdce", "Bit-Tracking Dead Code Elimination",
> +                      false, false)
> +INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
> +INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
> +INITIALIZE_PASS_END(BDCE, "bdce", "Bit-Tracking Dead Code Elimination",
> +                    false, false)
> +
> +static bool isAlwaysLive(Instruction *I) {
> +  return isa<TerminatorInst>(I) || isa<DbgInfoIntrinsic>(I) ||
> +         isa<LandingPadInst>(I) || I->mayHaveSideEffects();
> +}
> +
> +void BDCE::determineLiveOperandBits(const Instruction *UserI,
> +                                    const Instruction *I, unsigned
> OperandNo,
> +                                    const APInt &AOut, APInt &AB,
> +                                    APInt &KnownZero, APInt &KnownOne,
> +                                    APInt &KnownZero2, APInt &KnownOne2) {
> +  unsigned BitWidth = AB.getBitWidth();
> +
> +  // We're called once per operand, but for some instructions, we need to
> +  // compute known bits of both operands in order to determine the live
> bits of
> +  // either (when both operands are instructions themselves). We don't,
> +  // however, want to do this twice, so we cache the result in APInts
> that live
> +  // in the caller. For the two-relevant-operands case, both operand
> values are
> +  // provided here.
> +  auto ComputeKnownBits = [&](unsigned BitWidth, const Value *V1,
> +                              const Value *V2) {
> +    KnownZero = APInt(BitWidth, 0);
> +    KnownOne =  APInt(BitWidth, 0);
> +    computeKnownBits(const_cast<Value*>(V1), KnownZero, KnownOne, DL, 0,
> AC,
> +                     UserI, DT);
> +
> +    if (V2) {
> +      KnownZero2 = APInt(BitWidth, 0);
> +      KnownOne2 =  APInt(BitWidth, 0);
> +      computeKnownBits(const_cast<Value*>(V2), KnownZero2, KnownOne2, DL,
> 0, AC,
> +                       UserI, DT);
> +    }
> +  };
> +
> +  switch (UserI->getOpcode()) {
> +  default: break;
> +  case Instruction::Call:
> +  case Instruction::Invoke:
> +    if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(UserI))
> +      switch (II->getIntrinsicID()) {
> +      default: break;
> +      case Intrinsic::bswap:
> +        // The alive bits of the input are the swapped alive bits of
> +        // the output.
> +        AB = AOut.byteSwap();
> +        break;
> +      case Intrinsic::ctlz:
> +        if (OperandNo == 0) {
> +          // We need some output bits, so we need all bits of the
> +          // input to the left of, and including, the leftmost bit
> +          // known to be one.
> +          ComputeKnownBits(BitWidth, I, nullptr);
> +          AB = APInt::getHighBitsSet(BitWidth,
> +                 std::min(BitWidth, KnownOne.countLeadingZeros()+1));
> +        }
> +        break;
> +      case Intrinsic::cttz:
> +        if (OperandNo == 0) {
> +          // We need some output bits, so we need all bits of the
> +          // input to the right of, and including, the rightmost bit
> +          // known to be one.
> +          ComputeKnownBits(BitWidth, I, nullptr);
> +          AB = APInt::getLowBitsSet(BitWidth,
> +                 std::min(BitWidth, KnownOne.countTrailingZeros()+1));
> +        }
> +        break;
> +      }
> +    break;
> +  case Instruction::Add:
> +  case Instruction::Sub:
> +    // Find the highest live output bit. We don't need any more input
> +    // bits than that (adds, and thus subtracts, ripple only to the
> +    // left).
> +    AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
> +    break;
> +  case Instruction::Shl:
> +    if (OperandNo == 0)
> +      if (ConstantInt *CI =
> +            dyn_cast<ConstantInt>(UserI->getOperand(1))) {
> +        uint64_t ShiftAmt = CI->getLimitedValue(BitWidth-1);
> +        AB = AOut.lshr(ShiftAmt);
> +
> +        // If the shift is nuw/nsw, then the high bits are not dead
> +        // (because we've promised that they *must* be zero).
> +        const ShlOperator *S = cast<ShlOperator>(UserI);
> +        if (S->hasNoSignedWrap())
> +          AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1);
> +        else if (S->hasNoUnsignedWrap())
> +          AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt);
> +      }
> +    break;
> +  case Instruction::LShr:
> +    if (OperandNo == 0)
> +      if (ConstantInt *CI =
> +            dyn_cast<ConstantInt>(UserI->getOperand(1))) {
> +        uint64_t ShiftAmt = CI->getLimitedValue(BitWidth-1);
> +        AB = AOut.shl(ShiftAmt);
> +
> +        // If the shift is exact, then the low bits are not dead
> +        // (they must be zero).
> +        if (cast<LShrOperator>(UserI)->isExact())
> +          AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
> +      }
> +    break;
> +  case Instruction::AShr:
> +    if (OperandNo == 0)
> +      if (ConstantInt *CI =
> +            dyn_cast<ConstantInt>(UserI->getOperand(1))) {
> +        uint64_t ShiftAmt = CI->getLimitedValue(BitWidth-1);
> +        AB = AOut.shl(ShiftAmt);
> +        // Because the high input bit is replicated into the
> +        // high-order bits of the result, if we need any of those
> +        // bits, then we must keep the highest input bit.
> +        if ((AOut & APInt::getHighBitsSet(BitWidth, ShiftAmt))
> +            .getBoolValue())
> +          AB.setBit(BitWidth-1);
> +
> +        // If the shift is exact, then the low bits are not dead
> +        // (they must be zero).
> +        if (cast<AShrOperator>(UserI)->isExact())
> +          AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
> +      }
> +    break;
> +  case Instruction::And:
> +    AB = AOut;
> +
> +    // For bits that are known zero, the corresponding bits in the
> +    // other operand are dead (unless they're both zero, in which
> +    // case they can't both be dead, so just mark the LHS bits as
> +    // dead).
> +    if (OperandNo == 0) {
> +      ComputeKnownBits(BitWidth, I, UserI->getOperand(1));
> +      AB &= ~KnownZero2;
> +    } else {
> +      if (!isa<Instruction>(UserI->getOperand(0)))
> +        ComputeKnownBits(BitWidth, UserI->getOperand(0), I);
> +      AB &= ~(KnownZero & ~KnownZero2);
> +    }
> +    break;
> +  case Instruction::Or:
> +    AB = AOut;
> +
> +    // For bits that are known one, the corresponding bits in the
> +    // other operand are dead (unless they're both one, in which
> +    // case they can't both be dead, so just mark the LHS bits as
> +    // dead).
> +    if (OperandNo == 0) {
> +      ComputeKnownBits(BitWidth, I, UserI->getOperand(1));
> +      AB &= ~KnownOne2;
> +    } else {
> +      if (!isa<Instruction>(UserI->getOperand(0)))
> +        ComputeKnownBits(BitWidth, UserI->getOperand(0), I);
> +      AB &= ~(KnownOne & ~KnownOne2);
> +    }
> +    break;
> +  case Instruction::Xor:
> +  case Instruction::PHI:
> +    AB = AOut;
> +    break;
> +  case Instruction::Trunc:
> +    AB = AOut.zext(BitWidth);
> +    break;
> +  case Instruction::ZExt:
> +    AB = AOut.trunc(BitWidth);
> +    break;
> +  case Instruction::SExt:
> +    AB = AOut.trunc(BitWidth);
> +    // Because the high input bit is replicated into the
> +    // high-order bits of the result, if we need any of those
> +    // bits, then we must keep the highest input bit.
> +    if ((AOut & APInt::getHighBitsSet(AOut.getBitWidth(),
> +                                      AOut.getBitWidth() - BitWidth))
> +        .getBoolValue())
> +      AB.setBit(BitWidth-1);
> +    break;
> +  case Instruction::Select:
> +    if (OperandNo != 0)
> +      AB = AOut;
> +    break;
> +  }
> +}
> +
> +bool BDCE::runOnFunction(Function& F) {
> +  if (skipOptnoneFunction(F))
> +    return false;
> +
> +  AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
> +  DL = F.getParent()->getDataLayout();
> +  DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
> +
> +  DenseMap<Instruction *, APInt> AliveBits;
> +  SmallVector<Instruction*, 128> Worklist;
> +
> +  // The set of visited instructions (non-integer-typed only).
> +  SmallPtrSet<Instruction*, 128> Visited;
> +
> +  // Collect the set of "root" instructions that are known live.
> +  for (Instruction &I : inst_range(F)) {
> +    if (!isAlwaysLive(&I))
> +      continue;
> +
> +    // For integer-valued instructions, set up an initial empty set of
> alive
> +    // bits and add the instruction to the work list. For other
> instructions
> +    // add their operands to the work list (for integer values operands,
> mark
> +    // all bits as live).
> +    if (IntegerType *IT = dyn_cast<IntegerType>(I.getType())) {
> +      AliveBits[&I] = APInt(IT->getBitWidth(), 0);
> +      Worklist.push_back(&I);
> +      continue;
> +    }
> +
> +    // Non-integer-typed instructions...
> +    for (Use &OI : I.operands()) {
> +      if (Instruction *J = dyn_cast<Instruction>(OI)) {
> +        if (IntegerType *IT = dyn_cast<IntegerType>(J->getType()))
> +          AliveBits[J] = APInt::getAllOnesValue(IT->getBitWidth());
> +        Worklist.push_back(J);
> +      }
> +    }
> +    // To save memory, we don't add I to the Visited set here. Instead, we
> +    // check isAlwaysLive on every instruction when searching for dead
> +    // instructions later (we need to check isAlwaysLive for the
> +    // integer-typed instructions anyway).
> +  }
> +
> +  // Propagate liveness backwards to operands.
> +  while (!Worklist.empty()) {
> +    Instruction *UserI = Worklist.pop_back_val();
> +
> +    DEBUG(dbgs() << "BDCE: Visiting: " << *UserI);
> +    APInt AOut;
> +    if (UserI->getType()->isIntegerTy()) {
> +      AOut = AliveBits[UserI];
> +      DEBUG(dbgs() << " Alive Out: " << AOut);
> +    }
> +    DEBUG(dbgs() << "\n");
> +
> +    if (!UserI->getType()->isIntegerTy())
> +      Visited.insert(UserI);
> +
> +    APInt KnownZero, KnownOne, KnownZero2, KnownOne2;
> +    // Compute the set of alive bits for each operand. These are anded
> into the
> +    // existing set, if any, and if that changes the set of alive bits,
> the
> +    // operand is added to the work-list.
> +    for (Use &OI : UserI->operands()) {
> +      if (Instruction *I = dyn_cast<Instruction>(OI)) {
> +        if (IntegerType *IT = dyn_cast<IntegerType>(I->getType())) {
> +          unsigned BitWidth = IT->getBitWidth();
> +          APInt AB = APInt::getAllOnesValue(BitWidth);
> +          if (UserI->getType()->isIntegerTy() && !AOut &&
> +              !isAlwaysLive(UserI)) {
> +            AB = APInt(BitWidth, 0);
> +          } else {
> +            // If all bits of the output are dead, then all bits of the
> input
> +            // Bits of each operand that are used to compute alive bits
> of the
> +            // output are alive, all others are dead.
> +            determineLiveOperandBits(UserI, I, OI.getOperandNo(), AOut,
> AB,
> +                                     KnownZero, KnownOne,
> +                                     KnownZero2, KnownOne2);
> +          }
> +
> +          // If we've added to the set of alive bits (or the operand has
> not
> +          // been previously visited), then re-queue the operand to be
> visited
> +          // again.
> +          APInt ABPrev(BitWidth, 0);
> +          auto ABI = AliveBits.find(I);
> +          if (ABI != AliveBits.end())
> +            ABPrev = ABI->second;
> +
> +          APInt ABNew = AB | ABPrev;
> +          if (ABNew != ABPrev || ABI == AliveBits.end()) {
> +            AliveBits[I] = std::move(ABNew);
> +            Worklist.push_back(I);
> +          }
> +        } else if (!Visited.count(I)) {
> +          Worklist.push_back(I);
> +        }
> +      }
> +    }
> +  }
> +
> +  bool Changed = false;
> +  // The inverse of the live set is the dead set.  These are those
> instructions
> +  // which have no side effects and do not influence the control flow or
> return
> +  // value of the function, and may therefore be deleted safely.
> +  // NOTE: We reuse the Worklist vector here for memory efficiency.
> +  for (Instruction &I : inst_range(F)) {
> +    // For live instructions that have all dead bits, first make them
> dead by
> +    // replacing all uses with something else. Then, if they don't need to
> +    // remain live (because they have side effects, etc.) we can remove
> them.
> +    if (I.getType()->isIntegerTy()) {
> +      auto ABI = AliveBits.find(&I);
> +      if (ABI != AliveBits.end()) {
> +        if (ABI->second.getBoolValue())
> +          continue;
> +
> +        DEBUG(dbgs() << "BDCE: Trivializing: " << I << " (all bits
> dead)\n");
> +        // FIXME: In theory we could substitute undef here instead of
> zero.
> +        // This should be reconsidered once we settle on the semantics of
> +        // undef, poison, etc.
> +        Value *Zero = ConstantInt::get(I.getType(), 0);
> +        ++NumSimplified;
> +        I.replaceAllUsesWith(Zero);
> +        Changed = true;
> +      }
> +    } else if (Visited.count(&I)) {
> +      continue;
> +    }
> +
> +    if (isAlwaysLive(&I))
> +      continue;
> +
> +    DEBUG(dbgs() << "BDCE: Removing: " << I << " (unused)\n");
> +    Worklist.push_back(&I);
> +    I.dropAllReferences();
> +    Changed = true;
> +  }
> +
> +  for (Instruction *&I : Worklist) {
> +    ++NumRemoved;
> +    I->eraseFromParent();
> +  }
> +
> +  return Changed;
> +}
> +
> +FunctionPass *llvm::createBitTrackingDCEPass() {
> +  return new BDCE();
> +}
> +
>
> Modified: llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt (original)
> +++ llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt Mon Feb 16 19:36:59
> 2015
> @@ -1,6 +1,7 @@
>  add_llvm_library(LLVMScalarOpts
>    ADCE.cpp
>    AlignmentFromAssumptions.cpp
> +  BDCE.cpp
>    ConstantHoisting.cpp
>    ConstantProp.cpp
>    CorrelatedValuePropagation.cpp
>
> Modified: llvm/trunk/lib/Transforms/Scalar/Scalar.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/Scalar.cpp?rev=229462&r1=229461&r2=229462&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/Scalar.cpp (original)
> +++ llvm/trunk/lib/Transforms/Scalar/Scalar.cpp Mon Feb 16 19:36:59 2015
> @@ -28,6 +28,7 @@ using namespace llvm;
>  /// ScalarOpts library.
>  void llvm::initializeScalarOpts(PassRegistry &Registry) {
>    initializeADCEPass(Registry);
> +  initializeBDCEPass(Registry);
>    initializeAlignmentFromAssumptionsPass(Registry);
>    initializeSampleProfileLoaderPass(Registry);
>    initializeConstantHoistingPass(Registry);
> @@ -83,6 +84,10 @@ void LLVMAddAggressiveDCEPass(LLVMPassMa
>    unwrap(PM)->add(createAggressiveDCEPass());
>  }
>
> +void LLVMAddBitTrackingDCEPass(LLVMPassManagerRef PM) {
> +  unwrap(PM)->add(createBitTrackingDCEPass());
> +}
> +
>  void LLVMAddAlignmentFromAssumptionsPass(LLVMPassManagerRef PM) {
>    unwrap(PM)->add(createAlignmentFromAssumptionsPass());
>  }
>
> Added: llvm/trunk/test/Transforms/BDCE/basic.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BDCE/basic.ll?rev=229462&view=auto
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/BDCE/basic.ll (added)
> +++ llvm/trunk/test/Transforms/BDCE/basic.ll Mon Feb 16 19:36:59 2015
> @@ -0,0 +1,348 @@
> +; RUN: opt -S -bdce -instsimplify < %s | FileCheck %s
> +; RUN: opt -S -instsimplify < %s | FileCheck %s -check-prefix=CHECK-IO
> +target datalayout = "E-m:e-i64:64-n32:64"
> +target triple = "powerpc64-unknown-linux-gnu"
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @bar(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 4
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 8
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %shr = ashr i32 %or15, 4
> +  ret i32 %shr
> +
> +; CHECK-LABEL: @bar
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +
> +; Check that instsimplify is not doing this all on its own.
> +; CHECK-IO-LABEL: @bar
> +; CHECK-IO: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-IO: tail call signext i32 @foo(i32 signext 3)
> +; CHECK-IO: tail call signext i32 @foo(i32 signext 2)
> +; CHECK-IO: tail call signext i32 @foo(i32 signext 1)
> +; CHECK-IO: tail call signext i32 @foo(i32 signext 0)
> +; CHECK-IO: tail call signext i32 @foo(i32 signext 4)
> +; CHECK-IO: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +declare signext i32 @foo(i32 signext) #0
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @far(i32 signext %x) #1 {
> +entry:
> +  %call = tail call signext i32 @goo(i32 signext 5) #1
> +  %and = and i32 %call, 4
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @goo(i32 signext 3) #1
> +  %and2 = and i32 %call1, 8
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @goo(i32 signext 2) #1
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @goo(i32 signext 1) #1
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @goo(i32 signext 0) #1
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @goo(i32 signext 4) #1
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %shr = ashr i32 %or15, 4
> +  ret i32 %shr
> +
> +; CHECK-LABEL: @far
> +; Calls to foo(5) and foo(3) are still there, but their results are not
> used.
> +; CHECK: tail call signext i32 @goo(i32 signext 5)
> +; CHECK-NEXT: tail call signext i32 @goo(i32 signext 3)
> +; CHECK-NEXT: tail call signext i32 @goo(i32 signext 2)
> +; CHECK: tail call signext i32 @goo(i32 signext 1)
> +; CHECK: tail call signext i32 @goo(i32 signext 0)
> +; CHECK: tail call signext i32 @goo(i32 signext 4)
> +; CHECK: ret i32
> +
> +; Check that instsimplify is not doing this all on its own.
> +; CHECK-IO-LABEL: @far
> +; CHECK-IO: tail call signext i32 @goo(i32 signext 5)
> +; CHECK-IO: tail call signext i32 @goo(i32 signext 3)
> +; CHECK-IO: tail call signext i32 @goo(i32 signext 2)
> +; CHECK-IO: tail call signext i32 @goo(i32 signext 1)
> +; CHECK-IO: tail call signext i32 @goo(i32 signext 0)
> +; CHECK-IO: tail call signext i32 @goo(i32 signext 4)
> +; CHECK-IO: ret i32
> +}
> +
> +declare signext i32 @goo(i32 signext) #1
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @tar1(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %bs = tail call i32 @llvm.bswap.i32(i32 %or15) #0
> +  %shr = ashr i32 %bs, 4
> +  ret i32 %shr
> +
> +; CHECK-LABEL: @tar1
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +declare i32 @llvm.bswap.i32(i32) #0
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @tar2(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %shl = shl i32 %or15, 10
> +  ret i32 %shl
> +
> +; CHECK-LABEL: @tar2
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @tar3(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %add = add i32 %or15, 5
> +  %shl = shl i32 %add, 10
> +  ret i32 %shl
> +
> +; CHECK-LABEL: @tar3
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @tar4(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %sub = sub i32 %or15, 5
> +  %shl = shl i32 %sub, 10
> +  ret i32 %shl
> +
> +; CHECK-LABEL: @tar4
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @tar5(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %xor = xor i32 %or15, 5
> +  %shl = shl i32 %xor, 10
> +  ret i32 %shl
> +
> +; CHECK-LABEL: @tar5
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +define signext i32 @tar7(i32 signext %x, i1 %b) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %v = select i1 %b, i32 %or15, i32 5
> +  %shl = shl i32 %v, 10
> +  ret i32 %shl
> +
> +; CHECK-LABEL: @tar7
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i32
> +}
> +
> +; Function Attrs: nounwind readnone
> +define signext i16 @tar8(i32 signext %x) #0 {
> +entry:
> +  %call = tail call signext i32 @foo(i32 signext 5) #0
> +  %and = and i32 %call, 33554432
> +  %or = or i32 %and, %x
> +  %call1 = tail call signext i32 @foo(i32 signext 3) #0
> +  %and2 = and i32 %call1, 67108864
> +  %or3 = or i32 %or, %and2
> +  %call4 = tail call signext i32 @foo(i32 signext 2) #0
> +  %and5 = and i32 %call4, 16
> +  %or6 = or i32 %or3, %and5
> +  %call7 = tail call signext i32 @foo(i32 signext 1) #0
> +  %and8 = and i32 %call7, 32
> +  %or9 = or i32 %or6, %and8
> +  %call10 = tail call signext i32 @foo(i32 signext 0) #0
> +  %and11 = and i32 %call10, 64
> +  %or12 = or i32 %or9, %and11
> +  %call13 = tail call signext i32 @foo(i32 signext 4) #0
> +  %and14 = and i32 %call13, 128
> +  %or15 = or i32 %or12, %and14
> +  %tr = trunc i32 %or15 to i16
> +  ret i16 %tr
> +
> +; CHECK-LABEL: @tar8
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 5)
> +; CHECK-NOT: tail call signext i32 @foo(i32 signext 3)
> +; CHECK: tail call signext i32 @foo(i32 signext 2)
> +; CHECK: tail call signext i32 @foo(i32 signext 1)
> +; CHECK: tail call signext i32 @foo(i32 signext 0)
> +; CHECK: tail call signext i32 @foo(i32 signext 4)
> +; CHECK: ret i16
> +}
> +
> +attributes #0 = { nounwind readnone }
> +attributes #1 = { nounwind }
> +
>
> Added: llvm/trunk/test/Transforms/BDCE/dce-pure.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BDCE/dce-pure.ll?rev=229462&view=auto
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/BDCE/dce-pure.ll (added)
> +++ llvm/trunk/test/Transforms/BDCE/dce-pure.ll Mon Feb 16 19:36:59 2015
> @@ -0,0 +1,33 @@
> +; RUN: opt -bdce -S < %s | FileCheck %s
> +
> +declare i32 @strlen(i8*) readonly nounwind
> +
> +define void @test1() {
> +  call i32 @strlen( i8* null )
> +  ret void
> +
> +; CHECK-LABEL: @test1
> +; CHECK-NOT: call
> +; CHECK: ret void
> +}
> +
> +define i32 @test2() {
> +  ; invoke of pure function should not be deleted!
> +  invoke i32 @strlen( i8* null ) readnone
> +                  to label %Cont unwind label %Other
> +
> +Cont:           ; preds = %0
> +  ret i32 0
> +
> +Other:          ; preds = %0
> +   %exn = landingpad {i8*, i32} personality i32 (...)*
> @__gxx_personality_v0
> +            cleanup
> +  ret i32 1
> +
> +; CHECK-LABEL: @test2
> +; CHECK: invoke
> +; CHECK: ret i32 1
> +}
> +
> +declare i32 @__gxx_personality_v0(...)
> +
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>

-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150217/8c58cf5b/attachment.html>