[llvm] r245140 - Accelerate MergeFunctions with hashing

NAKAMURA Takumi via llvm-commits llvm-commits at lists.llvm.org
Sat Aug 15 21:07:58 PDT 2015


It had been failing also there;
http://bb.pgr.jp/builders/ninja-clang-i686-msc18-R/builds/2679
http://bb.pgr.jp/builders/ninja-clang-x64-mingw64-RA/builds/7660

Fixed in r245168.

On Sun, Aug 16, 2015 at 4:48 AM Pete Cooper via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Hey JF, Jason
>
> There’s an ASan/UBSan failure with a mergefunc test on the bots.  The
> blame list range is r245138-r245153.
>
> Mind taking a look to see if this commit might be the cause.  Its the only
> MergeFunc related one I could see in the range.
>
>
> http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/94/consoleFull#-121115394049ba4694-19c4-4d7e-bec5-911270d8a58c
>
> Cheers,
> Pete
>
> On Aug 14, 2015, at 7:06 PM, JF Bastien via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
> I'll defer the answer to Jason, who authored the patch.
> On Aug 14, 2015 6:39 PM, "David Blaikie" <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Fri, Aug 14, 2015 at 6:18 PM, JF Bastien via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: jfb
>>> Date: Fri Aug 14 20:18:18 2015
>>> New Revision: 245140
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=245140&view=rev
>>> Log:
>>> Accelerate MergeFunctions with hashing
>>>
>>> This patch makes the Merge Functions pass faster by calculating and
>>> comparing
>>> a hash value which captures the essential structure of a function before
>>> performing a full function comparison.
>>>
>>> The hash is calculated by hashing the function signature, then walking
>>> the basic
>>> blocks of the function in the same order as the main comparison
>>> function. The
>>> opcode of each instruction is hashed in sequence, which means that
>>> different
>>> functions according to the existing total order cannot have the same
>>> hash, as
>>> the comparison requires the opcodes of the two functions to be the same
>>> order.
>>>
>>> The hash function is a static member of the FunctionComparator class
>>> because it
>>> is tightly coupled to the exact comparison function used. For example,
>>> functions
>>> which are equivalent modulo a single variant callsite might be merged by
>>> a more
>>> aggressive MergeFunctions, and the hash function would need to be
>>> insensitive to
>>> these differences in order to exploit this.
>>>
>>> The hashing function uses a utility class which accumulates the values
>>> into an
>>> internal state using a standard bit-mixing function. Note that this is a
>>> different interface
>>> than a regular hashing routine, because the values to be hashed are
>>> scattered
>>> amongst the properties of a llvm::Function, not linear in memory. This
>>> scheme is
>>> fast because only one word of state needs to be kept, and the mixing
>>> function is
>>> a few instructions.
>>>
>>> The main runOnModule function first computes the hash of each function,
>>> and only
>>> further processes functions which do not have a unique function hash.
>>> The hash
>>> is also used to order the sorted function set. If the hashes differ,
>>> their
>>> values are used to order the functions, otherwise the full comparison is
>>> done.
>>>
>>> Both of these are helpful in speeding up MergeFunctions. Together they
>>> result in
>>> speedups of 9% for mysqld (a mostly C application with little
>>> redundancy), 46%
>>> for libxul in Firefox, and 117% for Chromium. (These are all LTO
>>> builds.) In all
>>> three cases, the new speed of MergeFunctions is about half that of the
>>> module
>>> verifier, making it relatively inexpensive even for large LTO builds with
>>> hundreds of thousands of functions. The same functions are merged, so
>>> this
>>> change is free performance.
>>>
>>> Author: jrkoenig
>>>
>>> Reviewers: nlewycky, dschuff, jfb
>>>
>>> Subscribers: llvm-commits, aemerson
>>>
>>> Differential revision: http://reviews.llvm.org/D11923
>>>
>>> Modified:
>>>     llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp
>>>     llvm/trunk/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
>>>
>>> Modified: llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp?rev=245140&r1=245139&r2=245140&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp Fri Aug 14 20:18:18
>>> 2015
>>> @@ -27,6 +27,14 @@
>>>  // -- We define Function* container class with custom "operator<"
>>> (FunctionPtr).
>>>  // -- "FunctionPtr" instances are stored in std::set collection, so
>>> every
>>>  //    std::set::insert operation will give you result in log(N) time.
>>> +//
>>> +// As an optimization, a hash of the function structure is calculated
>>> first, and
>>> +// two functions are only compared if they have the same hash. This
>>> hash is
>>> +// cheap to compute, and has the property that if function F == G
>>> according to
>>> +// the comparison function, then hash(F) == hash(G). This consistency
>>> property
>>> +// is critical to ensuring all possible merging opportunities are
>>> exploited.
>>> +// Collisions in the hash affect the speed of the pass but not the
>>> correctness
>>> +// or determinism of the resulting transformation.
>>>  //
>>>  // When a match is found the functions are folded. If both functions are
>>>  // overridable, we move the functionality into a new internal function
>>> and
>>> @@ -87,6 +95,7 @@
>>>  #include "llvm/ADT/STLExtras.h"
>>>  #include "llvm/ADT/SmallSet.h"
>>>  #include "llvm/ADT/Statistic.h"
>>> +#include "llvm/ADT/Hashing.h"
>>>  #include "llvm/IR/CallSite.h"
>>>  #include "llvm/IR/Constants.h"
>>>  #include "llvm/IR/DataLayout.h"
>>> @@ -132,6 +141,10 @@ public:
>>>
>>>    /// Test whether the two functions have equivalent behaviour.
>>>    int compare();
>>> +  /// Hash a function. Equivalent functions will have the same hash,
>>> and unequal
>>> +  /// functions will have different hashes with high probability.
>>> +  typedef uint64_t FunctionHash;
>>> +  static FunctionHash functionHash(Function &);
>>>
>>>  private:
>>>    /// Test whether two basic blocks have equivalent behaviour.
>>> @@ -390,9 +403,11 @@ private:
>>>
>>>  class FunctionNode {
>>>    mutable AssertingVH<Function> F;
>>> +  FunctionComparator::FunctionHash Hash;
>>>
>>>  public:
>>> -  FunctionNode(Function *F) : F(F) {}
>>> +  // Note the hash is recalculated potentially multiple times, but it
>>> is cheap.
>>> +  FunctionNode(Function *F) : F(F),
>>> Hash(FunctionComparator::functionHash(*F)){}
>>>    Function *getFunc() const { return F; }
>>>
>>>    /// Replace the reference to the function F by the function G,
>>> assuming their
>>> @@ -406,6 +421,9 @@ public:
>>>
>>>    void release() { F = 0; }
>>>    bool operator<(const FunctionNode &RHS) const {
>>> +    // Order first by hashes, then full function comparison.
>>> +    if (Hash != RHS.Hash)
>>> +      return Hash < RHS.Hash;
>>>      return (FunctionComparator(F, RHS.getFunc()).compare()) == -1;
>>>    }
>>>  };
>>> @@ -1074,6 +1092,66 @@ int FunctionComparator::compare() {
>>>    return 0;
>>>  }
>>>
>>> +// Accumulate the hash of a sequence of 64-bit integers. This is
>>> similar to a
>>> +// hash of a sequence of 64bit ints, but the entire input does not need
>>> to be
>>> +// available at once. This interface is necessary for functionHash
>>> because it
>>> +// needs to accumulate the hash as the structure of the function is
>>> traversed
>>> +// without saving these values to an intermediate buffer. This form of
>>> hashing
>>> +// is not often needed, as usually the object to hash is just read from
>>> a
>>> +// buffer.
>>> +class HashAccumulator64 {
>>>
>>
>> Any chance of using LLVM's Hashing.h for hash aggregation? (it's already
>> got accumulator support, if I'm not mistaken?)
>>
>>
>>> +  uint64_t Hash;
>>> +public:
>>> +  // Initialize to random constant, so the state isn't zero.
>>> +  HashAccumulator64() { Hash = 0x6acaa36bef8325c5ULL; }
>>> +  void add(uint64_t V) {
>>> +     Hash = llvm::hashing::detail::hash_16_bytes(Hash, V);
>>> +  }
>>> +  // No finishing is required, because the entire hash value is used.
>>> +  uint64_t getHash() { return Hash; }
>>> +};
>>> +
>>> +// A function hash is calculated by considering only the number of
>>> arguments and
>>> +// whether a function is varargs, the order of basic blocks (given by
>>> the
>>> +// successors of each basic block in depth first order), and the order
>>> of
>>> +// opcodes of each instruction within each of these basic blocks. This
>>> mirrors
>>> +// the strategy compare() uses to compare functions by walking the BBs
>>> in depth
>>> +// first order and comparing each instruction in sequence. Because this
>>> hash
>>> +// does not look at the operands, it is insensitive to things such as
>>> the
>>> +// target of calls and the constants used in the function, which makes
>>> it useful
>>> +// when possibly merging functions which are the same modulo constants
>>> and call
>>> +// targets.
>>> +FunctionComparator::FunctionHash
>>> FunctionComparator::functionHash(Function &F) {
>>> +  HashAccumulator64 H;
>>> +  H.add(F.isVarArg());
>>> +  H.add(F.arg_size());
>>> +
>>> +  SmallVector<const BasicBlock *, 8> BBs;
>>> +  SmallSet<const BasicBlock *, 16> VisitedBBs;
>>> +
>>> +  // Walk the blocks in the same order as FunctionComparator::compare(),
>>> +  // accumulating the hash of the function "structure." (BB and opcode
>>> sequence)
>>> +  BBs.push_back(&F.getEntryBlock());
>>> +  VisitedBBs.insert(BBs[0]);
>>> +  while (!BBs.empty()) {
>>> +    const BasicBlock *BB = BBs.pop_back_val();
>>> +    // This random value acts as a block header, as otherwise the
>>> partition of
>>> +    // opcodes into BBs wouldn't affect the hash, only the order of the
>>> opcodes
>>> +    H.add(45798);
>>> +    for (auto &Inst : *BB) {
>>> +      H.add(Inst.getOpcode());
>>> +    }
>>> +    const TerminatorInst *Term = BB->getTerminator();
>>> +    for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
>>> +      if (!VisitedBBs.insert(Term->getSuccessor(i)).second)
>>> +        continue;
>>> +      BBs.push_back(Term->getSuccessor(i));
>>> +    }
>>> +  }
>>> +  return H.getHash();
>>> +}
>>> +
>>> +
>>>  namespace {
>>>
>>>  /// MergeFunctions finds functions which will generate identical
>>> machine code,
>>> @@ -1228,11 +1306,28 @@ bool MergeFunctions::doSanityCheck(std::
>>>  bool MergeFunctions::runOnModule(Module &M) {
>>>    bool Changed = false;
>>>
>>> -  for (Module::iterator I = M.begin(), E = M.end(); I != E; ++I) {
>>> -    if (!I->isDeclaration() && !I->hasAvailableExternallyLinkage())
>>> -      Deferred.push_back(WeakVH(I));
>>> +  // All functions in the module, ordered by hash. Functions with a
>>> unique
>>> +  // hash value are easily eliminated.
>>> +  std::vector<std::pair<FunctionComparator::FunctionHash, Function *>>
>>> +    HashedFuncs;
>>> +  for (Function &Func : M) {
>>> +    if (!Func.isDeclaration() && !Func.hasAvailableExternallyLinkage())
>>> {
>>> +      HashedFuncs.push_back({FunctionComparator::functionHash(Func),
>>> &Func});
>>> +    }
>>> +  }
>>> +
>>> +  std::sort(HashedFuncs.begin(), HashedFuncs.end());
>>> +
>>> +  auto S = HashedFuncs.begin();
>>> +  for (auto I = HashedFuncs.begin(), IE = HashedFuncs.end(); I != IE;
>>> ++I) {
>>> +    // If the hash value matches the previous value or the next one, we
>>> must
>>> +    // consider merging it. Otherwise it is dropped and never
>>> considered again.
>>> +    if ((I != S && std::prev(I)->first == I->first) ||
>>> +        (std::next(I) != IE && std::next(I)->first == I->first) ) {
>>> +      Deferred.push_back(WeakVH(I->second));
>>> +    }
>>>    }
>>> -
>>> +
>>>    do {
>>>      std::vector<WeakVH> Worklist;
>>>      Deferred.swap(Worklist);
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll?rev=245140&r1=245139&r2=245140&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
>>> (original)
>>> +++ llvm/trunk/test/Transforms/MergeFunc/call-and-invoke-with-ranges.ll
>>> Fri Aug 14 20:18:18 2015
>>> @@ -63,14 +63,6 @@ lpad:
>>>    resume { i8*, i32 } zeroinitializer
>>>  }
>>>
>>> -define i8 @call_with_same_range() {
>>> -; CHECK-LABEL: @call_with_same_range
>>> -; CHECK: tail call i8 @call_with_range
>>> -  bitcast i8 0 to i8
>>> -  %out = call i8 @dummy(), !range !0
>>> -  ret i8 %out
>>> -}
>>> -
>>>  define i8 @invoke_with_same_range() personality i8* undef {
>>>  ; CHECK-LABEL: @invoke_with_same_range()
>>>  ; CHECK: tail call i8 @invoke_with_range()
>>> @@ -84,6 +76,16 @@ lpad:
>>>    resume { i8*, i32 } zeroinitializer
>>>  }
>>>
>>> +define i8 @call_with_same_range() {
>>> +; CHECK-LABEL: @call_with_same_range
>>> +; CHECK: tail call i8 @call_with_range
>>> +  bitcast i8 0 to i8
>>> +  %out = call i8 @dummy(), !range !0
>>> +  ret i8 %out
>>> +}
>>> +
>>> +
>>> +
>>>  declare i8 @dummy();
>>>  declare i32 @__gxx_personality_v0(...)
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150816/245d13cf/attachment.html>


More information about the llvm-commits mailing list