[PATCH] InstrProf: Calculate a better function hash

Chandler Carruth chandlerc at google.com
Tue Apr 15 23:28:30 PDT 2014


I don't really understand the use of RAV here, or the logic of having a
secondary switch, but at this point, there are too many patches flying and
I feel like there is some more extensive discussion that took place
somewhere other than this thread. =/ No fun when you get bounced back and
forth in code review.

I'm happy to go with the implementation strategy as you have it. I may even
write up a patch to show how I'm imagining you might simplify this.
Anyways...

Some more microscopic nits. Feel free to commit whenever. I like where this
design has ended up, and thanks for seeing it through. =]

+/// \brief Stable hasher for PGO region counters.
+///
+/// PGOHash produces a stable hash of a given function's control flow.
+///
+/// Changing the output of this hash will invalidate all previously
generated
+/// profiles -- i.e., don't do it.
+///
+/// \note  When this hash does eventually change (years?), we still need to
+/// support old hashes.  We'll need to pull in the version number from the
+/// profile data format and use the matching hash function.
+class PGOHash {
+  constexpr static int NumBitsPerType = 6;
+  constexpr static unsigned NumTypesPerWord =
+      sizeof(uint64_t) * 8 / NumBitsPerType;
+  constexpr static unsigned TooBig = 1u << NumBitsPerType;

I would sink these to live down with the static_assert, and sink all of it
into the existing private section with member variables.

+
+public:
+  /// \brief Hash values for AST nodes.
+  ///
+  /// Distinct values for AST nodes that have region counters attached.
+  ///
+  /// These values must be stable.  All new members must be added at the
end,
+  /// and no members should be removed.  Changing the enumeration value
for an
+  /// AST node will affect the hash of every function that contains that
node.
+  enum HashType : unsigned char {
+    None = 0,
+    LabelStmt = 1,
+    WhileStmt,
+    DoStmt,
+    ForStmt,
+    CXXForRangeStmt,
+    ObjCForCollectionStmt,
+    SwitchStmt,
+    CaseStmt,
+    DefaultStmt,
+    IfStmt,
+    CXXTryStmt,
+    CXXCatchStmt,
+    ConditionalOperator,
+    BinaryOperatorLAnd,
+    BinaryOperatorLOr,
+    BinaryConditionalOperator,
+
+    // Keep this last.  It's for the static assert that follows.
+    LastHashType
+  };
+  static_assert(LastHashType <= TooBig, "Too many types in HashType");
+
+private:
+  uint64_t Working;
+  unsigned Count;
+  llvm::MD5 MD5;
+
+public:
+  // TODO: When this format changes, take in a version number here, and
use the
+  // old hash calculation for file formats that used the old hash.
+  PGOHash() : Working(0), Count(0) {}
+  void combine(HashType Type);
+  uint64_t finalize();

I would just put these up with the public enum. Excepting private types or
enums which are necessary to define prior to the public interface (because
it is defined in terms of them), I generally prefer the public interface at
the top followed by the private interface. Mild preference though.

+};




On Tue, Apr 15, 2014 at 2:46 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> On 2014-Apr-11, at 14:23, Duncan P. N. Exon Smith <dexonsmith at apple.com>
> wrote:
>
> >
> > On Apr 10, 2014, at 17:21, Duncan P. N. Exon Smith <dexonsmith at apple.com>
> wrote:
> >
> >> On Apr 10, 2014, at 4:13, Chandler Carruth <chandlerc at google.com>
> wrote:
> >>
> >>> And comments on the proper change...
> >>
> >> Thanks for the review!  Updated patch attached.
> >>
> >>> +void MapRegionCounters::assignCounter(const Decl *D) {
> >>> +  CounterMap[D->getBody()] = NextCounter++;
> >>> +  Hash.combine(hashDecl(*D));
> >>> +}
> >>>
> >>> Why hash the declaration at all? There shouldn't be any cases where a
> declaration impacts control flow, and you have complexity enum and in the
> hasher to handle this.
> >>
> >> Good point.
> >>
> >>> +static PGOHash::HashType hashStmt(const Stmt &S) {
> >>>
> >>> While I see the appeal of this abstraction, I don't think it is
> ultimately buying you much of anything. We already do a full switch over
> the StmtClass in the visitor, we shouldn't re-do it here (for simplicity,
> I'm moderately confident the optimizer will "fix" this). I would probably
> leave the counter management as it was, and just add a hash combine step
> for the statements. Or you could pass the enum through and share the
> counter growth and hash combination much like you have in the current
> patch. The complexity I would avoid is yet another switch over all of the
> CFG-impacting statement kinds.
> >>>
> >>> Does that make sense to you?
> >>
> >> I'm mainly concerned about minimizing boilerplate.  Bob's working on a
> >> patch to switch to RecursiveASTVisitor so we can gut the traversal
> >> logic, and I don't want to move in the opposite direction.
> >>
> >> Your idea to pass the enum through accomplishes the same goal (and is
> >> obviously cleaner), so I went with that.
> >
> > I've rebased after Bob's commit in r206039.  New patch attached.
> >
> > I re-added a getHashStmt() function since it made for the cleanest
> > diff.  I think this is the best way here after all.
>
> Rebased for Justin's recent changes.  New patch attached.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140415/66f583cc/attachment.html>


More information about the cfe-commits mailing list