[all-commits] [llvm/llvm-project] 435782: [llvm-profdata] ProfileReader cleanup - preparatio...

William Junda Huang via All-commits all-commits at lists.llvm.org
Fri May 5 17:21:26 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 4357824c63e8bbdb7748255389ea0bc104b4a799
      https://github.com/llvm/llvm-project/commit/4357824c63e8bbdb7748255389ea0bc104b4a799
  Author: William Huang <williamjhuang at google.com>
  Date:   2023-05-06 (Sat, 06 May 2023)

  Changed paths:
    M llvm/include/llvm/ProfileData/SampleProfReader.h
    M llvm/lib/ProfileData/SampleProfReader.cpp
    A llvm/test/tools/llvm-profdata/Inputs/sample-multiple-nametables.profdata
    A llvm/test/tools/llvm-profdata/Inputs/sample-nametable-after-samples.profdata
    A llvm/test/tools/llvm-profdata/Inputs/sample-nametable-empty-string.profdata
    A llvm/test/tools/llvm-profdata/sample-nametable.test

  Log Message:
  -----------
  [llvm-profdata] ProfileReader cleanup - preparation for MD5 refactoring

Cleanup profile reader classes to prepare for complex refactoring as propsed in D147740 (Use MD5 as key for profile map). Change is too complicated so I am cleaning up the reader implementation first with these goals.
-  Reduce duplicated/similar logic
-  Reduce virtual functions, changing them to non-virtual
-  Reduce unnecessry checks, indirections, and dead writes.

This is patch 1/n. This patch refactors NameTable

Explaining several decisions here

1. useMD5() means whether  names of the profiles (the ProfileMap) are represented as MD5. It is NOT whether the input profile format is MD5. This function is an interface for IPO passes to decide whether to match function names or function MD5. There are two motives here:
(a) Eventually we want to use MD5 to represent all function contexts because it is much faster to use it as a key for lookup tables (prototype implementation D147740), so in compilation mode we call setProfileUseMD5() to force use MD5. While in tools mode (llvm-profdata) we want to keep the function name info if it's in the input profile.
(b) We also propose to allow multiple name tables and profile sections in ExtBinary format, and it could consist of name tables with or without using MD5, in this case MD5 prevails and other name tables are converted to MD5.

2. MD5 handling logic is pushed up to BinaryReader base class, because this trades a non-devirtualized virtual function call with a predictable branch. ReadStringFromTable() accounts for >5% time when loading a full 1 GB profile, it should not be virtual.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D148868




More information about the All-commits mailing list