[all-commits] [llvm/llvm-project] f3f283: [StaticDataLayout][PGO] Add profile format for sta...

Mingming Liu via All-commits all-commits at lists.llvm.org
Thu May 15 18:32:11 PDT 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: f3f28323adbb9d01372d81b4c78ed94683e58757
      https://github.com/llvm/llvm-project/commit/f3f28323adbb9d01372d81b4c78ed94683e58757
  Author: Mingming Liu <mingmingl at google.com>
  Date:   2025-05-15 (Thu, 15 May 2025)

  Changed paths:
    A llvm/include/llvm/ProfileData/DataAccessProf.h
    M llvm/include/llvm/ProfileData/InstrProf.h
    M llvm/lib/ProfileData/CMakeLists.txt
    A llvm/lib/ProfileData/DataAccessProf.cpp
    M llvm/lib/ProfileData/InstrProf.cpp
    M llvm/unittests/ProfileData/CMakeLists.txt
    A llvm/unittests/ProfileData/DataAccessProfTest.cpp

  Log Message:
  -----------
  [StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. (#138170)

Context: For
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744#p-336543-background-3,
we propose to profile memory loads and stores via hardware events,
symbolize the addresses of binary static data sections and feed the
profile back into compiler for data partitioning.

This change adds the profile format for static data layout, and the
classes to operate on it.

The profile and its format
1. Conceptually, a piece of data (call it a symbol) is represented by
its symbol name or its content hash. The former applies to majority of
data whose mangled name remains relatively stable over binary releases,
and the latter applies to string literals (with name patterns like
`.str.<N>[.llvm.<hash>]`.
- The symbols with samples are hot data. The number of hot symbols is
small relative to all symbols. The profile tracks its sampled counts and
locations. Sampled counts come from hardware events, and locations come
from debug information in the profiled binary. The symbols without
samples are cold data. The number of such cold symbols is large. The
profile tracks its representation (the name or content hash).
- Based on a preliminary study, debug information coverage for data
symbols is partial and best-effort. In the LLVM IR, global variables
with source code correspondence may or may not have debug information.
Therefore the location information is optional in the profiles.
2. The profile-and-compile cycle is similar to SamplePGO. Profiles are
sampled from production binaries, and used in next binary releases.
Known cold symbols and new hot symbols can both have zero sampled
counts, so the profile records known cold symbols to tell the two for
next compile.

In the profile's serialization format, strings are concatenated together
and compressed. Individual records stores the index.

A separate PR will connect this class to InstrProfReader/Writer via
MemProfReader/Writer.

---------

Co-authored-by: Kazu Hirata <kazu at google.com>



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list