[llvm] [llvm-driver] Remove llvm-profdata from the driver (PR #162191)

Prabhu Rajasekaran via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 6 20:53:58 PDT 2025


Prabhuk wrote:

> > llvm-profdata uses cl tool for command line parsing which declares global options which can clash in a multicall driver build.
> 
> Would you mind giving a small example of this?
> 
> IIUC command option names are _registered_ such that the same option string can only show up once under a namespace, and `cl::opt` options are registered under a global namespace by default.

Example -- you can build the llvm-profdata as part of the multilib driver (LLVM_TOOL_LLVM_DRIVER_BUILD) to observe the problem readily. llvm-profdata --help in a multicall build will print the help options which are unrelated to llvm-profdata. Output from my desktop is pasted below.

The reason is that cl::opt registers the options globally. So the multicall binary will see all cl::opt options registered from all tools resulting in problems such as this --help example. This is a known limitation with cl::opt tool. We are trying to port all tools that need to be part of multicall driver to use llvm::opt::OptTable. OptTable originally did not have support for Subcommands which prevented is from using it in llvm-profdata. But we now support SubCommands in OptTable(https://github.com/llvm/llvm-project/pull/155026). Hopefully we can port llvm-profdata to OptTable and then reenable multicall driver mode for it soon. Any help in porting this tool to OptTable will be much appreciated.


```
OVERVIEW: LLVM profile data

USAGE: llvm-profdata [subcommand] [options]

SUBCOMMANDS:

  merge   - Takes several profiles and merge them together. See detailed documentation in https://llvm.org/docs/CommandGuide/llvm-profdata.html#profdata-merge
  order   - Reads temporal profiling traces from a profile and outputs a function order that reduces the number of page faults for those traces. See detailed documentation in https://llvm.org/docs/CommandGuide/llvm-profdata.html#profdata-order
  overlap - Computes and displays the overlap between two profiles. See detailed documentation in https://llvm.org/docs/CommandGuide/llvm-profdata.html#profdata-overlap
  show    - Takes a profile data file and displays the profiles. See detailed documentation in https://llvm.org/docs/CommandGuide/llvm-profdata.html#profdata-show

  Type "llvm-profdata <subcommand> --help" to get more help on a specific subcommand

OPTIONS:

Color Options:

  --color                                                    - Use colors in output (default=autodetect)

General options:

  --aarch64-neon-syntax=<value>                              - Choose style of NEON code to emit from AArch64 backend:
    =generic                                                 -   Emit generic NEON assembly
    =apple                                                   -   Emit Apple-style NEON assembly
  --aarch64-use-aa                                           - Enable the use of AA during codegen.
  --abort-on-max-devirt-iterations-reached                   - Abort when the max iterations for devirtualization CGSCC repeat pass is reached
  --allow-ginsert-as-artifact                                - Allow G_INSERT to be considered an artifact. Hack around AMDGPU test infinite loops.
  --arc-contract-use-objc-claim-rv                           - Enable generation of calls to objc_claimAutoreleasedReturnValue
  --arm-add-build-attributes                                 - 
  --arm-implicit-it=<value>                                  - Allow conditional instructions outside of an IT block
    =always                                                  -   Accept in both ISAs, emit implicit ITs in Thumb
    =never                                                   -   Warn in ARM, reject in Thumb
    =arm                                                     -   Accept in ARM, reject in Thumb
    =thumb                                                   -   Warn in ARM, emit implicit ITs in Thumb
  --asm-show-inst                                            - Emit internal instruction representation to assembly file
  --atomic-counter-update-promoted                           - Do counter update using atomic fetch add  for promoted counters only
  --atomic-first-counter                                     - Use atomic fetch add for first counter in a function (usually the entry counter)
  --bounds-checking-single-trap                              - Use one trap block per function
  --cfg-hide-cold-paths=<number>                             - Hide blocks with relative frequency below the given value
  --cfg-hide-deoptimize-paths                                - 
  --cfg-hide-unreachable-paths                               - 
  --check-functions-filter=<regex>                           - Only emit checks for arguments of functions whose names match the given regular expression
  --conditional-counter-update                               - Do conditional counter updates in single byte counters mode)
  --correlate=<value>                                        - Use debug-info or binary correlation to correlate profiles with build id fetcher
    =<empty>                                                 -   No profile correlation
    =debug-info                                              -   Use debug info to correlate
    =binary                                                  -   Use binary to correlate
  --cost-kind=<value>                                        - Target cost kind
    =throughput                                              -   Reciprocal throughput
    =latency                                                 -   Instruction latency
    =code-size                                               -   Code size
    =size-latency                                            -   Code size and latency
    =all                                                     -   Print all cost kinds
  --crel                                                     - Use CREL relocation format for ELF
  --ctx-profile-force-is-specialized                         - Treat the given module as-if it were containing the post-thinlink module containing the root
  --debug-file-directory=<string>                            - Directories to search for object files by build ID
  --debug-info-correlate                                     - Use debug info to correlate profiles. (Deprecated, use -profile-correlate=debug-info)
  --debugify-atoms                                           - 
  --debugify-func-limit=<ulong>                              - Set max number of processed functions per pass.
  --debugify-level=<value>                                   - Kind of debug info to add
    =locations                                               -   Locations only
    =location+variables                                      -   Locations and Variables
  --debugify-quiet                                           - Suppress verbose debugify output
  --disable-auto-upgrade-debug-info                          - Disable autoupgrade of debug info
  --disable-i2p-p2i-opt                                      - Disables inttoptr/ptrtoint roundtrip optimization
  --do-counter-promotion                                     - Do counter register promotion
  --dot-cfg-mssa=<file name for generated dot file>          - file name for generated dot file
  --dwarf-version=<int>                                      - Dwarf version
  --dwarf64                                                  - Generate debugging info in the 64-bit DWARF format
  --emit-compact-unwind-non-canonical                        - Whether to try to emit Compact Unwind for non canonical entries.
  --emit-dwarf-unwind=<value>                                - Whether to emit DWARF EH frame entries.
    =always                                                  -   Always emit EH frame entries
    =no-compact-unwind                                       -   Only emit EH frame entries when compact unwind is not available
    =default                                                 -   Use target platform default
  --enable-cse-in-irtranslator                               - Should enable CSE in irtranslator
  --enable-cse-in-legalizer                                  - Should enable CSE in Legalizer
  --enable-gvn-hoist                                         - Enable the GVN hoisting pass (default = off)
  --enable-gvn-memdep                                        - 
  --enable-gvn-memoryssa                                     - 
  --enable-gvn-sink                                          - Enable the GVN sinking pass (default = off)
  --enable-jump-table-to-switch                              - Enable JumpTableToSwitch pass (default = off)
  --enable-load-in-loop-pre                                  - 
  --enable-load-pre                                          - 
  --enable-loop-simplifycfg-term-folding                     - 
  --enable-name-compression                                  - Enable name/filename string compression
  --enable-split-backedge-in-load-pre                        - 
  --enable-split-loopiv-heuristic                            - Enable loop iv regalloc heuristic
  --enable-vtable-profile-use                                - If ThinLTO and WPD is enabled and this option is true, vtable profiles will be used by ICP pass for more efficient indirect call sequence. If false, type profiles won't be used.
  --enable-vtable-value-profiling                            - If true, the virtual table address will be instrumented to know the types of a C++ pointer. The information is used in indirect call promotion to do selective vtable-based comparison.
  --expand-variadics-override=<value>                        - Override the behaviour of expand-variadics
    =unspecified                                             -   Use the implementation defaults
    =disable                                                 -   Disable the pass entirely
    =optimize                                                -   Optimise without changing ABI
    =lowering                                                -   Change variadic calling convention
  --experimental-debug-variable-locations                    - Use experimental new value-tracking variable locations
  --fatal-warnings                                           - Treat warnings as errors
  --fdpic                                                    - Use the FDPIC ABI
  --force-tail-folding-style=<value>                         - Force the tail folding style
    =none                                                    -   Disable tail folding
    =data                                                    -   Create lane mask for data only, using active.lane.mask intrinsic
    =data-without-lane-mask                                  -   Create lane mask with compare/stepvector
    =data-and-control                                        -   Create lane mask using active.lane.mask intrinsic, and use it for both data and control flow
    =data-and-control-without-rt-check                       -   Similar to data-and-control, but remove the runtime check
    =data-with-evl                                           -   Use predicated EVL instructions for tail folding. If EVL is unsupported, fallback to data-without-lane-mask.
  --fs-profile-debug-bw-threshold=<uint>                     - Only show debug message if the source branch weight is greater  than this value.
  --fs-profile-debug-prob-diff-threshold=<uint>              - Only show debug message if the branch probability is greater than this value (in percentage).
  --generate-merged-base-profiles                            - When generating nested context-sensitive profiles, always generate extra base profile for function with all its context profiles merged into it.
  --gsframe                                                  - Whether to emit .sframe unwind sections.
  --hash-based-counter-split                                 - Rename counter variable of a comdat function based on cfg hash
  --hot-cold-split                                           - Enable hot-cold splitting pass
  --hwasan-percentile-cutoff-hot=<int>                       - Hot percentile cutoff.
  --hwasan-random-rate=<number>                              - Probability value in the range [0.0, 1.0] to keep instrumentation of a function. Note: instrumentation can be skipped randomly OR because of the hot percentile cutoff, if both are supplied.
  --implicit-mapsyms                                         - Allow mapping symbol at section beginning to be implicit, lowering number of mapping symbols at the expense of some portability. Recommended for projects that can build all their object files using this option
  --import-all-index                                         - Import all external functions in index.
  --incremental-linker-compatible                            - When used with filetype=obj, emit an object file which can be used with an incremental linker
  --instcombine-code-sinking                                 - Enable code sinking
  --instcombine-guard-widening-window=<uint>                 - How wide an instruction window to bypass looking for another guard
  --instcombine-max-num-phis=<uint>                          - Maximum number phis to handle in intptr/ptrint folding
  --instcombine-max-sink-users=<uint>                        - Maximum number of undroppable users for instruction sinking
  --instcombine-maxarray-size=<uint>                         - Maximum array size considered when doing a combine
  --instcombine-negator-enabled                              - Should we attempt to sink negations?
  --instcombine-negator-max-depth=<uint>                     - What is the maximal lookup depth when trying to check for viability of negation sinking.
  --instrprof-atomic-counter-update-all                      - Make all profile counter updates atomic (for testing only)
  --internalize-public-api-file=<filename>                   - A file containing list of symbol names to preserve
  --internalize-public-api-list=<list>                       - A list of symbol names to preserve
  --intrinsic-cost-strategy=<value>                          - Costing strategy for intrinsic instructions
    =instruction-cost                                        -   Use TargetTransformInfo::getInstructionCost
    =intrinsic-cost                                          -   Use TargetTransformInfo::getIntrinsicInstrCost
    =type-based-intrinsic-cost                               -   Calculate the intrinsic cost based only on argument types
  --iterative-counter-promotion                              - Allow counter promotion across the whole loop nest.
  --load=<pluginfilename>                                    - Load the specified plugin
  --lower-allow-check-percentile-cutoff-hot=<int>            - Hot percentile cutoff.
  --lower-allow-check-random-rate=<number>                   - Probability value in the range [0.0, 1.0] of unconditional pseudo-random checks.
  --lto-embed-bitcode=<value>                                - Embed LLVM bitcode in object files produced by LTO
    =none                                                    -   Do not embed
    =optimized                                               -   Embed after all optimization passes
    =post-merge-pre-opt                                      -   Embed post merge, but before optimizations
  --matrix-default-layout=<value>                            - Sets the default matrix layout
    =column-major                                            -   Use column-major layout
    =row-major                                               -   Use row-major layout
  --matrix-print-after-transpose-opt                         - 
  --max-counter-promotions=<int>                             - Max number of allowed counter promotions
  --max-counter-promotions-per-loop=<uint>                   - Max number counter promotions per loop to avoid increasing register pressure too much
  --mc-relax-all                                             - When used with filetype=obj, relax all fixups in the emitted object file
  --mir-strip-debugify-only                                  - Should mir-strip-debug only strip debug info from debugified modules by default
  --misexpect-tolerance=<uint>                               - Prevents emitting diagnostics when profile counts are within N% of the threshold..
  --ms-secure-hotpatch-functions-file=<filename>             - A file containing list of mangled function names to mark for Windows Secure Hot-Patching
  --ms-secure-hotpatch-functions-list=<list>                 - A list of mangled function names to mark for Windows Secure Hot-Patching
  --no-deprecated-warn                                       - Suppress all deprecated warnings
  --no-discriminators                                        - Disable generation of discriminator information.
  --no-type-check                                            - Suppress type errors (Wasm)
  --no-warn                                                  - Suppress all warnings
  --object-size-offset-visitor-max-visit-instructions=<uint> - Maximum number of instructions for ObjectSizeOffsetVisitor to look at
  --pgo-block-coverage                                       - Use this option to enable basic block coverage instrumentation
  --pgo-temporal-instrumentation                             - Use this option to enable temporal instrumentation
  --pgo-view-block-coverage-graph                            - Create a dot file of CFGs with block coverage inference information
  --print-pipeline-passes                                    - Print a '-passes' compatible string describing the pipeline (best-effort only).
  --profcheck-annotate-select                                - Also inject (if missing) and verify MD_prof for `select` instructions
  --profcheck-default-function-entry-count=<long>            - 
  --profcheck-default-select-false-weight=<uint>             - When annotating `select` instructions, this value will be used for the second ('false') case.
  --profcheck-default-select-true-weight=<uint>              - When annotating `select` instructions, this value will be used for the first ('true') case.
  --profcheck-weights-for-test                               - Generate weights with small values for tests.
  --profile-correlate=<value>                                - Use debug info or binary file to correlate profiles.
    =<empty>                                                 -   No profile correlation
    =debug-info                                              -   Use debug info to correlate
    =binary                                                  -   Use binary to correlate
  --riscv-add-build-attributes                               - 
  --riscv-use-aa                                             - Enable the use of AA during codegen.
  --runtime-counter-relocation                               - Enable relocating counters at runtime.
  --safepoint-ir-verifier-print-only                         - 
  --sample-profile-check-record-coverage=<N>                 - Emit a warning if less than N% of records in the input profile are matched to the IR.
  --sample-profile-check-sample-coverage=<N>                 - Emit a warning if less than N% of samples in the input profile are matched to the IR.
  --sample-profile-max-propagate-iterations=<uint>           - Maximum number of iterations to go through when propagating sample block/edge weights through the CFG.
  --sampled-instr-burst-duration=<uint>                      - Set the profile instrumentation burst duration, which can range from 1 to the value of 'sampled-instr-period' (0 is invalid). This number of samples will be recorded for each 'sampled-instr-period' count update. Setting to 1 enables simple sampling, in which case it is recommended to set 'sampled-instr-period' to a prime number.
  --sampled-instr-period=<uint>                              - Set the profile instrumentation sample period. A sample period of 0 is invalid. For each sample period, a fixed number of consecutive samples will be recorded. The number is controlled by 'sampled-instr-burst-duration' flag. The default sample period of 65536 is optimized for generating efficient code that leverages unsigned short integer wrapping in overflow, but this is disabled under simple sampling (burst duration = 1).
  --sampled-instrumentation                                  - Do PGO instrumentation sampling
  --sanitizer-early-opt-ep                                   - Insert sanitizers on OptimizerEarlyEP.
  --save-temp-labels                                         - Don't discard temporary labels
  --skip-ret-exit-block                                      - Suppress counter promotion if exit blocks contain ret.
  --speculative-counter-promotion-max-exiting=<uint>         - The max number of exiting blocks of a loop to allow  speculative counter promotion
  --speculative-counter-promotion-to-loop                    - When the option is false, if the target block is in a loop, the promotion will be disallowed unless the promoted counter  update can be further/iteratively promoted into an acyclic  region.
  --summary-file=<string>                                    - The summary file to use for function importing.
  --sve-tail-folding=<string>                                - Control the use of vectorisation using tail-folding for SVE where the option is specified in the form (Initial)[+(Flag1|Flag2|...)]:
                                                               disabled      (Initial) No loop types will vectorize using tail-folding
                                                               default       (Initial) Uses the default tail-folding settings for the target CPU
                                                               all           (Initial) All legal loop types will vectorize using tail-folding
                                                               simple        (Initial) Use tail-folding for simple loops (not reductions or recurrences)
                                                               reductions    Use tail-folding for loops containing reductions
                                                               noreductions  Inverse of above
                                                               recurrences   Use tail-folding for loops containing fixed order recurrences
                                                               norecurrences Inverse of above
                                                               reverse       Use tail-folding for loops requiring reversed predicates
                                                               noreverse     Inverse of above
  --tail-predication=<value>                                 - MVE tail-predication pass options
    =disabled                                                -   Don't tail-predicate loops
    =enabled-no-reductions                                   -   Enable tail-predication, but not for reduction loops
    =enabled                                                 -   Enable tail-predication, including reduction loops
    =force-enabled-no-reductions                             -   Enable tail-predication, but not for reduction loops, and force this which might be unsafe
    =force-enabled                                           -   Enable tail-predication, including reduction loops, and force this which might be unsafe
  --target-abi=<string>                                      - The name of the ABI to be targeted from the backend.
  --thinlto-assume-merged                                    - Assume the input has already undergone ThinLTO function importing and the other pre-optimization pipeline changes.
  --ubsan-guard-checks                                       - Guard UBSAN checks with `llvm.allow.ubsan.check()`.
  --verify-legalizer-debug-locs=<value>                      - Verify that debug locations are handled
    =none                                                    -   No verification
    =legalizations                                           -   Verify legalizations
    =legalizations+artifactcombiners                         -   Verify legalizations and artifact combines
  --verify-region-info                                       - Verify region info (time consuming)
  --vp-counters-per-site=<number>                            - The average number of profile counters allocated per value profiling site.
  --vp-static-alloc                                          - Do static counter allocation for value profiler
  --wholeprogramdevirt-cutoff=<uint>                         - Max number of devirtualizations for devirt module pass
  --x86-align-branch=<string>                                - Specify types of branches to align (plus separated list of types):
                                                               jcc      indicates conditional jumps
                                                               fused    indicates fused conditional jumps
                                                               jmp      indicates direct unconditional jumps
                                                               call     indicates direct and indirect calls
                                                               ret      indicates rets
                                                               indirect indicates indirect unconditional jumps
  --x86-align-branch-boundary=<uint>                         - Control how the assembler should align branches with NOP. If the boundary's size is not 0, it should be a power of 2 and no less than 32. Branches will be aligned to prevent from being across or against the boundary of specified size. The default value 0 does not align branches.
  --x86-branches-within-32B-boundaries                       - Align selected instructions to mitigate negative performance impact of Intel's micro code update for errata skx102.  May break assumptions about labels corresponding to particular instructions, and should be used with caution.
  --x86-enable-apx-for-relocation                            - Enable APX features (EGPR, NDD and NF) for instructions with relocations on x86-64 ELF
  --x86-pad-max-prefix-size=<uint>                           - Maximum number of prefixes to use for padding
  --x86-relax-relocations                                    - Emit GOTPCRELX/REX_GOTPCRELX/CODE_4_GOTPCRELX instead of GOTPCREL on x86-64 ELF
  --x86-sse2avx                                              - Specify that the assembler should encode SSE instructions with VEX prefix

Generic Options:

  --help                                                     - Display available options (--help-hidden for more)
  --help-list                                                - Display list of available options (--help-list-hidden for more)
  --version                                                  - Display the version of this program

IR2Vec Options:

  --ir2vec-arg-weight=<number>                               - Weight for argument embeddings
  --ir2vec-kind=<value>                                      - IR2Vec embedding kind
    =symbolic                                                -   Generate symbolic embeddings
    =flow-aware                                              -   Generate flow-aware embeddings
  --ir2vec-opc-weight=<number>                               - Weight for opcode embeddings
  --ir2vec-type-weight=<number>                              - Weight for type embeddings
  --ir2vec-vocab-path=<string>                               - Path to the vocabulary file for IR2Vec

Polly Options:
Configure the polly loop optimizer

  --polly                                                    - Enable the polly optimizer (with -O1, -O2 or -O3)
  --polly-2nd-level-tiling                                   - Enable a 2nd level loop of loop tiling
  --polly-annotate-metadata-vectorize                        - Append vectorize enable/disable metadata from polly
  --polly-ast-print-accesses                                 - Print memory access functions
  --polly-context=<isl parameter set>                        - Provide additional constraints on the context parameters
  --polly-dce-precise-steps=<int>                            - The number of precise steps between two approximating iterations. (A value of -1 schedules another approximation stage before the actual dead code elimination.
  --polly-delicm-max-ops=<int>                               - Maximum number of isl operations to invest for lifetime analysis; 0=no limit
  --polly-detect-full-functions                              - Allow the detection of full functions
  --polly-dump-after                                         - Dump module after Polly transformations into a file suffixed with "-after"
  --polly-dump-after-file=<string>                           - Dump module after Polly transformations to the given file
  --polly-dump-before                                        - Dump module before Polly transformations into a file suffixed with "-before"
  --polly-dump-before-file=<string>                          - Dump module before Polly transformations to the given file
  --polly-enable-simplify                                    - Simplify SCoP after optimizations
  --polly-ignore-func=<string>                               - Ignore functions that match a regex. Multiple regexes can be comma separated. Scop detection will ignore all functions that match ANY of the regexes provided.
  --polly-isl-arg=<argument>                                 - Option passed to ISL
  --polly-matmul-opt                                         - Perform optimizations of matrix multiplications based on pattern matching
  --polly-on-isl-error-abort                                 - Abort if an isl error is encountered
  --polly-only-func=<string>                                 - Only run on functions that match a regex. Multiple regexes can be comma separated. Scop detection will run on all functions that match ANY of the regexes provided.
  --polly-only-region=<identifier>                           - Only run on certain regions (The provided identifier must appear in the name of the region's entry block
  --polly-only-scop-detection                                - Only run scop detection, but no other optimizations
  --polly-optimized-scops                                    - Polly - Dump polyhedral description of Scops optimized with the isl scheduling optimizer and the set of post-scheduling transformations is applied on the schedule tree
  --polly-parallel                                           - Generate thread parallel code (isl codegen only)
  --polly-parallel-force                                     - Force generation of thread parallel code ignoring any cost model
  --polly-pattern-matching-based-opts                        - Perform optimizations based on pattern matching
  --polly-postopts                                           - Apply post-rescheduling optimizations such as tiling (requires -polly-reschedule)
  --polly-pragma-based-opts                                  - Apply user-directed transformation from metadata
  --polly-pragma-ignore-depcheck                             - Skip the dependency check for pragma-based transformations
  --polly-process-unprofitable                               - Process scops that are unlikely to benefit from Polly optimizations.
  --polly-register-tiling                                    - Enable register tiling
  --polly-report                                             - Print information about the activities of Polly
  --polly-reschedule                                         - Optimize SCoPs using ISL
  --polly-show                                               - Highlight the code regions that will be optimized in a (CFG BBs and LLVM-IR instructions)
  --polly-show-only                                          - Highlight the code regions that will be optimized in a (CFG only BBs)
  --polly-stmt-granularity=<value>                           - Algorithm to use for splitting basic blocks into multiple statements
    =bb                                                      -   One statement per basic block
    =scalar-indep                                            -   Scalar independence heuristic
    =store                                                   -   Store-level granularity
  --polly-tc-opt                                             - Perform optimizations of tensor contractions based on pattern matching
  --polly-tiling                                             - Enable loop tiling
  --polly-vectorizer=<value>                                 - Select the vectorization strategy
    =none                                                    -   No Vectorization
    =stripmine                                               -   Strip-mine outer loops for the loop-vectorizer to trigger
```





https://github.com/llvm/llvm-project/pull/162191


More information about the llvm-commits mailing list