[llvm] [BOLT][NFC] Add documentation on BOLT options (PR #92117)

via llvm-commits llvm-commits at lists.llvm.org
Tue May 14 06:37:45 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-bolt

Author: Elvina Yakubova (ElvinaYakubova)

<details>
<summary>Changes</summary>

Add .md file documentation with all BOLT options to display it more conveniently.

---

Patch is 29.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92117.diff


1 Files Affected:

- (added) bolt/docs/CommandLineArgumentReference.md (+1205) 


``````````diff
diff --git a/bolt/docs/CommandLineArgumentReference.md b/bolt/docs/CommandLineArgumentReference.md
new file mode 100644
index 0000000000000..6624c2a28f007
--- /dev/null
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -0,0 +1,1205 @@
+# BOLT - a post-link optimizer developed to speed up large applications
+
+## SYNOPSIS
+
+`llvm-bolt <executable> -o <executable>.bolt -data=perf.fdata [options]`
+
+## OPTIONS
+
+### Generic options
+
+- `-h`
+
+  Alias for `--help`
+
+- `--help`
+
+  Display available options (`--help-hidden` for more).
+
+- `--help-hidden`
+
+  Display all available options.
+
+- `--help-list`
+
+  Display list of available options (`--help-list-hidden` for more).
+
+- `--help-list-hidden`
+
+  Display list of all available options.
+
+- `--print-all-options`
+
+  Print all option values after command line parsing.
+
+- `--print-options`
+
+  Print non-default options after command line parsing.
+
+- `--version`
+
+  Display the version of this program.
+
+### Output options
+
+- `-o <string>`
+
+  output file
+
+- `-w <string>`
+
+  Save recorded profile to a file
+
+### BOLT generic options
+
+- `--align-text=<uint>`
+
+  Alignment of .text section
+
+- `--allow-stripped`
+
+  Allow processing of stripped binaries
+
+- `--asm-dump[=<dump folder>]`
+
+  Dump function into assembly
+
+- `-b`
+
+  Alias for -data
+
+- `--bolt-id=<string>`
+
+  Add any string to tag this execution in the output binary via bolt info section
+
+- `--break-funcs=<func1,func2,func3,...>`
+
+  List of functions to core dump on (debugging)
+
+- `--check-encoding`
+
+  Perform verification of LLVM instruction encoding/decoding. Every instruction
+  in the input is decoded and re-encoded. If the resulting bytes do not match
+  the input, a warning message is printed.
+
+- `--cu-processing-batch-size=<uint>`
+
+  Specifies the size of batches for processing CUs. Higher number has better
+  performance, but more memory usage. Default value is 1.
+
+- `--data=<string>`
+
+  <data file>
+
+- `--debug-skeleton-cu`
+
+  Prints out offsets for abbrev and debug_info of Skeleton CUs that get patched.
+
+- `--deterministic-debuginfo`
+
+  Disables parallel execution of tasks that may produce nondeterministic debug info
+
+- `--dot-tooltip-code`
+
+  Add basic block instructions as tool tips on nodes
+
+- `--dump-cg=<string>`
+
+  Dump callgraph to the given file
+
+- `--dump-data`
+
+  Dump parsed bolt data for debugging
+
+- `--dump-dot-all`
+
+  Dump function CFGs to graphviz format after each stage; enable '-print-loops'
+  for color-coded blocks
+
+- `--dump-orc`
+
+  Dump raw ORC unwind information (sorted)
+
+- `--dwarf-output-path=<string>`
+
+  Path to where .dwo files or dwp file will be written out to.
+
+- `--dwp=<string>`
+
+  Path and name to DWP file.
+
+- `--dyno-stats`
+
+  Print execution info based on profile
+
+- `--dyno-stats-all`
+
+  Print dyno stats after each stage
+
+- `--dyno-stats-scale=<uint>`
+
+  Scale to be applied while reporting dyno stats
+
+- `--enable-bat`
+
+  Write BOLT Address Translation tables
+
+- `--force-data-relocations`
+
+  Force relocations to data sections to always be processed
+
+- `--force-patch`
+
+  Force patching of original entry points
+
+- `--funcs=<func1,func2,func3,...>`
+
+  Limit optimizations to functions from the list
+
+- `--funcs-file=<string>`
+
+  File with list of functions to optimize
+
+- `--funcs-file-no-regex=<string>`
+
+  File with list of functions to optimize (non-regex)
+
+- `--funcs-no-regex=<func1,func2,func3,...>`
+
+  Limit optimizations to functions from the list (non-regex)
+
+- `--hot-data`
+
+  Hot data symbols support (relocation mode)
+
+- `--hot-functions-at-end`
+
+  If reorder-functions is used, order functions putting hottest last
+
+- `--hot-text`
+
+  Generate hot text symbols. Apply this option to a precompiled binary that
+  manually calls into hugify, such that at runtime hugify call will put hot
+  code into 2M pages. This requires relocation.
+
+- `--hot-text-move-sections=<sec1,sec2,sec3,...>`
+
+  List of sections containing functions used for hugifying hot text. BOLT makes
+  sure these functions are not placed on the same page as the hot text.
+  (default='.stub,.mover').
+
+- `--insert-retpolines`
+
+  Run retpoline insertion pass
+
+- `--keep-aranges`
+
+  Keep or generate .debug_aranges section if .gdb_index is written
+
+- `--keep-tmp`
+
+  Preserve intermediate .o file
+
+- `--lite`
+
+  Skip processing of cold functions
+
+- `--max-data-relocations=<uint>`
+
+  Maximum number of data relocations to process
+
+- `--max-funcs=<uint>`
+
+  Maximum number of functions to process
+
+- `--no-huge-pages`
+
+  Use regular size pages for code alignment
+
+- `--no-threads`
+
+  Disable multithreading
+
+- `--pad-funcs=<func1:pad1,func2:pad2,func3:pad3,...>`
+
+  List of functions to pad with amount of bytes
+
+- `--print-aliases`
+
+  Print aliases when printing objects
+
+- `--print-all`
+
+  Print functions after each stage
+
+- `--print-cfg`
+
+  Print functions after CFG construction
+
+- `--print-debug-info`
+
+  Print debug info when printing functions
+
+- `--print-disasm`
+
+  Print function after disassembly
+
+- `--print-dyno-opcode-stats=<uint>`
+
+  Print per instruction opcode dyno stats and the functionnames:BB offsets of
+  the nth highest execution counts
+
+- `--print-dyno-stats-only`
+
+  While printing functions output dyno-stats and skip instructions
+
+- `--print-exceptions`
+
+  Print exception handling data
+
+- `--print-globals`
+
+  Print global symbols after disassembly
+
+- `--print-jump-tables`
+
+  Print jump tables
+
+- `--print-loops`
+
+  Print loop related information
+
+- `--print-mem-data`
+
+  Print memory data annotations when printing functions
+
+- `--print-normalized`
+
+  Print functions after CFG is normalized
+
+- `--print-only=<func1,func2,func3,...>`
+
+  List of functions to print
+
+- `--print-orc`
+
+  Print ORC unwind information for instructions
+
+- `--print-profile`
+
+  Print functions after attaching profile
+
+- `--print-profile-stats`
+
+  Print profile quality/bias analysis
+
+- `--print-pseudo-probes=<value>`
+
+  Print pseudo probe info
+  - `=decode`: decode probes section from binary
+  - `=address_conversion`: update address2ProbesMap with output block address
+  - `=encoded_probes`: display the encoded probes in binary section
+  - `=all`: enable all debugging printout
+
+- `--print-relocations`
+
+  Print relocations when printing functions/objects
+
+- `--print-reordered-data`
+
+  Print section contents after reordering
+
+- `--print-retpoline-insertion`
+
+  Print functions after retpoline insertion pass
+
+- `--print-sdt`
+
+  Print all SDT markers
+
+- `--print-sections`
+
+  Print all registered sections
+
+- `--print-unknown`
+
+  Print names of functions with unknown control flow
+
+- `--profile-format=<value>`
+
+  Format to dump profile output in aggregation mode, default is fdata
+  - `=fdata`: offset-based plaintext format
+  - `=yaml`: dense YAML representation
+
+- `--r11-availability=<value>`
+
+  Determine the availability of r11 before indirect branches
+  - `=never`: r11 not available
+  - `=always`: r11 available before calls and jumps
+  - `=abi`r11 available before calls but not before jumps
+
+- `--relocs`
+
+  Use relocations in the binary (default=autodetect)
+
+- `--remove-symtab`
+
+  Remove .symtab section
+
+- `--reorder-skip-symbols=<symbol1,symbol2,symbol3,...>`
+
+  List of symbol names that cannot be reordered
+
+- `--reorder-symbols=<symbol1,symbol2,symbol3,...>`
+
+  List of symbol names that can be reordered
+
+- `--retpoline-lfence`
+
+  Determine if lfence instruction should exist in the retpoline
+
+- `--skip-funcs=<func1,func2,func3,...>`
+
+  List of functions to skip
+
+- `--skip-funcs-file=<string>`
+
+  File with list of functions to skip
+
+- `--strict`
+
+  Trust the input to be from a well-formed source
+
+- `--tasks-per-thread=<uint>`
+
+  Number of tasks to be created per thread
+
+- `--thread-count=<uint>`
+
+  Number of threads
+
+- `--time-build`
+
+  Print time spent constructing binary functions
+
+- `--time-rewrite`
+
+  Print time spent in rewriting passes
+
+- `--top-called-limit=<uint>`
+
+  Maximum number of functions to print in top called functions section
+
+- `--trap-avx512`
+
+  In relocation mode trap upon entry to any function that uses AVX-512 instructions
+
+- `--trap-old-code`
+
+  Insert traps in old function bodies (relocation mode)
+
+- `--update-debug-sections`
+
+  Update DWARF debug sections of the executable
+
+- `--use-gnu-stack`
+
+  Use GNU_STACK program header for new segment (workaround for issues with
+  strip/objcopy)
+
+- `--use-old-text`
+
+  Re-use space in old .text if possible (relocation mode)
+
+- `-v <uint>`
+
+  Set verbosity level for diagnostic output
+
+- `--write-dwp`
+
+  Output a single dwarf package file (dwp) instead of multiple non-relocatable
+  dwarf object files (dwo).
+
+### BOLT optimization options
+
+- `--align-blocks`
+
+  Align basic blocks
+
+- `--align-blocks-min-size=<uint>`
+
+  Minimal size of the basic block that should be aligned
+
+- `--align-blocks-threshold=<uint>`
+
+  Align only blocks with frequency larger than containing function execution
+  frequency specified in percent. E.g. 1000 means aligning blocks that are 10
+  times more frequently executed than the containing function.
+
+- `--align-functions=<uint>`
+
+  Align functions at a given value (relocation mode)
+
+- `--align-functions-max-bytes=<uint>`
+
+  Maximum number of bytes to use to align functions
+
+- `--assume-abi`
+
+  Assume the ABI is never violated
+
+- `--block-alignment=<uint>`
+
+  Boundary to use for alignment of basic blocks
+
+- `--bolt-seed=<uint>`
+
+  Seed for randomization
+
+- `--cg-from-perf-data`
+
+  Use perf data directly when constructing the call graph for stale functions
+
+- `--cg-ignore-recursive-calls`
+
+  Ignore recursive calls when constructing the call graph
+
+- `--cg-use-split-hot-size`
+
+  Use hot/cold data on basic blocks to determine hot sizes for call graph functions
+
+- `--cold-threshold=<uint>`
+
+  Tenths of percents of main entry frequency to use as a threshold when
+  evaluating whether a basic block is cold (0 means it is only considered
+  cold if the block has zero samples). Default: 0
+
+- `--elim-link-veneers`
+
+  Run veneer elimination pass
+
+- `--eliminate-unreachable`
+
+  Eliminate unreachable code
+
+- `--equalize-bb-counts`
+
+  Use same count for BBs that should have equivalent count (used in non-LBR
+  and shrink wrapping)
+
+- `--execution-count-threshold=<uint>`
+
+  Perform profiling accuracy-sensitive optimizations only if function execution
+  count >= the threshold (default: 0)
+
+- `--fix-block-counts`
+
+  Adjust block counts based on outgoing branch counts
+
+- `--fix-func-counts`
+
+  Adjust function counts based on basic blocks execution count
+
+- `--force-inline=<func1,func2,func3,...>`
+
+  List of functions to always consider for inlining
+
+- `--frame-opt=<value>`
+
+  Optimize stack frame accesses
+  - `none`: do not perform frame optimization
+  - `hot`: perform FOP on hot functions
+  - `all`: perform FOP on all functions
+
+- `--frame-opt-rm-stores`
+
+  Apply additional analysis to remove stores (experimental)
+
+- `--function-order=<string>`
+
+  File containing an ordered list of functions to use for function reordering
+
+- `--generate-function-order=<string>`
+
+  File to dump the ordered list of functions to use for function reordering
+
+- `--generate-link-sections=<string>`
+
+  Generate a list of function sections in a format suitable for inclusion in a
+  linker script
+
+- `--group-stubs`
+
+  Share stubs across functions
+
+- `--hugify`
+
+  Automatically put hot code on 2MB page(s) (hugify) at runtime. No manual call
+  to hugify is needed in the binary (which is what --hot-text relies on).
+
+- `--icf`
+
+  Fold functions with identical code
+
+- `--icp`
+
+  Alias for --indirect-call-promotion
+
+- `--icp-calls-remaining-percent-threshold=<uint>`
+
+  The percentage threshold against remaining unpromoted indirect call count
+  for the promotion for calls
+
+- `--icp-calls-topn`
+
+  Alias for --indirect-call-promotion-calls-topn
+
+- `--icp-calls-total-percent-threshold=<uint>`
+
+  The percentage threshold against total count for the promotion for calls
+
+- `--icp-eliminate-loads`
+
+  Enable load elimination using memory profiling data when performing ICP
+
+- `--icp-funcs=<func1,func2,func3,...>`
+
+  List of functions to enable ICP for
+
+- `--icp-inline`
+
+  Only promote call targets eligible for inlining
+
+- `--icp-jt-remaining-percent-threshold=<uint>`
+
+  The percentage threshold against remaining unpromoted indirect call count for
+  the promotion for jump tables
+
+- `--icp-jt-targets`
+
+  Alias for --icp-jump-tables-targets
+
+- `--icp-jt-topn`
+
+  Alias for --indirect-call-promotion-jump-tables-topn
+
+- `--icp-jt-total-percent-threshold=<uint>`
+
+  The percentage threshold against total count for the promotion for jump tables
+
+- `--icp-jump-tables-targets`
+
+  For jump tables, optimize indirect jmp targets instead of indices
+
+- `--icp-mp-threshold`
+
+  Alias for --indirect-call-promotion-mispredict-threshold
+
+- `--icp-old-code-sequence`
+
+  Use old code sequence for promoted calls
+
+- `--icp-top-callsites=<uint>`
+
+  Optimize hottest calls until at least this percentage of all indirect calls
+  frequency is covered. 0 = all callsites
+
+- `--icp-topn`
+
+  Alias for --indirect-call-promotion-topn
+
+- `--icp-use-mp`
+
+  Alias for --indirect-call-promotion-use-mispredicts
+
+- `--indirect-call-promotion=<value>`
+
+  Indirect call promotion
+  - `none`: do not perform indirect call promotion
+  - `calls`: perform ICP on indirect calls
+  - `jump-tables`: perform ICP on jump tables
+  - `all`: perform ICP on calls and jump tables
+
+- `--indirect-call-promotion-calls-topn=<uint>`
+
+  Limit number of targets to consider when doing indirect call promotion on
+  calls. 0 = no limit
+
+- `--indirect-call-promotion-jump-tables-topn=<uint>`
+
+  Limit number of targets to consider when doing indirect call promotion on
+  jump tables. 0 = no limit
+
+- `--indirect-call-promotion-mispredict-threshold=<uint>`
+
+  Misprediction threshold for skipping ICP on an indirect call
+
+- `--indirect-call-promotion-topn=<uint>`
+
+  Limit number of targets to consider when doing indirect call promotion.
+  0 = no limit
+
+- `--indirect-call-promotion-use-mispredicts`
+
+  Use misprediction frequency for determining whether or not ICP should be
+  applied at a callsite. The `-indirect-call-promotion-mispredict-threshold`
+  value will be used by this heuristic
+
+- `--infer-fall-throughs`
+
+  Infer execution count for fall-through blocks
+
+- `--infer-stale-profile`
+
+  Infer counts from stale profile data.
+
+- `--inline-all`
+
+  Inline all functions
+
+- `--inline-ap`
+
+  Adjust function profile after inlining
+
+- `--inline-limit=<uint>`
+
+  Maximum number of call sites to inline
+
+- `--inline-max-iters=<uint>`
+
+  Maximum number of inline iterations
+
+- `--inline-memcpy`
+
+  Inline memcpy using 'rep movsb' instruction (X86-only)
+
+- `--inline-small-functions`
+
+  Inline functions if increase in size is less than defined by `-inline-small-functions-bytes`
+
+- `--inline-small-functions-bytes=<uint>`
+
+  Max number of bytes for the function to be considered small for inlining purposes
+
+- `--instrument`
+
+  Instrument code to generate accurate profile data
+
+- `--iterative-guess`
+
+  In non-LBR mode, guess edge counts using iterative technique
+
+- `--jt-footprint-optimize-for-icache`
+
+  With jt-footprint-reduction, only process PIC jumptables and turn off other
+  transformations that increase code size
+
+- `--jt-footprint-reduction`
+
+  Make jump tables size smaller at the cost of using more instructions at jump
+  sites
+
+- `-jump-tables=<value>`
+
+  Jump tables support (default=basic)
+  - `none`: do not optimize functions with jump tables
+  - `basic`: optimize functions with jump tables
+  - `move`: move jump tables to a separate section
+  - `split`: split jump tables section into hot and cold based on function
+  execution frequency
+  - `aggressive`: aggressively split jump tables section based on usage of the
+  tables
+
+- `--keep-nops`
+
+  Keep no-op instructions. By default they are removed.
+
+- `--lite-threshold-count=<uint>`
+
+  Similar to '-lite-threshold-pct' but specify threshold using absolute function
+  call count. I.e. limit processing to functions executed at least the specified
+  number of times.
+
+- `--lite-threshold-pct=<uint>`
+
+  Threshold (in percent) for selecting functions to process in lite mode. Higher
+  threshold means fewer functions to process. E.g threshold of 90 means only top
+  10 percent of functions with profile will be processed.
+
+- `--mcf-use-rarcs`
+
+  In MCF, consider the possibility of cancelling flow to balance edges
+
+- `--memcpy1-spec=<func1,func2:cs1:cs2,func3:cs1,...>`
+
+  List of functions with call sites for which to specialize memcpy() for size 1
+
+- `--min-branch-clusters`
+
+  Use a modified clustering algorithm geared towards minimizing branches
+
+- `--no-inline`
+
+  Disable all inlining (overrides other inlining options)
+
+- `--no-scan`
+
+  Do not scan cold functions for external references (may result in slower binary)
+
+- `--peepholes=<value>`
+
+  Enable peephole optimizations
+  - `none`: disable peepholes
+  - `double-jumps`: remove double jumps when able
+  - `tailcall-traps`: insert tail call traps
+  - `useless-branches`: remove useless conditional branches
+  - `all`: enable all peephole optimizations
+
+- `--plt=<value>`
+
+  Optimize PLT calls (requires linking with -znow)
+  - `none`: do not optimize PLT calls
+  - `hot`: optimize executed (hot) PLT calls
+  - `all`: optimize all PLT calls
+
+- `--preserve-blocks-alignment`
+
+  Try to preserve basic block alignment
+
+- `--print-after-branch-fixup`
+
+  Print function after fixing local branches
+
+- `--print-after-jt-footprint-reduction`
+
+  Print function after jt-footprint-reduction pass
+
+- `--print-after-lowering`
+
+  Print function after instruction lowering
+
+- `--print-cache-metrics`
+
+  Calculate and print various metrics for instruction cache
+
+- `--print-clusters`
+
+  Print clusters
+
+- `--print-finalized`
+
+  Print function after CFG is finalized
+
+- `--print-fix-relaxations`
+
+  Print functions after fix relaxations pass
+
+- `--print-fix-riscv-calls`
+
+  Print functions after fix RISCV calls pass
+
+- `--print-fop`
+
+  Print functions after frame optimizer pass
+
+- `--print-function-statistics=<uint>`
+
+  Print statistics about basic block ordering
+
+- `--print-icf`
+
+  Print functions after ICF optimization
+
+- `--print-icp`
+
+  Print functions after indirect call promotion
+
+- `--print-inline`
+
+  Print functions after inlining optimization
+
+- `--print-longjmp`
+
+  Print functions after longjmp pass
+
+- `--print-optimize-bodyless`
+
+  Print functions after bodyless optimization
+
+- `--print-output-address-range`
+
+  Print output address range for each basic block in the function
+  whenBinaryFunction::print is called
+
+- `--print-peepholes`
+
+  Print functions after peephole optimization
+
+- `--print-plt`
+
+  Print functions after PLT optimization
+
+- `--print-regreassign`
+
+  Print functions after regreassign pass
+
+- `--print-reordered`
+
+  Print functions after layout optimization
+
+- `--print-reordered-functions`
+
+  Print functions after clustering
+
+- `--print-sctc`
+
+  Print functions after conditional tail call simplification
+
+- `--print-simplify-rodata-loads`
+
+  Print functions after simplification of RO data loads
+
+- `--print-sorted-by=<value>`
+
+  Print functions sorted by order of dyno stats
+  - `executed-forward-branches`: executed forward branches
+  - `taken-forward-branches`: taken forward branches
+  - `executed-backward-branches`: executed backward branches
+  - `taken-backward-branches`: taken backward branches
+  - `executed-unconditional-branches`: executed unconditional branches
+  - `all-function-calls`: all function calls
+  - `indirect-calls`: indirect calls
+  - `PLT-calls`: PLT calls
+  - `executed-instructions`: executed instructions
+  - `executed-load-instructions`: executed load instructions
+  - `e...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/92117


More information about the llvm-commits mailing list