[flang-commits] [flang] [flang] flang manpage overhaul (PR #144948)

Sun Jun 22 13:54:01 PDT 2025

pawosm-arm wrote:

This is how it looks like after the recent changes:

```
FLANG(1)                                                                                                                                                Flang                                                                                                                                               FLANG(1)

NAME
       flang - Flang Documentation (In Progress)

SYNOPSIS
       flang [options] filename ...

DESCRIPTION
       flang  is  a Fortran compiler which supports all of the Fortran 95 and many newer language features. Flang supports OpenMP and has some support for OpenACC and CUDA. It encompasses preprocessing, parsing, optimization, code generation, assembly, and linking.  Depending on the options passed in, Flang
       will perform only some, or all, of these actions. While Flang is highly integrated, it is important to understand the stages of compilation in order to understand how to invoke it.  These stages are:

       Driver The flang executable is actually a small driver that orchestrates the execution of other tools such as the compiler, assembler and linker.  Typically you do not need to interact with the driver, but you transparently use it to run the other tools.

       Preprocessing
              This stage performs tokenization of the input source file, macro expansion, #include expansion and handles other preprocessor directives.

       Parsing and Semantic Analysis
              This stage parses the input file, translating preprocessor tokens into a parse tree.  Once in the form of a parse tree, it applies semantic analysis to compute types for expressions and determine whether the code is well formed. Parse errors and most compiler warnings  are  generated  by  this
              stage.

       Code Generation and Optimization
              This stage translates the parse tree into intermediate code (known as "LLVM IR") and, ultimately, machine code.  It also optimizes this intermediate code and handles target-specific code generation. The output of this stage is typically a ".s" file, referred to as an "assembly" file.

              Flang also supports the use of an integrated assembler, in which the code generator produces object files directly. This avoids the overhead of generating the ".s" file and calling the target assembler explicitly.

       Assembler
              This stage runs the target assembler to translate the output of the compiler into a target object file. The output of this stage is typically a ".o" file, referred to as an "object" file.

       Linker This stage runs the target linker to merge multiple object files into an executable or dynamic library. The output of this stage is typically an "a.out", ".dylib" or ".so" file.

OPTIONS
       -B<prefix>

       Search $prefix$file for executables, libraries, and data files. If $prefix is a directory, search $prefix/$file

       -Qunused-arguments

       Don't emit warning for unused driver arguments

       --config=<file>, --config <arg>

       Specify configuration file

       -dumpmachine

       Display the compiler's target processor

       -dumpversion

       Display the version of the compiler

       -fdo-concurrent-to-openmp=<arg>

       Try to map `do concurrent` loops to OpenMP [none|host|device]. <arg> must be 'none', ' host' or ' device'.

       -finit-global-zero, -fno-init-global-zero

       Zero initialize globals without default initialization (default)

       -fopenmp-targets=<arg1>,<arg2>...

       Specify comma-separated list of triples OpenMP offloading targets to be supported

       -fppc-native-vector-element-order, -fno-ppc-native-vector-element-order

       Specifies PowerPC native vector element order (default)

       -frepack-arrays, -fno-repack-arrays

       Create  temporary  copies  of  non-contiguous assumed shape dummy arrays in subprogram prologues, and destroy them in subprogram epilogues.  The temporary copy is initialized with values from the original array in the prologue, if needed. In the epilogue, the current values in the temporary array are
       copied into the original array, if needed.

       Accessing the contiguous temporary in the program code may result in faster execution comparing to accessing elements of the original array, when they are sparse in memory. At the same time, the overhead of copying values between the original and the temporary arrays may  be  significant,  which  may
       slow down some programs.

       Enabling array repacking may also change the behavior of certain programs:

       • The  copy  actions may introduce a data race in valid OpenACC/OpenMP programs.  For example, if different threads execute the same subprogram with a non-contiguous assumed shape dummy array, and the different threads access unrelated parts of the array, then the whole array copy made in each thread
         will cause a data race.

       • OpenACC/OpenMP offload programs may behave incorrectly with regards to the device data environment, due to the fact that the original array and the temporary may have different presence status on the device.

       • IS_CONTIGUOUS intrinsic may return TRUE with the array repacking enabled, whereas if would return FALSE with the repacking disabled.

       • The result of LOC intrinsic applied to an actual argument associated with a non-contiguous assumed shape dummy array, may be different from the result of LOC applied to the dummy array.

       -frtlib-add-rpath, -fno-rtlib-add-rpath

       Add -rpath with architecture-specific resource directory to the linker flags. When --hip-link is specified, also add -rpath with HIP runtime library directory to the linker flags

       -fsave-main-program, -fno-save-main-program

       Place all main program variables in static memory (otherwise scalars may be placed on the stack)

       -fstack-arrays, -fno-stack-arrays

       Attempt to allocate array temporaries on the stack, no matter their size

       -fstack-repack-arrays, -fno-stack-repack-arrays

       Controls whether the array temporaries created under -frepack-arrays are allocated on the stack or on the heap.

       By default, the heap is used. Allocations of polymorphic types are always done on the heap, though this may change in future releases.

       -fversion-loops-for-stride, -fno-version-loops-for-stride

       Create unit-strided versions of loops

       --gcc-install-dir=<arg>

       Use GCC installation in the specified directory. The directory ends with path components like 'lib{,32,64}/gcc{,-cross}/$triple/$version'. Note: executables (e.g. ld) used by the compiler are not overridden by the selected GCC installation

       --gcc-toolchain=<arg>

       Specify a directory where Flang can find 'lib{,32,64}/gcc{,-cross}/$triple/$version'. Flang will use the GCC installation with the largest version

       -gpulibc

       Link the LLVM C Library for GPUs

       -help, --help

       Display available options

       --help-hidden

       Display help for hidden options

       -mllvm <arg>, -mllvm=<arg>

       Additional arguments to forward to LLVM's option processing

       -mmlir <arg>

       Additional arguments to forward to MLIR's option processing

       -module-dir<dir>, -J<arg>

       This option specifies where to put .mod files for compiled modules.  It is also added to the list of directories to be searched by an USE statement.  The default is the current directory.

       --no-default-config

       Disable loading default configuration files

       -nodefaultlibs

       -nogpulibc

       -nopie

       -o<file>

       Write output to <file>

       --offloadlib, --no-offloadlib

       Link device libraries for GPU device compilation

       -print-effective-triple, --print-effective-triple

       Print the effective target triple

       -print-resource-dir, --print-resource-dir

       Print the resource directory pathname that contains lib and include directories with the runtime libraries and MODULE files.

       -print-target-triple, --print-target-triple

       Print the normalized target triple

       -pthread, -no-pthread

       Support POSIX threads in generated code

       -rtlib=<arg>, --rtlib=<arg>

       Compiler runtime library to use

       -save-temps=<arg>, --save-temps=<arg>, -save-temps (equivalent to -save-temps=cwd), --save-temps (equivalent to -save-temps=cwd)

       Save intermediate compilation results. <arg> can be set to 'cwd' for current working directory, or 'obj' which will save temporary files in the same directory as the final output file

       --sysroot=<arg>

       --target=<arg>, -target <arg>

       Generate code for the given target

       -v

       Show commands to run and use verbose output

       --version

       Print version information

       -w, --no-warnings

       Suppress all warnings

       -x<language>

       Treat subsequent input files as having type <language>

   Actions
       The action to perform on the input.

       -E

       Only run the preprocessor

       -S

       Only run preprocess and compilation steps

       -c

       Only run preprocess, compile, and assemble steps

       -emit-llvm

       Use the LLVM representation for assembler and object files

       -fsyntax-only

       Run the preprocessor, parser and semantic analysis stages

   Compilation options
       Flags controlling the behavior of Flang during compilation. These flags have no effect during actions that do not perform compilation.

       -Xflang <arg>

       Pass <arg> to the flang compiler

       -moutline-atomics, -mno-outline-atomics

       Generate local calls to out-of-line atomic operations

       -print-supported-cpus, --print-supported-cpus, -mcpu=help, -mtune=help

       Print supported cpu models for the given target (if target is not specified,it will print the supported cpus for the default target)

       -std=<arg>, --std=<arg>

       Language standard to compile for

   Preprocessor options
       Flags controlling the behavior of the Flang preprocessor.

       -D<macro>=<value>

       Define <macro> to <value> (or 1 if <value> omitted)

       -P

       Disable linemarker output in -E mode

       -U<macro>

       Undefine macro <macro>

   Include path management
       Flags controlling how #includes are resolved to files.

       -I<dir>

       Add directory to include search path. For C++ inputs, if there are multiple -I options, these directories are searched in the order they are given before the standard system directories are searched. If the same directory is in the SYSTEM include search paths,  for  example  if  also  specified  with
       -isystem, the -I option will be ignored

       -isysroot<dir>

       Set the system root directory (usually /)

   Dumping preprocessor state
       Flags allowing the state of the preprocessor to be dumped in various ways.

       -dM

       Print macro definitions in -E mode instead of normal output

   Diagnostic options
       Flags controlling which warnings, errors, and remarks Flang will generate. See Clang's Diagnostic Reference for a full list of warning and remark flags.

       -R<remark>

       Enable the specified remark

       -Rpass-analysis=<arg>

       Report transformation analysis from optimization passes whose name matches the given POSIX regular expression

       -Rpass-missed=<arg>

       Report missed transformations by optimization passes whose name matches the given POSIX regular expression

       -Rpass=<arg>

       Report transformations performed by optimization passes whose name matches the given POSIX regular expression

       -W<warning>

       Enable the specified warning

   Target-independent compilation options
       -cpp

       Enable predefined and command line preprocessor macros

       -fPIC, -fno-PIC

       -fPIE, -fno-PIE

       -falternative-parameter-statement

       Enable the old style PARAMETER statement

       -fapprox-func, -fno-approx-func

       Allow certain math function calls to be replaced with an approximately equivalent calculation

       -fassociative-math, -fno-associative-math

       -fbackslash, -fno-backslash

       Specify that backslash in string introduces an escape character

       -fcolor-diagnostics, -fdiagnostics-color, -fno-color-diagnostics

       Enable colors in diagnostics

       -fconvert=<arg>

       Set endian conversion of data for unformatted files

       -fd-lines-as-code, -fno-d-lines-as-code

       Treat fixed form lines with 'd' or 'D' in the first column as blank.

       -fd-lines-as-comments, -fno-d-lines-as-comments

       Treat fixed form lines with 'd' or 'D' in the first column as comments.

       -fdefault-double-8

       Set the default double precision kind to an 8 byte wide type

       -fdefault-integer-8

       Set the default integer and logical kind to an 8 byte wide type

       -fdefault-real-8

       Set the default real kind to an 8 byte wide type

       -fdiagnostics-color=<arg>

       When to use colors in diagnostics. <arg> must be 'auto', 'always' or 'never'.

       -ffast-math, -fno-fast-math

       Allow aggressive, lossy floating-point optimizations

       -ffixed-form

       Process source files in fixed form

       -ffixed-line-length=<arg>, -ffixed-line-length-<arg>

       Set column after which characters are ignored in typical fixed-form lines in the source file

       -ffp-contract=<arg>

       Form  fused  FP  ops  (e.g. FMAs): fast (fuses across statements disregarding pragmas) | on (only fuses in the same statement unless dictated by pragmas) | off (never fuses) | fast-honor-pragmas (fuses across statements unless dictated by pragmas). Default is 'fast' for CUDA, 'fast-honor-pragmas' for
       HIP, and 'on' otherwise. <arg> must be 'fast', 'on', 'off' or 'fast-honor-pragmas'.

       -ffree-form

       Process source files in free form

       -fhermetic-module-files

       Emit hermetic module files (no nested USE association)

       -fhonor-infinities, -fno-honor-infinities

       Specify that floating-point optimizations are not allowed that assume arguments and results are not +-inf.

       -fhonor-nans, -fno-honor-nans

       Specify that floating-point optimizations are not allowed that assume arguments and results are not NANs.

       -fimplicit-none, -fno-implicit-none

       No implicit typing allowed unless overridden by IMPLICIT statements

       -fimplicit-none-ext, -fno-implicit-none-ext

       No implicit externals allowed

       -finput-charset=<arg>

       Specify the default character set for source files

       -finstrument-functions

       Generate calls to instrument function entry and exit

       -fintegrated-as, -fno-integrated-as

       Enable the integrated assembler

       -fintrinsic-modules-path <dir>

       This option specifies the location of pre-compiled intrinsic modules,
              if they are not in the default location expected by the compiler.

       -flarge-sizes

       Use INTEGER(KIND=8) for the result type in size-related intrinsics

       -flogical-abbreviations, -fno-logical-abbreviations

       Enable logical abbreviations

       -floop-interchange, -fno-loop-interchange

       Enable the loop interchange pass

       -flto=<arg>, -flto (equivalent to -flto=full), -flto=auto (equivalent to -flto=full), -flto=jobserver (equivalent to -flto=full)

       Set LTO mode. <arg> must be 'thin' or 'full'.

       -fms-runtime-lib=<arg>

       Specify Visual Studio C runtime library. "static" and "static_dbg" correspond to the cl flags /MT and /MTd which use the multithread, static version. "dll" and "dll_dbg" correspond to the cl flags /MD and /MDd which use the multithread, dll version. <arg> must  be  'static',  'static_dbg',  'dll'  or
       'dll_dbg'.

       -fomit-frame-pointer, -fno-omit-frame-pointer

       Omit the frame pointer from functions that don't need it. Some stack unwinding cases, such as profilers and sanitizers, may prefer specifying -fno-omit-frame-pointer. On many targets, -O1 and higher omit the frame pointer by default. -m[no-]omit-leaf-frame-pointer takes precedence for leaf functions

       -fopenacc

       Enable OpenACC

       -fopenmp, -fno-openmp

       Parse OpenMP pragmas and generate parallel code.

       -fopenmp-force-usm

       Force behavior as if the user specified pragma omp requires unified_shared_memory.

       -fopenmp-target-debug, -fno-openmp-target-debug

       Enable debugging in the OpenMP offloading device RTL

       -fopenmp-version=<arg>

       Set OpenMP version (e.g. 45 for OpenMP 4.5, 51 for OpenMP 5.1). Default value is 31 for Flang

       -fopenmp=<arg>

       -foptimization-record-file=<file>

       Specify the output name of the file containing the optimization remarks. Implies -fsave-optimization-record. On Darwin platforms, this cannot be used with multiple -arch <arch> options.

       -foptimization-record-passes=<regex>

       Only include passes which match a specified regular expression in the generated optimization record (by default, include all passes)

       -fpass-plugin=<dsopath>

       Load pass plugin from a dynamic shared object file (only with new pass manager).

       -fpic, -fno-pic

       -fpie, -fno-pie

       -fprofile-generate

       Generate instrumented code to collect execution counts into default.profraw (overridden by LLVM_PROFILE_FILE env var)

       -fprofile-use=<pathname>

       Use instrumentation data for profile-guided optimization. If pathname is a directory, it reads from <pathname>/default.profdata. Otherwise, it reads from file <pathname>.

       -frealloc-lhs, -fno-realloc-lhs

       If an allocatable left-hand side of an intrinsic assignment is unallocated or its shape/type does not match the right-hand side, then it is automatically (re)allocated

       -freciprocal-math, -fno-reciprocal-math

       Allow division operations to be reassociated

       -frecord-command-line, -fno-record-command-line

       Generate  a section named ".GCC.command.line" containing the driver command-line. After linking, the section may contain multiple command lines, which will be individually terminated by null bytes. Separate arguments within a command line are combined with spaces; spaces and backslashes within an ar‐
       gument are escaped with backslashes. This format differs from the format of the equivalent section produced by GCC with the -frecord-gcc-switches flag.  This option is currently only supported on ELF targets.

       -frepack-arrays-contiguity=<arg>

       When -frepack-arrays is in effect, 'whole' enables repacking for arrays that are non-contiguous in any dimension, 'innermost' enables repacking for arrays that are non-contiguous in the innermost dimension (the default). <arg> must be 'whole' or 'innermost'.

       -fropi, -fno-ropi

       Generate read-only position independent code (ARM only)

       -frwpi, -fno-rwpi

       Generate read-write position independent code (ARM only)

       -fsave-optimization-record, -fno-save-optimization-record

       Generate a YAML optimization record file

       -fsave-optimization-record=<format>

       Generate an optimization record file in a specific format

       -fsigned-zeros, -fno-signed-zeros

       -fslp-vectorize, -fno-slp-vectorize, -ftree-slp-vectorize

       Enable the superword-level parallelism vectorization passes

       -fstrict-overflow, -fno-strict-overflow

       -ftime-report

       -ftime-report-json

       -funderscoring, -fno-underscoring

       Appends one trailing underscore to external names

       -funroll-loops, -fno-unroll-loops

       Turn on loop unroller

       -funsigned, -fno-unsigned

       Enables UNSIGNED type

       -fuse-ld=<arg>

       -fveclib=<arg>

       Use the given vector functions library. <arg> must be 'Accelerate', 'libmvec', 'MASSV', 'SVML', 'SLEEF', 'Darwin_libsystem_m', 'ArmPL', 'AMDLIBM' or 'none'.

       -fvectorize, -fno-vectorize, -ftree-vectorize

       Enable the loop vectorization passes

       -fverbose-asm, -fno-verbose-asm

       Generate verbose assembly output

       -fwrapv, -fno-wrapv

       Treat signed integer overflow as two's complement

       -fwrapv-pointer, -fno-wrapv-pointer

       Treat pointer overflow as two's complement

       -fxor-operator, -fno-xor-operator

       Enable .XOR. as a synonym of .NEQV.

       -nocpp

       Disable predefined and command line preprocessor macros

       -pedantic, --pedantic

       Warn on language extensions

   Common Offloading options
       --offload-arch=<arg>, --no-offload-arch=<arg>

       Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May  be
       specified more than once.

       --offload-device-only

       Only compile for the offloading device.

       --offload-host-device

       Compile for both the offloading host and device (default).

       --offload-host-only

       Only compile for the offloading host.

   HIP options
       --rocm-device-lib-path=<arg>

       ROCm device library path. Alternative to rocm-path.

       --rocm-path=<arg>

       ROCm installation path, used for finding and automatically linking required bitcode libraries.

   Target-dependent compilation options
       -m32

       -m64

       -mabi=<arg>

       -maix32

       -maix64

       -march=<arg>

       For a list of available architectures for the target use '-mcpu=help'

       -masm=<arg>

       -mcmodel=<arg>

       -mcode-object-version=<arg>

       Specify code object ABI version. Defaults to 6. (AMDGPU only). <arg> must be 'none', '4', '5' or '6'.

       -mcpu=<arg>

       For a list of available CPUs for the target use '-mcpu=help'

       -mdynamic-no-pic<arg>

       -mlarge-data-threshold=<arg>

       -mmacos-version-min=<arg>, -mmacosx-version-min=<arg>

       Set macOS deployment target

       -mprefer-vector-width=<arg>

       Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.

       -mrecip

       Equivalent to '-mrecip=all'

       -mrecip=<arg1>,<arg2>...

       Control use of approximate reciprocal and reciprocal square root instructions followed by <n> iterations of Newton-Raphson refinement. <value> = ( ['!'] ['vec-'] ('rcp'|'sqrt') [('h'|'s'|'d')] [':'<n>] ) | 'all' | 'default' | 'none'

       -mrvv-vector-bits=<arg>

       Specify the size in bits of an RVV vector register. Defaults to the vector length agnostic value of "scalable". Accepts power of 2 values between 64 and 65536. Also accepts "zvl" to use the value implied by -march/-mcpu. (RISC-V only)

       -mtune=<arg>

       Only supported on AArch64, PowerPC, RISC-V, SPARC, SystemZ, and X86

   AARCH64
       -msve-vector-bits=<arg>

       Specify the size in bits of an SVE vector register. Defaults to the vector length agnostic value of "scalable". (AArch64 only)

   X86
       -mapx-features=<arg1>,<arg2>..., -mno-apx-features=<arg1>,<arg2>...

       Enable features of APX. <arg> must be 'egpr', 'push2pop2', 'ppx', 'ndd', 'ccmp', 'nf', 'cf' or 'zu'.

       -mevex512, -mno-evex512

   LoongArch
       -mannotate-tablejump, -mno-annotate-tablejump

       Enable annotate table jump instruction to correlate it with the jump table.

       -mdiv32, -mno-div32

       Use div.w[u] and mod.w[u] instructions with input not sign-extended.

       -mfrecipe, -mno-frecipe

       Enable frecipe.{s/d} and frsqrte.{s/d}

       -mlam-bh, -mno-lam-bh

       Enable amswap[_db].{b/h} and amadd[_db].{b/h}

       -mlamcas, -mno-lamcas

       Enable amcas[_db].{b/h/w/d}

       -mlasx, -mno-lasx

       Enable Loongson Advanced SIMD Extension (LASX).

       -mld-seq-sa, -mno-ld-seq-sa

       Do not generate same-address load-load barrier instructions (dbar 0x700)

       -mlsx, -mno-lsx

       Enable Loongson SIMD Extension (LSX).

       -mscq, -mno-scq

       Enable sc.q instruction.

       -msimd=<arg>

       Select the SIMD extension(s) to be enabled in LoongArch either 'none', 'lsx', 'lasx'.

   Optimization level
       Flags controlling how much optimization should be performed.

       -O<arg>

       -Ofast<arg>

       Deprecated; use '-O3 -ffast-math -fstack-arrays' for the same behavior, or '-O3 -fstack-arrays' to enable only conforming optimizations

   Debug information generation
       Flags controlling how much and what kind of debug information should be generated.

   Kind and level of debug information
       -g

       Generate source-level debug information

   Debug level
       -g0

       -g2

       -g3

       -gline-directives-only

       Emit debug line info directives only

       -gline-tables-only, -g1

       Emit debug line number tables only

   Linker options
       Flags that are passed on to the linker

       -L<dir>

       Add directory to library search path

       -Wl,<arg>,<arg2>...

       Pass the comma separated arguments in <arg> to the linker

       -Xlinker <arg>

       Pass <arg> to the linker

       -Xoffload-linker<triple> <arg>

       Pass <arg> to the offload linkers or the ones identified by -<triple>

       -l<arg>

       -nostdlib

       -rdynamic

       -rpath <arg>

       -shared, --shared

       -shared-libflangrt

       Link the flang-rt shared library

       -static, --static

       -static-libflangrt

       Link the flang-rt static library

       -stdlib

AUTHOR
       Flang Contributors

COPYRIGHT
       2017-2025, The Flang Team

21                                                                                                                                                  Jun 22, 2025                                                                                                                                            FLANG(1)
```

https://github.com/llvm/llvm-project/pull/144948